One of the most effective overall application security controls is input validation, which checks user input to determine if it is valid data. For example, an input field for a person's first name might reject the string "';DROP TABLE users" as invalid because it doesn't meet the criteria defined as a proper first name.
Input validation should be used as a first (and not the only) line of defense and is particularly effective against various injection attacks. SQL Injection, Code Injection, Command Injection and Cross-Site Scripting can be effectively mitigated by using strong input validation. While input validation can be very effective at mitigating vulnerabilities, it should not be relied on by itself, but should be used as a part of a defense-in-depth strategy.
Best practices for input validation include:
- Use positive (allowlist) input validation
Positive input validation defines patterns of input that are considered valid by the application and may be processed, whereas a blocklisting approach defines a list of invalid data inputs and rejects those specific ones. Allowlisting is a preferred approach because it is typically easier to define or enumerate all possible valid inputs than it is to define all the possible invalid inputs. - Implement input validation on the server-side
Input validation should be performed on the server to prevent an attacker from being able to bypass it. Client-side input validation may be used for performance reasons but it can be easily disabled or modified by an attacker because it is executed on the attackers’ computer. To prevent input validation code from being bypassed, it should be executed on the server. - Centralize all input validation
Unifying input validation functions in one place helps facilitate maintenance and ensures they are used consistently, and makes it easier to prevent duplicate functions - functions that perform similar tasks that have been created by different programmers because they were unaware that a similar function already existed. Centralizing input validation also provides insight into what validation functions are available to the application. - Validate all input for length, characters, format, validate range of numeric data
If invalid input can bypass the validation process, it can might allow an attacker to carry out SQL Injection, XSS and other attacks. Be sure to identify all sources of input and validate each type of input data before it is processed. Enumerating all possible sources of input can be challenging for a large application. In such cases, consider the application as a set of components and review a manageable portion of the application at a time to make sure that each component validates input from all sources. In practice this might mean reviewing each individual page in a web application and making sure that each piece of user-supplied data is validated. Spreadsheets are useful to keep track of sources of input, types of data, and corresponding validator functions.
The recommended approach for validating input is to validate length, characters, format and then range of numeric data – and in that order. Checking the length first is one of the simplest and most relevant steps and helps prevent Regular Expression Denial of Service attacks (if regular expressions are used to perform other validation checks). After checking the length, validate the characters: numeric data should only have numbers in it, names should only have letters in them, etc. Usernames can have letters and numbers in them, so it’s important to disallow various special characters such as apostrophes, quotation marks and semicolons.
If data being validated has a certain format, that format should also be validated by using regular expressions. For example, U.S. zip codes, social security numbers and credit card numbers have specific formats that can be checked using regular expressions, but are not really numeric data. One common type of data that is particularly difficult to check using regular expressions is e-mail addresses. If e-mail addresses need to be validated, send a verification e-mail to the address with a unique and difficult to predict token. Once the user supplies this unique token, then the email address can be considered valid. This is an easy to implement technique that is used regularly in Internet forums and other Web sites that require e-mail verification when a new account is being created.
Lastly, for purely numeric data, the range of the data should be checked against a minimum and a maximum values to make sure that it is within feasible limits. For example, an order total cannot be negative, so the lower limit is 0 and the upper limit should be lower than the maximum value that can be stored by the data type used to represent the number in the memory.
Input validation is an effective first line of defense, but it should not be relied on exclusively. Often, data that appears valid can be used to exploit an underlying vulnerability. Additionally, be sure to use programming best practices to protect against common vulnerabilities, which include:
- SQL Injection - use prepared statements or use properly parameterized stored procedures.
- Code Injection - remove code that might interpret user-supplied data as code.
- Command Injection - use parameterized APIs for executing external commands.
- Cross-Site Scripting - encode user-supplied data before displaying it on web pages.
While I will not go into detail here, additional layers of defense should be considered for error handling, strong cryptographic storage, communication security, logging and don’t forget qualified incident response personnel.