I’m a software developer at heart, but my real passion for secure design is the result of my experience conducting penetration tests and code reviews on our customer’s applications.  I routinely find dozens of vulnerabilities that are the result of bad design, and these are often the most difficult to fix. If a vulnerability is a result of a design problem, a painful redesign may be the only way to make a fix. Lack of data validation on trust boundaries is often a major culprit. We recommend a centralized input and data validation architecture, so that trust boundaries can clearly be validated and a data validation is all done in the same set of routines. This approach is easier to implement correctly and much easier to maintain than a more scattershot approach. It also makes it easy for static analysis tools to validate your validation approach, they can check tainted input to ensure it passes through a validation routine as it makes its way from source to sink.

Here is a summary of some best practices:

  1. Identify Trust Boundaries.
    Ensure that entry points between trust boundaries validate all input data explicitly. Make no assumptions about the data. The only exception is inside a routine that you know can only be called by other routines within the same trust boundary.
  2. Constrain, reject, and sanitize input.
    Examine the validation functions to make sure they constrain known valid input first, then reject known bad input, and sanitize the resulting data. Constrain what you allow from the beginning. It is much easier to validate data for known valid types, patterns, and ranges (using an allowlist) than it is to validate data by looking for known bad characters (using a blocklist). The range of valid data is generally a more finite set than the range of potentially malicious input. However, for added defense you might want to reject known bad input and then sanitize the input. Constrain input for type, length, format, and range. Use regular expressions to help constrain text input. Use strong data typing where possible.
  3. Centralize validation.
    Develop a dedicated class or library for input and data validation functions in all but the smallest of applications.  A good library includes routines for all of the different types of validation you need to apply, and these can be used in combination if necessary. Trace data from entry point to exit point to know how it should be validated. For instance you will treat input that is used in a SQL query differently than input that will echoed back to the user in an HTML page.

This approach requires great care, however. A bug in a centralized data validation routine will manifest itself in hundreds of ways throughout your application. The fix may be easier, since you can make it in one place, but the consequences of failure are greater as well.