Variant analysis is the process of using a known vulnerability as a seed to find similar problems in your code. Security engineers typically perform variant analysis to identify similar possible vulnerabilities and to ensure these threats are properly fixed across multiple codebases.
Variant analysis usually starts with a known problem. This "seed" vulnerability is often found through pentesting or bug bounty programs, or using techniques such as fuzzing. Once the root-cause of a critical bug is identified, security teams often perform manual security audits to identify other occurrences of similar problems across the codebase.
With Semmle, we can scale our variant finding over time and across multiple codebases.
Variant analysis does not always follow the same script, yet there are several common techniques used to identify vulnerabilities.
Control flow analysis (CFA) allows you to inspect how the different parts of the source code are executed and in which order. Control flow analysis is useful for finding vulnerable code paths that are only executed under unlikely circumstances that a developer has not anticipated and trying to trigger such code paths by providing the application with a malicious payload or environment.
Data flow analysis (DFA) is the process of tracking data from a source, where it enters an application, to a sink, where the data is used in a potentially harmful way if it's not sanitized along the way.
Taint tracking typically refers to untrusted – or tainted – data that is under partial or full control of a user. Using data flow analysis, tainted data is tracked from the source through method calls and variable assignments – including containers and class members – to a sink. The longer a path is from source to sink, the more difficult it is for a developer to trust the data.
Range analysis (or bounds analysis) is used to investigate which possible values a variable can hold, and which values it will never hold. This is useful information in various lines of investigation. For example, range analysis can identify areas in the source code where a programmer incorrectly assumed a string's upper bound on its length, subsequently leading to a buffer overflow. Range analysis can also be used to identify areas of dead code.
Semantic code search allows you to quickly interrogate a codebase and identify areas of interest for further investigation. This is valuable to identify methods having a particular signature, or variables that may contain credentials.
DevSecCon London 2018: Variant Analysis – A critical step in handling vulnerabilities
Traditional variant analysis relies on manual code inspection using standard tools like grep, AWK, and your IDE. The techniques mentioned above can be highly iterative making them tedious and time-consuming. They also require security engineers to have intimate knowledge of their codebase as well as a good understanding of various threat models. Given the velocity of most development teams, it's nearly impossible for any security team to keep up and review every commit.
The key to scaling security expertise is never doing the same research twice. While hunting for security vulnerabilities the first time can be fun, repeatedly looking for the same issue is tedious.
Semmle CodeQL provides extensive libraries to perform variant analysis using the common techniques mentioned above. By writing a CodeQL query to codify the diagnosis of a vulnerability, you can easily repeat the analysis across multiple codebases. Semmle LGTM can even analyze new code changes to prevent mistakes from ever reaching production.
In many cases, there's no need to start from scratch. With over 1,600 open source queries contributed by top researchers from the Semmle Security Research Team and our growing customer community, it's easy to refine an existing query to find a specific issue in your code. Sharing your own queries not only helps secure major open source projects, it helps us secure all software, together.
Variant analysis enhances your existing security research process. Check out our blog for an introduction to variant analysis with CodeQL and LGTM.