Get started
Please provide your contact information below, and we will follow up shortly.
Analyzing large open source projects
LGTM.com makes CodeQL snapshots available to download for free for many open source projects, however there are resource limits in place to ensure that resources are fairly available across all projects on LGTM.com. As some particularly large open source projects have build times that exceed these limits, there are some projects that are not properly available on LGTM.com. They are either partially analyzed, or not at all.
(Note: These resource limits do not apply to LGTM Enterprise, our licensed code analysis platform. Typical customer deployments build much faster on infrastructure tailored and scaled according to their needs.)
Additionally, LGTM.com only provides CodeQL snapshots for the latest version of a project's source code that has been analysed, and it is often the case that you want to run queries on older versions of a project's source code.
While we work to make these big projects available on LGTM.com, we have decided to regularly provide snapshots of some of these projects to the community. We will make an effort to regularly update this page, with new revisions and new interesting projects.
Apple Darwin XNU
Gimp
Linux
Node.js
WebKit
v8
Yandex ClickHouse
Samba
Qemu
If you want to explore one of these projects using CodeQL, then you just need to download CodeQL for Eclipse, and import the snapshots of this project. Instructions on how to use CodeQL for Eclipse, and how to import a snapshot can be found in the Semmle help.
Compared to the snapshots that you can download from LGTM.com, there are a few differences with these snapshots that may make running queries on them slower than you would expect. However, here are some things that you can do to mitigate this.
Notice 1: Cache
Snapshots downloaded from LGTM.com come with a pre-warmed cache. This means that queries that you run will take advantage of intermediate data that has already been computed to produce the results for previous query runs (that is, the standard CodeQL queries). This intermediate data includes information such as all of the data flow paths.
In comparison, the snapshots on this page do not come with a pre-warmed cache. This is because the cache is tied to the current CodeQL schema (which usually changes with new version of CodeQL), and is cleared when you run QL > Check for Database Upgrades in CodeQL for Eclipse. As you will likely need to run database upgrades before first using these snapshots, including a cache doesn't make sense as it will be immediately cleared and lead to unnecessary bloat in the downloads.
As a result, the first time you run queries that use certain libraries (such as data flow), CodeQL for Eclipse may spend a long time evaluating data for them that is then cached. This means early query runs will take significantly longer to evaluate than you might expect. This is especially true given the large size of these snapshots, and should speed up as you run more queries.
Notice 2: RAM & CPU
By default, CodeQL for Eclipse limits itself to 4 GB of RAM and 1 CPU thread. In most cases (that is, most snapshots downloaded from LGTM.com) this should be sufficient, however these limits will not be sufficient for these large snapshots.
To configure RAM and CPU usage, open your Eclipse preferences and go to QL, then increase both Memory for running queries and Number of threads for running queries to higher values. The higher the values, the faster CodeQL queries should evaluate, though they should remain within the hardware limitations of your computer.
Notice 3: Database Version
You will also likely have to do a database upgrade on most of these snapshots as we are manually maintaining them, and they won't automatically use the latest version of CodeQL. To do this, right-click the snapshot then click QL > Check for Database Upgrades.
If you're new to CodeQL and would like to get started learning, then consider trying our CTF Challenges.
And if you need any help getting started, or have suggestions for other large projects we should add to this list, feel free to post on our forums or email dev-advocacy@semmle.com.
We hope you all will join us in making Open Source Software more secure!
Please provide your contact information below, and we will follow up shortly.