Analyzing large open source projects

LGTM.com makes QL snapshots available to download for free for many open source projects, however there are resource limits in place to ensure that resources are fairly available across all projects on LGTM.com. As some particularly large open source projects have build times that exceed these limits, there are some projects that are not properly available on LGTM.com. They are either partially analyzed, or not at all.

(Note: These resource limits do not apply to LGTM Enterprise, our licensed code analysis platform. Typical customer deployments build much faster on infrastructure tailored and scaled according to their needs.)

Additionally, LGTM.com only provides QL snapshots for the latest version of a project's source code that has been analysed, and it is often the case that you want to run queries on older versions of a project's source code.

While we work to make these big projects available on LGTM.com, we have decided to regularly provide snapshots of some of these projects to the community. We will make an effort to regularly update this page, with new revisions and new interesting projects.

QL snapshots of large open source projects

Apple Darwin XNU

Gimp

Linux

Node.js

WebKit

v8

Yandex ClickHouse

Samba

Qemu

If you want to explore one of these projects using QL, then you just need to download QL for Eclipse, and import the snapshots of this project. Instructions on how to use QL for Eclipse, and how to import a snapshot can be found in the Semmle help.

IMPORTANT NOTICES: large snapshots, performance & database version

Compared to the snapshots that you can download from LGTM.com, there are a few differences with these snapshots that may make running queries on them slower than you would expect. However, here are some things that you can do to mitigate this.

Notice 1: Cache

Snapshots downloaded from LGTM.com come with a pre-warmed cache. This means that queries that you run will take advantage of intermediate data that has already been computed to produce the results for previous query runs (that is, the standard QL queries). This intermediate data includes information such as all of the data flow paths.

In comparison, the snapshots on this page do not come with a pre-warmed cache. This is because the cache is tied to the current QL schema (which usually changes with new version of QL), and is cleared when you run QL > Check for Database Upgrades in QL for Eclipse. As you will likely need to run database upgrades before first using these snapshots, including a cache doesn't make sense as it will be immediately cleared and lead to uneccesary bloat in the downloads.

As a result, the first time you run queries that use certain libraries (such as data flow), QL for Eclipse may spend a long time evaluating data for them that is then cached. This means early query runs will take significantly longer to evaluate than you might expect. This is especially true given the large size of these snapshots, and should speed up as you run more queries.

Notice 2: RAM & CPU

By default, QL for Eclipse limits itself to 4 GB of RAM and 1 CPU thread. In most cases (that is, most snapshots downloaded from LGTM.com) this should be sufficient, however these limits will not be sufficient for these large snapshots.

To configure RAM and CPU usage, open your Eclipse preferences and go to QL, then increase both Memory for running queries and Number of threads for running queries to higher values. The higher the values, the faster QL queries should evaluate, though they should remain within the hardware limitations of your computer.

Notice 3: Database Version

You will also likely have to do a database upgrade on most of these snapshots as we are manually maintaining them, and they won't automatically use the latest version of QL. To do this, right-click the snapshot then click QL > Check for Database Upgrades.

Learning QL, suggestions, or getting help

If you're new to QL and would like to get started learning, then consider trying our CTF Challenges.

And if you need any help getting started, or have suggestions for other large projects we should add to this list, feel free to post on our forums or email dev-advocacy@semmle.com.

We hope you all will join us in making Open Source Software more secure!

Get started