Semmle CTF 2: U-Boot challenge
Do you want to challenge your vulnerability hunting skills and to quickly learn Semmle CodeQL? Your mission, should you choose to accept it, is to find all variants leading to a memcpy attacker controlled overflow. You will do this by utilizing CodeQL, our simple, yet expressive, code query technology. To capture the flag, you’ll need to write a query that finds unsafe calls to memcpy using this step by step guide.
The goal of this challenge is to find the 13 remote-code-execution vulnerabilities that our security researchers found in the U-Boot loader. The vulnerabilities can be triggered when U-Boot is configured to use the network for fetching the next stage boot resources. MITRE has issued the following CVEs for the 13 vulnerabilities: CVE-2019-14192, CVE-2019-14193, CVE-2019-14194, CVE-2019-14195, CVE-2019-14196, CVE-2019-14197, CVE-2019-14198, CVE-2019-14199, CVE-2019-14200, CVE-2019-14201, CVE-2019-14202, CVE-2019-14203, and CVE-2019-14204.
Through these vulnerabilities an attacker in the same network (or controlling a malicious NFS server) could gain code execution at the U-Boot powered device. The first two occurrences of the vulnerability were plain memcpy overflows with an attacker-controlled size coming from the network packet without any validation. The memcpy function copies n bytes from memory area src to memory area dest. This can be unsafe when the size being parsed is not appropriately validated, allowing an attacker to fully control the data and length being passed through.
U-Boot contains hundreds of calls to memcpy and libc functions that read from the network such as ntohl and ntohs. In this challenge, you will use Semmle CodeQL to find those calls. Of course many of those calls are safe, so throughout this challenge you will refine your query to reduce the number of false positives.
Upon completion of the challenge, you will have a query that is able to find many of the vulnerabilities that allow for remote execution of arbitrary code on U-Boot powered devices.
The quickest way to get started with CodeQL is to use LGTM's query console. However, if you prefer, you can also install CodeQL and write your queries offline. Instructions for installing CodeQL are included at the end of this document.
If you get stuck, try searching our documentation and blog posts for help and ideas. Below are a few links to help you get started:
The challenge is split into several steps, each of which contains multiple questions.
Question 0.0: Can you work out what the above query is doing?
Question 0.1: Modify the query to find the definition of memcpy.
Question 0.2: ntohl, ntohll, and ntohs can either be functions or macros (depending on the platform where the code is compiled).
As these snapshots for U-Boot were built on Linux, we know they are going to be macros. Write a query to find the definition of these macros.
Hint: The CodeQL Query Console has an auto-completion feature. Hit Ctrl-Space after the from clause to get the list of objects you can query. Wait a second after typing myObject. to get the list of methods.
Hint: We can use a regular expression to write a query that searches for all three macros at once.
Question 1.0: Find all the calls to memcpy.
Question 1.1: Find all the calls to ntohl, ntohll, and ntohs.
Question 1.2: Find the expressions that resulted in these macro invocations.
For this step, we want to detect cases where some data read from the network will end up being used by a call to memcpy. To do this, we’ll use the Semmle CodeQL taint tracking library, and its predicate hasFlowPath that will tell us when some data coming from a source flows to a sink. Use the boiler plate provided below to complete your taint tracking query.
Question 2.0: Write a QL class that finds all the top-level expressions associated with the macro invocations to the calls to ntohl, ntohll, and ntohs.
Question 2.1: Create the configuration class, by defining the source and sink. The source should be calls to ntohl, ntohll, or ntohs. The sink should be the size argument of an unsafe call to memcpy.
Hint: The source should be an instance of the class you wrote in part 2.0.
Hint: The sink should be the size argument of calls to memcpy.
Question 3.0: There are 13 known vulnerabilities in U-Boot.
The query you completed above probably found 9 of them. See if you can refine your query to find 1 or more additional vulnerabilities.
Question 3.1: Generalize your query to find other untrusted inputs (not only networking) such as ext4 fs.
If you find yourself stuck writing CodeQL or on any part of the CTF and would like some help, there are a few different things you can do:
We hope you enjoyed this challenge! If you are interested in continuing to use CodeQL for security research, then we recommend installing CodeQL on your own computer. This will enable you to run queries offline. We have also provided these offline instructions for posterity, because the query results on LGTM will change over time as the source code evolves. But the instructions below use a snapshot corresponding to revision d0d07ba, which is the revision for which we designed this challenge. To run CodeQL queries offline, follow these steps:
You can download other snapshots for offline use from LGTM. For example, you can download a snapshot for the latest revision of U-Boot here. Every project on LGTM has a download link for downloading the latest snapshot.