Developing a custom analysis to find Heartbleed-like security vulnerabilities

The Heartbleed bug affecting OpenSSL is a classic example of a bounds-checking vulnerability where data is read from beyond a buffer’s boundary due to missing input validation.

In this particular case, there are two vulnerable buffer reads. One of them is in file t1_lib.c and looks like this:

memcpy(bp, pl, payload);

Here, pl is a pointer variable that points into the data field of an ssl3_record_st struct, and payload is an integer variable containing a value read from that same buffer. The length of the data buffer is recorded in the length field of the same struct, but payload is not checked against this field, allowing an attacker to read beyond the boundary of the buffer by providing an artificially large value for payload.

In security terms, payload is tainted since it is read from untrusted input and not verified. There are many general-purpose security analyses that try to find cases such as this one where tainted values are used in sensitive positions such as a memcpy, but in the case of Heartbleed the dataflow involved in determining that payload ultimately stems from user input is quite difficult to track, and we are not aware of any off-the-shelf security analysis that would find the Heartbleed bug. (In response to the discovery of Heartbleed, Coverity extended their taint analysis to consider byte swapping operations as taint sources. Since reading payload from the buffer involves such an operation, their analysis can now find the vulnerability, which it previously was not able to do.)

Where general-purpose analyses fail, a domain-specific analysis is called for. In this example, we will show how to implement an analysis for spotting the pattern leading to the Heartbleed vulnerability in just a few lines of Semmle QL, and then generalize it to identify similar problematic patterns. While such a domain-specific analysis is generally of little use in identifying new, unknown vulnerabilities, it can be extremely useful in checking for additional instances of a known vulnerability.

Starting small

To begin with, let us write a QL query that identifies the concrete pattern explained above. That is, the query should find calls to memcpy where:

  1. The source of the copy operation (that is, the second argument to memcpy) is a pointer into the data buffer of an ssl3_record_st struct
  2. The length of the copy operation (that is, the third argument to memcpy) has not been compared to the length field of any ssl3_record_st struct

Approaching this task step-by-step, we start by writing a query that finds calls to memcpy:

import default

from FunctionCall fc
where fc.getTarget().getName().matches("%memcpy%")
select fc

Note that our definition of what constitutes a call to memcpy is quite generous: any call to a function whose name contains the string memcpy will do. This is because in some environments memcpy may be a macro resolving to an underlying function with a more complicated name.

On an unpatched version of OpenSSL, this query finds 594 results, among them the two calls that we are interested in.

Next, we want to improve our query to return only those calls where the second argument points into the data field of an ssl3_record_st struct. A simple idea would be to try the following:

import default
 
from FunctionCall fc, Struct ssl3_record_st, Field data
where fc.getTarget().getName().matches("%memcpy%") and
      ssl3_record_st.hasName("ssl3_record_st") and
      data = ssl3_record_st.getAField() and data.hasName("data") and
      fc.getArgument(1) = data.getAnAccess()
select fc

We introduce a new variable ssl3_record_st representing the struct we are interested in and a field data representing its field of the same name. The additional conditions in the where clause simply bind these variables and make sure that the second argument of fc is a variable access referencing the data field.

Somewhat disappointingly, this query finds no results. But remember that in the Heartbleed vulnerability, the call to memcpy didn’t reference the data field directly: rather, it copied from a variable pl, which pointed into the data buffer.

Here are the relevant statements initializing and assigning to pl:

int
tls1_process_heartbeat(SSL *s)
    {
    unsigned char *p = &s->s3->rrec.data[0], *pl;
    unsigned short hbtype;
    unsigned int payload;
 
    /* ... */
 
    hbtype = *p++;
    n2s(p, payload);
    pl = p;
 
    /* ... */
 
    if (hbtype == TLS1_HB_REQUEST)
            {
            /* ... */
            memcpy(bp, pl, payload);
            /* ... */
            }
 
 
    /* ... */
 
    }

Note that at first, a different variable p is set up to point to the first element of data; then it is incremented a number of times, before finally being assigned to pl. Thus, in order to know that the memcpy reads from the data field we have to track this dataflow through p and pl.

We could use a general-purpose dataflow library for this, but for this example it will be more instructive to define a simple dataflow predicate from first principles.

We start by defining a pointsInto predicate that determines whether an expression e might evaluate to a pointer value that points into the memory region pointed to by some variable v:

predicate pointsInto(Expr e, Variable v) {
  e = v.getAnAccess() or
  exists (ArrayExpr ae | ae = ((AddressOfExpr)e).getOperand() | ae.getArrayBase() = v.getAnAccess()) or
  pointsInto(((VariableAccess)e).getTarget().getAnAssignedValue(), v)
}

Our predicate considers three cases:

  • If e is an access to v, then clearly it points to the beginning of the memory region pointed to by v.
  • If e is of the form &b[i] where b is an access to v, then it also points into the memory region pointed to by v (specifically, to the i th element of that region).
  • If e is an access to a variable x, where x is assigned a value that points into the memory region pointed to by v, then so does e.

A more sophisticated implementation could consider other types of expressions and allow for more complicated dataflow, but for the purposes of this example, this simple version suffices.

We can now refine our earlier query to use this predicate:

from FunctionCall fc, Struct ssl3_record_st, Field data
where fc.getTarget().getName().matches("%memcpy%") and
      ssl3_record_st.hasName("ssl3_record_st") and
      data = ssl3_record_st.getAField() and data.hasName("data") and
      pointsInto(fc.getArgument(1), data)
select fc

This query now yields six results on OpenSSL, two of which are the Heartbleed vulnerabilities. But of course it will find these results both in the unpatched version and in the patched version, since we have not yet implemented the second part of our analysis that checks whether the copy length is properly validated (which it is in the patched version but not in the unpatched version).

As a first approximation, we could use the following predicate to determine whether a variable v is ever compared to a variable w:

predicate comparedTo(Variable v, Variable w) {
  exists (ComparisonOperation comp |
    comp = v.getAnAccess().getParent+() and
    comp = w.getAnAccess().getParent+()
  )
}

This simply says that v is compared to w if there is some comparison operation (that is, a comparison using any of the C operators ==, !=, <=, <, >=, >) such that both v and w are accessed somewhere in that comparison. (Note that this predicate is flow-insensitive: we do not check that v and w are compared on all possible control flow paths, or even before a certain point in the program: as long as they are compared somewhere, the predicate will succeed.)

We update our query to use this predicate:

from FunctionCall fc, Struct ssl3_record_st, Field data, Field length
where fc.getTarget().getName().matches("%memcpy%") and
      ssl3_record_st.hasName("ssl3_record_st") and
      data = ssl3_record_st.getAField() and data.hasName("data") and
      length = ssl3_record_st.getAField() and length.hasName("length") and
      pointsInto(fc.getArgument(1), data) and
      not comparedTo(((VariableAccess)fc.getArgument(2)).getTarget(), length)
select fc

Surprisingly, even with our very rough implementation of comparedTo we get quite good results: we find the two instances of Heartbleed, plus one false positive. The false positive is from a call to memcpy where the third argument is not compared to the data field, but instead initialized from it in the following way:

if (len > rr->length)
  n = rr->length;
else
  n = (unsigned int)len;
 
memcpy(buf,&(rr->data[rr->off]),n);

This can easily be eliminated by adding another disjunct to our comparedTo predicate:

predicate comparedTo(Variable v, Variable w) {
  v.getAnAssignedValue() = w.getAnAccess() or
  exists (ComparisonOperation comp |
    comp = v.getAnAccess().getParent+() and
    comp = w.getAnAccess().getParent+()
  )
}

This disjunct essentially says that if there is an assignment of the form v = w, we consider v to have been implicitly compared to w.

With this refinement, we now have implemented, in 25 lines of QL, a query that finds exactly the two Heartbleed vulnerabilities on unpatched OpenSSL, and verifies their absence on the patched revision 1.0.1g.

Taking it further

Of course, as it stands the query is extremely specific: it only finds memcpy operations reading from the data field of the ssl3_record_st struct that do not check against length. Similar unchecked reads from other structs are not considered, since for these structs we don’t know which fields encode buffer lengths, and so do not know what to check for.

An interesting approach for getting around this is to make our analysis statistical: if we see that in most cases a memcpy from a field f is accompanied by a bounds check against field g, the cases where there is no such check are probably worth a closer look; they may be harmless, or they may be cases where a developer forgot to insert a required check, leading to a vulnerability. The underlying intuition is that developers probably do the right checks most of the time, so these cases can teach our analysis what checks should be performed.

In QL, extending our query in this way is quite easy. We first introduce two predicates memcpy and guarded_memcpy that capture the concept of a memcpy from a field f, and the concept of a memcpy from a field f with a bounds check against field g, respectively:

predicate memcpy(FunctionCall memcpy, Field f) {
  memcpy.getTarget().getName().matches("%memcpy%") and
  pointsInto(memcpy.getArgument(1), f)
}
 
predicate guarded_memcpy(FunctionCall memcpy, Field f, Field g) {
  memcpy(memcpy, f) and
  comparedTo(((VariableAccess)memcpy.getArgument(2)).getTarget(), g)
}

Next, we add a predicate memcpy_usually_guarded that holds if memcpy operations reading from a field f are guarded by a bounds check against field g in more than 50% of all cases:

predicate memcpy_usually_guarded(Field f, Field g, float p) {
  exists (int total, int guarded |
      total = count(FunctionCall fc | memcpy(fc, f)) and
      guarded = count(FunctionCall fc | guarded_memcpy(fc, f, g)) and
      p = ((float)guarded*100)/total and
      p > 50
  )
}

Note how aggregates make this condition very easy to express: total is the total number of memcpy operations reading from f; guarded is the number of memcpy operations reading from f that are guarded by checking g. Dividing one by the other and multiplying by 100 yields the percentage p of memcpy operations reading from f that guard by checking against g.

The final version of the query now looks as follows (and can be downloaded here):

from FunctionCall memcpy, Struct s, Field f, Field g, float perc
where f = s.getAField() and g = s.getAField() and
      memcpy(memcpy, f) and
      memcpy_usually_guarded(f, g, perc) and
      not guarded_memcpy(memcpy, f, g) and
      forall (Field gg, float pperc | memcpy_usually_guarded(f, gg, pperc) | pperc <= perc)
select memcpy, "memcpy from " + s.toString() + "::" + f +
               " is guarded by comparison against " + s.toString() + "::" + g +
               " in " + perc + "% of all cases, but not here."

The where clauses ensure that:

  1. f and g are fields of the same struct type.
  2. memcpy is a memcpy operation reading from f.
  3. memcpy operations reading from f are usually guarded by checking against g.
  4. This memcpy operation is not guarded by checking against g.

The final forall constrains g to be the field against which reads from f are most often checked in case there is more than one: this simply ensures that we do not get more than one hit for the same memcpy (unless there are two fields with exactly the same percentage of bounds checks).

Our final query yields 16 hits on unpatched OpenSSL, two of which are Heartbleed vulnerabilities. The other 14 are false positives due to incomplete dataflow tracking in this basic implementation; they can easily be eliminated by switching to a more sophisticated dataflow library.