Legacy Modernization with Semmle QL

The challenges of keeping a 30-year old code base modern

In the mid 2000s, an investment company acquired the source code to a commercial risk management system they had been using. Bringing the code in house gave them greater control and agility by being able to continuously optimize performance and enhance functionality. This led to the business being able to make better-informed investment decisions faster.

The system is large—nearly 3 million lines of thirty year old C++ and a considerable amount of Java, and some C# added in recent years. The project team roster steadily changes. This poses challenges for the team’s technical leadership:

  • Bringing new engineering talent up to full productivity as quickly as possible
  • Replacing old commercial libraries with newer, open source alternatives
  • Removing dead code to make the system easier to understand and maintain
  • Modernizing old algorithms and designs to take advantage of new language and computing platform changes

In 2015, the project leaders began using Semmle™ QL to help address these issues. Semmle QL, a fundamental part of the Semmle engineering analytics platform, is a query language that gives developers new ways to explore and understand their source code.

About Semmle QL

QL is a modern variant of Datalog, and it is ideal for those who want an unbounded ability to ask questions of their code and related development team information by interrogating it the way they would any database. The syntax of QL is modeled on Java, with a strong influence from other query languages like SQL. The object-oriented syntax, with support for recursion, allows you to define queries with very sophisticated logic.

According to one of the technical team members at the customer, “the difference between QL and text search tools like Grep is that QL understands the code; how data and logic flow, scoping, typing, and so on. In a short amount of time, it’s given us a much deeper view of how the source code is used. In turn, this has helped our development team become more productive and make better architectural decisions.”

Using QL to find and eliminate aging libraries

The risk management system contained commercial libraries, for which the customer still paid license fees. The choice to use those libraries made technical and economic sense to the original vendor back in the 1990s, but today, there are free, open source alternatives such as Boost that are a better fit.

The challenge with this is understanding the dependencies on the original library. It’s easy enough to find specific references to the commercial APIs. But you also need to understand dependencies on the methods that are making those references. It’s a recursive search, and therefore, a natural fit for Semmle QL. Queries such as the one below helped reveal the usage of the aging libraries.

import cpp

from Function g, Function f
where f.getName() = "crvLoadMktRates" and g.calls*(f)
select f, g, g.getLocation()

The Semmle QL query above searches C++ code to find all functions g that eventually (however deeply nested through layers of function calls) call a function f with the name “crvLoadMktRates,” and reports the location in the code base of each.

Semmle QL has accelerated the ongoing process, and lowered the risk of modernizing the code base. The end result will be reduced commercial license costs and improved performance.

“A Win-Win-Win” – a new on-boarding experience for developers

In the summer of 2015, the team hired four new developers. Since it was the height of the summer vacation season, the team was spread thinly, and there wasn’t the usual number of people on hand to mentor the new hires.

The situation inspired one of the senior developers on the team to use QL to help the new developers ramp up. He defined QL queries that returned task lists for the new developers. These tasks included fixing minor coding syntax violations…the kind that don’t need an understanding of the business functionality to fix.

As a result, the new hires were committing code changes on the first day of employment. “It was a triple win,” said the tech lead. “The new hires felt like they were contributing immediately, the code base was improving, and we were reducing the time it took for new hires to reach full productivity…two to three months faster.”

In the new hire’s words…

“The tasks Semmle assigned during my first few days on the job were simple: removing dead code and unused dependencies; changing from pass by value to pass by reference, for example.

Even though they were small, it felt good to commit real code changes on my first day. The experience also helped me understand the code base faster, see what problems were common in a big legacy code base, and learn the team’s preferred C++ coding practices.

The use of Semmle during my on-boarding process was very positive.”