November 20, 2008, 05:46:33 pm *
News:
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Intra-procedural information  (Read 1241 times)
olhotak
Newbie
*
Posts: 8


« on: April 13, 2007, 06:18:54 pm »

SemmleCode gives access to mostly inter-procedural information, such as classes, methods, fields. I would be interested in more intra-procedural information, such as local variables and the assignments between them. For example, such information would make it possible to express a points-to analysis to the runtime types of objects from allocation site to everywhere they're used. (I'm sure you never expected me to want a points-to analysis  Smiley )
Logged
Oege de Moor
Newbie
*
Posts: 30


« Reply #1 on: April 14, 2007, 09:20:52 am »

Currently the intraprocedural information in the database includes calls and field accesses (details at http://semmle.com/content/view/50/143/), but indeed not enough to do things like points-to, and queries such as "on every path from a call to Lock.acquire() out of the method, Lock.release() is called".

We certainly plan to extend the populator to store the necessary information in the database: for one thing, then SemmleCode is useful for university courses on program analysis, and I happen to give one of those next year  Wink

Clearly the choice of IR to store in relational form is crucial. Obviously we're considering Jimple

http://en.wikipedia.org/wiki/Jimple

That would be extremely convenient for expressing analyses; the downside might be that it's rather expensive to construct from bytecode, thus potentially introducing a bottleneck in population from jars.

Another alternative might be Wala

http://wala.sourceforge.net/wiki/index.php/Main_Page

Or Joeq

http://joeq.sourceforge.net/

Whatever representation we use, it has to be 100% reliable.

Ondrej, do you have any experience with the other IRs besides Jimple? Any recommendations for what SemmleCode should use?



Logged
olhotak
Newbie
*
Posts: 8


« Reply #2 on: May 02, 2007, 02:05:53 pm »

The most distinguishing characteristic of Jimple compared to many other Java IRs is that every variable is given a static type. This is often convenient, but has costs. Computing these types is NP-hard. Considering the hardness of the problem, the algorithm used in Soot [Gagnon, Hendren, Marceau, SAS 2000] is quite fast for typical cases, but can still be a bottleneck when processing many classes. Another issue is how it behaves when the whole program (or at least its entire class hierarchy) is not available: I have not managed to understand how an incomplete hierarchy affects the algorithm, even after discussions with Etienne.

Another difference is that many IRs are in SSA form. Jimple is not, but Shimple is an SSA version of Jimple. For a query tool, SSA would be convenient, I think (without it, one would have to write some reaching-defs subqueries).

If the type inference was done on SSA form, would it still be NP-hard? This would be an interesting question to answer...

My recommendation would be to start with an SSA-based form without the Jimple type inference first. A typed IR could be built on top of it later, either built into the Semmle engine, or perhaps expressed using SemmleCode.
Logged
Pages: [1]
  Print  
 
Jump to: