A Logic Meta Programming Foundation for Example-Driven Pattern Detection in Object-Oriented Programs

Scriptie/masterproef: Doctoral Thesis


The growing number of tools for detecting user-specified software patterns is testament to their valuable applications throughout the development process. In these applications, user-specified software patterns describe code that exhibits characteristics of interest. For instance, violations of the protocol an API expects to be adhered to.

Logic formulas can be used as expressive and descriptive pattern specifications. This merely requires reifying the program under investigation such that variables can range over its elements. Executing a proof procedure will establish whether program elements exhibit the characteristics specified in a formula. However, we have observed that such formulas become convoluted and operational in nature when developers attempt to ensure all instances of a pattern are recalled.

As the behavioral characteristics of a pattern can be implemented in different ways, the corresponding formula either has to enumerate each implementation variant or describe the machine-verifiable behavior shared by these variants. The former formulas are relatively descriptive, but only recall the implementation variants that are enumerated. The latter formulas recall all implementations of the specified behavior, but do so by quantifying over information about the program's behavior. This exposes developers to the intricate details of the program analyses that compute this information.

We have reconciled both approaches by embedding source code excerpts in logic formulas. These excerpts exemplify the prototypical implementation of a pattern's characteristics ---thus ensuring the resulting specifications are descriptive. They are specified in the concrete syntax of the base program augmented with logic variables. To ensure that all implementation variants are recalled, these excerpts are matched against behavioral program information according to multiple matching strategies that vary in leniency. Each match is quantified by the extent to which it exhibits the specified characteristics. The smaller this extent, the more likely the match is a false positive. This establishes a ranking which facilitates assessing a large amount of matches. A logic of quantified truth provides the theoretical foundation for this ranking.

Unique to our matching process is that it incorporates whole-program analyses in its comparison of individual program elements. A semantic analysis ensures correctness. To compare an unqualified and fully qualified type, for instance, a semantic analysis takes the import declarations into account of the compilation units in which they reside. A points-to analysis increases the amount of implementation variants that are recalled. When expressions are compared, for instance, syntactic deviations are allowed as long as they may evaluate to the same object at run-time.

The resulting example-driven approach to pattern detection recalls the implicit implementation variants (i.e. those that are implied by the semantics of the programming language) of a machine-verifiable characteristic specified as a code excerpt that exemplifies its prototypical implementation.
Datum Prijs28 aug 2009
BegeleiderWolfgang De Meuter (Promotor), Dirk Vermeir (Jury), Viviane Jonckers (Jury), Theo D'Hondt (Jury), Ralf Lämmel (Jury), Michael Godfrey (Jury) & Johan Brichau (Co-promotor)

Citeer dit