Site icon JVM Advent

The art of static code analysis

The necessity for static analysis of source code …

It highly likely most Java (and not only) developers have used at minimum some sort of a static analysis tool to perform a task such as (to name a few): To perform static code analysis we typically need a proper representation of source code, suitable for analysis. A programming language can be described by a formal grammar. Further more a parser can be created or generated following the rules of a formal grammar to create proper representation (typically a parse tree) from source code. Based on the type of language we want to represent we can use different types of formal grammars: It is not uncommon that in the early days different tools for static code analysis required writing a parser manually which is not a trivial task ….

Parser generators to the rescue …

Tools can be created to generate parsers based on a target context-free grammar rules. This is, for example, the case with tools like LEX and YACC written in C and generating code in C. At a high level parser generation is illustrated by the following diagram: In the early days of Java Sun Microsystems has developed a parser generator called Jack which was then later renamed to JavaCC (which stands for Java compiler-compiler). Another popular alternative for generating a parser for a Java grammar is ANTLR  (ANother Tool for Language Recognition). Both of these parser generators are well supported and written in Java. JavaCC (similar to YACC for C) can combine grammar  rules with Java code that is included in the generated parser, however JavaCC provides code generation only for Java while ANTLR is general purpose, has a large number of grammars for a number of programming languages and provides the possibility to generate parsers in different languages. Both of these tools work with formal grammars in eBNF form and considering for example the above general diagram here’s how the process of parser generation looks like in Antlr with a simple example of an expression parser generator: The parse tree that is generated by the parser itself provides more effort in terms of code analysis so that is why typically parser generators provide the possibility to generate a more concise representation which eliminates extra symbols and provides additional symbol resolution capabilities: the AST (abstract syntax tree). The process of using ANTLR or JavaCC in a standard Maven/Gradle project is very similar. For ANTLR:
<dependency> <groupId>org.antlr</groupId> <artifactId>antlr4-runtime</artifactId> <version>4.7.1</version> </dependency> …. <plugin> <groupId>org.antlr</groupId> <artifactId>antlr4-maven-plugin</artifactId> <version>4.7.1</version> <executions> <execution> <goals> <goal>antlr4</goal> </goals> </execution> </execution> …
For JavaCC:
<dependency> <groupId>net.java.dev.javacc</groupId> <artifactId>javacc</artifactId> <version>7.0.13</version> </dependency> … <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>javacc-maven-plugin</artifactId> <version>3.0.1</version> <executions> <execution> <id>javacc</id> <goals> <goal>javacc</goal> </goals> </execution> </executions> </plugin> …
Once this is in place the generated parser can be used to generate a parse tree that can have i.e. a listener attach for specific executions during the parsing process. Example using Antlr-generated parser:


String content = "public class Example { public void func(int x){ return x + 10; } }";

Java20Lexer lexer = new Java20Lexer(CharStreams.fromString(content));
CommonTokenStream tokens = new CommonTokenStream(lexer);
Java20Parser parser = new Java20Parser(tokens);
ParseTree tree = parser.compilationUnit();
ParseTreeWalker walker = new ParseTreeWalker();
ExprListener listener = new ExprListener();
walker.walk(listener, tree);

An alternative way to create a parser is by a parsing expression grammar (PEG). A library that implements this approach (the grammar rules are written in Java code directly as part of the application) is Parboiled.

Java libraries to the rescue …

Parser generators and PEG parsers are quite generic. They also may not be up to date with the desired Java version. As an alternative a specializing parsing library can be used such as JavaParser or Eclipse JDT.

JavaParser

It is based on JavaCC, it is well maintained and provides support for JDK 21. It provides enhanced symbol resolution and generates and AST from the source code. In addition it provides capabilities to query the AST via a DSL provided by the library, generate code from the AST or modify it. It is really simple to get started using the library, the following example counts the number of methods in a class:

public static int countMethods(File file) throws FileNotFoundException { 
   CompilationUnit cu = StaticJavaParser.parse(file); 
   int count = 0; 
   for (Node node : cu.findAll(MethodDeclaration.class)) { 
      count++; 
   } 
   return count; 
}

To get started using JavaParser it is sufficient to include the following dependency:

com.github.javaparser
javaparser-symbol-solver-core
3.25.10
</blockquote>

Eclipse JDT

Eclipse JDT (Java Developer Tools) is the main fuel behind the Java editor in Eclipse IDE that provides advanced capabilities like partial compilation, code completion etc. In earlier days of Eclipse it was not straighforward to use JDT outside of the Eclipse IDE primarily because these requred a number of extra dependencies to be dragged as well. Now Eclipse JDT is available as standalone library via the following dependency:
<dependency> <groupId>org.eclipse.jdt</groupId> <artifactId>org.eclipse.jdt.core</artifactId> <version>3.36.0</version> </dependency>
The following example implements a method to count the number of methods in a Java class:

public static int countMethods(File file) 
    throws IOException, MalformedTreeException, BadLocationException {

    String source = FileUtils.readFileToString(file, Charset.defaultCharset());
    Document document = new Document(source);
    ASTParser parser = ASTParser.newParser(AST.JLS21);
    parser.setSource(document.get().toCharArray());
    CompilationUnit unit = (CompilationUnit) parser.createAST(null);

    int count = 0;
    List<AbstractTypeDeclaration> types = unit.types();
    for (AbstractTypeDeclaration type : types) {
        if (type.getNodeType() == ASTNode.TYPE_DECLARATION) {
        List<BodyDeclaration> bodies = type.bodyDeclarations();
        for (BodyDeclaration body : bodies) {
           if (body.getNodeType() == ASTNode.METHOD_DECLARATION) {
              count++;
           }
        }
     }
}
return count;
}
As you can see there are multiple options you can choose from in order to start writing a tool on your own for static analysis of Java code.

Author: Martin Toshev

Exit mobile version