Peering through the peephole: build a peephole optimiser using the new Java Class-File API

When you think about code optimisation, you might imagine complex analysis over entire programs, sophisticated JIT compilers, or elaborate data-flow analyses. But some of the most effective optimisations come from a much simpler technique: peephole optimisation. This approach can yield surprisingly good results with relatively little complexity.

What are peephole optimisations?

Imagine looking at a piece of code through a small window – a peephole – that only shows you a few instructions at a time. As you slide this window over your code, you look for patterns that can be replaced with more efficient alternatives. Consider this simple Java code:

int y = foo + 0;

which might compile into the following Java bytecode:

iload_0
iconst_0
iadd

Clearly, the addition of zero to some other value is redundant and through our peephole, we can spot this pattern and remove it.

Applying the optimisation would result in just a single instruction instead of the original three:

iload_0

This optimisation is local – we only needed to look at a small sequence of instructions to identify and apply it. We didn’t need to analyse the entire program or understand complex control flow. This locality is what makes peephole optimisation both powerful and simple.

Why build a peephole optimiser?

With modern JVMs performing sophisticated runtime optimisations, including JIT compilation, you might wonder if there’s still value in bytecode optimisation.

Every unnecessary instruction in your class files takes up space, and this space adds up across a large application. Peephole optimisations, especially when combined with other techniques, can lead to significant savings in application size and often wins in performance. Mobile developers especially know that every kilobyte counts when users are downloading apps over mobile networks and performance matters for users with low-powered devices. This is why tools like ProGuard and R8 are standard in the Android toolchain – they use optimisation techniques including peephole optimisation, alongside tree-shaking, to reduce application size.

Building a peephole optimiser is also a great learning opportunity to understand how optimisers work, to get an understanding of Java bytecode and the patterns of instructions that can be optimised. Peephole optimisation is not just useful in the Java world either: it’s a common technique used by compilers and optimisers, for example the InstCombine pass in LLVM applies peephole optimisations to LLVM IR.

In this post, we’ll build a working peephole optimiser using Java’s new Class-File API (a preview feature in Java 23 and targeted as final for Java 24). This API is a modern approach to Java bytecode manipulation, replacing the visitor pattern used by older libraries with modern Java features and idioms.

Java Class-File API

The Class-File API is a new Java API, targeted for Java 24, that aims to provide a standard API for parsing, generating, and transforming Java class files. It is currently in its third iteration as JEP 484, and previously appeared as previews in JEP 457 and JEP 466.

Unlike libraries such as ProGuardCORE, ASM or ByteBuddy its scope is smaller: only parsing, generating, & transforming are in-scope while code analysis features are explicitly out of scope. Since peephole optimisation does not require any complicated code analyses, building such an optimiser on top of the API is relatively straight-forward.

Older libraries like ASM and ProGuardCORE (both 20+ years old) make heavy use of the visitor pattern (which made a lot of sense at the time) whereas the new API uses more recent Java idioms and features that weren’t present when the older libraries were designed. Generating a class which prints HelloWorld requires just a handful of lines of Java code:

ClassFile.of()
.buildTo(Path.of("HelloWorld.class"), ClassDesc.of("HelloWorld"), classBuilder -> classBuilder
 .withMethodBody("main", MethodTypeDesc.ofDescriptor("([Ljava/lang/String;)V"), ACC_PUBLIC | ACC_STATIC, 
  codeBuilder -> codeBuilder
   .getstatic(ClassDesc.of("java.lang.System"), "out", ClassDesc.of("java.io.PrintStream"))
   .ldc("Hello World")
   .invokevirtual(ClassDesc.of("java.io.PrintStream"), "println", MethodTypeDesc.ofDescriptor("(Ljava/lang/Object;)V"))
   .return_()
));

In JDK 23, the Class-File API is currently a preview feature, so the --enable-preview flags need to be provided to both the compiler and the java commands:

$ javac --enable-preview GenerateHelloWorld.java
$ java --enable-preview GenerateHelloWorld
$ java --enable-preview HelloWorld

Now let’s dive in and start building our optimiser!

Reading and writing classes

The overall structure of our optimiser will look like this:

Read a jar file
Read each class file from the jar file
Optimise the bytecode of each class
Write the result to a new jar

The optimised jar will be semantically equivalent to the original jar but (hopefully) smaller in size.

The first thing we’ll need to do is set up a framework for reading and writing jar files; for these we’ll use the tools available in the java.util.jar package. I’ll leave out some error checking to keep things simple but the full code for the optimiser can be found here.

public class Optimizer {
  public static void main(String[] args) {
    var input  = new File(args[0]);
    var output = new File(args[1]);
    optimizeJar(input, output);
  }

  private static void optimizeJar(File input, File output) {
    try (
      var jarFile      = new JarFile(input);
      var outputStream = new JarOutputStream(
                         new BufferedOutputStream(
                         new FileOutputStream(output)))
    ) {
      var entries = jarFile.entries();
      while (entries.hasMoreElements()) {
        var entry = entries.nextElement();

        try (var inputStream = jarFile.getInputStream(entry)) {
          var newEntry = new JarEntry(entry);
          outputStream.putNextEntry(newEntry);

          if (entry.getName().endsWith(".class")) {
            var originalBytes = inputStream.readAllBytes();
             try {
               // TODO: optimise the class file.
               var optimizedBytes = originalBytes;
               outputStream.write(optimizedBytes);
             } catch (Exception e) {
               // If there's an error during optimisation,
               // copy over the original bytes instead.
               System.err.println(
                   "Error optimising " + 
                    entry.getName() + ": " 
                    + e.getMessage()
               );
               outputStream.write(originalBytes);
             }
           } else {
             // Copy other files across unchanged.
             inputStream.transferTo(outputStream);
           }
           outputStream.closeEntry();
         }
       }
     } catch (IOException e) {
        System.err.println("Error: " + e.getMessage());
     }
   }
}

For now, we just copy class files from the input to the output and the TODO in the snippet shows the location where we’ll need to apply optimisations to the class files.

Creating an input for testing

Before we continue, let’s create a test input jar so that we can exactly control the input bytecode sequences for testing purposes. Of course, we can use the Class-File API for this!

We’ll create a jar that contains a single class named Test, which is equivalent to the following Java code:

public class Test {
  public static void main(String[] args) {
    StringBuilder sb = new StringBuilder();
    sb.append(“The length“);
    sb.append(“ of the”);
    sb.append(“ arguments array is ”);
    sb.append(args.length + 0);
    System.out.println(sb.toString());
  }
}

The Java bytecode can be generated using the ClassFile API as follows:

public class TestJarGenerator {
  public static void main(String[] args) throws IOException {
    if (args.length != 1) {
      System.err.println("Usage: java --enable-preview TestJarGenerator.java <output.jar>");
      System.exit(1);
    }


    byte[] classBytes = ClassFile.of()
       .build(ClassDesc.of("Test"), cb -> cb
       .withMethodBody("main", MethodTypeDesc.ofDescriptor("([Ljava/lang/String;)V"), ACC_PUBLIC | ACC_STATIC, codeBuilder -> codeBuilder
         .new_(ClassDesc.of("java.lang.StringBuilder"))
         .dup()
         .invokespecial(ClassDesc.of("java.lang.StringBuilder"), "<init>", MethodTypeDesc.of(CD_void))
         .ldc("The length")
         .invokevirtual(ClassDesc.of("java.lang.StringBuilder"), "append", MethodTypeDesc.of(ClassDesc.of("java.lang.StringBuilder"), ClassDesc.of("java.lang.String")))
         .ldc(" of the")
         .invokevirtual(ClassDesc.of("java.lang.StringBuilder"), "append", MethodTypeDesc.of(ClassDesc.of("java.lang.StringBuilder"), ClassDesc.of("java.lang.String")))
         .ldc(" arguments array is ")
         .invokevirtual(ClassDesc.of("java.lang.StringBuilder"), "append", MethodTypeDesc.of(ClassDesc.of("java.lang.StringBuilder"), ClassDesc.of("java.lang.String")))
         .aload(0)
         .arraylength()
         .iconst_0()
         .iadd()
         .invokevirtual(ClassDesc.of("java.lang.StringBuilder"), "append", MethodTypeDesc.of(ClassDesc.of("java.lang.StringBuilder"), CD_int))
         .invokevirtual(ClassDesc.of("java.lang.StringBuilder"), "toString", MethodTypeDesc.of(ClassDesc.of("java.lang.String")))
         .getstatic(ClassDesc.of("java.lang.System"), "out", ClassDesc.of("java.io.PrintStream"))
         .swap()
         .invokevirtual(ClassDesc.of("java.io.PrintStream"), "println", MethodTypeDesc.of(CD_void, ClassDesc.of("java.lang.String")))
         .return_()
     ));

    var manifest = new Manifest();
    var attr = manifest.getMainAttributes();
    attr.put(MANIFEST_VERSION, "1.0");
    attr.put(MAIN_CLASS, "Test");

    try (var jos = new JarOutputStream(Files.newOutputStream(Path.of(args[0])), manifest)) {
      var entry = new JarEntry("Test.class");
      jos.putNextEntry(entry);
      jos.write(classBytes);
      jos.closeEntry();
    }
  }
}

This bytecode gives us the opportunity to apply a couple of optimisations: removing redundant zero addition and merging string constants to remove unnecessary StringBuilder.append calls.

If you run the optimiser on the jar file now, the output should be the same as the input:

$ java --enable-preview optimiser.java input.jar output.jar
$ java -jar output.jar foo bar
The length of the arguments array is 2

You can check the bytecode with the javap tool which will be useful to see the results of the optimisations later:

$ javap -c -v -p -cp output.jar Test
…
23: arraylength
24: iconst_0
25: iadd
…

Now that we can read and write jar files, let’s see how we can transform classes with the Class-File API.

Transforming classes

To optimise the class files in the jar we’ll need to:

Parse the bytes into a ClassModel
Transform the code attributes in the ClassModel
Write the resulting bytes of the transformed ClassModel to the output jar

We’ll create a helper method named optimizeClass to implement the parsing and ClassModel transform. The TODO in optimizeJar can be replaced with a call to the new method:

var optimizedBytes = optimizeClass(originalBytes);
outputStream.write(optimizedBytes);

The method uses the Class-File API to parse the original bytes into a ClassModel and then uses the ClassFile.transform method to apply a transform:

private static byte[] optimizeClass(byte[] bytes) {
 // Parse the class bytes into a class model.
 // Drop line numbers and debug info, to simplify the peephole pattern matching.
 var classModel = ClassFile
     .of(DROP_LINE_NUMBERS, DROP_DEBUG)
     .parse(bytes);

 // When transforming the class, use a new constant pool instead of adding new
 // entries to the existing one.
 return ClassFile
  .of(NEW_POOL)
  .transform(classModel, transformingMethods(
   (methodBuilder, methodElement) -> {
    if (methodElement instanceof CodeAttribute codeAttribute) {
     methodBuilder.withCode(codeBuilder -> {
      // TODO: optimize code
      methodBuilder.accept(codeAttribute);
     });
    } else {
     methodBuilder.accept(methodElement);
    }
  }
 ));
}

Some things to pay attention to in the code snippet:

We drop line numbers and debugging information: this makes the peephole pattern matching easier, since this information introduces extra pseudo-instructions in the code in between actual instructions.
We use the NEW_POOL option when creating the transformed class file: by default the original constant pool is used as it is more efficient but it means that the constant pool can grow in size if we add new elements; since we’re interested in making classes smaller, it’s better to use a new constant pool.
We’re using a static helper method ClassFile.transformingMethods to reduce some of the necessary boilerplate.
The ClassFile.transform method returns a byte array: these are the new, optimised bytes that we’ll write out to the output jar.

If you run the optimiser on the jar file again, the output should still be the same as the input, since we have not yet applied any transformations to the code attributes. We have a TODO in the location we’re we’ll apply the peephole optimisations to the code attributes.

Class hierarchy resolver

When transforming bytecode we need to be careful about stack map frames. These frames, required since Java 7, help the JVM verify the bytecode’s type safety. They describe the types of values on the operand stack and in local variables at certain points in the code.

By default, stackmap frames are automatically generated by the Class-File API when required. However, to generate stackmap frames correctly, the API needs to understand the class hierarchy. For example, when branches merge, it needs to find the common supertype of values coming from different paths.

If you think of a method that returns a List, and in one branch we return a LinkedList and another branch ArrayList; then to verify that both types satisfy the method return type we would need to look up the hierarchy to check that both LinkedList and ArrayList share List as a common supertype.

We don’t have any branches in our simple test case but real applications will have branches, so this is optional for now but you should add it if you want to try the optimiser on real applications.

As we’re reading classes from a jar file we can provide an implementation of ClassHierarchyResolver that can read classes from the jar file that we’re optimising and delegates to a resource parsing resolver.

public static class JarClassHierarchyResolver implements ClassHierarchyResolver {
 private final ClassHierarchyResolver resourceClassHierarchyResolver;

 public JarClassHierarchyResolver(JarFile jarFile) {
  this.resourceClassHierarchyResolver = ClassHierarchyResolver
   .ofResourceParsing(
    classDesc -> {
     var desc = classDesc.descriptorString();
     // Remove the L and ; from the descriptor
     // e.g. Ljava/lang/Object -> java/lang/Object
     var internalName = desc
           .substring(1, desc.length() - 1);
     var jarEntry = jarFile
           .getJarEntry(internalName + ".class");

     // Class not found
     if (jarEntry == null) return null;

     try {
       return jarFile.getInputStream(jarEntry);
     } catch (IOException e) {
       // Error reading class
       return null;
     }
    });
 }
 @Override
 public ClassHierarchyInfo getClassInfo(ClassDesc classDesc) {
  return resourceClassHierarchyResolver.getClassInfo(classDesc);
 }
}

We’ll need to create an instance of the resolver in the optimizeJar method and pass it to the optimizeClass method:

private void optimizeJar(File input, File output) {
... 
 var resolver = ClassHierarchyResolver
   .defaultResolver()
   .orElse(new JarClassHierarchyResolver(jarFile))
   .cached();
...

And we use it in optimizeClass by passing it as an option to the ClassFile.of method:

return ClassFile
  .of(NEW_POOL, ClassHierarchyResolverOption.of(resolver))
  .transform(classModel, transformingMethods(
...

Creating the peephole

Our optimiser is a peephole optimiser which means that we need to create a peephole window of some size that slides through the code elements. We’ll look for patterns in the window and decide if we want to replace them, remove them, or keep them.

We’ll create a window by iterating through the code elements and creating a fixed size array containing the current element and the next 4 elements. This gives us a window size of 5 which can be adjusted according to the length of the patterns that are to be matched but usually the window size for peephole optimisations should be small.

The following method implements the sliding window but does not yet apply any optimisations:

private static void optimizeCodeAttribute(CodeAttribute codeAttribute, CodeBuilder codeBuilder) {
 var elements = codeAttribute.elementList();
 var windowSize = 5;
 var currentIndex = 0;

 while (currentIndex < elements.size()) {
  // Create a fixed size window with up to 
  // windowSize elements and the remainder nulls.
  var window = new CodeElement[windowSize];
  for (int i = 0; i < windowSize && currentIndex + i < elements.size(); i++) {
   window[i] = elements.get(currentIndex + i);
  }

  // TODO: apply optimisations on the window here
 
  // No optimisations, so continue to the next element.
  codeBuilder.accept(elements.get(currentIndex++));
 }
}

Don’t forget to update the optimizeClass method to call the optimizeCodeAttribute method:

...
methodBuilder.withCode(codeBuilder -> {
  optimizeCodeAttribute(codeAttribute, codeBuilder);
});
...

If you run the optimiser now, you’ll again see that no transformations occurred: the output will be the same as the input since we simply call codeBuilder.accept with all the original code elements.

Let’s finally implement our first optimisation!

Your first peephole optimisation

At the beginning we introduced the following sequence of instructions where the addition of integer zero is redundant:

iload_0
iconst_0
iadd

There are also other ways we can push the zero integer onto the stack:

ldc 0
bipush 0
sipush 0

In the Class-File API model, all of these instructions are implementations of the ConstantInstruction interface which means we can implement the optimisation for all of these in the same way.

To implement the peephole optimisation we need to check for two consecutive instructions:

A ConstantInstruction with constant value 0
An iadd instruction

The type of first + operand (the bytecode instruction before these two) does not matter: that’s the value that we’re adding zero to; so the optimisation can simply remove these two consecutive instructions.

The peephole window is an array and we can check the first and second elements using the instanceof, taking advantage of pattern matching to check the constant value and the opcode of the instructions:

if (window[0] instanceof ConstantInstruction c 
      && c.constantValue().equals(0) &&
    window[1] instanceof Instruction i
      && i.opcode() == IADD
   ) {
   // Skip the two matched elements
   // and emit no new elements.
   currentIndex += 2;
   continue;
}

The optimisation does not require emitting any replacement instructions; we simply skip the two matched instructions so that they are not emitted.

If you now run the optimiser on the test input, you will see that two instructions are removed:

$ java --enable-preview Optimizer.java input.jar output.jar
$ java -jar output.jar foo bar
The length of the arguments array is 2
$ diff -w <(unzip -p input.jar Test.class > /tmp/Test1.class && javap -c /tmp/Test1.class  | sed -E 's/#[0-9]+/#/g;s/^[[:space:]]*[0-9]+: //') <(unzip -p output.jar Test.class > /tmp/Test2.class && javap -c /tmp/Test2.class  | sed -E 's/#[0-9]+/#/g;s/^[[:space:]]*[0-9]+: //')
15,16d14
< iconst_0
< iadd

Congratulations, you’ve implemented your first peephole optimisation and saved 2 bytes!

A StringBuilder optimisation

We’ll implement one more peephole optimisation: when multiple StringBuilder.append invocations with constant strings appear sequentially we will merge them. For example, given the following Java code:

stringBuilder.append(“foo”).append(“bar”);

We can remove the second append call by merging the two constant strings:

stringBuilder.append(“foobar”);

This peephole optimisation is a bit more complicated than the previous one:

We need to match four instructions in the window:
- A constant string instruction
- A StringBuilder.append call
- A second constant string instruction
- A second StringBuilder.append call
We also need to emit a new instruction: we need to emit a replacement with the new concatenated string
Strings in the constant pool have a maximum size of 65535 bytes: we need to check that combining two shorter strings together does not exceed this limit

In Java bytecode, the instructions to match look like this:

ldc “foo”
invokevirtual java/lang/StringBuilder#append(Ljava/lang/String;)Ljava/lang/StringBuilder;
ldc “bar”
invokevirtual java/lang/StringBuilder#append(Ljava/lang/String;)Ljava/lang/StringBuilder;

And the replacement would simply be:

ldc “foobar”
invokevirtual java/lang/StringBuilder#append(Ljava/lang/String;)Ljava/lang/StringBuilder;

This replaces the four original instructions with just two instructions.

The peephole optimisation can be implemented in a similar way as the previous optimisation, using instanceof with pattern matching. Notice the use of the codeBuilder to emit new instructions for the newly created concatenated constant string and invoke instruction.

if (window[0] instanceof ConstantInstruction c1 
      && c1.constantValue() instanceof String s1 &&
    window[1] instanceof InvokeInstruction i1 &&
    i1.owner().asSymbol().equals(ClassDesc.of("java.lang.StringBuilder")) &&
    i1.method().name().equalsString("append") &&
    i1.typeSymbol().equals(MethodTypeDesc.of(ClassDesc.of("java.lang.StringBuilder"), ClassDesc.of("java.lang.String"))) &&
    window[2] instanceof ConstantInstruction c2 
      && c2.constantValue() instanceof String s2 &&
    window[3] instanceof InvokeInstruction i2 &&
    i2.owner().equals(i1.owner()) && i1.method().equals(i2.method()) && i1.type().equals(i2.type())
) {
 var concat = s1 + s2;

 // Emit the concatenated string constant, if it fits.
 if (concat.getBytes(UTF_8).length <= 65535) {
   codeBuilder
      .ldc(concat)
      .invokevirtual(i1.owner().asSymbol(), i1.method().name().stringValue(), i1.typeSymbol());

   // Skip the four matched instructions.
   currentIndex += 4;
   continue;
 }
}

If you now run the optimiser on the test input, you will see that five instructions are removed and one instruction is added:

$ java --enable-preview optimiser.java input.jar output.jar
$ java -jar output.jar foo bar
The length of the arguments array is 2
$ diff -w <(unzip -p input.jar Test.class > /tmp/Test1.class && javap -c /tmp/Test1.class  | sed -E 's/#[0-9]+/#/g;s/^[[:space:]]*[0-9]+: //') <(unzip -p output.jar Test.class > /tmp/Test2.class && javap -c /tmp/Test2.class  | sed -E 's/#[0-9]+/#/g;s/^[[:space:]]*[0-9]+: //')
7,9c7
< ldc           #                 // String The length
< invokevirtual #            // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
< ldc           #                 // String  of the
---
> ldc           #                 // String The length of the
15,16d12
< iconst_0
< iadd

Peephole optimisations are often applied multiple times because some optimisations enable other optimisations. For example, in the test input after one optimisation pass we end up the opportunity for another constant string append optimisation:

ldc “The length of the”
invokevirtual java/lang/StringBuilder#append(Ljava/lang/String;)Ljava/lang/StringBuilder;
ldc “ arguments array is”
invokevirtual java/lang/StringBuilder#append(Ljava/lang/String;)Ljava/lang/StringBuilder;

You can make a simple change in the optimizeJar method to apply the optimisations multiple times to a class, by calling optimizeClass again with the optimised bytes from the previous call:

...
var optimizedBytes = originalBytes;
for (int pass = 0; pass < numberOfPasses; pass++) {
  optimizedBytes = optimizeClass(resolver, optimizedBytes);
}
outputStream.write(optimizedBytes);
...

There are many other peephole optimisations that can be applied to StringBuilder calls, arithmetic instructions and more: try adding some new optimisations yourself!

Next steps

Peephole optimisation demonstrates that sometimes the simplest approaches can yield impressive results. By focusing on local patterns, we can achieve meaningful improvements without complex whole-program analysis. While these local optimisations are valuable on their own, they become even more powerful when combined with global optimisations like method inlining. For example, when a method is inlined, it often creates new opportunities for peephole optimisation that weren’t visible before.

The new Class-File API makes implementing peephole optimisations, and code transformations generally, for Java bytecode more straightforward than ever, providing a modern interface for bytecode manipulation that is part of the Java standard library.

As a next step, try extending the optimiser with your own patterns and optimisations, for example try implementing more arithmetic optimisations where constants are involved.

You can find the full code for the optimiser over on GitHub.

Author: James Hamilton

I’m a compiler engineer working at Guardsquare on JVM/Android related tools & libraries including ProGuardCORE, ProGuard and DexGuard.

Twitter LinkedIn Github

Peering through the peephole: build a peephole optimiser using the new Java Class-File API

What are peephole optimisations?

Why build a peephole optimiser?

Java Class-File API

Reading and writing classes

Creating an input for testing

Transforming classes

Class hierarchy resolver

Creating the peephole

Your first peephole optimisation

A StringBuilder optimisation

Next steps

Author: James Hamilton

Like this:

Related

Leave a ReplyCancel reply

Peering through the peephole: build a peephole optimiser using the new Java Class-File API

What are peephole optimisations?

Why build a peephole optimiser?

Java Class-File API

Reading and writing classes

Creating an input for testing

Transforming classes

Class hierarchy resolver

Creating the peephole

Your first peephole optimisation

A StringBuilder optimisation

Next steps

Author: James Hamilton

Share this:

Like this:

Related

Leave a ReplyCancel reply

The journey of providing Kotlin bindings for GitHub Actions

CQRS meets modern Java

The art of static code analysis