Own your heap: Iterate class instances with JVMTI

Today I want to talk about a different Java that most of us don’t see and use every day, to be more exact about lower level bindings, some native code and how to perform some small magic. Albeit we won’t get to the true source of magic on JVM, but performing some small miracles is within a reach of a single post.

I spend my days researching, writing and coding on the RebelLabs team at ZeroTurnaround, a company that creates tools for Java developers that mostly run as javaagents. It’s often the case that if you want to enhance the JVM without rewriting it or get any decent power on the JVM you have to dive into the beautiful world of Java agents. These come in two flavors: Java javaagents and native ones. In this post we’ll concentrate on the latter.

Note, this GeeCON Prague presentation by Anton Arhipov, who is an XRebel product lead, is a good starting point to learn about javaagents written entirely in Java: Having fun with Javassist.

In this post we’ll create a small native JVM agent, explore the possibility of exposing native methods into the Java application and find out how to make use of the Java Virtual Machine Tool Interface.

If you’re looking for a practical takeaway from the post, we’ll be able to, spoiler alert, count how many instances of a given class are present on the heap.

Imagine that you are Santa’s trustworthy hacker elf and the big red has the following challenge for you:
Santa: My dear Hacker Elf, could you write a program that will point out how many Thread objects are currently hidden in the JVM’s heap?
Another elf that doesn’t like to challenge himself would answer: It’s easy and straightforward, right?


return Thread.getAllStackTraces().size();

But what if we want to over-engineer our solution to be able to answer this question about any given class? Say we want to implement the following interface?


public interface HeapInsight {
int countInstances(Class klass);
}

Yeah, that’s impossible, right? What if you receive String.class as an argument? Have no fear, we’ll just have to go a bit deeper into the internals on the JVM. One thing that is available to JVM library authors is JVMTI, a Java Virtual Machine Tool Interface. It was added ages ago and many tools, that seem magical, make use of it. JVMTI offers two things:

  • a native API
  • an instrumentation API to monitor and transform the bytecode of classes loaded into the JVM.

For the purpose of our example, we’ll need access to the native API. What we want to use is the IterateThroughHeap function, which lets us provide a custom callback to execute for every object of a given class.

First of all, let’s make a native agent that will load and echo something to make sure that our infrastructure works.

A native agent is something written in a C/C++ and compiled into a dynamic library to be loaded before we even start thinking about Java. If you’re not proficient in C++, don’t worry, plenty of elves aren’t, and it won’t be hard. My approach to C++ includes 2 main tactics: programming by coincidence and avoiding segfaults. So since I managed to write and comment the example code for this post, collectively we can go through it. Note: the paragraph above should serve as a disclaimer, don’t put this code into any environment of value to you.

Here’s how you create your first native agent:


#include
#include

using namespace std;

JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved)
{
cout << "A message from my SuperAgent!" << endl;
return JNI_OK;
}

The important part of this declaration is that we declare a function called Agent_OnLoad, which follows the documentation for the dynamically linked agents.

Save the file as, for example a native-agent.cpp and let’s see what we can do about turning into a library.

I’m on OSX, so I use clang to compile it, to save you a bit of googling, here’s the full command:


clang -shared -undefined dynamic_lookup -o agent.so -I /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/ -I /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/darwin native-agent.cpp

This creates an agent.so file that is a library ready to serve us. To test it, let’s create a dummy hello world Java class.


package org.shelajev;
public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}

When you run it with a correct -agentpath option pointing to the agent.so, you should see the following output:


java -agentpath:agent.so org.shelajev.Main
A message from my SuperAgent!
Hello World!

Great job! We now have everything in place to make it actually useful. First of all we need an instance of jvmtiEnv, which is available through a JavaVM *jvm when we are in the Agent_OnLoad, but is not available later. So we have to store it somewhere globally accessible. We do it by declaring a global struct to store it.


#include
#include

using namespace std;

typedef struct {
jvmtiEnv *jvmti;
} GlobalAgentData;

static GlobalAgentData *gdata;

JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved)
{
jvmtiEnv *jvmti = NULL;
jvmtiCapabilities capa;
jvmtiError error;

// put a jvmtiEnv instance at jvmti.
jint result = jvm->GetEnv((void **) &jvmti, JVMTI_VERSION_1_1);
if (result != JNI_OK) {
printf("ERROR: Unable to access JVMTI!n");
}
// add a capability to tag objects
(void)memset(&capa, 0, sizeof(jvmtiCapabilities));
capa.can_tag_objects = 1;
error = (jvmti)->AddCapabilities(&capa);

// store jvmti in a global data
gdata = (GlobalAgentData*) malloc(sizeof(GlobalAgentData));
gdata->jvmti = jvmti;
return JNI_OK;
}

We also updated the code to add a capability to tag objects, which we’ll need for iterating through the heap. The preparations are done now, we have the JVMTI instance initialized and available for us. Let’s offer it to our Java code via a JNI.

JNI stands for Java Native Interface, a standard way to include native code calls into a Java application. The Java part will be pretty straightforward, add the following countInstances method definition to the Main class:


package org.shelajev;

public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
int a = countInstances(Thread.class);
System.out.println("There are " + a + " instances of " + Thread.class);
}

private static native int countInstances(Class klass);
}

To accommodate the native method, we must change our native agent code. I’ll explain it in a minute, but for now add the following function definitions there:


extern "C"
JNICALL jint objectCountingCallback(jlong class_tag, jlong size, jlong* tag_ptr, jint length, void* user_data)
{
int* count = (int*) user_data;
*count += 1;
return JVMTI_VISIT_OBJECTS;
}

extern "C"
JNIEXPORT jint JNICALL Java_org_shelajev_Main_countInstances(JNIEnv *env, jclass thisClass, jclass klass)
{
int count = 0;
jvmtiHeapCallbacks callbacks;
(void)memset(&callbacks, 0, sizeof(callbacks));
callbacks.heap_iteration_callback = &objectCountingCallback;
jvmtiError error = gdata->jvmti->IterateThroughHeap(0, klass, &callbacks, &count);
return count;
}

Java_org_shelajev_Main_countInstances is more interesting here, its name follows the convention, starting with Java_ then the _ separated fully qualified class name, then the method name from the Java code. Also, don’t forget the JNIEXPORT declaration, which says that the function is exported into the Java world.

Inside the Java_org_shelajev_Main_countInstances we specify the objectCountingCallback function as a callback and call IterateThroughHeap with the parameters that came from the Java application.

Note that our native method is static, so the arguments in the C counterpart are:

 
JNIEnv *env, jclass thisClass, jclass klass

for an instance method they would be a bit different:

 
JNIEnv *env, jobj thisInstance, jclass klass

where thisInstance points to the this object of the Java method call.

Now the definition of the objectCountingCallback comes directly from the documentation. And the body does nothing more than incrementing an int.

Boom! All done! Thank you for your patience. If you’re still reading this, you’re ready to test all the code above.

Compile the native agent again and run the Main class. This is what I see:


java -agentpath:agent.so org.shelajev.Main
Hello World!
There are 7 instances of class java.lang.Thread

If I add a Thread t = new Thread(); line to the main method, I see 8 instances on the heap. Sounds like it actually works. Your thread count will almost certainly be different, don’t worry, it’s normal because it does count JVM bookkeeping threads, that do compilation, GC, etc.

Now, if I want to count the number of String instances on the heap, it’s just a matter of changing the argument class. A truly generic solution, Santa would be happy I hope.

Oh, if you’re interested, it finds 2423 instances of String for me. A pretty high number for such as small application. Also,


return Thread.getAllStackTraces().size();

gives me 5, not 8, because it excludes the bookkeeping threads! Talk about trivial solutions, eh?

Now you’re armed with this knowledge and this tutorial I’m not saying you’re ready to write your own JVM monitoring or enhancing tools, but it is definitely a start.

In this post we went from zero to writing a native Java agent, that compiles, loads and runs successfully. It uses the JVMTI to obtain the insight into the JVM that is not accessible otherwise. The corresponding Java code calls the native library and interprets the result.
This is often the approach the most miraculous JVM tools take and I hope that some of the magic has been demystified for you.

What do you think, does it clarify agents for you? Let me know! Find me and chat with me on twitter: @shelajev.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!