(Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

I have been contemplating for a number of months about reviewing a cache of articles and videos on topics like Performance tuning, JVM, GC in Java, Mechanical Sympathy, etc… and finally took the time to do it – may be this was the point in my intellectual progress when was I required to do such a thing!
Thanks to Attila-Mihaly for giving me the opportunity to write a post for his yearly newsletter Java Advent Calendar, hence a review on various Java related topics fits the bill! The selection of videos and articles are purely random, and based on the order in which they came to my knowledge. My hidden agenda is to mainly go through them to understand and broaden my own knowledge at the same time share any insight with others along the way.

I’ll be covering three reviews of talks by Attila Szegedi (1 talk) and Ben Evans (2 talks). They speak on the subject of Java Performance and the GC. The first talk by Attila covers a lot of his experience as an Engineer at Twitter – so its lots of information out of live experience in the field on production systems. Making use of thin objects instead of fat ones is one of the buzzwords in his talk.

Ben in his two talks covers Performance, JVM and GC in great depth. He points out about people’s misconception about Performance, the JVM and GC, things that people don’t have certain run-time flags enabled in production.  How the underlying machinery works, why it works the way it works?How efficient the machinery is and what best to do and not to do to get good throughput out of it?

Here I go with my commentary, I decided to start with Attila Szegedi’s talk as I quite liked the title…..


Everything I Ever Learned About JVM Performance Tuning @Twitter by Attila Szegedi
(video & slides)

Attila at the time of the talk worked for Twitter where he learnt a lot about the internals of the JVM and the Java language itself – Twitter being an organisation where tuning, optimising JVMs, low-latency are defacto practises.
He covers interesting topics like:
– contributors of latency
– finished code not ready for production
– areas of performance tuning (primarily memory tuning and lock contention tuning)
– Memory footprint tuning (OOME, inefficient tuning, FAT data)
– FAT data – a new terminology coined by him, and how to resolve issues created by it (pretty indepth and interesting) – learn about byte allocations to data types in the Java / JVM languages.
Some deep dive topics like compressed object pointers, are one of the suggestions (including a pit-fall). Certain types in Scala 2.7.7 are inefficient – as revealed by a JVM profiler. Do not use Thrift – as it is not a friend of low-latency, as they are heavy – adds between 52 to 72 bytes of overhead per object, does not support 32-bit floats, etc… Be careful with thread locals – sticks around and uses more resources than expected.
Performance triangle, Attila shares his insight into this concept. GC is the biggest threat of the JVM. Old gen uses ConcCollector, while the new gen goes through the STW process, and enlists a number of throughput and low-pause collectors.
Improve GC by taking advantage of the Adaptive sizing policy, and give it a target to work on. Use a throughput collector with or without the adaptive policy and benchmark the results.  He takes us through the various -XX: +Print… flags and explains its uses. Keep fragmentations low and avoid full GC stops. Lots of detail on the workings of the GC and what can be done to improve GC (tuning both new and old gens).
Latency that are not GC related – thread coordination optimization. Barriers and half-barriers can be used when using threads to improve latency – along with some tricks when using the Atomic values & AtomicReferences. Cassandra slab allocator – helps efficiency and performance – do not write your own memory manager. Attila is no longer a fan of “Soft references” – although great in theory but not in practice, more GC cycles are needed to clear them!

Conclusion: know your code as often they may be the root to your problems – frameworks can many a times be the cause of performance issues. Lots of things can be done to squeeze performance out of the programs written, if one knows how to best use the fundamental building blocks of data structures of your development environment. Its a hard game to maintain the best throughput and get the best performance out of the JVM.

— Recommend watching the video, lots more covered than the synopsis above  —


9 Fallacies of Java Performance by Ben Evans (blog)

In this article Ben goes about busting old myths and assumptions about Java, its performance, GC, etc… Areas covered being:
1) Java is slow, 2) A single line of Java means anything in isolation, 3) A micro-benchmark means what you think it does , 4) Algorithmic slowness is the most common cause of performance problems, 5) Caching solves everything, 6)  All apps need to be concerned about Stop-The-World, 7) Hand-rolled Object Pooling is appropriate for a wide range of apps, 8) CMS is always a better choice of GC than Parallel Old, 9) Increasing the heap size will solve your memory problem
– JIT compiled code is as fast as C++ in many cases
– JIT compiler can optimize away dead and unused code, even on the basis of profiling data. In JVMs like JRockit, the JIT can decompose object operations.
– For best results don’t prematurely optimize, instead correct your performance hot spots. 
– Richard Feynman once said: “The first principle is that you must not fool yourself – and you are the easiest person to fool” – something to keep in mind when thinking of writing Java micro-benchmarks.
The points being the ideas people have in their minds about Java is but the opposite of the reality of things. Basically suggesting the masses to revisit the ideas and make conclusions based on sheer facts and not assumptions or old beliefs.
– GC, database access, misconfiguration, etc… are likely to cause application slowness as compared to algorithms.
– Measure, don’t guess ! Use empirical production data to uncover the true causes of performance problems.
– Don’t just add a cache to redirect the problem elsewhere and add complexity to the system, but collect basic usage statistics (miss rate, hit rate, etc.) to prove that the caching layer is actually adding value.
– If the users haven’t complained or you are not in the low-latency stack – don’t worry about STOP-THE-WORLD pauses (circa 200 ms depending on the heap size).
– Object pooling is very difficult and should only be used when GC pauses are unacceptable, and intelligent attempts at tuning and refactoring have been unable to reduce pauses to an acceptable level.
– Check if CMS is your correct GC strategy, you should first determine that STW pauses from Parallel Old are unacceptable and can’t be tuned. Ben stresses: be sure that all metrics are obtained on a production-equivalent system.  
– Understanding the dynamics of object allocation and lifetime before changing heap size or tuning other parameters is essential. Acting without measuring can make matters worse. The tenuring distribution information from the garbage collector is especially important here.
Conclusion: The GC subsystem has incredible potential for tuning and for producing data to guide tuning, and then to use a tool to analyse the logs – either handwritten scripts and some graph generation, or a visual tool such as the (open-source) GCViewer or a commercial product.


Visualizing Java GC by Ben Evans (video & slides)

Misunderstanding or shortcomings in people’s understanding of GC. Its not just Mark & Sweep. Many run-times these days have GC! Two schools of thoughts – GC & Reference counting! Humans make mistakes as compared to machines which requires high levels of precision. True GC is incredibly efficient, reference counting is expensive – pioneered by Java (comments from +Gil Tene: On the correctness side, I’d be careful saying “pioneered by Java” for anything in GC. Java’s GC semantics are fairly classic, and present no new significant problems that predating environments did not. Most core GC techniques used in JVMs were researched and well known in other environments (smalltalk, lisp, etc.) and are also available in other Runtimes. While it is fair to say that from a practical perspective, JVMs tend to have the most mature GC mechanisms these days, that’s because Java is a natural place to apply new GC techniques that actually work. But innovation and pioneering in GC is not strongly tied to Java.)

The allocation list is where all objects are rooted from. You can’t get an accurate picture of all the objects of a running object at any given point of time of a running live application without stopping the application that’s why we have STW (Stop-The-World)(comments from +Gil Tene: In addition, the notion that “you can’t get an accurate picture of all the objects of a running object at any given point of time of a running live application without stopping the application that’s why we have STW (Stop-The-World)!” is wrong. Concurrent marking and concurrent compaction are very real things that achieve just that without stopping the application. “Just needs some good engineering”, and “you just can’t do X” are very different things.)

Golden rules of GC
– must collect all the garbage (sensitive rule)
– must never collect a live object
(trick: but they are never created equal)

Hotspot is C/C++/Assembly application. Heap is a contiguous block of memory with different memory pools – Young Gen, Old Gen, and PermGen pools. Objects are created by application (mutator) threads and removed by GC. Applications are not slow due to GC all the time.
PermG – not desirable, going away in Java 8 (known issue: causes OOME exceptions), to be replaced by Metaspace outside the heap (native memory).
GC is based on ‘Weak generational hypothesis’ – objects die young, or die old – found out through empirical research. (comments from +Michael Barker: I think this statement:
“GC is based on ‘Weak generational hypothesis’ – objects die young, or die old – found out through empirical research.”
Is not correct.  I think I can guess at what you mean, but you may want to consider rewording it so that it is not misleading.  There are GC implementations in real world VMs that are not generational collectors.

comments from +Kirk Pepperdine: Indeed. the ParcPlace VM had 7 different memory spaces that have a strong resemblance to the todays generational spaces. There is Eden with two hemi spaces plus 4 other spaces for different types of long lived data.)
Re-worded version: GC in the JVM is based on ‘Weak generational hypothesis’ – objects die young, or die old – found out through empirical research.

Tenuring threshold is the number of GC you survive before your get moved to the Old Gen (Tenuring space). JavaFX is bundled with jdk7u6 and up.
Source code of JavaFX Memory Visualizer written in Java replacing the Flash version  – https://github.com/kittylyst/jfx-mem – written using FlexML (FXML). An extensive explanation of how the program is written in FlexML, a nice programming language – uses the builder pattern in combination with DSL like expressions. The program models the way GC works and how objects are created, destroyed and moved about the different pools. 
List of mandatory flags, which do not have any performance impact
-verbose: gc
-Xloggc:<pathtofile>
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
All the information needed about an executing application and GC are recorded by the above. Also covers basic heap sizing flags. Setting the heap flags to equal do not apply anymore since recent versions of the JDK. Also there’s more than 200 flags to the GC and VM not including all the undocumented ones.
GC log files are useful for post-processing, but sometimes are not recorded correctly. MXBeans impact the running application but also do not give more information than the log files.
GC log files have a general format giving information on change of allocation, occupancy, tenuring info, collection info, etc…,  – explosion of GC log file formats and not much tooling out there. Many of the free tools cover some sort of dashboard like output showing various GC related metrics, the commercial versions have a better approach and useful information in general.

Premature promotion – under pressure of creation of new objects, objects are moved directly from YG to OG without going through the Survivor spaces.

Use tools, measure and don’t guess!

Conclusion: know the facts and find out details if they are not known but do not guess or assume. False conceptions have lead to assumptions and incorrect understanding of the JVM and the GC process at times. Don’t just changes flags or use tools, know why to and what they do. For e.g. switching on GC logging (with appropriate flags enabled) does not have a visible impact on the performance of the JVM but is a boon in the medium to long run.


— Highly recommend watching the video, lots more covered than the synopsis above, Ben has explained GC in the simplest form one could, covering many important details  —

As it is not practical to review all such videos and articles, a number of them have been provided in the links below for further study. In many cases I have paraphrased or directly quoted what the authors have to say to preserve the message and meaning they wished to convey. A follow-on to this blog post will appear in the same space under the titled (Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al on 19th Dec 2013.

Thanks

Thanks to +Gil Tene, +Michael Barker, @Ryan Rawson, +Kirk Pepperdine, and +Richard Warburton for read the post and providing using feedback.

Feel free to post your comments below or tweet at @theNeomatrix369!

Useful resources


    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    Under the JVM hood – Classloaders

    By Simon Maple, @sjmapleZeroTurnaround Technical Evangelist

    Classloaders are a low level and often ignored aspect of the Java language among many developers. At ZeroTurnaround, our developers have had to live, breathe, eat, drink and almost get intimate with classloaders to produce the JRebel technology which interacts at a classloader level to provide live runtime class reloading, avoiding lengthy rebuilds/repackaging/redeploying cycles. Here are some of the things we’ve learnt around classloaders including some debugging tips which will hopefully save you time and potential headdesking in the future.

    A classloader is just a plain java object

    Yes, it’s nothing clever, well other than the system classloader in the JVM, a classloader is just a java object! It’s an abstract class, ClassLoader, which can be implemented by a class you create. Here is the API:

    public abstract class ClassLoader {

    public Class loadClass(String name);

    protected Class defineClass(byte[] b);

    public URL getResource(String name);

    public Enumeration getResources(String name);

    public ClassLoader getParent()

    }

    Looks pretty straightforward, right? Let’s take a look method by method. The central method is loadClass which just takes a String class name and returns you the actual Class object. This is the method which if you’ve used classloaders before is probably the most familiar as it’s the most used in day to day coding. defineClass is a final method in the JVM that takes a byte array from a file or a location on the network and produces the same outcome, a Class object.

    A classloader can also find resources from a classpath. It works in a similar way to the loadClass method. There are a couple of methods, getResource and getResources, which return a URL or an Enumeration of URLs which point to the resource which represents the name passed as input to the method.

    Every classloader has a parent; getParent returns the classloaders parent, which is not Java inheritance related, rather a linked list style connection. We will look into this in a little more depth later on.

    Classloaders are lazy, so classes are only ever loaded when they are requested at runtime. Classes are loaded by the resource which invokes the class, so a class, at runtime, could be loaded by multiple classloaders depending on where they are referenced from and which classloader loaded the classes which referen… oops, I’ve gone cross-eyed! Let’s look at some code.

    public class A {

    public void doSmth() {

    B b = new B();

    b.doSmthElse();

    }

    }

    Here we have class A calling the constructor of class B within the doSmth of it’s methods.  Under the covers this is what is happening

    A.class.getClassLoader().loadClass(“B”);

    The classloader which originally loaded class A is invoked to load the class B.

    Classloaders are hierarchical, but like children, they don’t always ask their parents

    Every classloader has a parent classloader. When a classloader is asked for a class, it will typically go straight to the parent classloader first calling loadClass which may in turn ask it’s parent and so on. If two classloaders with the same parent are asked to load the same class, it would only be done once, by the parent. It gets very troublesome when two classloaders load the same class separately, as this can cause problems which we’ll look at later.

    When the JEE spec was designed, the web classloader was designed to work the opposite way – great. Let’s take a look at the figure below as our example.  
     


    Module WAR1 has its own classloader and prefers to load classes itself rather than delegate to it’s parent, the classloader scoped by App1.ear. This means different WAR modules, like WAR1 and WAR2 cannot see each others classes. The App1.ear module has its own classloader and is parent to the WAR1 and WAR2 classloaders.  The App1.ear classloader is used by the WAR1 and WAR2 classloaders when they needs to delegate a request up the hierarchy i.e. a class is required outside of the WAR classloader scope. Effectively the WAR classes override the EAR classes where both exist. Finally the EAR classloader’s parent is the container classloader.  The EAR classloader will delegate requests to the container classloader, but it does not do it in the same way as the WAR classloader, as the EAR classloader will actually prefer to delegate up rather than prefer local classes. As you can see this is getting quite hairy and is different to the plain JSE class loading behaviour.

    The flat classpath

    We talked about how the system classloader looks to the classpath to find classes that have been requested. This classpath could include directories or JAR files and the order which they are looked through is actually dependant on the JVM you are using. There may be multiple copies or versions of the class you require on the classpath, but you will always get the first instance of the class found on the classpath.  It’s essentially just a list of resources, which is why it’s referred to as flat. As a result the classpath list can often be relatively slow to iterate through when looking for a resource.

    Problems can occur when applications who are using the same classpath want to use different versions of a class, lets use Hibernate as an example. When two versions of Hibernate JARs exist on the classpath, one version cannot be higher up the classpath for one application than it is for the other, which means both will have to use the same version. One way around this is to bloat the application (WAR) with all the libraries necessary, so that they use their local resources, but this then leads to big applications which are hard to maintain. Welcome to JAR hell! OSGi provides a solution here as it allows versioning of JAR files, or bundles, which results in a mechanism to allow wiring to particular versions of JAR files avoiding the flat classpath problems.

    How do I debug my class loading errors?

    NoClassDefFoundError/ClassNotFoundException/ClassNoDefFoundException?

     

    So, you’ve got an error/exception like the ones above. Well, does the class actually exist? Don’t bother looking in your IDE, as that’s where you compiled your class, it must be there otherwise you’ll get a compile time exception. This is a runtime exception so it’s in the runtime we want to look for the class which it says we’re missing… but where do you start? Consider the following piece of code…

    Arrays.toString((((URLClassLoader) Test.class.getClassLoader())
    .getURLs()));

    This code returns an array list of all jars and directories on the classpath of the classloader the class Test is using. So now we can see if the JAR or location our mystery class should exist in is actually on the classpath. If it does not exist, add it! If it does exist, check the JAR/directory to make sure your class actually exists in that location and add it if it’s missing. These are the two typical problems which result in this error case.

    NoSuchMethodError/NoSuchFieldError/AbstractMethodError/IllegalAccessError?

     

    Now it’s getting interesting! These are all subclasses of the IncompatibleClassChangeError. We know the classloader has found the class we want (by name), but clearly it hasn’t found the right version. Here we have a class called Test which is making an invocation to another class, Util, but BANG – We get an exception! Lets look at the next snippet of code to debug:

    Test.class.getClassLoader().getResource(Util.class.getName()
    .replace('.', '/') + ".class");

    We’re calling getResource on the classloader of class Test. This returns us the URL of the Util resource. Notice we’ve replaced the ‘.’ with a ‘/’ and added a ‘.class’ at the end of the String. This changes the package and classname of the class we’re looking for (from the perspective of the classloader) into a directory structure and filename on the filesystem – neat. This will show us the exact class we have loaded and we can make sure it’s the correct version. We can use javap -private on the class at a command prompt to see the byte code and check which methods and fields actually exist. You can easily see the structure of the class and validate whether it’s you or the Java runtime which is going crazy! Believe me, at one stage or another you’ll question both, and nearly every time it will be you! :o)

    LinkageError/ClassCastException/IllegalAccessError

     

    These can occur if two different classloaders load the same class and they try to interact… ouch! Yes, it’s now getting a bit hairy. This can cause problems as we do not know if they will load the classes from the same place. How can this happen? Lets look at the following code, still in the Test class:

    Factory.instance().sayHello();

    The code looks pretty clean and safe, and it’s not clear how an error could emerge from this line. We’re calling a static factory method to get us an instance of the Test class and are invoking a method on it. Lets look at this supporting image to show the reason why an exception is being thrown.


    Here we can see a web classloader (which loaded the Test class) will prefer local classes, so when it makes reference to a class, it will be loaded by the web classloader, if possible. Fairly straightforward so far.  The Test class uses the Factory class to get hold of an instance of the Util class which is fairly typical practice in Java, but the Factory class doesn’t exist in the WAR as it is an external library.  This is no problem as the web classloader can delegate to the shared classloader, which can see the Factory class. Note that the shared classloader is now loading it’s own version of the Util class as when the Factory instantiates the class, it uses the shared classloader (as shown in the first example earlier). The Factory class returns the Util object (created by the shared classloader) back to the WAR, which then tries to use the class, and effectively cast the class to a potentially different version of the same class (the Util class visible to the web classloader). BOOM!

    We can run the same code as before from within both places (The Factory.instance() method and the Test class) to see where each of our Util classes are being loaded from.

    Test.class.getClassLoader().getResource(Util.class.getName()
    .replace('.', '/') + ".class"));

    Hopefully this has given you an insight into the world of classloading, and instead of not understanding the classloader, you can now appreciate it with a hint of fear and uncertainty! Thanks for reading and making it to the end. We’d all like to wish you a Merry Christmas and a happy new year from ZeroTurnaround!  Happy coding!
     
    Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!