Exploring the performance improvements of new Inline Types
Project Valhalla is one of the most interesting projects regarding the future of Java and of all JVM. It is still in an experimental state, but in this post we will look at how to try it, and we will implement a small program to verify the kind of performance improvements are possible with it.
Valhalla’s aim is to bring a new Inline Types (aka Value Types) that will: “Codes like a class works like an int”.
Currently, in the JVM there are 8 kinds of primitive types, each one is associated with a “signature letter” of the alphabet:
B byte signed byte
C char Unicode character encoded with UTF-16
D double double-precision floating-point value
F float single-precision floating-point value
I int integer
J long long integer
S short signed short
Z boolean true or false
On top of these, we have objects:
L ClassName reference an instance of class ClassName
You may have noted the signature letter if you printed out the result of toString()
method for a generic object.
Project Valhalla introduces a new letter for a new kind of types:
Q ClassName inline types
These new types will eventually replace the primitive types in our applications, removing current boxed types (Integer and such) and bringing a new seamless world based on Reference and Inline types only.
This is still in the future, for the moment let’s enjoy what is already available in the current Early-Access build.
Getting Started
To simplify the set-up, the Project Valhalla team released an Early-Access Build on August the 30th.
You can download it from here and configure it in JAVA_HOME as a normal JDK (version 14): http://jdk.java.net/valhalla/
Unfortunately at the moment (December 2019) IntelliJ and Gradle are not able to compile a source with the new features of the language. We need to use the JDK command-line tools or ant.
I shared on Github a repository with several examples and the ant build scripts needed to run them.
A Valhalla POINT
Let’s start from something very simple, let’s define a Point type with two fields for the coordinates.
inline public class Point {
public int x;
public int y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
The only change is the inline keyword on line 1. Point is an Inline Type and now we will look at the differences with a normal class.
Code like a Class
- It can be created using new
- As you can see, you can declare fields, constructors, and methods like a normal class
- It can implement an Interface, overriding the methods.
- It can be used by generics and can have generic parameters
Work like an int
- The type Point is immutable. The fields x and y are automatically treated as final. We can decide if we want them private or public but we cannot change them.
- Point has automatically generated methods equals, toString, and hashcode. We can test its equality using the == operator like int and double.
- Point has no null value. If you want to represent a null Point you need to use the new “inline widening” which are a kind of boxing on steroids.
- Point instances are not allocated separately from their references, they are represented directly in memory, like int now.
- Point has a default value, that corresponds to Point(0,0). It can be created with Point.default.
- Point cannot inherit from another type and nothing can inherit from it.
Work in progress
Some characteristics are still unfinished and they may not work in the current build or change how they work in future releases. In particular:
- It is not possible to use synchronization operations on Inline Types. So no synchronized methods, locks, wait/notify.
- Reflection: It’s not possible to distinguish them from reference objects. There will be some special interface for this.
- Thread-safe: it’s currently not possible to have a volatile behavior and to use them in Atomic operations.
For a more comprehensive look at Project Valhalla features, see the articles linked at the end. Here we will concentrate on the performance aspect of it.
Compact Arrays
One of the most exciting features of the Inline Classes is the way in which arrays are created.
In the case of primitives, each position of the array has the direct representation of the type; for example, the 64 bits of a long in case of an array of long.
In the case of the referenced object instead, the array will only contain the reference to the object allocated memory on the heap.
This means that reading an object from an array involves first reading the reference and then fetching the memory from its actual location. If the object has referenced fields, those would also cause data to be fetched from a remote memory location.
Inline Types on the other side work like primitives.
To have a taste of how much this matters, let’s make a simple test program to sum all trades from a given account in a big array.
Here is our trade’s simple representation, with an amount, an account and a traded security.
public class TradeRef {
public final double amount;
public final String account;
public final String security;
public TradeRef(double amount, String account, String security) {
this.amount = amount;
this.account = account;
this.security = security;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
TradeRefEncoded that = (TradeRefEncoded) o;
return Double.compare(that.amount, amount) == 0 &&
account == that.account &&
security == that.security;
}
@Override
public int hashCode() {
return Objects.hash(amount, account, security);
}
}
Similarly, we will define a TradeInline
class, identical with the inline modifier:
inline public class TradeInline {
final double amount;
final String account;
final String security;
public TradeInline(double amount, String account, String security) {
this.amount = amount;
this.account = account;
this.security = security;
}
}
After the compilation, we can use javap
from the command line to print the generated ByteCode:
javap -s antbuild/com/ubertob/ministring/TradeInline.class
And this is the output:
public final value class com.ubertob.ministring.TradeInline {
final double amount;
descriptor: D
final java.lang.String account;
descriptor: Ljava/lang/String;
final java.lang.String security;
descriptor: Ljava/lang/String;
public final int hashCode();
descriptor: ()I
public final boolean equals(java.lang.Object);
descriptor: (Ljava/lang/Object;)Z
public final java.lang.String toString();
descriptor: ()Ljava/lang/String;
public static com.ubertob.ministring.TradeInline com.ubertob.ministring.TradeInline(double, java.lang.String, java.lang.String);
descriptor: (DLjava/lang/String;Ljava/lang/String;)Qcom/ubertob/ministring/TradeInline;
}
Note the value modifier in the first line and the methods hashCode, equals, and toString that are not present in our class.
encoding a string into a long
Our TradeInline
class has still two String inside as fields. Can we inline them?
Unfortunately at the moment is not possible to flatten String and Arrays inside an Inline type. There are future plans for something called Array 2.0 that would allow for that.
What we can do now instead, is to squeeze a String inside a long
type, although we need to accept some limitations.
The long
type has 64 bits, so using a base 64 encoding (6 bits) we can store up to 10 characters inside.
In our case, we assume that the Security and the Account have a maximum length of 10 characters and they are composed only of uppercase letters, numbers and a limited number of special characters.
This is the code of our MiniString
inline type, with the encoding and decoding static functions:
inline public class MiniString {
long raw;
public MiniString(String str) {
raw = encode(str);
}
public String get() {
return decode(raw);
}
public static final int MAX_MINISTR_LEN = 10;
public static final int MINI_STR_BASE = 64;
public static final String letters = "=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 _-!?.$&%@#:[]{}()*<>:;',/^";
public static long encode(String str) {
String prepared = prepareString(str);
long encoded = 0;
for (char c : prepared.toCharArray()) {
int x = letters.indexOf(c);
encoded = encoded * MINI_STR_BASE + x;
}
return encoded;
}
private static String prepareString(String str) {
StringBuilder prepared = new StringBuilder();
for (char c : str.toCharArray()) {
if (letters.indexOf(c) >= 0)
prepared.append(c);
if (prepared.length() > MAX_MINISTR_LEN)
break;
}
return prepared.toString();
}
public static String decode(long number) {
StringBuilder decoded = new StringBuilder();
long remaining = number;
while (true) {
int mod = (int) ( remaining % MINI_STR_BASE);
char c = letters.charAt(mod);
decoded.insert(0, c);
if ( remaining < MINI_STR_BASE)
break;
remaining = remaining / MINI_STR_BASE;
}
return decoded.toString();
}
}
We will define a TradeMiniString
inline type using our new MiniString
for Account and Security.
This is the generated ByteCode:
public final value class com.ubertob.ministring.TradeMiniString {
final double amount;
descriptor: D
final com.ubertob.ministring.MiniString account;
descriptor: Qcom/ubertob/ministring/MiniString;
final com.ubertob.ministring.MiniString security;
descriptor: Qcom/ubertob/ministring/MiniString;
public final int hashCode();
descriptor: ()I
public final boolean equals(java.lang.Object);
descriptor: (Ljava/lang/Object;)Z
public final java.lang.String toString();
descriptor: ()Ljava/lang/String;
public static com.ubertob.ministring.TradeMiniString com.ubertob.ministring.TradeMiniString(double, java.lang.String, java.lang.String);
descriptor: (DLjava/lang/String;Ljava/lang/String;)Qcom/ubertob/ministring/TradeMiniString;
}
Note how the descriptor for the MiniString
account starts with the Q of inline types and not with the L of the reference types.
Finally, to have a fair comparison, we also define a TradeRefEncoded
class using the same trick to encode an Account and Security in a field of type long
.
You can find the complete code in my GitHub repository.
Before looking at the performance, let’s look how our Trade objects are using the memory.
First the standard TradeRef
object:
On the left, there is our big array. Then a pointer to the TradeRef object, and then two other pointers to the strings.
Depending on the order of creation, these can be quite far apart in our heap.
This is a performance problem because fetching memory from different regions is one of the slowest operations for a CPU.
This classic post from Martin Thomson is one of the better explanations:
https://mechanical-sympathy.blogspot.com/2012/08/memory-access-patterns-are-important.html
Let’s look at a simple diagram of the InlineTrade value memory representation now:
Here the whole value is directly on the array. For this reason, the array is actually bigger than in the first case. If you don’t allocate all objects, in other words, if your array mostly contains null values, this can be a drawback.
Finally, let’s see how the TradeMiniString is allocated in the memory:
You can see that it stays completely inside the array element, with no external pointers.
This is possible only because we accept big limitations on what can be contained in those strings; still, it’s an acceptable compromise when using strings only for storing ID from a database or the stock ticker symbols.
Performance Comparison
To compare the performance we can create four arrays of 5 million elements, one for each of our trade types:
static final int arraySize = 5_000_000;
public TradeRef[] tradeRefs = new TradeRef[arraySize];
public TradeRefEncoded[] tradesRefEncoded = new TradeRefEncoded[arraySize];
public TradeInline[] tradesInline = new TradeInline[arraySize];
public TradeMiniString[] tradesMiniString = new TradeMiniString[arraySize];
Then we fill them with identical random values:
public static void generatingTradesAndBrowing() {
var tr = new TradeRepository();
tr.fillWithRandomData();
var searcherRef = new TradeRefBrowser(tr.tradeRefs);
var searcherInline = new InlineTradeBrowser(tr.tradesInline);
var searcherRefEncoded = new TradeRefEncodedBrowser(tr.tradesRefEncoded);
var searcherMiniString = new MiniStringTradeBrowser(tr.tradesMiniString);
var account = tr.tradeRefs[1000].account;
while (true) {
benchmarks(searcherRef, searcherInline, searcherRefEncoded, searcherMiniString, account);
}
}
And finally, we do searches on each one repeatedly, printing out the time elapsed.
private static void benchmarks(TradeRefBrowser searcherRef, InlineTradeBrowser searcherInline, TradeRefEncodedBrowser searcherRefEncoded, MiniStringTradeBrowser searcherMiniString, String account) {
cronoSum(() -> searcherRef.sumByAccountFor(account), "Ref with for");
cronoSum(() -> searcherRef.sumByAccountStream(account), "Ref with stream");
cronoSum(() -> searcherRefEncoded.sumByAccountFor(account), "RefEncoded with for");
cronoSum(() -> searcherRefEncoded.sumByAccountStream(account), "RefEncoded with stream");
cronoSum(() -> searcherInline.sumByAccountFor(account), "Inline with for");
cronoSum(() -> searcherMiniString.sumByAccountFor(account), "MiniString with for");
}
We don’t care much about specific numbers here, so we just keep looping and print the results. The timings tend to stabilize after a few minutes when the Java Hotspot compiler optimizes and inlines most of the methods.
We can filter and sum using two methods: a for loop or a nicer stream map and reduce:
public double sumByAccountStream(String account){
return Arrays.stream(repo)
.filter( trade -> trade.account.equals(account))
.map(trade -> trade.amount)
.reduce(0.0, (a, b) -> a+b);
}
public double sumByAccountFor(String account){
double res = 0;
for (int i = 0; i < repo.length; i++) {
TradeRef tradeRef = repo[i];
if (tradeRef.account.equals(account))
res = res + tradeRef.amount;
}
return res;
}
Unfortunately, streams are not (yet) working with Inline Classes:
Exception in thread "main" java.lang.ClassFormatError: Illegal class name "Qcom/ubertob/ministring/TradeMiniString;" in class file <Unknown>
at java.base/jdk.internal.misc.Unsafe.defineAnonymousClass0(Native Method)
at java.base/jdk.internal.misc.Unsafe.defineAnonymousClass(Unsafe.java:1345)
at java.base/java.lang.invoke.InnerClassLambdaMetafactory.spinInnerClass(InnerClassLambdaMetafactory.java:324)
at java.base/java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(InnerClassLambdaMetafactory.java:192)
at java.base/java.lang.invoke.LambdaMetafactory.metafactory(LambdaMetafactory.java:329)
at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:127)
at java.base/java.lang.invoke.CallSite.makeSite(CallSite.java:307)
at java.base/java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:259)
at java.base/java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:249)
at com.ubertob.ministring.MiniStringTradeBrowser.sumByAccountStream(Unknown Source)
So for the moment we can only measure performance using for loops.
Fancy Graphs (finally)
The actual numbers are not very important here, so I just grabbed a significative sample running the application on my Linux laptop.
Rather than the numbers, we are interested in how fast they are relative to each other.
We can see here how in this case the improvement is really huge: inline TradeMiniString is more than 20x faster!
Event compared with the TradeRefEncoded is still 6x faster!
This seems almost too good to be true, so we can use a better method to measure the actual performance.
Brendan Gregg wrote some very useful tools to use Linux kernel perf tool to profile Java programs.
There are several blog posts on how to produce and how to read flame graphs:
https://medium.com/netflix-techblog/java-in-flames-e763b3d32166
And here are our flames:
You can find the actual svg file in the GitHub repository here.
We can see four green blocks (including the one with the arrow) that are a representation of how much time these four methods used the CPU. In this graph, the green blocks are Java calls, while the red ones are kernel system calls.
The spikes are other methods, related to the print out of results and garbage collection. We can ignore them.
From the left, the first block is the InlineTrade, using normal strings. It is clearly smaller than the last two blocks which represent the TradeRef and TradeRefEncoded.
As you may have guessed, the block pointed by the arrow is the TradeMiniString duration, it is so brief that you can read only a few characters of the name.
The relative duration of each method from the perf profile is reported in this graph:
The relative timings are similar, Inline is 6 times faster than Ref and Inline Encoded is 5.5 times faster than Ref Encoded.
Conclusions
It is still too early for precise measurements, but it is already clear that Valhalla inline types have the potential to bring a massive performance boost for a certain type of critical applications.
There are still many rough edges and painful decisions to make but the overall shape of the Valhalla project is very exciting.
Since the last Early-Access build there have already been many improvements in the source code, let’s hope to have another stable build to test soon with more features.
resources to learn more
The full code of this and other examples:
https://github.com/uberto/testValueTypes
A Recent video about Valhalla at Devoxx BE
https://devoxx.be/talk/?id=41660
Brian Goetz on the State of Valhalla
http://cr.openjdk.java.net/~briangoetz/valhalla/sov/02-object-model.html
An in-depth explanation of Inline Types from Ben Evans:
https://www.infoq.com/articles/inline-classes-java
My post on the previous early-access build:
https://medium.com/@ramtop/a-taste-of-value-types-1a8a136fcfe2