Grails Application with MySQL container

With the container ecosystem evolving of the past years, developers start to become aware of a solution that can enhance their productivity and deployment time. I will be keeping the instructions as simple and detailed as possible. For people completely new to Docker, I recommend this video:

For more in depth understanding and commands available, you can take a look a the official documentation. I also recommend this short read to understand the differences between containers and VMs.

Let’s say you’re developing feature X that requires data  from a relational database to be kept up to date for you to perform your development and manual testing. This might mean that you probably need some kind of mechanism to refresh the data, e.g. some kind of synch job where you can get your most up-to-date data from.

Depending on the size of your data, it can range from a few seconds to several minutes. What if you add new columns or you want to change column types?

For this tutorial you will need:

Java 8

Grails 3 (you can easily install it via http://sdkman.io/)

Docker (follow the steps at https://docs.docker.com/v1.8/installation/ for your OS)

MySQL client

Tutorial Project: DockerTutorial

I am performing this tutorial on a Macbook with OS X Yosemite. When the steps differ from Linux installations, I will mention steps for both.

Start  Docker Quickstart Terminal and make sure docker works properly:

docker -version

You should see something along the lines of

Docker version

An IDE is not necessary but if you want to use one, you can just import it as a Gradle project.

Now, we need to download the MySQL docker image for MacOS. Due to a current bug in the official image, we have to use a different image. The issue is only for MacOS and can be followed here.

docker pull dgraziotin/mysql – For MacOS

docker pull mysql:5.6 – For Linux

Before we can do anything with our image, make sure that (if we have any instance of mysql server running locally) we shutdown our running mysql server.

Now, we can start our mysql image.

docker-run

The above commands starts our container with name docker-db in the terminal window,  where we can monitor the container logs. This creates an admin user with a random generated password, outputted in the logs. You should see a message similar to the following:

db admin user

To verify that our container is working properly, let’s try connecting to it via our mysql client:

mysql login

Where host should be 0.0.0.0 for Linux.

For MacOS, it is slightly different.  Docker in MacOS makes use of a virtual machine. When you start docker with the Quickstart Terminal, this is actually creating a virtual machine with the Docker daemon running inside it.  We need to run the following command to get our desired IP by running docker-machine ls. This results in an output similar to the following:

Let’s create our database:

 create db

Let’s keep the mysql window open as we’ll come back to it later on.

For our Grails application, we will need to change the database driver for it to connect properly to our MySQL container. Navigate to the grails-app/conf folder and open application.yml. Change host, port and password to the values you used above.

app config

Ok, now that our app is configured properly and our container is running, let’s take it for a spin. Navigate to the root folder where you extracted the project files and run grails run-app.

When the application starts, let’s open it via http://localhost:8080 and we will presented with the Grails scaffolding with a couple of controllers.

When refreshing your local environment, you need to have a mechanism that will populate your database with the new data. This can be a set of scripts and/or database dumps. Insert and update statements can take quite a bit of time depending on the database size and the amount of data required. One alternative, could be to simply get the dump we need and then sync that dump with a database container.

One could argue: “What would be the different from simply using a local MySQL instance?”

What if you’d need to keep your current database state? Let’s say you found a bug or some other kind of issue that can be replicated with that specific set of data. With Docker, you can just spin another container and mount another folder while keeping the one you need.

Or if you have non-developers in team, e.g. designers. Wouldn’t it make their life easier if they just add to run a simple script that starts the container, so they can do their work on an up-to-date database?

Another bonus point is that you do not need to have anything at all related to MySQL server installed on your machine.

How do we achieve this behaviour? We can achieve it through a data volume. The approach that I am going to show will involve mounting a volume in our machine with one inside the container.  First, let’s create our data folder, e.g. /tmp/docker-data

Next, let’s add our data files into that folder. I’ve populated a small sample database for the purpose of the tutorial:

Now, we will execute the container again, but this time with a data volume.

docker run -t -i -p 3306:3306 -v /tmp/docker-db-data:/mysql –name docker-db dgraziotin/mysql

The v flag specifies the mapping from our host to the container. When running that container with those flags, we are saying I want to run a MySQL container named docker-db that will store the database data in my /tmp/docker-db-data folder.

Let’s jump back to our application folder and execute grails run-app. If you navigate to one of the two scaffolding pages, you will see some Accounts and Transactions.

This concludes the introduction to this series where we demonstrated how to incorporate a sample application with a container containing a RDS database.

Reactive Development Using Vert.x

Lately, it seems like we’re hearing about the latest and greatest frameworks for Java. Tools like Ninja, SparkJava, and Play; but each one is opinionated and make you feel like you need to redesign your entire application to make use of their wonderful features. That’s why I was so relieved when I discovered Vert.x. Vert.x isn’t a framework, it’s a toolkit and it’s un-opinionated and it’s liberating. Vert.x doesn’t want you to redesign your entire application to make use of it, it just wants to make your life easier. Can you write your entire application in Vert.x? Sure! Can you add Vert.x capabilities to your existing Spring/Guice/CDI applications? Yep! Can you use Vert.x inside of your existing JavaEE applications? Absolutely! And that’s what makes it amazing.

Background

Vert.x was born when Tim Fox decided that he liked a lot of what was being developed in the NodeJS ecosystem, but he didn’t like some of the trade-offs of working in V8: Single-threadedness, limited library support, and JavaScript itself. Tim set out to write a toolkit which was unopinionated about how and where it is used, and he decided that the best place to implement it was on the JVM. So, Tim and the community set out to create an event-driven, non-blocking, reactive toolkit which in many ways mirrored what could be done in NodeJS, but also took advantage of the power available inside of the JVM. Node.x was born and it later progressed to become Vert.x.

Overview

Vert.x is designed to implement an event bus which is how different parts of the application can communicate in a non-blocking/thread safe manner. Parts of it were modeled after the Actor methodology exhibited by Eralng and Akka. It is also designed to take full advantage of today’s multi-core processors and highly concurrent programming demands. As such, by default, all Vert.x VERTICLES are implemented as single-threaded by default. Unlike NodeJS though, Vert.x can run MANY verticles in MANY threads. Additionally, you can specify that some verticles are “worker” verticles and CAN be multi-threaded. And to really add some icing on the cake, Vert.x has low level support for multi-node clustering of the event bus via the use of Hazelcast. It has gone on to include many other amazing features which are too numerous to list here, but you can read more in the official Vert.x docs.

The first thing you need to know about Vert.x is, similar to NodeJS, never block the current thread. Everything in Vert.x is set up, by default, to use callbacks/futures/promises. Instead of doing synchronous operations, Vert.x provides async methods for doing most I/O and processor intensive operations which might block the current thread. Now, callbacks can be ugly and painful to work with, so Vert.x optionally provides an API based on RxJava which implements the same functionality using the Observer pattern. Finally, Vert.x makes it easy to use your existing classes and methods by providing the executeBlocking(Function f) method on many of it’s asynchronous APIs. This means you can choose how you prefer to work with Vert.x instead of the toolkit dictating to you how it must be used.

The second thing to know about Vert.x is that it composed of verticles, modules, and nodes. Verticles are the smallest unit of logic in Vert.x, and are usually represented by a single class. Verticles should be simple and single-purpose following the UNIX Philosophy. A group of verticles can be put together into a module, which is usually packaged as a single JAR file. A module represents a group of related functionality which when taken together could represent an entire application or just a portion of a larger distributed application. Lastly, nodes are single instances of the JVM which are running one or more modules/verticles. Because Vert.x has clustering built-in from the ground up, Vert.x applications can span nodes either on a single machine or across multiple machines in multiple geographic locations (though latency can hider performance).

Example Project

Now, I’ve been to a number of Meetups and conferences lately where the first thing they show you when talking about reactive programming is to build a chat room application. That’s all well and good, but it doesn’t really help you to completely understand the power of reactive development. Chat room apps are simple and simplistic. We can do better. In this tutorial, we’re going to take a legacy Spring application and convert it to take advantage of Vert.x. This has multiple purposes: It shows that the toolkit is easy to integrate with existing Java projects, it allows us to take advantage of existing tools which may be entrenched parts of our ecosystem, and it also lets us follow the DRY principle in that we don’t have to rewrite large swathes of code to get the benefits of Vert.x.

Our legacy Spring application is a contrived simple example of a REST API using Spring Boot, Spring Data JPA, and Spring REST. The source code can be found in the “master” branch HERE. There are other branches which we will use to demonstrate the progression as we go, so it should be simple for anyone with a little experience with git and Java 8 to follow along. Let’s start by examining the Spring Configuration class for the stock Spring application.


@SpringBootApplication
@EnableJpaRepositories
@EnableTransactionManagement
@Slf4j
public class Application {
    public static void main(String[] args) {
        ApplicationContext ctx = SpringApplication.run(Application.class, args);

        System.out.println("Let's inspect the beans provided by Spring Boot:");

        String[] beanNames = ctx.getBeanDefinitionNames();
        Arrays.sort(beanNames);
        for (String beanName : beanNames) {
            System.out.println(beanName);
        }
    }

    @Bean
    public DataSource dataSource() {
        EmbeddedDatabaseBuilder builder = new EmbeddedDatabaseBuilder();
        return builder.setType(EmbeddedDatabaseType.HSQL).build();
    }

    @Bean
    public EntityManagerFactory entityManagerFactory() {
        HibernateJpaVendorAdapter vendorAdapter = new HibernateJpaVendorAdapter();
        vendorAdapter.setGenerateDdl(true);

        LocalContainerEntityManagerFactoryBean factory = new LocalContainerEntityManagerFactoryBean();
        factory.setJpaVendorAdapter(vendorAdapter);
        factory.setPackagesToScan("com.zanclus.data.entities");
        factory.setDataSource(dataSource());
        factory.afterPropertiesSet();

        return factory.getObject();
    }

    @Bean
    public PlatformTransactionManager transactionManager(final EntityManagerFactory emf) {
        final JpaTransactionManager txManager = new JpaTransactionManager();
        txManager.setEntityManagerFactory(emf);
        return txManager;
    }
}

As you can see at the top of the class, we have some pretty standard Spring Boot annotations. You’ll also see an @Slf4j annotation which is part of the lombok library, which is designed to help reduce boiler-plate code. We also have @Bean annotated methods for providing access to the JPA EntityManager, the TransactionManager, and DataSource. Each of these items provide injectable objects for the other classes to use. The remaining classes in the project are similarly simplistic. There is a Customer POJO which is the Entity type used in the service. There is a CustomerDAO which is created via Spring Data. Finally, there is a CustomerEndpoints class which is the JAX-RS annotated REST controller.

As explained earlier, this is all standard fare in a Spring Boot application. The problem with this application is that for the most part, it has limited scalability. You would either run this application inside of a Servlet container, or with an embedded server like Jetty or Undertow. Either way, each requests ties up a thread and is thus wasting resources when it waits for I/O operations.

Switching over to the Convert-To-Vert.x-Web branch, we can see that the Application class has changed a little. We now have some new @Bean annotated methods for injecting the Vertx instance itself, as well as an instance of ObjectMapper (part of the Jackson JSON library). We have also replaced the CustomerEnpoints class with a new CustomerVerticle. Pretty much everything else is the same.

The CustomerVerticle class is annotated with @Component, which means that Spring will instantiate that class on startup. It also has it’s start method annotated with @PostConstruct so that the Verticle is launched on startup. Looking at the actual content of the code, we see our first bits of Vert.x code: Router.

The Router class is part of the vertx-web library and allows us to use a fluent API to define HTTP URLs, methods, and header filters for our request handling. Adding the BodyHandler instance to the default route allows a POST/PUT body to be processed and converted to a JSON object which Vert.x can then process as part of the RoutingContext. The order of routes in Vert.x CAN be significant. If you define a route which has some sort of glob matching (* or regex), it can swallow requests for routes defined after it unless you implement chaining. Our example shows 3 routes initially.


    @PostConstruct
    public void start() throws Exception {
        Router router = Router.router(vertx);
        router.route().handler(BodyHandler.create());
        router.get("/v1/customer/:id")
                .produces("application/json")
                .blockingHandler(this::getCustomerById);
        router.put("/v1/customer")
                .consumes("application/json")
                .produces("application/json")
                .blockingHandler(this::addCustomer);
        router.get("/v1/customer")
                .produces("application/json")
                .blockingHandler(this::getAllCustomers);
        vertx.createHttpServer().requestHandler(router::accept).listen(8080);
    }

Notice that the HTTP method is defined, the “Accept” header is defined (via consumes), and the “Content-Type” header is defined (via produces). We also see that we are passing the handling of the request off via a call to the blockingHandler method. A blocking handler for a Vert.x route accepts a RoutingContext object as it’s only parameter. The RoutingContext holds the Vert.x Request object, Response object, and any parameters/POST body data (like “:id”). You’ll also see that I used method references rather than lambdas to insert the logic into the blockingHandler (I find it more readable). Each handler for the 3 request routes is defined in a separate method further down in the class. These methods basically just call the methods on the DAO, serialize or deserialize as needed, set some response headers, and end() the request by sending a response. Overall, pretty simple and straightforward.


    private void addCustomer(RoutingContext rc) {
        try {
            String body = rc.getBodyAsString();
            Customer customer = mapper.readValue(body, Customer.class);
            Customer saved = dao.save(customer);
            if (saved!=null) {
                rc.response().setStatusMessage("Accepted").setStatusCode(202).end(mapper.writeValueAsString(saved));
            } else {
                rc.response().setStatusMessage("Bad Request").setStatusCode(400).end("Bad Request");
            }
        } catch (IOException e) {
            rc.response().setStatusMessage("Server Error").setStatusCode(500).end("Server Error");
            log.error("Server error", e);
        }
    }

    private void getCustomerById(RoutingContext rc) {
        log.info("Request for single customer");
        Long id = Long.parseLong(rc.request().getParam("id"));
        try {
            Customer customer = dao.findOne(id);
            if (customer==null) {
                rc.response().setStatusMessage("Not Found").setStatusCode(404).end("Not Found");
            } else {
                rc.response().setStatusMessage("OK").setStatusCode(200).end(mapper.writeValueAsString(dao.findOne(id)));
            }
        } catch (JsonProcessingException jpe) {
            rc.response().setStatusMessage("Server Error").setStatusCode(500).end("Server Error");
            log.error("Server error", jpe);
        }
    }

    private void getAllCustomers(RoutingContext rc) {
        log.info("Request for all customers");
        List customers = StreamSupport.stream(dao.findAll().spliterator(), false).collect(Collectors.toList());
        try {
            rc.response().setStatusMessage("OK").setStatusCode(200).end(mapper.writeValueAsString(customers));
        } catch (JsonProcessingException jpe) {
            rc.response().setStatusMessage("Server Error").setStatusCode(500).end("Server Error");
            log.error("Server error", jpe);
        }
    }

“But this is more code and messier than my Spring annotations and classes”, you might say. That CAN be true, but it really depends on how you implement the code. This is meant to be an introductory example, so I left the code very simple and easy to follow. I COULD use an annotation library for Vert.x to implement the endpoints in a manner similar to JAX-RS. In addition, we have gained a massive scalability improvement. Under the hood, Vert.x Web uses Netty for low-level asynchronous I/O operations, thus providing us the ability to handle MANY more concurrent requests (limited by the size of the database connection pool).

We’ve already made some improvement to the scalability and concurrency of this application by using the Vert.x Web library, but we can improve things a little more by implementing the Vert.x EventBus. By separating the database operations into Worker Verticles instead of using blockingHandler, we can handle request processing more efficiently. This is show in the Convert-To-Worker-Verticles branch. The application class has remained the same, but we have changed the CustomerEndpoints class and added a new class called CustomerWorker. In addition, we added a new library called Spring Vert.x Extension which provides Spring Dependency Injections support to Vert.x Verticles. Start off by looking at the new CustomerEndpoints class.


    @PostConstruct
    public void start() throws Exception {
        log.info("Successfully create CustomerVerticle");
        DeploymentOptions deployOpts = new DeploymentOptions().setWorker(true).setMultiThreaded(true).setInstances(4);
        vertx.deployVerticle("java-spring:com.zanclus.verticles.CustomerWorker", deployOpts, res -> {
            if (res.succeeded()) {
                Router router = Router.router(vertx);
                router.route().handler(BodyHandler.create());
                final DeliveryOptions opts = new DeliveryOptions()
                        .setSendTimeout(2000);
                router.get("/v1/customer/:id")
                        .produces("application/json")
                        .handler(rc -> {
                            opts.addHeader("method", "getCustomer")
                                    .addHeader("id", rc.request().getParam("id"));
                            vertx.eventBus().send("com.zanclus.customer", null, opts, reply -> handleReply(reply, rc));
                        });
                router.put("/v1/customer")
                        .consumes("application/json")
                        .produces("application/json")
                        .handler(rc -> {
                            opts.addHeader("method", "addCustomer");
                            vertx.eventBus().send("com.zanclus.customer", rc.getBodyAsJson(), opts, reply -> handleReply(reply, rc));
                        });
                router.get("/v1/customer")
                        .produces("application/json")
                        .handler(rc -> {
                            opts.addHeader("method", "getAllCustomers");
                            vertx.eventBus().send("com.zanclus.customer", null, opts, reply -> handleReply(reply, rc));
                        });
                vertx.createHttpServer().requestHandler(router::accept).listen(8080);
            } else {
                log.error("Failed to deploy worker verticles.", res.cause());
            }
        });
    }

The routes are the same, but the implementation code is not. Instead of using calls to blockingHandler, we have now implemented proper async handlers which send out events on the event bus. None of the database processing is happening in this Verticle anymore. We have moved the database processing to a Worker Verticle which has multiple instances to handle multiple requests in parallel in a thread-safe manner. We are also registering a callback for when those events are replied to so that we can send the appropriate response to the client making the request. Now, in the CustomerWorker Verticle we have implemented the database logic and error handling.

@Override
public void start() throws Exception {
    vertx.eventBus().consumer("com.zanclus.customer").handler(this::handleDatabaseRequest);
}

public void handleDatabaseRequest(Message<Object> msg) {
    String method = msg.headers().get("method");

    DeliveryOptions opts = new DeliveryOptions();
    try {
        String retVal;
        switch (method) {
            case "getAllCustomers":
                retVal = mapper.writeValueAsString(dao.findAll());
                msg.reply(retVal, opts);
                break;
            case "getCustomer":
                Long id = Long.parseLong(msg.headers().get("id"));
                retVal = mapper.writeValueAsString(dao.findOne(id));
                msg.reply(retVal);
                break;
            case "addCustomer":
                retVal = mapper.writeValueAsString(
                                    dao.save(
                                            mapper.readValue(
                                                    ((JsonObject)msg.body()).encode(), Customer.class)));
                msg.reply(retVal);
                break;
            default:
                log.error("Invalid method '" + method + "'");
                opts.addHeader("error", "Invalid method '" + method + "'");
                msg.fail(1, "Invalid method");
        }
    } catch (IOException | NullPointerException e) {
        log.error("Problem parsing JSON data.", e);
        msg.fail(2, e.getLocalizedMessage());
    }
}

The CustomerWorker worker verticles register a consumer for messages on the event bus. The string which represents the address on the event bus is arbitrary, but it is recommended to use a reverse-tld style naming structure so that it is simple to ensure that the addresses are unique (“com.zanclus.customer”). Whenever a new message is sent to that address, it will be delivered to one, and only one, of the worker verticles. The worker verticle then calls handleDatabaseRequest to do the database work, JSON serialization, and error handling.

There you have it. You’ve seen that Vert.x can be integrated into your legacy applications to improve concurrency and efficiency without having to rewrite the entire application. We could have done something similar with an existing Google Guice or JavaEE CDI application. All of the business logic could remain relatively untouched while we tried in Vert.x to add reactive capabilities. The next steps are up to you. Some ideas for where to go next include Clustering, WebSockets, and VertxRx for ReactiveX sugar.

Writing BDD tests with Cucumber JVM

Cucumber JVM as an excellent tool to write your BDD tests.In this article I would like to give an introduction to BDD with Cucumber JVM.

Let’s get started…

What is BDD?

problems

In a nutshell, BDD tries to solve the problem of “understanding requirements with examples”
bdd

BDD tools
There are lot of tools available for BDD and interestingly you can find quite a few vegetable names in the list 😉 Cucumber,Spinach, Lettuce, JBehave, Twist etc. Out of these Cucumber is simple and easy to use.

Cucumber-JVM
Cucumber is written in Ruby and Cucumber JVM is an implementation of cucumber for the popular JVM languages like Java, Scala, Groovy, Clojure etc

Cucumber Stack
stack
We write features and scenarios in a “Ubiquitous” Language and then implement them with the step definitions and support code.

Feature file and Gherkin
You first begin by writing a .feature file.A feature file conventionally starts with the Feature keyword followed by Scenario. Each scenario consists of multiple steps. Cucumber uses Gherkin for this. Gherkin is a Business Readable, Domain Specific Language that lets you describe software’s behavior without detailing how that behavior is implemented.
Example:

Feature: Placing bets       
 Scenario: Place a bet with cash balance         
 Given I have an account with cash balance of 100        
 When I place a bet of 10 on "SB_PRE_MATCH"      
 Then the bet should be placed successfully      
 And the remaining balance in my account should be 90

As you can see the feature file is more like a spoken language with gherkin keywords like Feature, Scenario, Given,When, Then,And ,But, #(for comments).

Step Definitions
Once you have finalized the feature file with different scenarios, the next stage is to give life to the scenarios by writing your step definitions. Cucumber uses regular expression to map the steps with the actual step definitions. Step definitions can be written in the JVM language of your choice. The keywords are ignored while mapping the step definitions.
So in reference to the above example feature we will have to write step definition for all the four steps. Use the IDE plugin to generate the stub for you.

import cucumber.api.java.en.And;        
import cucumber.api.java.en.Given;       
import cucumber.api.java.en.Then;        
import cucumber.api.java.en.When;        
public class PlaceBetStepDefs {      
 @Given("^I have an account with cash balance of (\\d+) $")      
 public void accountWithBalance(int balance) throws Throwable {      
 // Write code here that turns the phrase above into concrete actions        
 //throw new PendingException();         
 }       
 @When("^I place a bet of (\\d+) on \"(.*?)\"$")         
 public void placeBet(int stake, String product) throws Throwable {      
 // Write code here that turns the phrase above into concrete actions        
 // throw new PendingException();        
 }       
 @Then("^the bet should be placed successfully$")        
 public void theBetShouldBePlacedSuccessfully() throws Throwable {       
 // Write code here that turns the phrase above into concrete actions        
 //throw new PendingException();         
 }       
 @And("^the remaining balance in my account should be (\\d+)$")      
 public void assertRemainingBalance(int remaining) throws Throwable {        
 // Write code here that turns the phrase above into concrete actions        
 //throw new PendingException();         
 }       
}

Support Code
The next step is to back your step definitions with support code. You can for example do a REST call to execute the step or do a database call or use a web driver like selenium . It is entirely up to the implementation. Once you get the response you can assert it with the results you are expecting or map it to your domain objects.
For example you can you selenium web driver to simulate logging into a site

protected WebDriver driver;         
@Before("@startbrowser")         
public void setup() {        
 System.setProperty("webdriver.chrome.driver", "C:\\devel\\projects\\cucumberworkshop\\chromedriver.exe");      
 driver = new ChromeDriver();        
}        
@Given("^I open google$")        
public void I_open_google() throws Throwable {       
 driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);         
 driver.get("https://www.google.co.uk");         
}

Expressive Scenarios
Cucumber provides more options to organize your scenarios better.

  • Background– use this to define steps which are common to all scenarios
  • Data Tables– You can write the input data in table format
  • Scenario Outline-placeholder for your scenario which can be executed for a set of data called Example.
  • Tags and Sub Folders to organize your features-Tags are more like sticky notes for documentation.

Dependency Injection
More often than not you might have to pass the information created in one step to another. For example you create a domain object in your first step and then you need to use it in your second step. The clean way to achieve this is through Dependency Injection . Cucumber provides modules for the main DI containers like Spring, Guice, Pico etc.

Executing Cucumber
It is very easy to run Cucumber on IntelliJ IDE . It can be also integrated with your build system. You can also control the tests you want to run with different options.

Reporting Options
There are lot of plugins available for reporting . For example you could use the Master Thought plugin for the reports.

References
The Cucumber for Java book– This is an excellent book and this is all you need to get you started
Documentation
GitHub link
That’s all folks. Hope you liked it. Have a good Christmas! Enjoy.

How to write a java agent

For vmlens, a lightweight java race condition catcher, we are using a java agent to trace field accesses. Here are the lessons we learned implementing such an agent.

The Start

Create an agent class with a “static public static void premain(String args, Instrumentation inst)” method. Put this class into a jar file with a manifest pointing to the Agent class. The premain method will be called before the main method of the application.

Manifest-Version: 1.0
Ant-Version: Apache Ant 1.9.2
Created-By: 1.8.0_05-b13 (Oracle Corporation)
Built-By: Thomas Krieger
Implementation-Vendor: Anarsoft
Implementation-Title: VMLens Agent
Implementation-Version: 2.0.0.201511181111
Can-Retransform-Classes: true
Premain-Class: com.anarsoft.trace.agent.Agent
Boot-Class-Path: agent_bootstrap.jar

The MANIFEST.MF file from vmlens.

Class loader magic part 1

The agent class will be loaded by the system class loader. But we have to avoid version conflicts between the classes used by the agent and the application. Especially the frameworks used in the agent should not be visible to the application classes. So we use a dedicated URLClassLoader to load all other agent classes:

// remember the currently used classloader
ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
        
// Create and set a special URLClassLoader
URLClassLoader classloader = new URLClassLoader(urlList.toArray(new URL[]{}) , null );
Thread.currentThread().setContextClassLoader(classloader);
    
// Load and execute the agent
String agentName = "com.anarsoft.trace.agent.runtime.AgentRuntimeImpl";
AgentRuntime agentRuntime  =  (AgentRuntime) classloader.loadClass(agentName).newInstance();
    
// reset the classloader
Thread.currentThread().setContextClassLoader(contextClassLoader);

Class loader magic part 2

Now we use asm to add our static callbacks methods when a field is accessed. To make sure that the classes are visible in every other class, they have to be loaded by the bootstrap classloader. To do this they have to be in a java package and the jar containing them have to be in the boot class path.

package java.anarsoft.trace.agent.bootstrap.callback;

public class FieldAccessCallback {

public static  void getStaticField(int field,int methodId) {
 }

}

A callback class from vmlens. It has to be in the java package namespace to be visible in all classes.

Boot-Class-Path: agent_bootstrap.jar

The boot class path entry in the MANIFEST.MF file from vmlens.

VMLens, a lightweight java race condition catcher, is built as a java agent. We know, writing java agents can be a tricky business. So, if you have any questions, just ask them in a comment below.

Functional Data Structures in Java 8 with Javaslang

Java 8’s lambdas (λ) empower us to create wonderful API’s. They incredibly increase the expressiveness of the language.

Javaslang leveraged lambdas to create various new features based on functional patterns. One of them is a functional collection library that is intended to be a replacement for Java’s standard collections.

Javaslang Collections

(This is just a bird’s view, you will find a human-readable version below.)

Functional Programming

Before we deep-dive into the details about the data structures I want to talk about some basics. This will make it clear why I created Javaslang and specifically new Java collections.

Side-Effects

Java applications are typically plentiful of side-effects. They mutate some sort of state, maybe the outer world. Common side effects are changing objects or variables in place, printing to the console, writing to a log file or to a database. Side-effects are considered harmful if they affect the semantics of our program in an undesirable way.

For example, if a function throws an exception and this exception is interpreted, it is considered as side-effect that affects our program. Furthermore exceptions are like non-local goto-statements. They break the normal control-flow. However, real-world applications do perform side-effects.

int divide(int dividend, int divisor) {
    // throws if divisor is zero
    return dividend / divisor;
}

In a functional setting we are in the favorable situation to encapsulate the side-effect in a Try:

// = Success(result) or Failure(exception)
Try<Integer> divide(Integer dividend, Integer divisor) {
    return Try.of(() -> dividend / divisor);
}

This version of divide does not throw any more. We made the possible failure explicit by using the type Try.

Mario Fusco on Functional Programming

Referential Transparency

A function, or more general an expression, is called referential transparent if a call can be replaced by its value without affecting the behavior of the program. Simply spoken, given the same input the output is always the same.

// not referential transparent
Math.random();

// referential transparent
Math.max(1, 2);

A function is called pure if all expressions involved are referential transparent. An application composed of pure functions will most probably just work if it compiles. We are able to reason about it. Unit tests are easy to write and debugging becomes a relict of the past.

Thinking in Values

Rich Hickey, the creator of Clojure, gave a great talk about The Value of Values. The most interesting values are immutable values. The main reason is that immutable values

  • are inherently thread-safe and hence do not need to be synchronized
  • are stable regarding equals and hashCode and thus are reliable hash keys
  • do not need to be cloned
  • behave type-safe when used in unchecked covariant casts (Java-specific)

The key to a better Java is to use immutable values paired with referential transparent functions.

Javaslang provides the necessary controls and collections to accomplish this goal in every-day Java programming.

Data Structures in a Nutshell

Javaslang’s collection library comprises of a rich set of functional data structures built on top of lambdas. The only interface they share with Java’s original collections is Iterable. The main reason is that the mutator methods of Java’s collection interfaces do not return an object of the underlying collection type.

We will see why this is so essential by taking a look at the different types of data structures.

Mutable Data Structures

Java is an object-oriented programming language. We encapsulate state in objects to achieve data hiding and provide mutator methods to control the state. The Java collections framework (JCF) is built upon this idea.

interface Collection<E> {
    // removes all elements from this collection
    void clear();
}

Today I comprehend a void return type as a smell. It is evidence that side-effects take place, state is mutated. Shared mutable state is an important source of failure, not only in a concurrent setting.

Immutable Data Structures

Immutable data structures cannot be modified after their creation. In the context of Java they are widely used in the form of collection wrappers.

List<String> list = Collections.unmodifiableList(otherList);

// Boom!
list.add("why not?");

There are various libraries that provide us with similar utility methods. The result is always an unmodifiable view of the specific collection. Typically it will throw at runtime when we call a mutator method.

Persistent Data Structures

A persistent data structure does preserve the previous version of itself when being modified and is therefore effectively immutable. Fully persistent data structures allow both updates and queries on any version.

Many operations perform only small changes. Just copying the previous version wouldn’t be efficient. To save time and memory, it is crucial to identify similarities between two versions and share as much data as possible.

This model does not impose any implementation details. Here come functional data structures into play.

Functional Data Structures

Also known as purely functional data structures, these are immutable and persistent. The methods of functional data structures are referential transparent.

Javaslang features a wide range of the most-commonly used functional data structures. The following examples are explained in-depth.

Linked List

One of the most popular and also simplest functional data structures is the (singly) linked List. It has a head element and a tail List. A linked List behaves like a Stack which follows the last in, first out (LIFO) method.

In Javaslang we instantiate a List like this:

// = List(1, 2, 3)
List<Integer> list1 = List.of(1, 2, 3);

Each of the List elements forms a separate List node. The tail of the last element is Nil, the empty List.

List 1

This enables us to share elements across different versions of the List.

// = List(0, 2, 3)
List<Integer> list2 = list1.tail().prepend(0);

The new head element 0 is linked to the tail of the original List. The original List remains unmodified.

List 2

These operations take place in constant time, in other words they are independent of the List size. Most of the other operations take linear time. In Javaslang this is expressed by the interface LinearSeq, which we may already know from Scala.

If we need data structures that are queryable in constant time, Javaslang offers Array and Vector. Both have random access capabilities.

The Array type is backed by a Java array of objects. Insert and remove operations take linear time. Vector is in-between Array and List. It performs well in both areas, random access and modification.

In fact the linked List can also be used to implement a Queue data structure.

Queue

A very efficient functional Queue can be implemented based on two linked Lists. The front List holds the elements that are dequeued, the rear List holds the elements that are enqueued. Both operations enqueue and dequeue perform in O(1).

Queue<Integer> queue = Queue.of(1, 2, 3)
                            .enqueue(4)
                            .enqueue(5);

The initial Queue is created of three elements. Two elements are enqueued on the rear List.

Queue 1

If the front List runs out of elements when dequeueing, the rear List is reversed and becomes the new front List.

Queue 2

When dequeueing an element we get a pair of the first element and the remaining Queue. It is necessary to return the new version of the Queue because functional data structures are immutable and persistent. The original Queue is not affected.

Queue<Integer> queue = Queue.of(1, 2, 3);

// = (1, Queue(2, 3))
Tuple2<Integer, Queue<Integer>> dequeued =
        queue.dequeue();

What happens when the Queue is empty? Then dequeue() will throw a NoSuchElementException. To do it the functional way we would rather expect an optional result.

// = Some((1, Queue()))
Queue.of(1).dequeueOption();

// = None
Queue.empty().dequeueOption();

An optional result may be further processed, regardless if it is empty or not.

// = Queue(1)
Queue<Integer> queue = Queue.of(1);

// = Some((1, Queue()))
Option<Tuple2<Integer, Queue<Integer>>>
        dequeued = queue.dequeueOption();

// = Some(1)
Option<Integer> element =
        dequeued.map(Tuple2::_1);

// = Some(Queue())
Option<Queue<Integer>> remaining =
        dequeued.map(Tuple2::_2);

Sorted Set

Sorted Sets are data structures that are more frequently used than Queues. We use binary search trees to model them in a functional way. These trees consist of nodes with up to two children and values at each node.

We build binary search trees in the presence of an ordering, represented by an element Comparator. All values of the left subtree of any given node are strictly less than the value of the given node. All values of the right subtree are strictly greater.

// = TreeSet(1, 2, 3, 4, 6, 7, 8)
SortedSet<Integer> xs =
        TreeSet.of(6, 1, 3, 2, 4, 7, 8);

Binary Tree 1

Searches on such trees run in O(log n) time. We start the search at the root and decide if we found the element. Because of the total ordering of the values we know where to search next, in the left or in the right branch of the current tree.

// = TreeSet(1, 2, 3);
SortedSet<Integer> set = TreeSet.of(2, 3, 1, 2);

// = TreeSet(3, 2, 1);
Comparator<Integer> c = (a, b) -> b - a;
SortedSet<Integer> reversed =
        TreeSet.of(c, 2, 3, 1, 2);

Most tree operations are inherently recursive. The insert function behaves similar to the search function. When the end of a search path is reached, a new node is created and the whole path is reconstructed up to the root. Existing child nodes are referenced whenever possible. Hence the insert operation takes O(log n) time and space.

// = TreeSet(1, 2, 3, 4, 5, 6, 7, 8)
SortedSet<Integer> ys = xs.add(5);

Binary Tree 2

In order to maintain the performance characteristics of a binary search tree it needs to be kept balanced. All paths from the root to a leaf need to have roughly the same length.

In Javaslang we implemented a binary search tree based on a Red/Black Tree. It uses a specific coloring strategy to keep the tree balanced on inserts and deletes. To read more about this topic please refer to the book Purely Functional Data Structures by Chris Okasaki.

State of the Collections

Generally we are observing a convergence of programming languages. Good features make it, other disappear. But Java is different, it is bound forever to be backward compatible. That is a strength but also slows down evolution.

Lambda brought Java and Scala closer together, yet they are still so different. Martin Odersky, the creator of Scala, recently mentioned in his BDSBTB 2015 keynote the state of the Java 8 collections.

He described Java’s Stream as a fancy form of an Iterator. The Java 8 Stream API is an example of a lifted collection. What it does is to define a computation and link it to a specific collection in another excplicit step.

// i + 1
i.prepareForAddition()
 .add(1)
 .mapBackToInteger(Mappers.toInteger())

This is how the new Java 8 Stream API works. It is a computational layer above the well known Java collections.

// = ["1", "2", "3"] in Java 8
Arrays.asList(1, 2, 3)
      .stream()
      .map(Object::toString)
      .collect(Collectors.toList())

Javaslang is greatly inspired by Scala. This is how the above example should have been in Java 8.

// = Stream("1", "2", "3") in Javaslang
Stream.of(1, 2, 3).map(Object::toString)

Within the last year we put much effort into implementing the Javaslang collection library. It comprises the most widely used collection types.

Seq

We started our journey by implementing sequential types. We already described the linked List above. Stream, a lazy linked List, followed. It allows us to process possibly infinite long sequences of elements.

Seq

All collections are Iterable and hence could be used in enhanced for-statements.

for (String s : List.of("Java", "Advent")) {
    // side effects and mutation
}

We could accomplish the same by internalizing the loop and injecting the behavior using a lambda.

List.of("Java", "Advent").forEach(s -> {
    // side effects and mutation
});

Anyway, as we previously saw we prefer expressions that return a value over statements that return nothing. By looking at a simple example, soon we will recognize that statements add noise and divide what belongs together.

String join(String... words) {
    StringBuilder builder = new StringBuilder();
    for(String s : words) {
        if (builder.length() > 0) {
            builder.append(", ");
        }
        builder.append(s);
    }
    return builder.toString();
}

The Javaslang collections provide us with many functions to operate on the underlying elements. This allows us to express things in a very concise way.

String join(String... words) {
    return List.of(words)
               .intersperse(", ")
               .fold("", String::concat);
}

Most goals can be accomplished in various ways using Javaslang. Here we reduced the whole method body to fluent function calls on a List instance. We could even remove the whole method and directly use our List to obtain the computation result.

List.of(words).mkString(", ");

In a real world application we are now able to drastically reduce the number of lines of code and hence lower the risk of bugs.

Set and Map

Sequences are great. But to be complete, a collection library also needs different types of Sets and Maps.

Set and Map

We described how to model sorted Sets with binary tree structures. A sorted Map is nothing else than a sorted Set containing key-value pairs and having an ordering for the keys.

The HashMap implementation is backed by a Hash Array Mapped Trie (HAMT). Accordingly the HashSet is backed by a HAMT containing key-key pairs.

Our Map does not have a special Entry type to represent key-value pairs. Instead we use Tuple2 which is already part of Javaslang. The fields of a Tuple are enumerated.

// = (1, "A")
Tuple2<Integer, String> entry = Tuple.of(1, "A");

Integer key = entry._1;
String value = entry._2;

Maps and Tuples are used throughout Javaslang. Tuples are inevitable to handle multi-valued return types in a general way.

// = HashMap((0, List(2, 4)), (1, List(1, 3)))
List.of(1, 2, 3, 4).groupBy(i -> i % 2);

// = List((a, 0), (b, 1), (c, 2))
List.of('a', 'b', 'c').zipWithIndex();

At Javaslang, we explore and test our library by implementing the 99 Euler Problems. It is a great proof of concept. Please don’t hesitate to send pull requests.

Hands On!

I really hope this article has sparked your interest in Javaslang. Even if you do use Java 7 (or below) at work, as I do, it is possible to follow the idea of functional programming. It will be of great good!

Please make sure Javaslang is part of your toolbelt in 2016.

Happy hacking!

PS: question ? @_Javaslang or Gitter chat

The Java/JVM Advent has started!

The Java Advent has started (our first article of the year)! Enjoy 24 articles about a JVM related topic – one each day – written by the enthusiastic team at Java Advent HQ! Thank you for writing and thank you for reading.

To make sure that you don’t miss any of the posts, subscribe:

Also, if you like what you’re reading, be sure to share it with others who might be interested (be it on the social networks or in real-life).

Cheers!

Native Speed File Backed Large Data Storage In ‘Pure’ Java

Motivation:

All this started with the realisation that I could not afford a big enough computer. Audio processing requires huge amounts of memory. Audacity, an amazing free audio processor, manages this using a file backed storage system. This is a common approach for such issues where we store a huge amount of information and want random access to it. So, I wanted to develop a system for Sonic Field (my pet audio processing/synthesis project) which gave the same powerful disk based memory approach but in pure Java.

I got this to work late last year and discussed it (briefly) in the Java Advent Calendar (http://www.javaadvent.com/2014/12/a-serpentine-path-to-music.html) overview of Sonic Field. Disk based memory allows Sonic Field to process audio systems which require huge amounts of memory on my humble 16 gigabyte laptop. For example, this recent piece took over 50 gigabytes of memory to create:

Whilst this was a breakthrough, it was also inefficient. Memory intensive operations like mixing were a bottle neck in this system. Here I turn Java into a memory power house by implementing the same system but very much more efficiently. I suspect I am getting near the limit at which Java is no longer at a performance disadvantage to C++.

Last year I gave a high level overview of the method; this year I am deep diving to implementation of performance details. In so doing I will explain how we can remove the overhead of traditional Java memory access techniques and then expand the ideas for a more general approach to sharing and persisting large memory systems in JVM programming.

What Is Segmented Storage?

I admit there are a lot of concepts here. The first one to get our heads around is just how inefficient normal memory management of large memory systems is in Java. Let me be very clear indeed, I am not talking about garbage collection. Years of experience with both Java and C++ has taught me that neither collected nor explicit heap management is efficient or easy to get right. I am not discussing this at all. The issues with the JVM’s management of large memory systems are because of its bounds checking and object model. This is thrown into sharp focus when working with memory pools.

As latency or throughput performance become more critical than memory use there comes a point where one has to break out memory pools. Rather than a memory system which mixes everything together in one great glorious heap, we have pools of same sized objects. This requires more memory than a pure heap if the pool is not fully used or if the elements being mapped into pool chunks are smaller than the chunks themselves. However, pools are very fast indeed to manage.

In this post I am going to discuss pool backed segmented storage. Segmented storage is based on a pool but allows allocation of larger storage containers than a single pool chunk. The idea is that a storage container (say 1 gigabyte) can be made up of a selection of chunks (say 1 megabyte each). The segmented storage region is not necessarily made up of contiguous chunks. Indeed, this is its most important feature. It is made up of equal sized chunks from a backing pool but the chunks are scatters across virtual address space and might not even be in order. With this we have something with the request and release efficiency of a pool but but closer to the memory use efficiency of a heap and without any concerns over fragmentation.

Let us look at what a pool looks like first; then we can come back to segmentation.

A pool, in this discussion, consists of these parts:

  1. A pool (not necessarily all in one data structure) of chunks of equal sized memory.
  2. One or more lists of used chunks.
  3. One list of free chunks.

To create a segmented memory allocation from a pool we have a loop:

  1. Create a container (array or some such) of memory chunks. Call this the segment list for the allocation.
  2. Take a chunk of memory off the free list and add it to the segment list.
  3. See if the segment list contains equal or more total memory than required.
  4. If not repeat from 2.

Now we have an allocation segment list which has at least enough memory for the requirement. When we free this memory we simply put the chunks back on the free list. We can see from this that very quickly the chunks on the free list will no longer be in order and even if we were to sort them by address, they still would not be contiguous. Thus any allocation will have enough memory but not in any contiguous order.

Here is a worked example:

We will consider 10 chunks of 1 megabyte which we can call 1,2…10 which are initial in order.

Start:
  Free List: 1 2 3 4 5 6 7 8 9 10
Allocate a 2.5 megabyte store:
  Free List: 1 2 3 4 5 6 7
  Allocated Store A: 8 9 10
Allocate a 6 megabyte store:
  Free List: 1 
  Allocated Store A: 8 9 10
  Allocated Store A: 7 6 5 4 3 2
Free Allocated Store A:
  Free List: 10 9 8 1
  Allocated Store A: 7 6 5 4 3 2
Allocate a 3.1 megabyte store:
  Free List: 
  Allocated Store A: 7 6 5 4 3 2
  Allocated Store C:10 9 8 1

One can note that such an approach is good for some situations for systems like 64bit C++ but its true power is for Java. In current JVMs the maximum addressable array or ByteBuffer contains only 2**31 elements segmented storage offers an efficient way of addressing much greater quantities of memory and backing that memory with memory mapped files if required.. Consider that we need 20 billion doubles, we cannot allocate them into an array or a ByteBuffer; but we can use segmented memory so that we can achieve our goal.

Using anonymous virtual memory in Java for very large memory objects can be inefficient. In use cases where we want to handle very much more memory than the RAM on the machine, we are better off using memory mapped files than just using anonymous swap space. This means that the JVM is not competing with other programs for swap space (to an extent) but what is more important is that garbage collected memory distributes object access which is particularly poor for anonymous virtual memory. We want to concentrate access to particular pages in the time domain so that we attract as few hard page faults as possible. I have discuss other concepts in this area here: https://jaxenter.com/high-speed-multi-threaded-virtual-memory-in-java-105629.html.

Given this. if we narrow our requirement to 20 billion doubles as a memory mapped file then we are not even going to be able to use magic in sun.misc.Unsafe (see later) to help. Without JNI the largest memory mapped file ‘chunk’ we can manage in Java is just 2^31 bytes. It is this requirement for memory mapped files and the inherent allocation/freeing efficiency of segmented storage approaches that lead to me using it for Sonic Field (where I often need to manage over 100G of memory on a 16G machine).

Drilling Into The Implementation:

We now have a clear set of ideas to implement. We need mapped byte buffers. Each buffer is a chunk in a pool for free chunks. When we want to allocate a storage container we need to take some of these mapped byte buffer chunks out of the free pool and into our container. When the container is freed we return our chunks to the free pool. Simple, efficient and clean.

Also, one important thing is that the mapped byte buffers are actually java.nio.DirectByteBuffer objects with file back memory. We will use this concept later; for now we can just think of them as ByteBuffers.

On Sonic Field (which is the code for which I developed the technique of segmented storage using mapped byte buffers. – see https://github.com/nerds-central/SonicFieldRepo). In that code base I have defined the following:

    private static final long  CHUNK_LEN        = 1024 * 1024;

To get the sample we can consider each chunk as a CHUNK_LEN ByteBuffer. The code for accessing an element from an allocated memory chunk was before my speedup work:

   private static final long  CHUNK_SHIFT      = 20;
   private static final long  CHUNK_MASK       = CHUNK_LEN - 1;
...
   public final double getSample(int index)
   {
       long bytePos = index << 3;
       long pos = bytePos & CHUNK_MASK;
       long bufPos = (bytePos - pos) >> CHUNK_SHIFT;
       return chunks[(int) bufPos].getDouble((int) pos);
   }

So the allocated segment list in this case is an array of ByteBuffers:

  1. Find the index into the list by dividing the index required by the chunk size (use shift for efficiency).
  2. Find the index into the found chunk by taking the modulus (use binary and for efficiency).
  3. Look up the actual value using the getDouble intrinsic method (looks like a method but the compiler knows about it an elides the method call).

All this looks fine, but it does not work out all that well because there are some fundamental issues with the way Java lays out objects in memory which prevent segmented access being properly optimised. On the face of it, accessing a segmented memory area should be a few very fast shift and logic operations and an indirect lookup but that does not work out so for Java; all the problems happen in this line:

 return chunks[(int) bufPos].getDouble((int) pos);

This is what this line has to do:

  1. Look up the chunks object from its handle.
  2. Bounds check.
  3. Get the data from its data area.
  4. From that object handle for the ByteBuffer look up actual object.
  5. Look up its length dynamically (it can change so this is a safe point and an object field lookup).
  6. Bounds check.
  7. Retrieve the data.

Really? Yes, the JVM does all of that which is quite painful. Not only is it a lot of instructions it also requires jumping around in memory will all the consequent cache line flushing and memory pauses.

How can we improve on this? Remember that our ByteBuffers are DirectByteBuffers, this means that their data is not stored on the Java heap; it is located in the same virtual address location throughout the object lifetime. I bet you have guessed that the key here is using sun.misc.Unsafe. Yes, it is; we can bypass all this object lookup by using offheap memory. To do so means bending a few Java and JVM rules but the dividends are worth it.

From now on everything I discuss is relevant to Java 1.8 x86_64. Future versions might break this approach as it is not standards compliant.

Consider this:

   private static class ByteBufferWrapper
   {
       public long       address;
       public ByteBuffer buffer;
       public ByteBufferWrapper(ByteBuffer b) throws
                      NoSuchMethodException,
                      SecurityException,
                      IllegalAccessException,
                      IllegalArgumentException,
                      InvocationTargetException
       {
           Method addM = b.getClass().getMethod("address");
           addM.setAccessible(true);
           address = (long) addM.invoke(b);
           buffer = b;
       }
   }

What we are doing is getting the address in memory of the data stored in a DirectByteBuffer. To do this I use reflection as DirectByteBuffer is package private. DirectByteBuffer has a method on it called address() which returns a long. On x86_64 the size of an address (64 bits) is the same as long. Whilst the value of long is signed, we can just used long as binary data and ignore its numerical value. So the long returned from address() is actually the virtual address of the start of the buffer’s storage area.

Unlike ‘normal’ JVM storage (e.g. arrays) the storage of a DirectByteBuffer is ‘off heap’. It is virtual memory just like any other, but it is not owned by the garbage collector and cannot be moved by the garbage collector; this makes a huge difference to how quickly and by which techniques we can access it. Remember, the address returned by address() never changes for a given DirectByteBuffer object; consequently, we can use this address ‘forever’ and avoid object lookups.

Introducing sun.misc.Unsafe:

Whilst it would be lovely to believe that calling getDouble(int) on a DirectByteBuffer is super efficient, it does not appear that it is so. The bounds check slows it down despite the method being intrinsic [a magic function which the JVM JIT compiler knows about and can replace with machine code rather than compiling in a normal fashion]. However, with our address we can now use sun.misc.Unsafe to access the storage.

Rather than:

b.getDouble(pos);

We can:

unsafe.getDouble(address+pos);

The unsafe version is also intrinsic and compiles down to pretty much the same machine code as a C compiler (like gcc) would produce. In other words, it is as fast as it can get; there are no object dereferences or bounds checks, it just loads a double from an address.

The store equivalent is:

unsafe.putDouble(address+pos,value);

What is this ‘unsafe’ thing? We get that with another reflection hack around:

   private static Unsafe getUnsafe()
   {
       try
       {
           Field f = Unsafe.class.getDeclaredField("theUnsafe");
           f.setAccessible(true);
           return (Unsafe) f.get(null);
       }
       catch (Exception e)
       {
           throw new RuntimeException(e);
       }
   }
   private static final Unsafe unsafe = getUnsafe();

It is important to load the unsafe singleton into a final static field. This allows the compiler to assume the object reference never changes and so the very most optimal code is generated.

Now we have very fast acquisition of data from a DirectByteBuffer but we have a segmented storage model so we need to get the address for the correct byte buffer very quickly. If we store these in an array we risk the array bounds check and the array object dereference steps. We can get rid of these by further use of unsafe and offheap memory.

   private final long  chunkIndex;
...
   try
   {
       // Allocate the memory for the index - final so do it here
       long size = (1 + ((l << 3) >> CHUNK_SHIFT)) << 3;
       allocked = chunkIndex = unsafe.allocateMemory(size);
       if (allocked == 0)
       {
           throw new RuntimeException("Out of memory allocating " + size);
      }
      makeMap(l << 3l);
   }
   catch (Exception e)
   {
       throw new RuntimeException(e);
   }

Again we use the ‘final’ trick to let the compiler make the very best optimisations. The final here is a long which is just an address. We can directly allocate offheap memory using unsafe. The imaginatively called function to do this is allocateMemory(long). This returns a long which we store in chunkIndex. allocateMemory(long) actually allocates bytes but we want to store what is effectively an array of longs (addresses); this is what the bit of bit twiddling logic is doing when it computes size.

Now that we have a chunk of offheap memory large enough to store the addresses for the DirectByteBuffer segments for our storage container we can put the addresses in and retrieve them using unsafe.

During storage construction we:

    // now we have the chunks we get the address of the underlying memory
   // of each and place that in the off heap lookup so we no longer
   // reference them via objects but purely as raw memory
   long offSet = 0;
   for (ByteBufferWrapper chunk : chunks)
   {
       unsafe.putAddress(chunkIndex + offSet, chunk.address);
       offSet += 8;
   }

Which means our new code for getting and setting data can be very simple indeed:

    private long getAddress(long index)
   {
       long bytePos = index << 3;
       long pos = bytePos & CHUNK_MASK;
       long bufPos = (bytePos - pos) >> CHUNK_SHIFT;
       long address = chunkIndex + (bufPos << 3);
       return unsafe.getAddress(address) + pos;
   }

   /* (non-Javadoc)
    * @see com.nerdscentral.audio.SFSignal#getSample(int)
    */
  [email protected]
   public final double getSample(int index)
   {
       return unsafe.getDouble(getAddress(index));
   }

   /* (non-Javadoc)
    * @see com.nerdscentral.audio.SFSignal#setSample(int, double)
    */
  [email protected]
   public final double setSample(int index, double value)
   {
       unsafe.putDouble(getAddress(index), value);
       return value;
   }

The wonderful thing about this is the complete lack of object manipulation or bounds checking. OK, if someone asks for at sample which is out of bounds, the JVM will crash. That might not be a good thing. This sort of programming is very alien to many Java coders and we need to take its dangers very seriously. However, it is really quite fast compared to the original.

In my experiments, I have found that the default JVM inline settings are a little too conservative to get the best out of this approach. I have seen large speedups (up to a two times performance improvement) with the following command line tweaks.

-XX:MaxInlineSize=128 -XX:InlineSmallCode=1024

These just let the JVM make a better job of utilising the extra performance available through not being forced to perform bounds checks and object lookups. In general, I would not advise fiddling with JVM inline settings, but in this case I have real benchmark experience to show a benefit for complex offheap access work.

Testing – How Much Faster Is It?

I wrote the following piece of Jython to test:

import math
from java.lang import System

sf.SetSampleRate(192000)
count=1000
ncount=100

def test():
   t1=System.nanoTime()
   for i in range(1,ncount):
       signal=sf.Mix(+signal1,+signal2)
       signal=sf.Realise(signal)
       -signal
   t2=System.nanoTime()
   d=(t2-t1)/1000000.0
   print "Done: " + str(d)
   return d

signal1=sf.Realise(sf.WhiteNoise(count))
signal2=sf.Realise(sf.WhiteNoise(count))
print "WARM"
for i in range(1,100):
   test()
   
print "Real"
total=0.0
for i in range(1,10):
   total+=test()

print "Mean " + str(total/9.0)

-signal1
-signal2

What this does is create some stored doubles and then create new ones and reading from the old into the new over and over. Remember that we are using segmented storage backed by a pool; consequently, we only truly allocate that storage initially and after that the ‘chunks’ are just recycled. This architecture means that our execution time is dominated by executing getSample and setSample, not allocation or any other paraphernalia.

How much faster is our off heap system? On my Macbook Pro Retina I7 machine with Java 1.8.0 I got these figures for the ‘Real’ (i.e. post warm up) operations (smaller is better):

For the unsafe memory model:

  • Done: 187.124
  • Done: 175.007
  • Done: 181.124
  • Done: 175.384
  • Done: 180.497
  • Done: 180.688
  • Done: 183.309
  • Done: 178.901
  • Done: 181.746
  • Mean 180.42

For the traditional memory model:

  • Done: 303.008
  • Done: 328.763
  • Done: 299.701
  • Done: 315.083
  • Done: 306.809
  • Done: 302.515
  • Done: 304.606
  • Done: 300.291
  • Done: 342.436
  • Mean 311.468

So our unsafe memory model is 1.73 times faster than the traditional Java approach!

Why Is It 1.73 Times Faster

We can see why.

If we look back at the list of things required to just read a double from the traditional DirectByteBuffer and array approach:

  1. Look up the chunks object from its handle.
  2. Bounds check.
  3. Get the data from its data area.
  4. From that object handle for the ByteBuffer look up actual object.
  5. Look up its length dynamically (it can change so this is a safe point and an object field lookup).
  6. Bounds check.
  7. Retrieve the data.

With the new approach we have:

  1. Retrieve the address of the chunk
  2. Retrieve the data from that chunk

Not only are there very many fewer machine instructions being issued, the memory access is much more localised which almost certainly enhances the cache usage during data processing.

The source code for the fast version of the storage system as described here is: https://github.com/nerds-central/SonicFieldRepo/blob/cf6a1b67fb8dd07126b0b1274978bd850ba76931/SonicField/src/com/nerdscentral/audio/SFData.java

I am hoping that you, the reader, have spotted one big problem I have not addressed! My code is allocating offheap memory when ever it creates a segmented storage container. However, this memory will not be freed by the garbage collector. We could try to free with with finalizers but there are many reasons why this is not such a great idea.

My solution is to use explicit resource management. Sonic Field uses try with resources to manage its memory via reference counts. When the reference count for a particular storage container hits zero, the container is freed which places it storage chunks back in the free list and uses unsafe to free the address lookup memory.

Other Uses And New Ideas

Nearly a year ago now I posted ‘Java Power Features To Stay Relevant‘; I guess it was a controversial post and not everyone I have spoken to about my ideas finds them agreeable (to say the least). Nevertheless, I still believe the JVM has a challenge on its hands. The complex multi-threaded model of Java and the JVM its self is not necessarily the huge benefit people think it should be in the world of multi-core computing. There is still a lot of interest in using multiple small processes which communicate via shared memory or sockets. With the slow but inevitable increase in RDMA based networking, these approaches will seem more and more natural to people.

Java and JVM languages seem to have managed to make themselves uniquely unable to take advantage of these shifts in thinking. By developing a ‘walled garden’ approach the JVM has become very efficient at working internally but not great at working with other processes. This is a performance issue and also a stability issue; no matter how hard we try, there is always a chance the JVM will crash or enter an unstable state (OutOfMemoryError anyone?). In production systems this often necessitates several small JVM instances working together so if one goes away the production system stays up. Memory mapped files are a great way to help with persisting data even when a JVM process goes away.

All these issues lead me to another reason I am very interested in efficient offheap, mapped file architectures for the JVM. This technology sits at the overlap of shared memory and mapped file technologies which are now driving forces behind high speed, stable production environments. Whilst the system I discussed here is for a single JVM, using offheap atomics (see here: http://nerds-central.blogspot.co.uk/2015/05/synchronising-sunmiscunsafe-with-c.html) we can put the  free list offheap and share it between processes. Shared memory queues can then also give interprocess arbitration of segmented storage allocation and utilisation. Suddenly, the segmented storage model becomes an efficient way for multiple processes, both JVM and other technologies (Python, C++ etc) to share large, file persisted memory systems.

Right now there are some issues. The biggest of which is that whilst Java supports shared memory via memory mapped files it does not support that via pure shared memory. File mapping is an advantage if we are interested in large areas of memory (as in this example) but it is an unnecessary performance issue for small areas of rapidly changing memory which do not require persistence. I would like to see a true shared memory library in the JDK; this is unlikely to happen any time soon (see my point about a walled garden). JNI offers a route but then JNI has many disadvantages we well. Maybe project Panama will give the required functionality and finally break down the JVM’s walls.

To bring all this together the next trick I want to try is mapping files to a ramdisk (there is an interesting write up on this here: http://www.jamescoyle.net/knowledge/951-the-difference-between-a-tmpfs-and-ramfs-ram-disk). This should be quite easy on Linux and would let us place interprocess queues in a pure RAM shared memory areas without using JNI. With this piece done, a pure Java high speed interprocess shared memory model would be insight. Maybe that will have to wait for next year’s calendar?

Our authors – Markus Eisele

Markus Eisele is a Developer Advocate at Red Hat and focuses on JBoss Middleware. He is working with Java EE servers from different vendors since more than 14 years and talks about his favorite topics around Java EE on conferences all over the world. He has been a principal consultant and worked with different customers on all kinds of Java EE related applications and solutions. Beside that he has always been a prolific blogger, writer and tech editor for different Java EE related books. He is an active member of the German DOAG e.V. and it’s representative on the iJUG e.V. As a Java Champion and former ACE Director he is well known in the community.