Doing microservices with micro-infra-spring

We’ve been working at 4financeit for last couple of months on some open source solutions for microservices. I will be publishing some articles related to microservices and our tools and this is the first of (hopefully) many that I will write in the upcoming weeks (months?) on Too much coding blog.

This article will be an introduction to the micro-infra-spring library showing how you can quickly set up a microservice using our tools.

Introduction 

Before you start it is crucial to remember that it’s not enough to just use our tools to have a microservice. You can check out my slides about microservices and issues that we have dealt with while adopting them at 4financeit.

4financeit microservices 12.2014 at Lodz JUG from Marcin Grzejszczak

Here you can find my video where I talk about microservice at 4finance (it’s from 19.09.2014 so it’s pretty outdated)


Also it’s worth checking out the articles of Martin Fowler about microservices, Todd Hoff’s – Microservices not a free lunch! or The Strengths and Weaknesses of Microservices by Abel Avram’s.

Is monolith bad?

No it isn’t! The most important thing to remember when starting with microservices is that it will complicate your life in terms of operations, metrics, deployment and testing. Of course it does bring plenty of benefits but if you are unsure of what to pick – monolith or microservices then my advice to use is to go the monolith way.

All the benefits of microservices like code autonomy, doing one thing well, getting ridd of pacakge dependencies can be also achieved in the monolithic code thus try to write your applications with such approaches and your life will get simpler for sure. How to achieve that? That’s complicated but here are a couple of hints that I can give you:

  • try to do DDD. No, you don’t have DDD when your entities have methods. Try to use concepts of aggregate roots
  • try not to make dependencies on packages from different roots. If you have two different bounded context like com.blogspot.toomuchcoding.client and com.blogspot.toomuchcoding.loan – go via tight cohesion and low coupling – emit events, call REST endpoint, send JMS messages or talk via strictly defined API. Do not reuse internals of those packages – take a look at the next point that deals with encapsulation
  • take your highscool notes and read about encapsulation again. Most of us make the mistake of thinking that if we make a field private and add an accessor to it then we have encapsulation. That’s not true! I really like the example of Slawek Sobotka (article in polish) who shows an example of common approach to encapsulation:

    human.getStomach().getBowls().getContent().add(new Sausage())

    instead of

    human.eat(new Sausauge())

  • add to your IDE class generation template that you want your new classes to be package scoped by default – what should be publicly available are interfaces and really limited number of classes
  • start doing what’s crucial in terms of tracking microservice requests and measuring business and technical data in your own application! Gather metrics, set up correlation ids for your messages, add service discovery if you have multiple monoliths.

I’m a hipster – I want microservices!

Let’s assume that you know what you are doing, you evaluated all pros and cons and you want to go down the microservice way. You have a devops culture there in your company and people are eager to start working on multiple codebases. How to start? Pick our tools and you won’t regret it 😉

Clone a repo and get to work

We have set up a working template on Github with UI – boot-microservice-gui and without it – boot-microservice. If you clone our repo and start working with it you get a service that:
  • uses micro-infra-spring library
  • is written in Groovy
  • uses Spring Boot
  • is built with Gradle (set up for 4finance – but that’s really easy to change)
  • is JDK8 compliant
  • contains an example of a business scenario
what you just have to do is:
  • check out the slides above to see our approach to microservices
  • remove the packages com/ofg/twitter from src/main and src/test
  • alter microservice.json to support your requirements
  • write your code!
Why should you use our repo?
  • you don’t have to set up anything – we’ve already done it for you
  • the time required to start developing a feature is close to zero

Aren’t we duplicating Spring Cloud?

In fact we’re not. We’re using it in our libraries ourselves (right now for property storage in Git repository). We have some different approaches to service discovery for instance but in general we are extending Spring Cloud’s features by:

Conclusions

If you want to go down the microservice way you have to be well aware of the issues related to that approach. If you know what you’re doing you can use our libraries and our microservice templates to have a fast start into feature development. 

What’s next

On my blog at toomuchcoding.blogspot.com I’ll write about different features of the micro-infra-spring library with more emphasis on configuration on specific features that are not that well known but equally cool as the rest 😉 Also I’ll write some articles on how we approached splitting the monolith but you’ll have to wait for that some time 😉

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

How jOOQ Helps Pretend that Your Stored Procedures are a Part of Java

In this year’s Java Advent Calendar, we’re thrilled to have been asked to feature a mini-series showing you a couple of advanced and very interesting topics that we’ve been working on when developing jOOQ.

The series consists of:

Don’t miss any of these!

How jOOQ helps pretend that your stored procedures are a part of Java

This article was originally published fully on the jOOQ blog

Stored procedures are an interesting way to approach data processing. Some Java developers tend to stay clear of them for rather dogmatic reasons, such as:

  • They think that the database is the wrong place for business logic
  • They think that the procedural aspect of those languages is ill-suited for their domain

But in practice, stored procedures are an excellent means of handling data manipulations simply for the fact that they can execute complex logic right where the data is. This completely removes all effects that network latency and bandwidth will have on your application, otherwise. As we’re looking into supporting SAP HANA for jOOQ 3.6, we can tell you that running jOOQ’s 10000 integration test queries connecting from a local machine to the cloud takes a lot longer. If you absolutely want to stay in Java land, then you better also deploy your Java application into the cloud, close to that database (SAP HANA obviously offers that feature). But much better than that, move some of the logic into the database!

If you’re doing calculations on huge in-memory data sets, you should better get your code into that same memory, rather than shuffling around memory pieces between possibly separate physical memory addresses. Companies like Hazelcast essentially do the same, except that their in-memory database is written in Java, so you can also write your “stored procedure” in Java.

With SQL databases, procedural SQL languages are king. And because of their tight integration with SQL, they’re much superior for the job than any Java based stored procedure architecture.

I knoow, but JDBC’s CallableStatement… Arrrgh!

Yes. As ever so often (and as mentioned before in our previous articles, one very important reason why many Java developers don’t like working with SQL is JDBC. Binding to a database via JDBC is extremely tedious and keeps us from working efficiently. Let’s have a look at a couple of PL/SQL binding examples:

Assume we’re working on an Oracle-port of the popular Sakila database (originally created for MySQL). This particular Sakila/Oracle port was implemented by DB Software Laboratory and published under the BSD license.

Here’s a partial view of that Sakila database.

ERD created with vertabelo.comlearn how to use Vertabelo with jOOQ

Now, let’s assume that we have an API in the database that doesn’t expose the above schema, but exposes a PL/SQL API instead. The API might look something like this:

CREATE TYPE LANGUAGE_T AS OBJECT (
language_id SMALLINT,
name CHAR(20),
last_update DATE
);
/

CREATE TYPE LANGUAGES_T AS TABLE OF LANGUAGE_T;
/

CREATE TYPE FILM_T AS OBJECT (
film_id int,
title VARCHAR(255),
description CLOB,
release_year VARCHAR(4),
language LANGUAGE_T,
original_language LANGUAGE_T,
rental_duration SMALLINT,
rental_rate DECIMAL(4,2),
length SMALLINT,
replacement_cost DECIMAL(5,2),
rating VARCHAR(10),
special_features VARCHAR(100),
last_update DATE
);
/

CREATE TYPE FILMS_T AS TABLE OF FILM_T;
/

CREATE TYPE ACTOR_T AS OBJECT (
actor_id numeric,
first_name VARCHAR(45),
last_name VARCHAR(45),
last_update DATE
);
/

CREATE TYPE ACTORS_T AS TABLE OF ACTOR_T;
/

CREATE TYPE CATEGORY_T AS OBJECT (
category_id SMALLINT,
name VARCHAR(25),
last_update DATE
);
/

CREATE TYPE CATEGORIES_T AS TABLE OF CATEGORY_T;
/

CREATE TYPE FILM_INFO_T AS OBJECT (
film FILM_T,
actors ACTORS_T,
categories CATEGORIES_T
);
/

You’ll notice immediately, that this is essentially just a 1:1 copy of the schema in this case modelled as Oracle SQL OBJECT and TABLE types, apart from the FILM_INFO_T type, which acts as an aggregate.

Now, our DBA (or our database developer) has implemented the following API for us to access the above information:

CREATE OR REPLACE PACKAGE RENTALS AS
FUNCTION GET_ACTOR(p_actor_id INT) RETURN ACTOR_T;
FUNCTION GET_ACTORS RETURN ACTORS_T;
FUNCTION GET_FILM(p_film_id INT) RETURN FILM_T;
FUNCTION GET_FILMS RETURN FILMS_T;
FUNCTION GET_FILM_INFO(p_film_id INT) RETURN FILM_INFO_T;
FUNCTION GET_FILM_INFO(p_film FILM_T) RETURN FILM_INFO_T;
END RENTALS;
/

This, ladies and gentlemen, is how you can now…

… tediously access the PL/SQL API with JDBC

So, in order to avoid the awkward CallableStatement with its OUT parameter registration and JDBC escape syntax, we’re going to fetch a FILM_INFO_T record via a SQL statement like this:

try (PreparedStatement stmt = conn.prepareStatement(
"SELECT rentals.get_film_info(1) FROM DUAL");
ResultSet rs = stmt.executeQuery()) {

// STRUCT unnesting here...
}

So far so good. Luckily, there is Java 7’s try-with-resources to help us clean up those myriad JDBC objects. Now how to proceed? What will we get back from this ResultSet? A java.sql.Struct:

while (rs.next()) {
Struct film_info_t = (Struct) rs.getObject(1);

// And so on...
}

Now, the brave ones among you would continue downcasting the java.sql.Struct to an even more obscure and arcane oracle.sql.STRUCT, which contains almost no Javadoc, but tons of deprecated additional, vendor-specific methods.

For now, let’s stick with the “standard API”, though. Let’s continue navigating our STRUCT

while (rs.next()) {
Struct film_info_t = (Struct) rs.getObject(1);

Struct film_t = (Struct) film_info_t.getAttributes()[0];
String title = (String) film_t.getAttributes()[1];
Clob description_clob = (Clob) film_t.getAttributes()[2];
String description = description_clob.getSubString(1, (int) description_clob.length());

Struct language_t = (Struct) film_t.getAttributes()[4];
String language = (String) language_t.getAttributes()[1];

System.out.println("Film : " + title);
System.out.println("Description: " + description);
System.out.println("Language : " + language);
}

This could go on and on. The pain has only started, we haven’t even covered arrays yet. The details can be seen here in the original article.

Anyway. Now that we’ve finally achieved this, we can see the print output:

Film       : ACADEMY DINOSAUR
Description: A Epic Drama of a Feminist And a Mad
Scientist who must Battle a Teacher in
The Canadian Rockies
Language : English
Actors :
PENELOPE GUINESS
CHRISTIAN GABLE
LUCILLE TRACY
SANDRA PECK
JOHNNY CAGE
MENA TEMPLE
WARREN NOLTE
OPRAH KILMER
ROCK DUKAKIS
MARY KEITEL

When will this madness stop?

It’ll stop right here!

So far, this article read like a tutorial (or rather: medieval torture) of how to deserialise nested user-defined types from Oracle SQL to Java (don’t get me started on serialising them again!)

In the next section, we’ll see how the exact same business logic (listing Film with ID=1 and its actors) can be implemented with no pain at all using jOOQ and its source code generator. Check this out:

// Simply call the packaged stored function from
// Java, and get a deserialised, type safe record
FilmInfoTRecord film_info_t = Rentals.getFilmInfo1(configuration, new BigInteger("1"));

// The generated record has getters (and setters)
// for type safe navigation of nested structures
FilmTRecord film_t = film_info_t.getFilm();

// In fact, all these types have generated getters:
System.out.println("Film : " + film_t.getTitle());
System.out.println("Description: " + film_t.getDescription());
System.out.println("Language : " + film_t.getLanguage().getName());

// Simply loop nested type safe array structures
System.out.println("Actors : ");
for (ActorTRecord actor_t : film_info_t.getActors()) {
System.out.println(
" " + actor_t.getFirstName()
+ " " + actor_t.getLastName());
}

System.out.println("Categories : ");
for (CategoryTRecord category_t : film_info_t.getCategories()) {
System.out.println(category_t.getName());
}

Is that it?

Yes!

Wow, I mean, this is just as though all those PL/SQL types and procedures / functions were actually part of Java. All the caveats that we’ve seen before are hidden behind those generated types and implemented in jOOQ, so you can concentrate on what you originally wanted to do. Access the data objects and do meaningful work with them. Not serialise / deserialise them!

Not convinced yet?

I told you not to get me started on serialising the types to JDBC. And I won’t, but here’s how to serialise the types to jOOQ, because that’s a piece of cake!

Let’s consider this other aggregate type, that returns a customer’s rental history:

CREATE TYPE CUSTOMER_RENTAL_HISTORY_T AS OBJECT (
customer CUSTOMER_T,
films FILMS_T
);
/

And the full PL/SQL package specs:

CREATE OR REPLACE PACKAGE RENTALS AS
FUNCTION GET_ACTOR(p_actor_id INT) RETURN ACTOR_T;
FUNCTION GET_ACTORS RETURN ACTORS_T;
FUNCTION GET_CUSTOMER(p_customer_id INT) RETURN CUSTOMER_T;
FUNCTION GET_CUSTOMERS RETURN CUSTOMERS_T;
FUNCTION GET_FILM(p_film_id INT) RETURN FILM_T;
FUNCTION GET_FILMS RETURN FILMS_T;
FUNCTION GET_CUSTOMER_RENTAL_HISTORY(p_customer_id INT) RETURN CUSTOMER_RENTAL_HISTORY_T;
FUNCTION GET_CUSTOMER_RENTAL_HISTORY(p_customer CUSTOMER_T) RETURN CUSTOMER_RENTAL_HISTORY_T;
FUNCTION GET_FILM_INFO(p_film_id INT) RETURN FILM_INFO_T;
FUNCTION GET_FILM_INFO(p_film FILM_T) RETURN FILM_INFO_T;
END RENTALS;
/

So, when calling RENTALS.GET_CUSTOMER_RENTAL_HISTORY we can find all the films that a customer has ever rented. Let’s do that for all customers whose FIRST_NAME is “JAMIE”, and this time, we’re using Java 8:

// We call the stored function directly inline in
// a SQL statement
dsl().select(Rentals.getCustomer(
CUSTOMER.CUSTOMER_ID
))
.from(CUSTOMER)
.where(CUSTOMER.FIRST_NAME.eq("JAMIE"))

// This returns Result<Record1<CustomerTRecord>>
// We unwrap the CustomerTRecord and consume
// the result with a lambda expression
.fetch()
.map(Record1::value1)
.forEach(customer -> {
System.out.println("Customer : ");
System.out.println("- Name : " + customer.getFirstName() + " " + customer.getLastName());
System.out.println("- E-Mail : " + customer.getEmail());
System.out.println("- Address : " + customer.getAddress().getAddress());
System.out.println(" " + customer.getAddress().getPostalCode() + " " + customer.getAddress().getCity().getCity());
System.out.println(" " + customer.getAddress().getCity().getCountry().getCountry());

// Now, lets send the customer over the wire again to
// call that other stored procedure, fetching his
// rental history:
CustomerRentalHistoryTRecord history =
Rentals.getCustomerRentalHistory2(dsl().configuration(), customer);

System.out.println(" Customer Rental History : ");
System.out.println(" Films : ");

history.getFilms().forEach(film -> {
System.out.println(" Film : " + film.getTitle());
System.out.println(" Language : " + film.getLanguage().getName());
System.out.println(" Description : " + film.getDescription());

// And then, let's call again the first procedure
// in order to get a film's actors and categories
FilmInfoTRecord info =
Rentals.getFilmInfo2(dsl().configuration(), film);

info.getActors().forEach(actor -> {
System.out.println(" Actor : " + actor.getFirstName() + " " + actor.getLastName());
});

info.getCategories().forEach(category -> {
System.out.println(" Category : " + category.getName());
});
});
});

… and a short extract of the output produced by the above:

Customer  : 
- Name : JAMIE RICE
- E-Mail : [email protected]
- Address : 879 Newcastle Way
90732 Sterling Heights
United States
Customer Rental History :
Films :
Film : ALASKA PHANTOM
Language : English
Description : A Fanciful Saga of a Hunter
And a Pastry Chef who must
Vanquish a Boy in Australia
Actor : VAL BOLGER
Actor : BURT POSEY
Actor : SIDNEY CROWE
Actor : SYLVESTER DERN
Actor : ALBERT JOHANSSON
Actor : GENE MCKELLEN
Actor : JEFF SILVERSTONE
Category : Music
Film : ALONE TRIP
Language : English
Description : A Fast-Paced Character
Study of a Composer And a
Dog who must Outgun a Boat
in An Abandoned Fun House
Actor : ED CHASE
Actor : KARL BERRY
Actor : UMA WOOD
Actor : WOODY JOLIE
Actor : SPENCER DEPP
Actor : CHRIS DEPP
Actor : LAURENCE BULLOCK
Actor : RENEE BALL
Category : Music

If you’re using Java and PL/SQL…

… then you should click on the below banner and download the free trial right now to experiment with jOOQ and Oracle:

The Oracle port of the Sakila database is available from this URL for free, under the terms of the BSD license:

https://github.com/jOOQ/jOOQ/tree/master/jOOQ-examples/Sakila/oracle-sakila-db

Finally, it is time to enjoy writing PL/SQL again!

And things get even better!

jOOQ is free and Open Source for use with Open Source databases, and it offers commercial licensing for use with commercial databases. So, if you’re using Firebird, MySQL, or PostgreSQL, you can leverage all your favourite database’s procedural SQL features and bind them easily to Java for free!

For more information about jOOQ or jOOQ’s DSL API, consider these resources:

That’s it with this year’s mini-series on jOOQ. Have a happy Holiday season!
This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

How jOOQ Allows for Fluent Functional-Relational Interactions in Java 8

In this year’s Java Advent Calendar, we’re thrilled to have been asked to feature a mini-series showing you a couple of advanced and very interesting topics that we’ve been working on when developing jOOQ.

The series consists of:

Don’t miss any of these!

How jOOQ allows for fluent functional-relational interactions in Java 8

In yesterday’s article, we’ve seen How jOOQ Leverages Generic Type Safety in its DSL when constructing SQL statements. Much more interesting than constructing SQL statements, however, is executing them.

Yesterday, we’ve seen a sample PL/SQL block that reads like this:

BEGIN
FOR rec IN (
SELECT first_name, last_name FROM customers
UNION
SELECT first_name, last_name FROM staff
)
LOOP
INSERT INTO people (first_name, last_name)
VALUES rec.first_name, rec.last_name;
END LOOP;
END;

And you won’t be surprised to see that the exact same thing can be written in Java with jOOQ:

for (Record2<String, String> rec : 
dsl.select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
.union(
select(STAFF.FIRST_NAME, STAFF.LAST_NAME ).from(STAFF))
) {
dsl.insertInto(PEOPLE, PEOPLE.FIRST_NAME, PEOPLE.LAST_NAME)
.values(rec.getValue(CUSTOMERS.FIRST_NAME), rec.getValue(CUSTOMERS.LAST_NAME))
.execute();
}

This is a classic, imperative-style PL/SQL inspired approach at iterating over result sets and performing actions 1-1.

Java 8 changes everything!

With Java 8, lambdas appeared, and much more importantly, Streams did, and tons of other useful features. The simplest way to migrate the above foreach loop to Java 8’s “callback hell” would be the following

dsl.select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
.union(
select(STAFF.FIRST_NAME, STAFF.LAST_NAME ).from(STAFF))
.forEach(rec -> {
dsl.insertInto(PEOPLE, PEOPLE.FIRST_NAME, PEOPLE.LAST_NAME)
.values(rec.getValue(CUSTOMERS.FIRST_NAME), rec.getValue(CUSTOMERS.LAST_NAME))
.execute();
}

This is still very simple. How about this. Let’s fetch a couple of records from the database, stream them, map them using some sophisticated Java function, reduce them into a batch update statement! Whew… here’s the code:

dsl.selectFrom(BOOK)
.where(BOOK.ID.in(2, 3))
.orderBy(BOOK.ID)
.fetch()
.stream()
.map(book -> book.setTitle(book.getTitle().toUpperCase()))
.reduce(
dsl.batch(update(BOOK).set(BOOK.TITLE, (String) null).where(BOOK.ID.eq((Integer) null))),
(batch, book) -> batch.bind(book.getTitle(), book.getId()),
(b1, b2) -> b1
)
.execute();

Awesome, right? Again, with comments

// Here, we simply select a couple of books from the database
dsl.selectFrom(BOOK)
.where(BOOK.ID.in(2, 3))
.orderBy(BOOK.ID)
.fetch()

// Now, we stream the result as a Java 8 Stream
.stream()

// Now we map all book titles using the "sophisticated" Java function
.map(book -> book.setTitle(book.getTitle().toUpperCase()))

// Now, we reduce the books into a batch update statement...
.reduce(

// ... which is initialised with empty bind variables
dsl.batch(update(BOOK).set(BOOK.TITLE, (String) null).where(BOOK.ID.eq((Integer) null))),

// ... and then we bind each book's values to the batch statement
(batch, book) -> batch.bind(book.getTitle(), book.getId()),

// ... this is just a dummy combiner function, because we only operate on one batch instance
(b1, b2) -> b1
)

// Finally, we execute the produced batch statement
.execute();

Awesome, right? Well, if you’re not too functional-ish, you can still resort to the “old ways” using imperative-style loops. Perhaps, your coworkers might prefer that:

BatchBindStep batch = dsl.batch(update(BOOK).set(BOOK.TITLE, (String) null).where(BOOK.ID.eq((Integer) null))),

for (BookRecord book :
dsl.selectFrom(BOOK)
.where(BOOK.ID.in(2, 3))
.orderBy(BOOK.ID)
) {
batch.bind(book.getTitle(), book.getId());
}

batch.execute();

So, what’s the point of using Java 8 with jOOQ?

Java 8 might change a lot of things. Mainly, it changes the way we reason about functional data transformation algorithms. Some of the above ideas might’ve been a bit over the top. But the principal idea is that whatever is your source of data, if you think about that data in terms of Java 8 Streams, you can very easily transform (map) those streams into other types of streams as we did with the books. And nothing keeps you from collecting books that contain changes into batch update statements for batch execution.

Another example is one where we claimed that Java 8 also changes the way we perceive ORMs. ORMs are very stateful, object-oriented things that help manage database state in an object-graph representation with lots of nice features like optimistic locking, dirty checking, and implementations that support long conversations. But they’re quite terrible at data transformation. First off, they’re much much inferior to SQL in terms of data transformation capabilities. This is topped by the fact that object graphs and functional programming don’t really work well either.

With SQL (and thus with jOOQ), you’ll often stay on a flat tuple level. Tuples are extremely easy to transform. The following example shows how you can use an H2 database to query for INFORMATION_SCHEMA meta information such as table names, column names, and data types, collect those information into a data structure, before mapping that data structure into new CREATE TABLE statements:

DSL.using(c)
.select(
COLUMNS.TABLE_NAME,
COLUMNS.COLUMN_NAME,
COLUMNS.TYPE_NAME
)
.from(COLUMNS)
.orderBy(
COLUMNS.TABLE_CATALOG,
COLUMNS.TABLE_SCHEMA,
COLUMNS.TABLE_NAME,
COLUMNS.ORDINAL_POSITION
)
.fetch() // jOOQ ends here
.stream() // Streams start here
.collect(groupingBy(
r -> r.getTableName(),
LinkedHashMap::new,
mapping(
r -> r,
toList()
)
))
.forEach(
(table, columns) -> {
// Just emit a CREATE TABLE statement
System.out.println(
"CREATE TABLE " + table + " (");

// Map each "Column" type into a String
// containing the column specification,
// and join them using comma and
// newline. Done!
System.out.println(
columns.stream()
.map(col -> " " + col.getName() +
" " + col.getType())
.collect(Collectors.joining(",n"))
);

System.out.println(");");
}
);

The above statement will produce something like the following SQL script:

CREATE TABLE CATALOGS(
CATALOG_NAME VARCHAR
);
CREATE TABLE COLLATIONS(
NAME VARCHAR,
KEY VARCHAR
);
CREATE TABLE COLUMNS(
TABLE_CATALOG VARCHAR,
TABLE_SCHEMA VARCHAR,
TABLE_NAME VARCHAR,
COLUMN_NAME VARCHAR,
ORDINAL_POSITION INTEGER,
COLUMN_DEFAULT VARCHAR,
IS_NULLABLE VARCHAR,
DATA_TYPE INTEGER,
CHARACTER_MAXIMUM_LENGTH INTEGER,
CHARACTER_OCTET_LENGTH INTEGER,
NUMERIC_PRECISION INTEGER,
NUMERIC_PRECISION_RADIX INTEGER,
NUMERIC_SCALE INTEGER,
CHARACTER_SET_NAME VARCHAR,
COLLATION_NAME VARCHAR,
TYPE_NAME VARCHAR,
NULLABLE INTEGER,
IS_COMPUTED BOOLEAN,
SELECTIVITY INTEGER,
CHECK_CONSTRAINT VARCHAR,
SEQUENCE_NAME VARCHAR,
REMARKS VARCHAR,
SOURCE_DATA_TYPE SMALLINT
);

That’s data transformation! If you’re as excited as we are, read on in this article how this example works exactly.

Conclusion

Java 8 has changed everything in the Java ecosystem. Finally, we can implement functional, transformative algorithms easily using Streams and lambda expressions. SQL is also a very functional and transformative language. With jOOQ and Java 8, you can extend data transformation directly from your type safe SQL result into Java data structures, back into SQL. These things aren’t possible with JDBC. These things weren’t possible prior to Java 8.

jOOQ is free and Open Source for use with Open Source databases, and it offers commercial licensing for use with commercial databases.

For more information about jOOQ or jOOQ’s DSL API, consider these resources:

Stay tuned for tomorrow’s article “How jOOQ helps pretend that your stored procedures are a part of Java”
This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

How jOOQ Leverages Generic Type Safety in its DSL

In this year’s Java Advent Calendar, we’re thrilled to have been asked to feature a mini-series showing you a couple of advanced and very interesting topics that we’ve been working on when developing jOOQ.

The series consists of:

Don’t miss any of these!

How jOOQ leverages generic type safety in its DSL

Few Java developers are aware of this, but SQL is a very type safe language. In the Java ecosystem, if you’re using JDBC, you’re operating on dynamically constructed SQL strings, which are sent to the server for execution – or failure. Some IDEs may have started to be capable of introspecting parts of your static SQL, but often you’re concatenating predicates to form a very dynamic query:

String sql = "SELECT a, b, c FROM table WHERE 1 = 1";

if (someCondition)
sql += " AND id = 3";

if (someOtherCondition)
sql += " AND value = 42";

These concatenations quickly turn nasty and are one of the reasons why Java developers don’t really like SQL

SQL as written via JDBC. Image (c) by Greg Grossmeier. License CC-BY-SA 2.0

But interestingly, PL/SQL or T-SQL developers never complain about SQL in this way. In fact, they feel quite the opposite. Look at how SQL is nicely embedded in a typical PL/SQL block:

BEGIN

-- The record type of "rec" is inferred by the compiler
FOR rec IN (

-- This compiles only when I have matching
-- degrees and types of both UNION subselects!
SELECT first_name, last_name FROM customers
UNION
SELECT first_name, last_name FROM staff
)
LOOP

-- This compiles only if rec really has
-- first_name and last_name columns
INSERT INTO people (first_name, last_name)

-- Obviously, VALUES must match the above target table
VALUES (rec.first_name, rec.last_name);
END LOOP;
END;

Now, we can most certainly discuss syntax. Whether you like SQL’s COBOLesque syntax or not is a matter of taste and a matter of habit, too. But one thing is clear, SQL is absolutely type safe, and most sane people would consider that a very good thing. Read The Inconvenient Truth About Dynamic vs. Static Typing for more details.

The same can be achieved in Java!

JDBC’s lack of type safety is a brilliant feature for the low-level API that JDBC is. At some point, we need an API that can simply send SQL strings over the wire without knowing anything about the wire protocol, and retrieve back cursors of arbitrary / unknown type. However, if we don’t execute our SQL directly via JDBC, but maintain a type safe SQL AST (Abstract Syntax Tree) prior to query execution, then we might actually anticipate the returned type of our statements.

jOOQ’s DSL API (Domain-specific language) works exactly like that. When you create SQL statements with jOOQ, you’re implicitly creating an AST both for your Java compiler, but also for your runtime environment. Here’s how that works:

DSL.using(configuration)
.select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
.union(
select(STAFF.FIRST_NAME, STAFF.LAST_NAME ).from(STAFF))
.fetch();

If we look closely at what the above query really does, we’ll see that we’re calling one of several overloaded select() methods on jOOQ’s DSLContext class, namely DSLContext.select(Field, Field), the one that takes two argument columns.

The whole API looks like this, and we’ll see immediately after why this is so useful:

<T1> SelectSelectStep<Record1<T1>> 
select(Field<T1> field1);
<T1, T2> SelectSelectStep<Record2<T1, T2>>
select(Field<T1> field1, Field<T2> field2);
<T1, T2, T3> SelectSelectStep<Record3<T1, T2, T3>>
select(Field<T1> field1, Field<T2> field2, Field<T3> field3);
// and so on...

So, by explicitly passing two columns to the select() method, you have chosen the second one of the above methods that returns a DSL type that is parameterised with Record2, or more specifically, with Record2<String, String>. Yes, the String parameter bindings are inferred from the very columns that we passed to the select() call, because jOOQ’s code generator reverse-engineers your database schema and generates those classes for you.

The generated Customers class really looks like this (simplified):

// All table references are listed here:
class Tables {
Customers CUSTOMERS = new Customers();
Staff STAFF = new Staff();
}

// All tables have an individual class each, with columns inside:
class Customers {
final Field<String> FIRST_NAME = ...
final Field<String> LAST_NAME = ...
}

As you can see, all type information is already available to you, automatically, as you have defined those types only once in the database. No need to define them again in Java.

Generic type information is ubiquitous

The interesting part is the UNION. The union() method on the DSL API simply looks like this:

public interface SelectUnionStep<R extends Record> {
SelectOrderByStep<R> union(Select<? extends R> select);
}

If we go back to our statement, we can see that the type of the object upon which we call union() is really this type:

SelectUnionStep<Record2<String, String>>

… thus, the method union() that we’re calling is really expecting an argument of this type:

union(Select<? extends Record2<String, String>> select);

… which essentially means that we’ll get a compilation error if we don’t provide two string columns also in the second subselect:

DSL.using(configuration)
.select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
.union(
// ^^^^^ doesn't compile, wrong argument type!
select(STAFF.FIRST_NAME).from(STAFF))
.fetch();

or also:

DSL.using(configuration)
.select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
.union(
// ^^^^^ doesn't compile, wrong argument type!
select(STAFF.FIRST_NAME, STAFF.DATE_OF_BIRTH).from(STAFF))
.fetch();

Static type checking helps finding bugs early

… indeed! All of the above bugs can be found at compile-time because your Java compiler will not accept the wrong SQL statements. When writing dynamic SQL, this can be incredibly subtle, as the different UNION subselects may not be created all at the same place. You may have a complex DAO that generates the SQL across several methods. With this kind of generic type safety, you can continue to do so, safely.

As mentioned before, this extends through the whole API. Check out…

IN predicates

This compiles:

// Get all customers whose first name corresponds to a staff first name
DSL.using(configuration)
.select().from(CUSTOMERS)
.where(CUSTOMERS.FIRST_NAME.in(
select(STAFF.FIRST_NAME).from(STAFF)
))
.fetch();

This doesn’t compile:

DSL.using(configuration)
.select().from(CUSTOMERS)
.where(CUSTOMERS.FIRST_NAME.in(
// ^^ wrong argument type!
select(STAFF.FIRST_NAME, STAFF.LAST_NAME).from(STAFF)
))
.fetch();

But this compiles:

// Get all customers whose first and last names both correspond
// to a staff first and last names
DSL.using(configuration)
.select().from(CUSTOMERS)
.where(row(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).in(
select(STAFF.FIRST_NAME, STAFF.LAST_NAME).from(STAFF)
))
.fetch();

Notice the use of row() to construct a row value expression, an extremely useful but little known SQL feature.

INSERT statements

This compiles:

DSL.using(configuration)
.insertInto(CUSTOMERS, CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME)
.values("John", "Doe")
.execute();

This doesn’t compile:

DSL.using(configuration)
.insertInto(CUSTOMERS, CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME)
.values("John")
// ^^^^^^ Invalid number of arguments
.execute();

Conclusion

Internal domain-specific languages can express a lot of type safety in Java, almost as much as the external language really implements. In the case of SQL – which is a very type safe language – this is particularly true and interesting.

jOOQ has been designed to create as little cognitive friction as possible for any Java developer who wants to write embedded SQL in Java, i.e. the Java code will look and feel exactly like the SQL code that it represents. At the same time, jOOQ has been designed to offer as much compile-time type safety as possible in the Java language (or also in Scala, Groovy, etc.).

jOOQ is free and Open Source for use with Open Source databases, and it offers commercial licensing for use with commercial databases.

For more information about jOOQ or jOOQ’s DSL API, consider these resources:

Stay tuned for tomorrow’s article “How jOOQ allows for fluent functional-relational interactions in Java 8”
This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Managing Package Dependencies with Degraph

A large part of the art of software development is keeping the complexity of a system as low as possible. But what is complexity anyway? While the exact semantics vary quite a bit, depending on who you ask, probably most agree that it has a lot to do with the number of parts in a system and their interactions.

Consider a marble in space, i.e a planet, moon or star. Without any interaction this is as boring as a system can get. Nothing happens. If the marble moves, it keeps moving in exactly the same way. To be honest there isn’t even a way to determine if it is moving. Boooring.

Add a second marble to the system and let them attract each other, like earth and moon. Now the system is a more interesting. The two objects circle each other if they aren’t too fast. Somewhat interesting.

Now add a third object. In the general case things go so interesting that we can’t even predict what is going to happen. The whole system didn’t just became complex it became chaotic. You now have a three body problem In the general case this problem cannot be solved, i.e. we cannot predict what will happen with the system. But there are some special cases. Especially the case where two of the objects a very close to each other like earth and moon and the third one is so far away that the two first object behave just like one. In this case you approximate the system with two particle systems.

But what has this to do with Java? This sounds more like physics.

I think software development is similar in some aspects. A complete application is way to complicated to be understood as a whole. To fight this complexity we divide the system into parts (classes) that can be understood on their own, and that hide their inner complexity so that when we look at the larger picture we don’t have to worry about every single code line in a class, but only about the class as one entity. This is actually very similar to what physicists do with systems.

But let’s look at the scale of things. The basic building block of software is the code line. And to keep the complexity in check we bundle code lines that work together in methods. How many code lines go into a single method varies, but it is in the order of 10 lines of code.
Next you gather methods into classes. How many methods go into a single class? Typically in the order of 10 methods!

And then? We bundle 100-10000 classes in a single jar! I hope I’m not the only one who thinks something is amiss.

I’m not sure what comes out of project jigsaw, but currently Java only offers packages as a way to bundle classes. Package aren’t a powerful abstraction, yet it is the only one we have, so we better use it.

Most teams do use packages, but not in a very well structured, but ad hoc way. The result is similar to trying to consider moon and sun as on part of the system, and the earth as the other part. The result might work, but it is probably as intuitive as Ptolemy’s planetary model. Instead decide on criteria how you want to differentiate your packages. I personally call them slicings, inspired by an article by Oliver Gierke. Possible slicings in order of importance are:

  • the deployable jar file the class should end up in
  • the use case / feature / part of the business model the class belongs to
  • the technical layer the class belongs to

The packages this results in will look like this: <domain>.<deployable>.<domain part>.<layer>

It should be easy to decide where a class goes. And it should also keep the packages at a reasonable size, even when you don’t use the separation by technical layer.

But what do you gain from this? It is easier to find classes, but that’s about it. You need one more rule to make this really worth while: There must be no cyclic dependencies!

This means, if a class in a package A references a class in package B no class in B may reference A. This also applies if the reference is indirect via multiple other packages. But that is still not enough. Slices should be cycle free as well, so if a domain part X references a different domain part Y, the reverse dependency must not exist!

This will in deed put some rather strict rules on your package and dependency structure. The benefit of this is, that it becomes very flexible.

Without such a structure splitting your project in multiple parts will probably be rather difficult. Ever tried to reuse part of an application in a different one, just to realize that you basically have to include most of the the application in order to get it to compile? Ever tried to deploy different parts of an application to different servers, just to realize you can’t? It certainly happend to me before I used the approach mentioned above. But with this more strict structure, the parts you may want to reuse, will almost on their own end up on the end of the dependency chain so you can take them and bundle them in their own jar, or just copy the code in a different project and have it compile in very short time.

Also while trying to keep your packages and slices cycle free you’ll be forced to think hard, what each package involved is really about. Something that improved my code base considerably in many cases.

So there is one problem left: Dependencies are hard to see. Without a tool, it is very difficult to keep a code base cycle free. Of course there are plenty of tools that check for cycles, but cleaning up these cycles is tough and the way most tools present these cycles doesn’t help very much. I think what one needs are two things:

  1. a simple test, that can run with all your other tests and fails when you create a dependency circle.
  2. a tool that visualizes all the dependencies between classes, while at the same time showing in which slice each class belongs.

Surprise! I can recommend such a great tool: Degraph! (I’m the author, so I might be biased)

You can write tests in JUnit like this:



assertThat(
classpath().including("de.schauderhaft.**")
.printTo("degraphTestResult.graphml")
.withSlicing("module", "de.schauderhaft.(*).*.**")
.withSlicing("layer", "de.schauderhaft.*.(*).**"),
is(violationFree())
);

The test will analyze everything in the classpath that starts with de.schauderhaft. It will slice the classes in two ways: By taking the third part of the package name and by taking the forth part of the package name. So a class name de.schauderhaft.customer.persistence.HibernateCustomerRepository ends up in the module customer and in the layer persistence. And it will make sure that modules, layers and packages are cycle free.

And if it finds a dependency circle, it will create a graphml file, which you can open using the free graph editor yed. With a little layouting you get results like the following where the dependencies that result in circular dependencies are marked in red.

Again for more details on how to achieve good usable layouts I have to refer to the documentation of Degraph.

Also note that the graphs are colored mainly green with a little red, which nicely fits the season!

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Thread local storage in Java

One of the rarely known features among developers is Thread-local storage.  The idea is simple and need for it comes in  scenarios where we need data that is … well local for the thread. If we have two threads we that refer to the same global variable but we wanna them to have separate value independently initialized of each other.

Most major programming languages have implementation of the concept. For example C++11 has even the thread_local keyword, Ruby has chosen an API approach .

Java has also an implementation of the concept with  java.lang.ThreadLocal<T> and its subclass java.lang.InheritableThreadLocal<T> since version 1.2, so nothing new and shiny here.
Let’s say that for some reason we need to have an Long specific for our thread. Using Thread local that would simple be

public class ThreadLocalExample {

  public static class SomethingToRun implements Runnable {

    private ThreadLocal threadLocal = new ThreadLocal();

    @Override
    public void run() {
      System.out.println(Thread.currentThread().getName() + " " + threadLocal.get());

      try {
        Thread.sleep(2000);
      } catch (InterruptedException e) {
      }

      threadLocal.set(System.nanoTime());
      System.out.println(Thread.currentThread().getName() + " " + threadLocal.get());
    }
  }


  public static void main(String[] args) {
    SomethingToRun sharedRunnableInstance = new SomethingToRun();

    Thread thread1 = new Thread(sharedRunnableInstance);
    Thread thread2 = new Thread(sharedRunnableInstance);

    thread1.start();
    thread2.start();
  }

}
One possible sample run of the following code will result into :


Thread-0 null

Thread-0 132466384576241

Thread-1 null

Thread-1 132466394296347
At the beginning the value is set to null to both threads, obviously each of them works with separate values since after setting the value to System.nanoTime() on Thread-0 it will not have any effect on the value of Thread-1 exactly as we wanted, a thread scoped long variable.

One nice side effect is a case where the thread calls multiple methods from various classes. They will all be able to use the same thread scoped variable without major API changes. Since the value is not explicitly passed through one might argue it difficult to test and bad for design, but that is a separate topic altogether.

In what areas are popular frameworks using Thread Locals?

Spring being one of the most popular frameworks in Java uses ThreadLocals internally for many parts, easily shown by a simple github search. Most of the usages are related to the current’s user’s actions or information. This is actually one of the main uses for ThreadLocals in JavaEE world, storing information for the current request like in RequestContextHolder :


private static final ThreadLocal requestAttributesHolder = 
    new NamedThreadLocal<RequestAttributes>("Request attributes");
Or the current JDBC connection user credentials in UserCredentialsDataSourceAdapter.

If we get back on RequestContextHolder we can use this class to access all of the current request information for anywhere in our code.
Common use case for this is  LocaleContextHolder that helps us store the current user’s locale.
Mockito uses it to store the current “global” configuration and if we take a look at any framework out there there is a high chance we’ll find it as well.

Thread Locals and Memory Leaks

We learned this awesome little feature so let’s use it all over the place. We can do that but few google searches and we can find out that most out there say ThreadLocal is evil. That’s not exactly true, it is a nice utility but in some contexts it might be easy to create a memory leak.

“Can you cause unintended object retention with thread locals? Sure you can. But you can do this with arrays too. That doesn’t mean that thread locals (or arrays) are bad things. Merely that you have to use them with some care. The use of thread pools demands extreme care. Sloppy use of thread pools in combination with sloppy use of thread locals can cause unintended object retention, as has been noted in many places. But placing the blame on thread locals is unwarranted.” – Joshua Bloch

It is very easy to create a memory leak in your server code using ThreadLocal if it runs on an application server. ThreadLocal context is associated to the thread where it runs, and will be garbaged once the thread is dead. Modern app servers use pool of threads instead of creating new ones on each request meaning you can end up holding large objects indefinitely in your application.  Since the thread pool is from the app server our memory leak could remain even after we unload our application. The fix for this is simple, free up resources you do not need.

One other ThreadLocal misuse is API design. Often I have seen use of RequestContextHolder(that holds ThreadLocal) all over the place, like the DAO layer for example. Later on if one were to call the same DAO methods outside a request like and scheduler for example he would get a very bad surprise.
This create black magic and many maintenance developers who will eventually figure out where you live and pay you a visit. Even though the variables in ThreadLocal are local to the thread they are very much global in your code. Make sure you really need this thread scope before you use it.

More info on the topic

http://en.wikipedia.org/wiki/Thread-local_storage
http://www.appneta.com/blog/introduction-to-javas-threadlocal-storage/
https://plumbr.eu/blog/how-to-shoot-yourself-in-foot-with-threadlocals
http://stackoverflow.com/questions/817856/when-and-how-should-i-use-a-threadlocal-variable
https://plumbr.eu/blog/when-and-how-to-use-a-threadlocal
https://weblogs.java.net/blog/jjviana/archive/2010/06/09/dealing-glassfish-301-memory-leak-or-threadlocal-thread-pool-bad-ide
https://software.intel.com/en-us/articles/use-thread-local-storage-to-reduce-synchronization

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

CMS Pipelines … for NetRexx on the JVM

This year I want to tell you about a new and exciting addition to NetRexx (which, incidentally just turned 19 years old the day before yesterday). NetRexx, as some of you know, is the first alternative language for the JVM, stems from IBM, and is free and open source since 2011 (http://www.netrexx.org). It is a happy marriage of the Rexx Language (Michael Cowlishaw, IBM, 1979) and the JVM. NetRexx can run compiled ahead of time, as .class files for maximum performance, or interpreted, for a quick development cycle, or very dynamic production of code. After the addition of Scripting in version 3.03 last year, the new release (3.04, somewhere at the end of 2014) include Pipes.

We know what pipes are, I hear you say, but what are Pipes? A Pipeline, also called a Hartmann Pipeline, is a concept that extends and improves pipes as they are known from Unix and other operating systems. The name pipe indicates an inter- process communication mechanism, as well as the programming paradigm it has introduced. Compared to Unix pipes, Hartmann Pipelines offer multiple input- and output streams, more complex pipe topologies, and a lot more, too much for this short article but worthy of your study.

Pipelines were first implemented on VM/CMS, one of IBM’s mainframe operating systems. This version was later ported to TSO to run under MVS and has been part of several product configurations. Pipelines are widely used by VM users, in a symbiotic relationship with REXX, the interpreted language that also has its origins on this platform. Pipes in the NetRexx version are compile by a special Pipes Compiler that has been integrated with NetRexx. The resulting code can run on every platform that has a JVM (Java Virtual Machine), including z/VM and z/OS for that matter. This portable version of Pipelines was started by Ed Tomlinson in 1997 under the name of njpipes, when NetRexx was still very new, and was open sourced in 2011, soon after the NetRexx translator itself. It was integrated into the NetRexx translator in 2014 and will be released integrated in the NetRexx distribution for the first time with version 3.04. It answers the eternal question posed to the development team by every z/VM programmer we ever met: “But … Does It Have Pipes?” It also marks the first time that a non-charge Pipelines product runs on z/OS. But of course most of you will be running Linux, Windows or OSX, where NetRexx and Pipes also run splendidly.

NetRexx users are very cautious of code size and peformance – for example because applications also run on limited JVM specifications as JavaME, in Lego Robots and on Androids and Raspberries, and generally are proud and protective of the NetRexx runtime, which weighs in at 37K (yes, 37 kilobytes, it even shrunk a few bytes over the years). For this reason, the Pipes Compiler and the Stages are packaged in the NetRexxF.jar – F is for Full, and this jar also includes the eclipse Java compiler which makes NetRexx a standalone package that only needs a JRE for development. There is a NetRexxC.jar for those who have a working Java SDK and only want to compile NetRexx. So we have NetRexxR.jar at 37K, NetRexxC.jar at 322K, and the full NetRexx kaboodle in 2.8MB – still small compared to some other JVM Languages.

The pipeline terminology is a metaphore derived from plumbing. Fitting two or more pipe segments together yield a pipeline. Water flows in one direction through the pipeline. There is a source, which could be a well or a water tower; water is pumped through the pipe into the first segment, then through the other segments until it reaches a tap, and most of it will end up in the sink. A pipeline can be increased in length with more segments of pipe, and this illustrates the modular concept of the pipeline. When we discuss pipelines in relation to computing we have the same basic structure, but instead of water that passes through the pipeline, data is passed through a series of programs (stages) that act as filters. Data must come from some place and go to some place. Analogous to the well or the water tower there are device drivers that act as a source of the data, where the tap or the sink represents the place the data is going to, for example to some output device as your terminal window or a file on disk, or a network destination. Just as water, data in a pipeline flows in one direction, by convention from the left to the right.

A program that runs in a pipeline is called a stage. A program can run in more than one place in a pipeline – these occurrences function independent of each other. The pipeline specification is processed by the pipeline compiler, and it must be contained in a character string; on the commandline, it needs to be between quotes, while when contained in a file, it needs to be between the delimiters of a NetRexx string. An exclamation mark (!) is used as stage separator, while the solid vertical bar | can be used as an option when specifiying the local option for the pipe, after the pipe name. When looking a two adjaced segments in a pipeline, we call the left stage the producer and the stage on the right the consumer, with the stage separator as the connector.

A device driver reads from a device (for instance a file, the command prompt, a machine console or a network connection) or writes to a device; in some cases it can both read and write. An example of a device drivers are diskr for diskread and diskw for diskwrite; these read and write data from and to files. A pipeline can take data from one input device and write it to a different device. Within the pipeline, data can be modified in almost any way imaginable by the programmer. The simplest process for the pipeline is to read data from the input side and copy it unmodified to the output side. The pipeline compiler connects these programs; it uses one program for each device and connects them together. All pipeline segments run on their own thread and are scheduled by the pipeline scheduler. The inherent characteristic of the pipeline is that any program can be connected to any other program because each obtains data and sends data throug a device independent standard interface. The pipeline usually processes one record (or line) at a time. The pipeline reads a record for the input, processes it and sends it to the output. It continues until the input source is drained.

Until now everything was just theory, but now we are going to show how to compile and run a pipeline. The executable script pipe is included in the NetRexx distribution to specify a pipeline and to compile NetRexx source that contains pipelines. Pipelines can be specified on the command line or in a file, but will always be compiled to a .class file for execution in the JVM.

 pipe ”(hello) literal ”hello world” ! console”

This specifies a pipeline consisting of a source stage literal that puts a string (“hello world”) into the pipeline, and a console sink, that puts the string on the screen. The pipe compiler will echo the source of the pipe to the screen – or issue messages when something was mistyped. The name of the classfile is the name of the pipe, here specified between parentheses. Options also go there. We call execute the pipe by typing:

java hello

Now we have shown the obligatory example, we can make it more interesting by adding a reverse stage in between:

pipe ”(hello) literal ”hello world” ! reverse ! console

When this is executed, it dutifully types “dlrow olleh”. If we replace the string after literal with arg(), we then can start the hello pipeline with a an argument to reverse: and we run it with:

java hello a man a plan a canal panama

and it will respond:

amanap lanac a nalp a nam a

which goes to show that without ignoring space no palindrome is very convincing – which we can remedy with a change to the pipeline: use the change stage to take out the spaces:

pipe”(hello) literal arg() ! change /” ”// ! console”

Now for the interesting parts. Whole pipeline topologies can be added, webservers can be built, relational databases (all with a jdbc driver) can be queried. For people that are familiar with the z/VM CMS Pipelines product, most of its reference manual is relevant for this implementation. We are working on the new documentation to go with NetRexx 3.04.

Pipes for NetRexx are the work of Ed Tomlinson, Jeff Hennick, with contributions by Chuck Moore, myself, and others. Pipes were the first occasion I have laid eyes on NetRexx, and I am happy they now have found their place in the NetRexx open source distribution. To have a look at it, download the NetRexx source from the Kenai site (https://kenai.com/projects/netrexx ) and build a 3.04 version yourself. Alternatively, wait until the 3.04 package hits http://www.netrexx.org.
This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Lightweight Integration with Java EE and Camel

Enterprise Java has different flavors and perspectives. Starting at the plain platform technology, which is well known as Java EE over to different frameworks and integration aspects and finally use-cases which involve data-centric user interfaces or specific visualizations. The most prominent problem which isn’t solved by Java EE itself is “integration”. There are plenty of products out there from well know vendors which solve all kinds of integration problems and promise to deliver complete solutions. As a developer, all you need from time to time is a solution that just works. This is the ultimate “Getting Started Resource” for Java EE developers when it comes to system integration.

A Bit Of Integration Theory
Integration challenges are nothing new. Since there has been different kinds of system and the need to combine their data into another one, this has been a central topic. Gregor Hohpe and Bobby Woolf started to collect a set of basic patterns they used to solve their customers integration problems with. These Enterprise Integration Patterns (EIPs) can be considered the bible of integration. It tries to find a common vocabulary and body of knowledge around asynchronous messaging architectures by defining 65 integration pattern. Forrester calls those “The core language of EAI”.

What Is Apache Camel?
Apache Camel offers you the interfaces for the EIPs, the base objects, commonly needed implementations, debugging tools, a configuration system, and many other helpers which will save you a ton of time when you want to implement your solution to follow the EIPs. It’s a complete production-ready framework. But it does not stop at those initially defined 65 patterns. It extends those with over 150 ready-to-use components which solve different problems around endpoints or system or technology integration. At a high level Camel consists of a CamelContext which contains a collection of Component instances. A Component is essentially a factory of Endpoint instances. You can explicitly configure Component instances in Java code or an IoC container like Spring, Guice or CDI, or they can be auto-discovered using URIs.

Why Should A Java EE Developer Care?
Enterprise projects require us to do so. Dealing with all sorts of system integrations always has been a challenging topic. You can either chose the complex road by using messaging systems and wiring them into your application and implement everything yourself or go the heavyweight road by using different products. I have been a fan of more pragmatic solutions since ever. And this is what Camel actually is: Comparably lightweight, easy to bootstrap and coming with a huge amount of pre-built integration components which let the developer focus on solving the business requirement behind it. Without having to learn new APIs or tooling. Camel comes with either a Java-based Fluent API, Spring or Blueprint XML Configuration files, and even a Scala DSL. So no matter which base you start to jump off from, you’ll always find something that you already know.

How To Get Started?
Did I got you? Want to give it a try? That’s easy, too. You have different ways according to the frameworks and platform you use. Looking back at the post title, this is going to focus on Java EE.
So, first thing you can do is just bootstrap Camel yourself. All you need is the core camel dependency and the cdi-camel dependency. Setting up a plain Java EE 7 maven project and adding those two is more than sufficient.

<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-core</artifactId>
<version>${camel.version}</version>
</dependency>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-cdi</artifactId>
<version>${camel.version}</version>
</dependency>

Next thing you need to do is find a place to inject your first CamelContext.


@Inject
CdiCamelContext context;

After everything is injected, you can start adding routes to it. A more complete example can be found in my CamelEE7 project on GitHub. Just fork it an go ahead. This one will work on a random Java EE application server. If you are on WildFly already, you can also take full advantage of the WildFly-Camel subsystem.

The WildFly Camel Subsystem
The strategy of wildfly-camel is, that a user can “just use” the camel core/component APIs in deployments that WildFly supports already. In other words, Camel should “just work” in standard Java EE deployments. The binaries are be provided by the platform. The deployment should not need to worry about module/wiring details.
Defining and Deploying Camel Contexts can be done in different ways. You either can directly define a Context in your standalone-camel.xml server configuration or deploy it as part of your web-app either as a single XML file with a predefined -camel-context.xml file suffix or as part of another WildFly supported deployment as META-INF/jboss-camel-context.xml file.
The WildFly Camel test suite uses the WildFly Arquillian managed container. This can connect to an already running WildFly instance or alternatively start up a standalone server instance when needed. A number of test enrichers have been implemented that allow you have these WildFly Camel specific types injected into your Arquillian test cases; you can inject a CamelContextFactory or a CamelContextRegistry as an  @ArquillianResource.
If you want to get started with that you can take a more detailed look at my more detailed blog-post.

Finding Examples

If you are excited and got everything up and running it is time to dig into some examples. First place to look at is the example directory in the distribution. There is an example for everything that you might need.
One of the most important use-cases is the tight integration with ActiveMQ. And assuming that you have something like a bunch of JMS messages that need to be converted into Files that are stored in a filesystem: This is a perfect Camel job. You need to configure the ActiveMQ component additional to what you’ve seen above and it allows messages to be sent to a JMS Queue or Topic or messages to be consumed from a JMS Queue or Topic using Apache ActiveMQ.
Teh following code shows you what it takes to convert a JMS messages from the test.queue queue into the file component which consumes them and stores them to disk.


context.addRoutes(new RouteBuilder() {
public void configure() {
from("test-jms:queue:test.queue").to("file://test");
}
});

Imagine to do this yourself. Want more sophisticated examples? With Twitter integration? Or different other technologies? There are plenty of examples out there to pick from. Probably one of the most exciting aspects of Camel. It is lightweight, stable and out there since years. Make sure to also follow the mailing-lists and the discussion forums.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Self-healing applications: are they real?

This post is an example about an application where the first solution to each and every IT problem – “have you tried turning it off and on again” – can actually do more harm than good. Instead, we have an application that can literally heal itself: it fails at the beginning, but starts running smoothly after some time. To give an example of such application in action, we recreated it in the most simple form possible, gathering inspiration from what is now a five-year old post from the Heinz Kabutz’ Java Newsletter:



package eu.plumbr.test;

public class HealMe {
private static final int SIZE = (int) (Runtime.getRuntime().maxMemory() * 0.6);

public static void main(String[] args) throws Exception {
for (int i = 0; i < 1000; i++) {
allocateMemory(i);
}
}

private static void allocateMemory(int i) {
try {
{
byte[] bytes = new byte[SIZE];
System.out.println(bytes.length);
}

byte[] moreBytes = new byte[SIZE];
System.out.println(moreBytes.length);

System.out.println("I allocated memory successfully " + i);

} catch (OutOfMemoryError e) {
System.out.println("I failed to allocate memory " + i);
}
}

}

The code above is allocating two bulks of memory in a loop. Each of those allocation is equal to 60% of the total available heap size. As the allocations occur sequentially in the same method, then one might expect this code to keep throwing java.lang.OutOfMemoryError: Java heap space errors and never successfully complete the allocateMemory() method.

Let us start with the static analysis of the source code:

  1. From the first fast examination, this code really cannot complete, because we try to allocate more memory than is available to JVM.
  2. If we look closer we can notice that the first allocation takes place in a scoped block, meaning that the variables defined in this block are visible only to this block. This indicates that the bytes should be eligible for GC after the block is completed. And so our code should in fact run fine right from the beginning, as at the time when it tries to allocate moreBytes the previous allocation bytes should be dead.
  3. If we now look into the compiled classfile, we will see the following bytecode:


    private static void allocateMemory(int);
    Code:
    0: getstatic #3 // Field SIZE:I
    3: newarray byte
    5: astore_1
    6: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream;
    9: aload_1
    10: arraylength
    11: invokevirtual #5 // Method java/io/PrintStream.println:(I)V
    14: getstatic #3 // Field SIZE:I
    17: newarray byte
    19: astore_1
    20: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream;
    23: aload_1
    24: arraylength
    25: invokevirtual #5 // Method java/io/PrintStream.println:(I)V
    ---- cut for brevity ----

    Here we see, that on offsets 3-5 first array is allocated and stored into local variable with index 1. Then, on offset 17 another array is going to be allocated. But first array is still referenced by local variable and so the second allocation should always fail with OOM. Bytecode interpreter just cannot let GC clean up first array, because it is still strongly referenced.

Our static code analysis has shown us that for two underlying reasons, the presented code should not run successfully and in one case, it should. Which one out of those three is the correct one? Let us actually run it and see for ourselves. It turns out that both conclusions were correct. First, application fails to allocate memory. But after some time (on my Mac OS X with Java 8 it happens at iteration #255) the allocations start succeeding:


java -Xmx2g eu.plumbr.test.HealMe
1145359564
I failed to allocate memory 0
1145359564
I failed to allocate memory 1

… cut for brevity ...

I failed to allocate memory 254
1145359564
I failed to allocate memory 255
1145359564
1145359564
I allocated memory successfully 256
1145359564
1145359564
I allocated memory successfully 257
1145359564
1145359564

Self-healing code is a reality! Skynet is near…

In order to understand what is really happening we need to think, what changes during program execution? The obvious answer is, of course, Just-In-Time compilation can occur. If you recall, Just-In-Time compilation is a JVM built-in mechanics to optimize code hotspots. For this, the JIT monitors the running code and when a hotspot is detected, JIT compiles your bytecode into native code, performing different optimizations such as method inlining and dead code elimination in the process.

Lets see if this is a case by turning on the following command line options and relaunching the program



-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+LogCompilation

This will generate a log file, in our case named hotspot_pid38139.log, where 38139 was the PID of your java process. In this file the following line can be found:



<task_queued compile_id='94' method='HealMe allocateMemory (I)V' bytes='83' count='256' iicount='256' level='3' stamp='112.305' comment='tiered' hot_count='256'/>

This means, that after executing “allocateMemory” methods 256 times, C1 compiler has decided to queue this method for C1 tier 3 compilation. You can get more information about tiered compilation’s levels and different thresholds here. And so our first 256 iterations were run in interpreted mode, where bytecode interpreter, being a simple stack machine, cannot know in advance, if some variable, bytes in this case, will be used further on or not. But JIT sees the whole method at once and so can deduce than bytes will not be used anymore and is, in fact, GC eligible. Thus the garbage collection can eventually take place and our program has magically self-healed. Now, I can only hope none of the readers should actually be responsible for debugging such a case in production. But in case you wish to make someone’s life a misery, introducing code like this to production would be a sure way to achieve this.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Own your heap: Iterate class instances with JVMTI

Today I want to talk about a different Java that most of us don’t see and use every day, to be more exact about lower level bindings, some native code and how to perform some small magic. Albeit we won’t get to the true source of magic on JVM, but performing some small miracles is within a reach of a single post.

I spend my days researching, writing and coding on the RebelLabs team at ZeroTurnaround, a company that creates tools for Java developers that mostly run as javaagents. It’s often the case that if you want to enhance the JVM without rewriting it or get any decent power on the JVM you have to dive into the beautiful world of Java agents. These come in two flavors: Java javaagents and native ones. In this post we’ll concentrate on the latter.

Note, this GeeCON Prague presentation by Anton Arhipov, who is an XRebel product lead, is a good starting point to learn about javaagents written entirely in Java: Having fun with Javassist.

In this post we’ll create a small native JVM agent, explore the possibility of exposing native methods into the Java application and find out how to make use of the Java Virtual Machine Tool Interface.

If you’re looking for a practical takeaway from the post, we’ll be able to, spoiler alert, count how many instances of a given class are present on the heap.

Imagine that you are Santa’s trustworthy hacker elf and the big red has the following challenge for you:
Santa: My dear Hacker Elf, could you write a program that will point out how many Thread objects are currently hidden in the JVM’s heap?
Another elf that doesn’t like to challenge himself would answer: It’s easy and straightforward, right?


return Thread.getAllStackTraces().size();

But what if we want to over-engineer our solution to be able to answer this question about any given class? Say we want to implement the following interface?


public interface HeapInsight {
int countInstances(Class klass);
}

Yeah, that’s impossible, right? What if you receive String.class as an argument? Have no fear, we’ll just have to go a bit deeper into the internals on the JVM. One thing that is available to JVM library authors is JVMTI, a Java Virtual Machine Tool Interface. It was added ages ago and many tools, that seem magical, make use of it. JVMTI offers two things:

  • a native API
  • an instrumentation API to monitor and transform the bytecode of classes loaded into the JVM.

For the purpose of our example, we’ll need access to the native API. What we want to use is the IterateThroughHeap function, which lets us provide a custom callback to execute for every object of a given class.

First of all, let’s make a native agent that will load and echo something to make sure that our infrastructure works.

A native agent is something written in a C/C++ and compiled into a dynamic library to be loaded before we even start thinking about Java. If you’re not proficient in C++, don’t worry, plenty of elves aren’t, and it won’t be hard. My approach to C++ includes 2 main tactics: programming by coincidence and avoiding segfaults. So since I managed to write and comment the example code for this post, collectively we can go through it. Note: the paragraph above should serve as a disclaimer, don’t put this code into any environment of value to you.

Here’s how you create your first native agent:


#include
#include

using namespace std;

JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved)
{
cout << "A message from my SuperAgent!" << endl;
return JNI_OK;
}

The important part of this declaration is that we declare a function called Agent_OnLoad, which follows the documentation for the dynamically linked agents.

Save the file as, for example a native-agent.cpp and let’s see what we can do about turning into a library.

I’m on OSX, so I use clang to compile it, to save you a bit of googling, here’s the full command:


clang -shared -undefined dynamic_lookup -o agent.so -I /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/ -I /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/darwin native-agent.cpp

This creates an agent.so file that is a library ready to serve us. To test it, let’s create a dummy hello world Java class.


package org.shelajev;
public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}

When you run it with a correct -agentpath option pointing to the agent.so, you should see the following output:


java -agentpath:agent.so org.shelajev.Main
A message from my SuperAgent!
Hello World!

Great job! We now have everything in place to make it actually useful. First of all we need an instance of jvmtiEnv, which is available through a JavaVM *jvm when we are in the Agent_OnLoad, but is not available later. So we have to store it somewhere globally accessible. We do it by declaring a global struct to store it.


#include
#include

using namespace std;

typedef struct {
jvmtiEnv *jvmti;
} GlobalAgentData;

static GlobalAgentData *gdata;

JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved)
{
jvmtiEnv *jvmti = NULL;
jvmtiCapabilities capa;
jvmtiError error;

// put a jvmtiEnv instance at jvmti.
jint result = jvm->GetEnv((void **) &jvmti, JVMTI_VERSION_1_1);
if (result != JNI_OK) {
printf("ERROR: Unable to access JVMTI!n");
}
// add a capability to tag objects
(void)memset(&capa, 0, sizeof(jvmtiCapabilities));
capa.can_tag_objects = 1;
error = (jvmti)->AddCapabilities(&capa);

// store jvmti in a global data
gdata = (GlobalAgentData*) malloc(sizeof(GlobalAgentData));
gdata->jvmti = jvmti;
return JNI_OK;
}

We also updated the code to add a capability to tag objects, which we’ll need for iterating through the heap. The preparations are done now, we have the JVMTI instance initialized and available for us. Let’s offer it to our Java code via a JNI.

JNI stands for Java Native Interface, a standard way to include native code calls into a Java application. The Java part will be pretty straightforward, add the following countInstances method definition to the Main class:


package org.shelajev;

public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
int a = countInstances(Thread.class);
System.out.println("There are " + a + " instances of " + Thread.class);
}

private static native int countInstances(Class klass);
}

To accommodate the native method, we must change our native agent code. I’ll explain it in a minute, but for now add the following function definitions there:


extern "C"
JNICALL jint objectCountingCallback(jlong class_tag, jlong size, jlong* tag_ptr, jint length, void* user_data)
{
int* count = (int*) user_data;
*count += 1;
return JVMTI_VISIT_OBJECTS;
}

extern "C"
JNIEXPORT jint JNICALL Java_org_shelajev_Main_countInstances(JNIEnv *env, jclass thisClass, jclass klass)
{
int count = 0;
jvmtiHeapCallbacks callbacks;
(void)memset(&callbacks, 0, sizeof(callbacks));
callbacks.heap_iteration_callback = &objectCountingCallback;
jvmtiError error = gdata->jvmti->IterateThroughHeap(0, klass, &callbacks, &count);
return count;
}

Java_org_shelajev_Main_countInstances is more interesting here, its name follows the convention, starting with Java_ then the _ separated fully qualified class name, then the method name from the Java code. Also, don’t forget the JNIEXPORT declaration, which says that the function is exported into the Java world.

Inside the Java_org_shelajev_Main_countInstances we specify the objectCountingCallback function as a callback and call IterateThroughHeap with the parameters that came from the Java application.

Note that our native method is static, so the arguments in the C counterpart are:

 
JNIEnv *env, jclass thisClass, jclass klass

for an instance method they would be a bit different:

 
JNIEnv *env, jobj thisInstance, jclass klass

where thisInstance points to the this object of the Java method call.

Now the definition of the objectCountingCallback comes directly from the documentation. And the body does nothing more than incrementing an int.

Boom! All done! Thank you for your patience. If you’re still reading this, you’re ready to test all the code above.

Compile the native agent again and run the Main class. This is what I see:


java -agentpath:agent.so org.shelajev.Main
Hello World!
There are 7 instances of class java.lang.Thread

If I add a Thread t = new Thread(); line to the main method, I see 8 instances on the heap. Sounds like it actually works. Your thread count will almost certainly be different, don’t worry, it’s normal because it does count JVM bookkeeping threads, that do compilation, GC, etc.

Now, if I want to count the number of String instances on the heap, it’s just a matter of changing the argument class. A truly generic solution, Santa would be happy I hope.

Oh, if you’re interested, it finds 2423 instances of String for me. A pretty high number for such as small application. Also,


return Thread.getAllStackTraces().size();

gives me 5, not 8, because it excludes the bookkeeping threads! Talk about trivial solutions, eh?

Now you’re armed with this knowledge and this tutorial I’m not saying you’re ready to write your own JVM monitoring or enhancing tools, but it is definitely a start.

In this post we went from zero to writing a native Java agent, that compiles, loads and runs successfully. It uses the JVMTI to obtain the insight into the JVM that is not accessible otherwise. The corresponding Java code calls the native library and interprets the result.
This is often the approach the most miraculous JVM tools take and I hope that some of the magic has been demystified for you.

What do you think, does it clarify agents for you? Let me know! Find me and chat with me on twitter: @shelajev.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!