Site icon JVM Advent

Batch Updates in Java Persistence Technologies

Relational data access often needs to insert or update multiple rows in a database as part of a single operation. In such scenarios, it is a good idea to use the batch update facilities built on top of JDBC, in order to submit multiple SQL commands as a single request to the underlying database. This reduces the number of roundtrips to the database, hence improving the result time of the operation.

JDBC batched updates

The Statement interface and its subinterfaces, PreparedStatement and CallableStatement support executing multiple SQL statements as a batch, by maintaining a collection of these statements that the application can add to using the method Statement.addBatch(sql). When the batch of statements is ready to be executed, the method Statement.executeBatch() can be called to execute them in one unit. To clear the current batch, the application can call the method Statement.clearBatch(). Only statements that return an update count are eligible for batch execution; select statements will throw a BatchUpdateException.

Example

The following code uses a Statement‘s batch to add a student to a course:

try(Connection connection = dataSource.getConnection()) {
    connection.setAutoCommit(false);

    try(Statement statement = connection.createStatement()) {
        statement.addBatch("insert into student values (14, 'John Doe')");
        statement.addBatch("insert into course values (3, 'Biology')");
        statement.addBatch("insert into student_courses values (14, 3)");

        int[] updateCounts = statement.executeBatch();
        connection.commit();

    } catch(BatchUpdateException ex) {
        connection.rollback();
        ... // do something with exception
    }
}

Another example using a PreparedStatement. Given a customer table, we want to import a list of customers. Notice that the method addBatch does not take an SQL string here, instead it adds the specified parameters to the prepared statement’s batch of commands.

connection.setAutoCommit(false);
try(PreparedStatement statement = connection.prepareStatement("insert into customer values (?, ?)")) {
    int n = 0;
    for(Customer customer : customers) {
        statement.setInt(1, ++n);
        statement.setString(2, customer.getName());
        statement.addBatch();
    }
    int[] updateCounts = statement.executeBatch();
    connection.commit();
} catch(BatchUpdateException ex) {
    connection.rollback();
    ... // do something with exception
}
Switching off auto-commit

One important thing to notice in the above code snippets is the call to connection.setAutoCommit(false), which allows the application to control when to commit the transaction. In the previous code, we only commit the transaction when all statements are executed successfully. In case of a BatchUpdateException thrown because of a failed statement, we roll back the transaction so that no effect happens on the database. We could have decided to examine the BatchUpdateException (as we’ll see shortly) to see which statement(s) failed and still decide to commit the statements that were processed successfully.

Disabling auto-commit mode should always be done when executing a batch of updates. Otherwise, the result of the updates depends on the behavior of the JDBC driver: it may or may not commit the successful statements.

Update counts and BatchUpdateException

The method Statement.executeBatch() returns an array of integers where each value is the number of affected rows by the corresponding statement. The order of values matches the order in which statements are added to the batch. Specifically, each element of the array is:

  1. an integer >= 0, reflecting the affected row count by the update statement,
  2. or the constant Statement.SUCCESS_NO_INFO, indicating that the statement was successful but the affected row count is unknown.

In case one of the statements failed, or was not a valid update statement, the method executeBatch() throws a BatchUpdateException. The exception can be examined by calling BatchUpdateException.getUpdateCounts(), which returns an array of integers. There are two possible scenarios:

  1. If the JDBC driver allows continuing the processing of remaining statements upon a failed one, then the result of BatchUpdateException.getUpdateCounts() is an array containing as many integers as there were statements in the batch, where the integers correspond to the affected row count for successful statements, except for the failed ones where the corresponding array element will be the constant Statement.EXECUTE_FAILED.
  2. If the JDBC driver does not continue upon a failed statement, then the result of BatchUpdateException.getUpdateCounts() is an array containing the affected row count for all successful statements until the first failed one.

Batch updates using Spring’s JdbcTemplate

Spring offers a convenient class as part of its support for JDBC. It reduces the amount of boilerplate code required when using plain JDBC such as processing result sets and closing resources. It also makes batch updates easier, as shown in the following example:

List<Customer> customers = ...;

jdbcTemplate.batchUpdate("insert into customer values (?, ?)",
             new BatchPreparedStatementSetter() {

    @Override
    public void setValues(PreparedStatement ps, int i) throws SQLException {
        ps.setLong(1, customers.get(i).getId());
        ps.setString(2, customers.get(i).getName());
    }

    @Override
    public int getBatchSize() {
        return customers.size();
    }
});

Batch updates using Hibernate

Hibernate can also make use of JDBC’s batching facility when generating the statements corresponding to its persistence operations. The main configuration property is hibernate.jdbc.batch_size which specifies the maximum batch size. This setting can be overriden for a specific session using the method Session.setJdbcBatchSize(). Hibernate will use the value specified in the method on the current session, and if not set it uses the value in the global session factory-level setting hibernate.jdbc.batch_size.

The earlier example that stores a list of customers would use the persistence methods in the Session instance:

Transaction transaction = null;
try (Session session = sessionFactory.openSession()) {
    transaction = session.getTransaction();
    transaction.begin();

    for (Customer customer : customers) {
        session.persist(customer);
    }

    transaction.commit();
} catch (RuntimeException ex) {
    if (transaction != null) {
        transaction.rollback();
    }
    throw ex;
}

When the transaction.commit() is invoked, Hibernate will send the SQL statements that insert the customer rows. If batching is enabled as described earlier (either via hibernate.jdbc.batch_size or by calling Session.setJdbcBatchSize(batchSize)), then all the generated statements will be sent as a single request. Otherwise, each statement is sent as a single request.

When employing batched updates in Hibernate for a large number of entity objects, it is a good practice to flush the session and clear its cache periodically as opposed to flushing the session at the end of the transaction. This reduces memory usage by the session cache because it holds entities that are in persistent state:

Transaction transaction = null;
try (Session session = sessionFactory.openSession()) {
    transaction = session.getTransaction();
    transaction.begin();

    int n = 0;
    for (Customer customer : customers) {
        if (++n % batchSize == 0) {
            // Flush and clear the cache every batch
            session.flush();
            session.clear();
        }
        session.persist(customer);
    }

    transaction.commit();
} catch (RuntimeException ex) {
    if (transaction != null) {
        transaction.rollback();
    }
    throw ex;
}

One important thing to know is that batch insert (not update or delete) doesn’t work with entities using identity columns (i.e. whose generation strategy is GenerationType.IDENTITY, because Hibernate needs to generate the identifier when persisting the entity and in this case the value can only be generated by sending the insert statement.

It should be noted that the above applies equally if the application uses an EntityManager instead of directly using a Session.

Batch updates using jOOQ

jOOQ also supports batch updates easily. Here’s an example that follows the earlier examples:

DSLContext create = ...;
BatchBindStep batch = create.batch(create.insertInto(CUSTOMER, ID, NAME)
                                         .values((Integer) null, null));
int n = 0;
for (Customer customer : customers) {
    batch.bind(++n, customer.getName());
}
int[] updateCounts = batch.execute();

Summary

All major Java persistence technologies support batch mode updates to relational databases leveraging the JDBC API. Such mode can improve performance for applications involving heavy workloads by reducing the number of network roundtrips to the database server.

[See the original article here]

Author: Mahmoud Anouti

Senior software engineer with over 7 years of Java development experience. Helps organize Beirut Java User Group. Check out his blog.

Exit mobile version