Introduction
We have recently migrated source code from Hibernate ORM to JDBC (Spring JDBC template) based implementation. Performance has been improved 10 times. This post describes the use-case, bench-marking and the migration steps.
We have recently migrated source code from Hibernate ORM to JDBC (Spring JDBC template) based implementation. Performance has been improved 10 times. This post describes the use-case, bench-marking and the migration steps.
Use case
A tree structure in database is getting populated from a deep file system (directory structure) having around 75000 nodes. Each node (directory) contains text files, which get parsed based on business rules and then populate the database ( BRANCHs representing a node, tables referring to branch, tree_nodes). The database tables was well mapped to Hibernate JPA entries and on saving Branch object, its relevant entities were automatically getting saved. Whole operation was performed in recursive manner by traversing over directory tree. Each table’s primary key is auto generated from a separate sequence.
As per development level performance testing, it was estimated that initial tree will take 30 hours to load whole tree. This was not acceptable, as UAT cannot be started without this migration. (Env : Oracle 11g, JBoss 6, JDK 6, Hibernate, Spring 3)
Bench-marking: Hibernate vs Spring JDBC
Data model
Create table and sequence
CREATE SEQUENCE customer_ID_SEQ START WITH 1 INCREMENT BY 1;
create table customer (id bigint not null, name varchar(200), primary key (id));
Model
@Entity
public class Customer {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE,
generator = "CUSTOMER_ID_SEQ")
@SequenceGenerator(name = "CUSTOMER_ID_SEQ",
sequenceName = "CUSTOMER_ID_SEQ", allocationSize = 1)
private long id;
private String name; // application assigned
@Column(name = "NAME")
…
…
Approaches
1.
Approach 1 : Existing
@Transactional
public void bulkPersist(List<Customer> entities) {
for (Customer entity : entities) {
em.persist(entity);
}
}
2.
Approach 2 : Batch size - Set hibernate batch-size property and
flushing after a batch same.
@Transactional
public void bulkPersist(List<Customer> entities) {
int i = 0;
for (Customer entity : entities) {
em.persist(entity);
i++;
if (i % batchSize == 0) {
flush();
clear();
}
}
}
3.
Approach 3 : JDBC template
@Transactional
public void bulkPersist (final List<Customer> entities) {
template.batchUpdate("insert into customer (id, name) values (CUSTOMER_ID_SEQ.nextval, ?)", new
BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setString(1,
entities.get(i).getName());
}
@Override
public int getBatchSize() {
return entities.size();
}
});
}
Result
Approach
3 is coming out 10 time faster
than that of Approach 1. Approach 2 has also improved the performance but
lessor then that of Approach 3;
Migration
As per benchmarking, Spring JDBC template’s ‘batchUpdate’ is
the fastest as compared to hibernate based approaches. Now while migrating code
base from hibernate to JDBC, biggest issue was to resolve the generated ID
referred in other queries.
Since ID is getting generated by a sequence and Spring JDBC
template’s ‘batchUpdate’ do not have provision to fetch generated IDs.
Hibernate was doing this automatically by updating ID field of the entity
object.
So solve this, we had fetched bulk ids from the sequence, in a
single query:
List<Integer> ids = template.queryForList("select customer_id_seq.nextval
from (select level from dual connect
by LEVEL <=" +
entities.size() +")", Integer.class);
Set ids in entity object while
iterating BatchPreparedStatementSetter.setValues
method
public void setValues(PreparedStatement ps, int i) throws SQLException {
Customer
customer = entities.get(i);
customer.setId(ids.get(i));
ps.setInt(1,
customer.getId());
ps.setString(2, customer.getName());
}
This way entity object get populated in the same way as it
is done in hibernate, without any special iteration/processing. Once entity object is populated, all the dependent batches
can be fired so that entity.getId returns the correct value.
We were able to migrate Hibernate based codebase to
Spring JDBC with minimal changes in source code. To save JVM memory, we have also implemented batching over
it. Processed only 2000 nodes at a time.
After porting codebase to Spring JDBC Template, we measured
performance for the tree load, on local database and it was coming out around 3
hrs. We run the same on UAT environment, and the whole tree get loaded in 1
hour.