Thursday, July 16, 2015

Choose ORM carefully

Introduction

We have recently migrated source code from Hibernate ORM to JDBC (Spring JDBC template) based implementation. Performance has been improved 10 times. This post describes the use-case, bench-marking and the migration steps. 

Use case

A tree structure in database is getting populated from a deep file system (directory structure) having around 75000 nodes. Each node (directory) contains text files, which get parsed based on business rules and then populate the database ( BRANCHs representing a node,  tables referring to branch, tree_nodes).  The database tables was well mapped to Hibernate JPA entries and on saving Branch object, its relevant entities were automatically getting saved. Whole operation was performed in recursive manner by traversing over directory tree. Each table’s primary key is auto generated from a separate sequence.
As per development level performance testing, it was estimated that initial tree will take 30 hours to load whole tree. This was not acceptable, as UAT cannot be started without this migration. (Env : Oracle 11g, JBoss 6, JDK 6, Hibernate, Spring 3)

Bench-marking: Hibernate vs Spring JDBC

Data model

Create table and sequence

CREATE SEQUENCE customer_ID_SEQ START WITH 1 INCREMENT BY 1;
create table customer (id bigint not null, name varchar(200), primary key (id));

Model

@Entity
public class Customer {

  @Id
  @GeneratedValue(strategy = GenerationType.SEQUENCE,
  generator = "CUSTOMER_ID_SEQ")
  @SequenceGenerator(name =  "CUSTOMER_ID_SEQ",
  sequenceName =  "CUSTOMER_ID_SEQ", allocationSize = 1)
  private long id;
  private String name; // application assigned

  @Column(name = "NAME")
  …
  …

Approaches

1.      Approach 1 : Existing

@Transactional
  public void bulkPersist(List<Customer> entities) {
    for (Customer entity : entities) {
      em.persist(entity);
    }
 }

2.      Approach 2 : Batch size - Set hibernate batch-size property and flushing after  a batch same.

@Transactional
  public void bulkPersist(List<Customer> entities) {
    int i = 0;
    for (Customer entity : entities) {
      em.persist(entity);
      i++;

      if (i % batchSize == 0) {
        flush();
        clear();
      }
    }
}
3.      Approach 3 : JDBC template

@Transactional
 public void bulkPersist (final List<Customer> entities) {
 template.batchUpdate("insert into customer (id, name) values (CUSTOMER_ID_SEQ.nextval, ?)", new    
       BatchPreparedStatementSetter() {
       
        @Override
        public void setValues(PreparedStatement ps, int i) throws SQLException {
                  ps.setString(1, entities.get(i).getName());
         
        }
       
        @Override
        public int getBatchSize() {
                return entities.size();
        }
        });
 }

Result

Approach 3 is coming out 10 time faster than that of Approach 1. Approach 2 has also improved the performance but lessor then that of Approach 3;

Migration

As per benchmarking, Spring JDBC template’s ‘batchUpdate’ is the fastest as compared to hibernate based approaches. Now while migrating code base from hibernate to JDBC, biggest issue was to resolve the generated ID referred in other queries. 

Since ID is getting generated by a sequence and Spring JDBC template’s ‘batchUpdate’ do not have provision to fetch generated IDs. Hibernate was doing this automatically by updating ID field of the entity object.

 So solve this, we had fetched bulk ids from the sequence, in a single query:
List<Integer> ids = template.queryForList("select customer_id_seq.nextval from (select level from dual connect by LEVEL <="
                                                                    entities.size() +")", Integer.class);
     
Set ids in entity object while iterating BatchPreparedStatementSetter.setValues method
    public void setValues(PreparedStatement ps, int i) throws SQLException {
              Customer customer = entities.get(i);
               customer.setId(ids.get(i));
              ps.setInt(1, customer.getId());
              ps.setString(2, customer.getName());
    }

This way entity object get populated in the same way as it is done in hibernate, without any special iteration/processing. Once entity object is populated, all the dependent batches can be fired so that entity.getId returns the correct value.

We were able to migrate Hibernate based codebase to Spring JDBC with minimal changes in source code. To save JVM memory, we have also implemented batching over it. Processed only 2000 nodes at a time.

After porting codebase to Spring JDBC Template, we measured performance for the tree load, on local database and it was coming out around 3 hrs. We run the same on UAT environment, and the whole tree get loaded in 1 hour.