Hibernate Search and JPA tutorial

In this tutorial we will show how we can upgrade our JPA + Maven application by adding Hibernate search functionalities.

Hibernate Search and JPA – part 1

Hibernate search can be used to apply the powerful full text search engines (like Apache Lucene) to your applications. Hibernate search futher address some shortcomings of Apache Lucene since it takes care of index synchronization and manages correctly the transformation from free text queries to domain objects.

Before diving into a simple example we will address a common newbie question: why do we need using Hibernate Search ? couldn’t we just use the great Criteria API or plain Queries ?

Traditional Query/Criteria work well in most cases, provided that you know its limitations. For example, consider the following example:

Query query = sessionFactory.getCurrentSession().createQuery("from User u where u.email like :email");
List<User> userList = query.setParameter("email", "john%").list();

This query might perform badly if the amount of data to retrieve is quite large and the attributes searched are not indexed on DB. On the other hand, when using Hibernate Search you can provide “google like” on field input texts which intelligently matches different fields or even Entity types; implementing such a feature with Criteria or SQL is a madness of complexity and won’t get you as good results.

 In order to achieve this great functionalities, Hibernate search creates and uses indexes on the fields which will be part of your searches, however the great news is that, once you have created your index, Hibernate will handle all future index udates.

As the documentation says:

By default, every time an object is inserted , updated or deleted through Hibernate, Hibernate Search updates the according Lucene index .

On top of the generated indexes, it can be quite easy to perform data mining, document similarity search, etc.. for example you can build tag clouds without needing your users to actually tag stuff by hand: you already have vectors of frequencies for all terms of your database.

Ok, let’s get back to our JPA +maven example, which uses an Employee and Department class. In this example we will suppose that we will perform search queries on the name attribute of the Employee. Here’s the domain class:

package com.mastertheboss.domain;

import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.ManyToOne;

import org.hibernate.search.annotations.Analyze;
import org.hibernate.search.annotations.Field;
import org.hibernate.search.annotations.Index;
import org.hibernate.search.annotations.Indexed;
import org.hibernate.search.annotations.IndexedEmbedded;
import org.hibernate.search.annotations.Store;

@Entity
@Indexed
public class Employee {
    @Id
    @GeneratedValue
    private Long id;
    
    @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
    private String name;
    
    @IndexedEmbedded
    @ManyToOne
    private Department department;

    public Employee() {}

    public Employee(String name, Department department) {
        this.name = name;
        this.department = department;
    }
    

    public Employee(String name) {
        this.name = name;
    }

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Department getDepartment() {
        return department;
    }

    public void setDepartment(Department department) {
        this.department = department;
    }

    @Override
    public String toString() {
        return "Employee [id=" + id + ", name=" + name + ", department="
                + department.getName() + "]";
    }

}

What has changed in the class: at first, we have declared thea persistent class as indexable. This is done by annotating the class with @Indexed (all entities not annotated with @Indexed will be ignored by the indexing process).

Next, the parameter index=Index.YES will ensure that the text will be indexed, while analyze=analyze.YES ensures that the text will be analyzed using the default Lucene analyzer. Usually, This helps if you are using common words like ‘a’ or ‘the’ in your searches. We will talk more about analyzers a little later on. The third parameter we specify within @Field, store=Store.NO, ensures that the actual data will not be stored in the index.

Finally, although not used in this example, we might need searching on the embedded Department class (such as the Department name).So we need to add the @IndexEmbedded annotation to the (Department department9 object that refers to the objects which contain the searchable text, and we also need to add the @Field(index=Index.TOKENIZED, store=Store.NO) annotation to the Department fields which are searchable.

package com.mastertheboss.domain;

import java.util.ArrayList;
import java.util.List;

import javax.persistence.CascadeType;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.OneToMany;

@Entity
public class Department {

    @Id
    @GeneratedValue
    private Long id;

    @Field(index=Index.TOKENIZED, store=Store.NO)
    private String name;
    
    @OneToMany(mappedBy="department",cascade=CascadeType.PERSIST)
    private List<Employee> employees = new ArrayList<Employee>();
    

    public Department() {
        super();
    }
    public Department(String name) {
        this.name = name;
    }
    public Long getId() {
        return id;
    }
    public void setId(Long id) {
        this.id = id;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public List<Employee> getEmployees() {
        return employees;
    }
    public void setEmployees(List<Employee> employees) {
        this.employees = employees;
    }
}

Changes needed in your configuration

As said, Hibernate Search uses the Apache Lucene search engine to do its indexing. To start using Hibernate Search, you’ll need to configure the location of these index files, as well as a search directory provider (we’ll just use the default). This is done in your JPA/Hibernate properties file. In this example we are using JPA so here’s your persistence.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<persistence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
    version="2.0" xmlns="http://java.sun.com/xml/ns/persistence">


    <persistence-unit name="persistenceUnit"
        transaction-type="RESOURCE_LOCAL">

        <class>com.mastertheboss.domain.Employee</class>
        <class>com.mastertheboss.domain.Department</class>
        <properties>

            <property name="javax.persistence.jdbc.driver" value="com.mysql.jdbc.Driver" />
            <property name="javax.persistence.jdbc.url" value="jdbc:mysql://localhost:3306/mysqldb" />
            <property name="javax.persistence.jdbc.user" value="user" />
            <property name="javax.persistence.jdbc.password" value="password" />

            <property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect" />

                        <!-- Hibernate Search configuration -->
            <property name="hibernate.search.default.directory_provider"
                value="filesystem" />
            <property name="hibernate.search.default.indexBase" value="/var/lucene/indexes" />
        </properties>

    </persistence-unit>


</persistence>

 


Ok for the domain classes. Now we need to modify the JPA Test class in order to perform the actual search. In order to perform a search the common approach is to create a Lucene query and then wrap this into a Query object (JPA/Hibernate).
The Lucene query can in turn be performed either using Lucene API or Hibernate Search query DSL. In this example we will use the latter approach.

If you want to learn more about both options, see this as reference:

http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#search-query-lucene-api

http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#search-query-querydsl

package com.mastertheboss.jpa;

import java.util.List;

import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.EntityTransaction;
import javax.persistence.Persistence;

import org.hibernate.search.jpa.FullTextEntityManager;
import org.hibernate.search.query.dsl.QueryBuilder;

import com.mastertheboss.domain.Employee;
import com.mastertheboss.domain.Department;

public class JpaTest {

    private EntityManager manager;

    public JpaTest(EntityManager manager) {
        this.manager = manager;
    }

    /**
     * @param args
     */
    public static void main(String[] args) {
        EntityManagerFactory factory = Persistence
                .createEntityManagerFactory("persistenceUnit");
        EntityManager manager = factory.createEntityManager();
        JpaTest test = new JpaTest(manager);

        EntityTransaction tx = manager.getTransaction();
        tx.begin();
        try {
            test.createEmployees();
        } catch (Exception e) {
            e.printStackTrace();
        }
        tx.commit();

        test.listEmployees(manager);

        System.out.println(".. done");
    }

    private void createEmployees() {
        int numOfEmployees = manager
                .createQuery("Select a From Employee a", Employee.class)
                .getResultList().size();
        if (numOfEmployees == 0) {
            Department department = new Department("java");
            manager.persist(department);

            manager.persist(new Employee("Jakab Gipsz", department));
            manager.persist(new Employee("Captain Nemo", department));

        }
    }

    private void listEmployees(EntityManager em) {

        FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search
                .getFullTextEntityManager(em);

        try {
            fullTextEntityManager.createIndexer().startAndWait();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        em.getTransaction().begin();

        QueryBuilder qb = fullTextEntityManager.getSearchFactory()
                .buildQueryBuilder().forEntity(Employee.class).get();
        org.apache.lucene.search.Query query = qb.keyword().onFields("name")
                .matching("Captain").createQuery();

        // wrap Lucene query in a javax.persistence.Query
        javax.persistence.Query persistenceQuery = fullTextEntityManager
                .createFullTextQuery(query, Employee.class);

        // execute search
        List<Employee> result = persistenceQuery.getResultList();
        System.out.println("num of employess:" + result);
        for (Employee next : result) {
            System.out.println("next employee: " + next);
        }
        em.getTransaction().commit();
        em.close();

    }

}

The core search functionality is contained into the listEmployees which list the Employees filtered by name: let’s comment the first part of it:

        FullTextEntityManager fullTextEntityManager =
                org.hibernate.search.jpa.Search.getFullTextEntityManager(em);

        try {
            fullTextEntityManager.createIndexer().startAndWait();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

This code takes care to trigger the index creation for the first time you execute the application (remember Hibernate Search will transparently index every entity persisted, updated or removed, however you have to create an initial Lucene index). For our small data set, this operation takes less than a minute. For larger data sets, it wouldn’t make sense to re-index everything every time you start the application.

Next, using the Query builder, you can then build queries. It is important to realize that the end result of a QueryBuilder is a Lucene query. For this reason you can easily mix and match queries generated via Lucene’s query parser or Query objects you have assembled with the Lucene programmatic API and use them with the Hibernate Search DSL.

Running the Hibernate search example

The simplest way to run the example is using maven and adding the Hibernate search dependency:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mastertheboss</groupId>
    <artifactId>EclipseJPAExample</artifactId>
    <version>1.0-SNAPSHOT</version>

    <dependencies>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.21</version>
        </dependency>
        <dependency>
            <groupId>org.hibernate.javax.persistence</groupId>
            <artifactId>hibernate-jpa-2.0-api</artifactId>
            <version>1.0.1.Final</version>
        </dependency>
        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-entitymanager</artifactId>
            <version>4.0.1.Final</version>
        </dependency>
        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-search</artifactId>
            <version>4.1.1.Final</version>
        </dependency>
    </dependencies>
</project>

Let’s run the main class as usual:

$ mvn compile exec:java -Dexec.mainClass=com.mastertheboss.jpa.JpaTest
[INFO] Scanning for projects...
. . . . .
INFO: HSEARCH000034: Hibernate Search 4.1.1.Final
12-ott-2012 10.41.32 org.hibernate.search.impl.ConfigContext getLuceneMatchVersion
WARN: HSEARCH000075: Configuration setting hibernate.search.lucene_version was not specified, using LUCENE_CURRENT.
12-ott-2012 10.41.33 org.hibernate.search.indexes.serialization.avro.impl.AvroSerializationProvider <init>
INFO: HSEARCH000079: Serialization protocol version 1.0
12-ott-2012 10.41.36 org.hibernate.search.impl.SimpleIndexingProgressMonitor addToTotalCount
INFO: HSEARCH000027: Going to reindex 2 entities
12-ott-2012 10.41.36 org.hibernate.search.impl.SimpleIndexingProgressMonitor indexingCompleted
INFO: HSEARCH000028: Reindexed 2 entities
num of employess:[Employee [id=2, name=Captain Nemo, department=java]]
next employee: Employee [id=2, name=Captain Nemo, department=java]