August 24, 2008

Using Hibernate Full-Text Search from within JBoss Seam

Adding full-text search is easy, right?

select * from blog where content ilike '%queryString%';

That's how many sites implement their first crack at content searching. It works but does not give the search experience users expect today. For example, if the search query is "computer software" and the blog contains text like "We provide both computers and software", the simple like query will not work, because the word and is in the middle. Trying to solve such problems with complex SQL constructions becomes a nightmare and gives poor results. Even worse, these types of like-based queries have severe performance problems as the data set grows, because it doesn't use any type of indexing. It must search through the entire corpus, one character at a time.

Lucky for us JBoss Seam developers, we have a much better solution.

Hibernate Search, based on Apache Lucene

In a typical EJB use scenario, as deployed on JBoss Application Server, these are the pieces of the architecture:

We will make a change to this scenario: we will replace the EntityManager with a FullTextEntityManager. FullTextEntityManager is a sub-class of EntityManager, so it can be used in exactly the same way, but it does have extra full-text search capabilities when needed.

Also, one more component is added to the mix:

Setting it up

You'll need the following JARs in your EAR's lib directory:

  1. hibernate-commons-annotations.jar
  2. lucene-core.jar
  3. hibernate-search.jar

Modify persistence.xml with the following lines:

<property name="hibernate.search.default.indexBase"
           value="/path/to/index/directory"/>
<!-- Not needed with HA 3.3 -->
<property name="hibernate.ejb.event.post-insert"
          value="org.hibernate.search.event.FullTextIndexEventListener"/>
<property name="hibernate.ejb.event.post-update"
      value="org.hibernate.search.event.FullTextIndexEventListener"/>
<property name="hibernate.ejb.event.post-delete"
        value="org.hibernate.search.event.FullTextIndexEventListener"/>            
        

Now let's start searching.

Annotating the entities

Let's start with a simple Blog class:

@Entity
public class Blog {
    private int id;

    @Column
    @Id @GeneratedValue public int getId() {
        return id;
    }
    
    public void setId(int id) {
        this.id = id;
    }
    
    private String content;
    
    public String getContenty() {
        return content;
    }

    // setter and other fields here
}
        

This is a very simplified entity that holds a single blog entry. Now let's add the necessary annotations:

import org.hibernate.search.annotations.Index;
import org.hibernate.search.annotations.DocumentId;
import org.hibernate.search.annotations.Field;
import org.hibernate.search.annotations.Indexed;
            
@Entity
@Indexed
public class Blog {
    private int id;

    @Column
    @Id @GeneratedValue @DocumentId public int getId() {
        return id;
    }
    
    public void setId(int id) {
        this.id = id;
    }
    
    private String content;
    
    @Field(index=Index.TOKENIZED)
    public String getContenty() {
        return content;
    }

    // setter and other fields here
}
        

Some very brief notes on these annotations:

@Indexed tells Hibernate Search that we want to create a full text index fo this entity. In fact, once you run it, you'll see a directory with the fully qualified class name in the index storage directory you specified in persistence.xml.

@DocumentId tells Hibernate Search which value is the identifier for this document within its index. Normally Lucene would use URLs. These entities are in a database, so we don't have URLs. Instead we use @DocumentIds.

@Field(index=Index.TOKENIZED) tells Hibernate Search that this is a field we are interested in and it is a string which should be tokenized (split into words).

That's all we need to do to the entity classes.

Using it

Let's create a Seam component which executes searches. We'll show the component first, and then show how it's integrated with the XHTML pages.

import org.hibernate.search.jpa.FullTextEntityManager;
import org.hibernate.search.jpa.FullTextQuery;

@Name("searchAction")
@Scope(ScopeType.CONVERSATION)
public class SearchAction {
 
 // we are using the Seam-managed persistence context, with a twist
     @In private FullTextEntityManager entityManager;

    private List resultList;
    public List getResultList() { return resultList; }

    public void search() {
        final MultiFieldQueryParser parser = new MultiFieldQueryParser(
                new String[] { "content" }, new StandardAnalyzer());
        try {

            final org.apache.lucene.search.Query luceneQuery = 
                    parser.parse(queryString);
            final FullTextQuery ftq = 
                    entityManager.createFullTextQuery(luceneQuery, Blog.class);
            resultList = ftq.getResultList();
        } catch (ParseException ex) {
            logger.log(Level.SEVERE, "Search terms: " + queryString, ex);
            return;
        }
    }
    

Note that we use a FullTextEntityManager which is injected just like a normal Seam-managed persistence context (SMPC) would be. When creating the parser, we specify the "content" field as the field we are interested in. We create a Lucene Query. In the FullTextQuery we specify the class we want to retrieve. Otherwise this is as familiar as an ordinary query.

Getting the search string

It would be easy to use a normal Seam-style form to specify the query string. But we don't want to do that. We want a RESTful-style form, which is a GET action, instead of a post. Search queries should be friendly bookmarkable URLs. To do this, we'll use a @RequestParameter annotation and a page action.

Using @RequestParameter

Add this member:

@RequestParameter private String queryString;

The page action

Assume the search results page is search-results.xhtml. You could put a page action in the application's global pages.xml file. However, I prefer not to do that. Putting everything in a single file results in a file which is unmanagable. Instead, a simple .page.xml file is created:

<page action="#{searchAction.search}">
</page>

This simple file will result in the searchAction.search method being called when the page is viewed. This works with the @RequestParameter annotation to link the query parameter string to the Seam component.

The form

We won't use a standard Seam / JSF form, because we're not using a true value binding, and also we want to use a GET action, not a POST action. The form looks like this:

<form action="/search-results.seam" method="get">
    
    <input type="text" 
           name="queryString"
           value="#{searchAction.queryString}"/>
    
    <input type="hidden" name="#{org.jboss.seam.core.manager.conversationIdParameter}" 
           value="#{conversation.id}"/>
    
    <input type="submit"
           value="Search"/>
    
</form>
    

Note the idiom for using a hidden input field to pass the conversation ID. Remove this if you don't want it to propagate.

It works

Create a result page, as usual, to display the results in a table format, or however else you would like them to be presented. If you are doing this with an existing database, you may be disappointed to find no results after you added the search annotations!

This happens because the Lucene indexes must be initialized. They are automatically updated when the FullTextEntityManager is used for creates, updated, persists and deletes, but the data which are already in the database (if any) won't have been indexed. We use a simple class that looks like this to initialize:

@Name("searchReIndex")
@Restrict("#{s:hasRole('admin')}")  // note: it's good to put a security restriction on this class
public class SearchReIndex {
    @In private FullTextEntityManager entityManager;
    
    public String reindex() {
        final List entries = entityManager.createQuery("select b from Blog b").getResultList();
        for (Blog b : entries) entityManager.index(b);
        return null;
    }
}

This can be triggered from an XHTML page:

<h:form>
   <h:commandButton style="font-size:150%;" value="REBUILD THE SEARCH INDEX"
                 action="#{searchReIndex.reindex}"/>
</h:form>

Rebuild the index, try the search, and find the entries.

Conclusion: the search has just begun

Lucene has many capabilities and options beyond this short introduction. But even this taste shows you the power of Lucene and Hibernate Search. We've also shown how to create friendly URLs using @RequestParameter and page actions. Go ahead and try out our search box in the upper right to see full text search in action.

There are many more capabilities to explore from this starting point:

It's a great system. We've just built a complete full text search engine for our simple blog with a couple dozen lines of simple code.