August 24, 2008
Using Hibernate Full-Text Search from within JBoss Seam
Adding full-text search is easy, right?
select * from blog where content ilike '%queryString%';
That's how many sites implement their first crack at content searching.
It works but does not give the search experience users expect today.
For example, if the search query is "computer software"
and the blog contains text like "We provide both computers and
software", the simple like query will not work, because the word
and is in the middle. Trying to solve such problems with complex
SQL constructions becomes a nightmare and gives poor results. Even worse,
these types of like-based queries have severe performance problems
as the data set grows, because it doesn't use any type of indexing. It must
search through the entire corpus, one character at a time.
Lucky for us JBoss Seam developers, we have a much better solution.
Hibernate Search, based on Apache Lucene
In a typical EJB use scenario, as deployed on JBoss Application Server, these are the pieces of the architecture:
- The
javax.persistence.EntityManagerprovides the interface to the persistent entities. - The
EntityManageris actually provided by Hibernate. The Hibernate session can be accessed bygetDelegate()
We will make a change to this scenario: we will replace the EntityManager
with a FullTextEntityManager. FullTextEntityManager
is a sub-class of EntityManager, so it can be used in exactly the
same way, but it does have extra full-text search capabilities when needed.
Also, one more component is added to the mix:
- Apache Lucene, which is the underlying indexer and full text search engine.
Setting it up
You'll need the following JARs in your EAR's lib directory:
hibernate-commons-annotations.jarlucene-core.jarhibernate-search.jar
Modify persistence.xml with the following lines:
<property name="hibernate.search.default.indexBase"
value="/path/to/index/directory"/>
<!-- Not needed with HA 3.3 -->
<property name="hibernate.ejb.event.post-insert"
value="org.hibernate.search.event.FullTextIndexEventListener"/>
<property name="hibernate.ejb.event.post-update"
value="org.hibernate.search.event.FullTextIndexEventListener"/>
<property name="hibernate.ejb.event.post-delete"
value="org.hibernate.search.event.FullTextIndexEventListener"/>
Now let's start searching.
Annotating the entities
Let's start with a simple Blog class:
@Entity
public class Blog {
private int id;
@Column
@Id @GeneratedValue public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
private String content;
public String getContenty() {
return content;
}
// setter and other fields here
}
This is a very simplified entity that holds a single blog entry. Now let's add the necessary annotations:
import org.hibernate.search.annotations.Index;
import org.hibernate.search.annotations.DocumentId;
import org.hibernate.search.annotations.Field;
import org.hibernate.search.annotations.Indexed;
@Entity
@Indexed
public class Blog {
private int id;
@Column
@Id @GeneratedValue @DocumentId public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
private String content;
@Field(index=Index.TOKENIZED)
public String getContenty() {
return content;
}
// setter and other fields here
}
Some very brief notes on these annotations:
@Indexed tells Hibernate Search that we want
to create a full text index fo this entity. In fact, once you run it,
you'll see a directory with the fully qualified class name in the index
storage directory you specified in persistence.xml.
@DocumentId tells Hibernate Search which value is the identifier
for this document within its index. Normally Lucene would use URLs.
These entities are in a database, so we don't have URLs. Instead
we use @DocumentIds.
@Field(index=Index.TOKENIZED) tells Hibernate Search
that this is a field we are interested in and it is a string which should
be tokenized (split into words).
That's all we need to do to the entity classes.
Using it
Let's create a Seam component which executes searches. We'll show the component first, and then show how it's integrated with the XHTML pages.
import org.hibernate.search.jpa.FullTextEntityManager;
import org.hibernate.search.jpa.FullTextQuery;
@Name("searchAction")
@Scope(ScopeType.CONVERSATION)
public class SearchAction {
// we are using the Seam-managed persistence context, with a twist
@In private FullTextEntityManager entityManager;
private List resultList;
public List getResultList() { return resultList; }
public void search() {
final MultiFieldQueryParser parser = new MultiFieldQueryParser(
new String[] { "content" }, new StandardAnalyzer());
try {
final org.apache.lucene.search.Query luceneQuery =
parser.parse(queryString);
final FullTextQuery ftq =
entityManager.createFullTextQuery(luceneQuery, Blog.class);
resultList = ftq.getResultList();
} catch (ParseException ex) {
logger.log(Level.SEVERE, "Search terms: " + queryString, ex);
return;
}
}
Note that we use a FullTextEntityManager which is injected
just like a normal Seam-managed persistence context (SMPC) would be.
When creating the parser, we specify the "content" field as the field
we are interested in. We create a Lucene Query. In
the FullTextQuery we specify the class we want to retrieve.
Otherwise this is as familiar as an ordinary query.
Getting the search string
It would be easy to use a normal Seam-style form to specify the query string.
But we don't want to do that. We want a RESTful-style form, which is a GET action,
instead of a post. Search queries should be friendly bookmarkable URLs.
To do this, we'll use a @RequestParameter annotation and a page action.
Using @RequestParameter
Add this member:
@RequestParameter private String queryString;
The page action
Assume the search results page is search-results.xhtml.
You could put a page action in the application's global
pages.xml file. However, I prefer not to do that. Putting
everything in a single file results in a file which is unmanagable.
Instead, a simple .page.xml file is created:
<page action="#{searchAction.search}">
</page>
This simple file will result in the searchAction.search
method being called when the page is viewed. This works with the
@RequestParameter annotation to link the query parameter string
to the Seam component.
The form
We won't use a standard Seam / JSF form, because we're not using a true value binding, and also we want to use a GET action, not a POST action. The form looks like this:
<form action="/search-results.seam" method="get">
<input type="text"
name="queryString"
value="#{searchAction.queryString}"/>
<input type="hidden" name="#{org.jboss.seam.core.manager.conversationIdParameter}"
value="#{conversation.id}"/>
<input type="submit"
value="Search"/>
</form>
Note the idiom for using a hidden input field to pass the
conversation ID. Remove this if you don't want it to propagate.
It works
Create a result page, as usual, to display the results in a table format, or however else you would like them to be presented. If you are doing this with an existing database, you may be disappointed to find no results after you added the search annotations!
This happens because the Lucene indexes must be initialized.
They are automatically updated when the FullTextEntityManager
is used for creates, updated, persists and deletes, but the data
which are already in the database (if any) won't have been indexed.
We use a simple class that looks like this to initialize:
@Name("searchReIndex")
@Restrict("#{s:hasRole('admin')}") // note: it's good to put a security restriction on this class
public class SearchReIndex {
@In private FullTextEntityManager entityManager;
public String reindex() {
final List entries = entityManager.createQuery("select b from Blog b").getResultList();
for (Blog b : entries) entityManager.index(b);
return null;
}
}
This can be triggered from an XHTML page:
<h:form>
<h:commandButton style="font-size:150%;" value="REBUILD THE SEARCH INDEX"
action="#{searchReIndex.reindex}"/>
</h:form>
Rebuild the index, try the search, and find the entries.
Conclusion: the search has just begun
Lucene has many capabilities and options beyond this short introduction. But even this
taste shows you the power of Lucene and Hibernate Search. We've also shown how
to create friendly URLs using @RequestParameter and page actions. Go ahead
and try out our search box in the upper right to see full text search
in action.
There are many more capabilities to explore from this starting point:
- Use URLrewrite to create "static HTML" looking URLs
- Create an RSS feed based on a search
- Integrate both database results and HTML crawler results in an output page
- Explore the powerful search options to optimize queries
- Of course, combine full text searching with all the other search capabilities of EJB QL
It's a great system. We've just built a complete full text search engine for our simple blog with a couple dozen lines of simple code.
