Integrate Apache Pluto With Lucene Search Engine Example Tutorial

Filed Under: Portal and Portlets

Knowledge information retrieval isn’t a luxury requirement that your application may or may not provide. Best applications are those that are providing cross-site search, so it’s minimize the efforts that you spent when trying to find or locate a piece of information.

Typically, you choose the Portal to build your application for many advantages that it provides. Mainly, you can consider the most important one is the ability to integrate with the latest search engines, thus, provide one central location for the users to get all contents searched including those static content like HTML, file systems and etc.

This tutorial will guide you through a well-thought-out steps that lead you finally into integrating Apache Pluto Portal with the latest version of Apache Lucene Search.

Lucene Concept

Lucene is a search engine, it contains a lot of components that work each together to get you finally the result that you want. It’s important for you to get passed upon these components as that should help you gather the maximum benefit for what already supposed to be at this tutorial.

Mainly, there’s two key functions that Lucene provides; creating and index and executing a user’s query. Your application is responsible for setting up each of these, but these operation will be done separately.

Figure below shows you the first step you should pass through to ensure that your documents (Contents) are indexed.

Lucene Search - Creating Index While querying the index should be depicted by the below figure:

Lucene Search - Searching Index

Sections below will help you getting further details about all of these components that you saw involved in the creating/querying index.

Documents

Ideally, Lucene’s index consists of documents and the lucene document consists of one indexed object. This object could be a database record, web page, Java Object and etc.

Each document consists of set of fields and each field is a pair of name/value that represents a piece of content. A given samples on those fields might be title, summary, content, etc.

To use a lucene’s document object you should have an object of type org.apache.lucene.document.Document class.

Analyzer

Analyzer is the pump heart of Lucene, you use Analyzer and its structural type in creating the Lucene index and inquiring it after then. Analyzer has the ability to turn free text into tokens that can be inquired later on.

Lucene has provided a lot of types of Analyzer as you can use the most fit one for your application. When you add a document to lucene’s index, Lucene will use the analyzer to process the text for every fields that are located at that document.

You should be able of locating different types of Analyzers underneath org.apache.lucene.analysis package.

Query

Query object is the object that you used for inquiring the Lucene index. To create a Query object you may use different kinds of ways to achieve a Query against your index. You may return back into Lucene API to know more about :

  • TermQuery
  • BooleanQuery
  • WildcardQuery
  • PhraseQuery
  • PrefixQuery
  • MultiPhraseQuery
  • FuzzyQuery
  • RegexpQuery
  • TermRangeQuery
  • NumericRangeQuery
  • ConstantScoreQuery
  • DisjunctionMaxQuery
  • MatchAllDocsQuery

Field

As we’ve stated earlier, a field is a pair of name/value that represents one piece of metadata or content for a Lucene document. Each field may be indexed, stored and/or tokenized. Indexed fields are searchable in Lucene, and Lucene will process them when the indexer adds the document to the index.

Processing of document’s fields into sets of individual tokens is the job of Lucene Analyzer. A field object exist at the package org.apache.lucene.document

TopScoreDocCollector

A collector implementation that collects the top-scoring hits, returning them as a TopDocs. This is used by IndexSearcher to implement TopDocs-based search. Hits are sorted by score descending and then (when the scores are tied) docID ascending.

IndexSearcher

You may notice below at the proposed sample that we used IndexSearcher that’s located at org.apache.lucene.search.Index.IndexSearcher package to make a search against out index and using the provided Query.

Mainly, to get an IndexSearcher object you need to pass IndexReader as an argument to its constructor. As soon as you’ve invoked search against your IndexSearcher, the Collector object has propagated with the search result so that you can invoke topDocs().scoreDocs to acquire the hits object that is mainly contained for all of searched documents.

Hits

The search method on the IndexSearcher class returns an org.apache.lucene.search.Hits object which mainly contains the searched documents so that you can access, process and display all of them in whatever the form you want.

Hits object isn’t just simple Collection object, as much bigger as the result can be, the importance of Hits methods are become so critical and surely helpful. Hits object has mainly provided you a three methods that can be used for several reasons:

  • public final Document doc(int n) throws IOException which mainly returns a Document that contains all of the document’s fields that were stored at the time the document as indexed.
  • public final int length() which mainly returns the number of search results that matched the query.
  • public final float score(int n) throws IOException which mainly returns the calculated score for each hit in the search results.

Index Building – Indexer

Following sample below shows how you can leverage the Lucene API to index set of proposed JournalDev Tutorials. This index shall help you inquiring about any Tutorial that JournalDev site has provided.

This index will assume that you’re looking for Tutorials by their Title.

Indexer.java


package com.journaldev.portlet;

import java.io.File;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;

public class Indexer {
	static {
			try {
				System.out.println("Initialize of Indexer ::");
				// Create an analyzer
				StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
				// Create a Lucene directory
				Directory dir = new SimpleFSDirectory(new File("D:\\LuceneSearch\\store"));
				System.out.println("Clean Index ::");
				for (String fileName : dir.listAll()){
					dir.deleteFile(fileName);
				}
				// Create index configuration writer
				IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
				// Create writer
				IndexWriter writer = new IndexWriter(dir, config);
				// Tutorial Topics
				String [] topics = {"Apache Pluto Tutorial","Hibernate Tutorial","Spring Tutorial","JSP & Servlet Tutorial","Primefaces Tutorial","LuceneSearch Tutorial"};

				for(String topic : topics){
					// Create document
					Document doc = new Document();
					// Add field
					doc.add(new TextField("title",topic,Field.Store.YES));
					// write document
					writer.addDocument(doc);
				}
				// Commit changes
				writer.commit();
				// Close the stream, so that you can open a read stream
				writer.close();
				System.out.println("All Tutorials Are Indexed ::");
			}
			catch(Exception e){
				e.printStackTrace();
			}
	}
}

Here’s below an additional clarification for the proposed code above:

  • You have multiple types of Store locations, you can use RAMDirectory or something else you may find it eligible instead of using Physical location. This indexer above has used SimpleSFDirectory (Simple System File Directory) as a location for the index’s segments.
  • This indexer will be got executed as soon as the Indexer class has loaded by the ClassLoader. This kind of loading will absolutely trigger the static initializer to start indexing a proposed documents.
  • To prevent the index from indexing the same document multiple times at each time the Indexer got loaded, we provided a simple remove mechanism that help you clear the index directory.
  • We’ve used a simple Analyzer for generating the needed tokens.
  • We’ve supposed a different Topics that JournalDev site has provided through defining of  String [] topics Tutorial array.
  • For every single Tutorial we defined a document has been created with one title field and indexed as well.
  • All changes on the index shall be committed.
  • Index write shall be closed so that another writer/reader can consume the created index.
  • In case you’ve missed closing your own writer once its work got finished, an exception will be thrown.

Simple Lucene Search Portlet

Following below a simple Lucene Search Portlet that’s already built upon the same used index.

Remember, you always use doView for rendering the view of the Portlet, meanwhile processAction has been used for initiating actions against your Portlet.

LuceneSearch.java


package com.journaldev.portlet;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;

import javax.portlet.ActionRequest;
import javax.portlet.ActionResponse;
import javax.portlet.GenericPortlet;
import javax.portlet.PortletException;
import javax.portlet.RenderRequest;
import javax.portlet.RenderResponse;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.QueryBuilder;
import org.apache.lucene.util.Version;

public class LuceneSearch extends GenericPortlet{
	static {
		try {
			// Load the Indexer
			Class.forName("com.journaldev.portlet.Indexer");
		} catch (ClassNotFoundException e) {
			e.printStackTrace();
		}
	}

	private ScoreDoc [] hits = new ScoreDoc[0];
	private IndexSearcher searcher = null;

	public void doView(RenderRequest request, RenderResponse response) throws PortletException, IOException {
		synchronized(hits){
			// Get the writer
			PrintWriter out = response.getWriter();
			if(request.getParameter("status") == null || request.getParameter("status").equals("initial")){
				// Print out the form Tag
				out.print("<form method=\"GET\" action=\""+response.createActionURL()+"\">");
				// Print out the search input
				out.print("<p>Search about your favor Tutorial That JournalDev has presented : <input type=\"text\" "
						+ "name=\"query\" id=\"query\"/></p>");
				// Print out the search command
				out.print("<br/> "
						+ "<input type=\"submit\" value=\"Search\"/>");
				// close form
				out.print("</form>");
			}
			else {
				// Print out the form Tag
				out.print("<form method=\"GET\">");
				// Print out the result
				for(ScoreDoc hit : hits){
				    int docId = hit.doc;
				    Document d = searcher.doc(docId);
					out.print("<p>Tutorial Is <span style='font-style: oblique;font-weight: bolder;'>"+d.get("title")+"</span> <span style='color:red'>With Score :"+hit.score+"</span></p>");
				}
				// Print out the render link
				out.print("<br/>"
						+ "<a href=\""+response.createRenderURL()+"\"?status=initial>Search Again</a>");
				// Print out the form Tag
				out.print("</form>");
				// Check whether the searcher is not null to close it
				if(searcher != null){
					// Close the reader for future modifications on the indexer
					searcher.getIndexReader().close();
				}
			}
		}
	}

	public void processAction(ActionRequest request, ActionResponse response) throws PortletException, IOException {
		// Fetch the hits
		synchronized (hits){
			// Reset the hits object
			hits = new ScoreDoc[0];
			// Create an analyzer
			StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
			// Create a Lucene directory
			Directory dir = new SimpleFSDirectory(new File("D:\\LuceneSearch\\store"));
			// Open index reader
			IndexReader reader = IndexReader.open(dir);
			// Create index searcher
			searcher = new IndexSearcher(reader);

			// Inquiry using QueryBuidler
			Query query = new QueryBuilder(analyzer).createPhraseQuery("title", request.getParameter("query"));
			// Create collector
			TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
			// Search using defined query and fill in the resulted in document inside collector
			searcher.search(query, collector);
			// Acquire the hits
			this.hits = collector.topDocs().scoreDocs;
			response.setRenderParameter("status", "searched");
		}

	}

}

Here’s below a detailed clarification for the code listed above:

  • According for best Portlet design, doView is will be used for displaying the search form and displaying the search result as well. At the same time, processAction will be used for handling the user’s query and to do the actual search work.
  • LuceneSearch Portlet will load the Indexer class, so that the index will be created for next coming search operations.
  • Two different instance variables have been defined and used; hits and searcher.
  • In case request’s parameter status is null or equal to initial, a search form will be provided for the end user to fill in his/her Tutorial title that he/she is looking for.
  • In case request’s parameter status isn’t null or equal to initial, that means the user has clicked on the search action and the search results has been propagated and waiting to display.
  • To protect your application from multiple requests that can affect the result to be inconsistent, a synchronized block has been provided for both of doView & processAction.
  • Once the user has clicked on the search action, processAction method got executed and the search operation has been started.
  • Hits object will be propagated with the resulted in documents and status parameter changed to be searched.
  • IndexWriter and IndexReader are used for writing to and reading from, respectively.
  • doView method starts its work once the processAction got finished.
  • The search result will be displayed and the IndexReader will be closed. This close will help you avoiding any lock your read operation may cause. If you’re trying to write on your index while the reading process is already running an exception will be thrown and vice versa is true.
  • The results will be displayed attached with their scores.

Simple Lucene Search Portlet Demo

Following below the normal flow that you may face if you’re deploying the Portlet into your Apache Pluto. This Tutorial assumes that you’re already familiar with the Apache Pluto and know exactly how you can create a Portal Page and deploying your Portlet within it.

In case you’ve missed out this important practice, it’s better for you to return back into Introduction Into Apache Pluto.

Lucene Search - Initial ViewLucene Search - Initial View - User Fill in QueryLucene Search - Result View

Summary

Search functionality is a key aspect that most recent sites provide it. Most applications these days don’t rely on a single location to retain its data, it’s most probably tend to search against database records, HTML pages, word document and many others. Best solution for this issue is having a single search engine that can do its work against all of these types of data in uniform interface.

This tutorial will help you getting started leveraging Lucene Search Engine and enabling you to create a Search Portlet. Contribute us by commenting below and find below this downloadable source code for your experimental.

Comments

  1. azzmi says:

    excellent job brother, go a head

    1. Mohammad says:

      Thanks Azzmi for your nice words.

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages