Archive for April, 2006

Lucene and the Zend Framework

Thursday, April 27th, 2006

Zend FrameworkOne of the most talked about features of the Zend Framework is its port of the Apache Lucene project – a Java-based full-text search-engine framework. The Zend Framework allows PHP developers to use Lucene without requiring additional PHP extensions or Java, or even a database.

The theory is that Zend_Search_Lucene overcomes the usual limitations of relational databases with features such as:

  • Fast indexing
  • Ranked result sets
  • A powerful but simple query syntax
  • The ability to index multiple fields

Lucene is well-known for it's speed. For an example have a look at DamnFastDotLucene – this demo site tests the performance of a .Net implementation of Lucene on quite a large set of documents:

  • 9150 text files from the Gutenberg Project
  • The total size of indexed documents is 3.5 GB
  • The index size is 880 MB
  • The Hardware: Pentium 4 3Ghz 800/1MB Cache, 1 GB DDRII Kingston 533, Western Digital Raptor 80GB

The result – it takes approximately the same time to search 5 MB of text as it does to search 3.5 GB of text. I was getting speeds less than 0.125 seconds. That is fast.

That was .Net though – what about the PHP implementation in the Zend Framework?

The reality for PHP developers using the Zend Framework may be a little different from the hype. Some developers are reporting Zend_Search_Lucene as being significantly slower than the queries being run from MySQL or PostGres. Have a look at the following comments in the Zend Framework Mailing List for details.

To be fair it is only very early days for the Zend Framework and Lucene – the project is still in early Alpha. However it is already being adopted by the community for live projects.

If you want to learn more about Zend_Search_Lucene I recommend the following links:

If you have any experiences with Zend_Search_Lucene that you would like to share I would appreciate hearing about it…