Lucene and the Zend Framework
One of the most talked about features of the Zend Framework is its port of the Apache Lucene project - a Java-based full-text search-engine framework. The Zend Framework allows PHP developers to use Lucene without requiring additional PHP extensions or Java, or even a database.
The theory is that Zend_Search_Lucene overcomes the usual limitations of relational databases with features such as:
- Fast indexing
- Ranked result sets
- A powerful but simple query syntax
- The ability to index multiple fields
Lucene is well-known for it's speed. For an example have a look at DamnFastDotLucene - this demo site tests the performance of a .Net implementation of Lucene on quite a large set of documents:
- 9150 text files from the Gutenberg Project
- The total size of indexed documents is 3.5 GB
- The index size is 880 MB
- The Hardware: Pentium 4 3Ghz 800/1MB Cache, 1 GB DDRII Kingston 533, Western Digital Raptor 80GB
The result - it takes approximately the same time to search 5 MB of text as it does to search 3.5 GB of text. I was getting speeds less than 0.125 seconds. That is fast.
That was .Net though - what about the PHP implementation in the Zend Framework?
The reality for PHP developers using the Zend Framework may be a little different from the hype. Some developers are reporting Zend_Search_Lucene as being significantly slower than the queries being run from MySQL or PostGres. Have a look at the following comments in the Zend Framework Mailing List for details.
To be fair it is only very early days for the Zend Framework and Lucene - the project is still in early Alpha. However it is already being adopted by the community for live projects.
If you want to learn more about Zend_Search_Lucene I recommend the following links:
If you have any experiences with Zend_Search_Lucene that you would like to share I would appreciate hearing about it…
