How to offload MySQL server with Sphinx by Vladimir Fedorkov
Vladimir Fedorkov of Sphinx.
Presentation started out with a very nice presentation of candies to all the audience members.
What is Sphinx? Another (C++) daemon on your boxes. Can be queried via API (PHP, Python, etc.) or MySQL-compatible protocol and SQL queries (SphinxQL). Some query examples are in the slides, here’s one about SphinxSE in the KB.
MyISAM FTS is good but becomes slow with half a million documents. InnoDB has FTS now but he’s not tried it (and neither has anyone in the audience to see it compare with MyISAM FTS).
Geographical distance is the distance measuring the surface of the earth (two pairs of float values – latitude, longitude). In Sphinx, there is support for GEODIST(Lat,Long,Lat2,Long2) in Sphinx.
Segments are good for price ranges on a site, date ranges, etc. Use INTERVAL(field, x0, x1, …, xN).
Keep huge text collections out of the database. sql_field = path_to_file_text. Tell Sphinx to index text not from MySQL but out in the filesystem. Keep the metadata inside the database but keep the actual data outside of the database. max_file_field_buffer needs to be set properly.
You can do proximity search with Sphinx — find the words “hello world” within a ten word block, for example.
Resources: the documentation, a book by O’Reilly: Introduction to Search with Sphinx: From installation to relevance tuning (sold out at the FOSDEM O’Reilly booth!), and their community page including wiki, forum, etc.