Colin Charles Agenda

MySQL Full Text Search by Alex Rubin

Download the PDF: http://www.mysqlfulltextsearch.com/full_text.pdf

Default search by relevance, default sort is by relevance

Boolean search is also popular. cats AND dogs. No default sorting, so you need to order the results yourself

Phrase search

MySQL Full Text Index, only available with MyISAM, and it supports natural language and Boolean search. ft_min_word_len – 4 characters per word by default is indexed. Frequency based ranking, doesn’t count distance between words

SELECT * FROM articles WHERE MATCH (title,body) AGAINST (‘database’ IN NATURAL LANGUAGE MODE);

For Boolean, you use AGAINST (‘cat AND dog’ IN BOOLEAN MODE).

n-gram fulltext plugin for CJK languages are available as plugins

DRBD and MySQL FullText search? DRBD requires InnoDB, when there is a failover, DRBD needs to perform a reovery. Fulltext only works for MyISAM. So ou create a “FullText” slave MyISAM table with FullText indexes. The slide (diagram) is most useful for this, naturally.

Speed up FT search? Fit the index into memory. key_buffer = total size of full text index (max=4GB). You can preload FT indexes into buffer.

You can manually partition. Partitioning decreases index and table size, so search is faster. Application needs changing of course. MySQL 5.1 partitioning features, do not support FTs.

Order by/Group by is a performance killer. Using order by date, is much slower than with no order by.

Real World Performance Killer
SELECT … FROM `ft` WHERE MATCH `album` AGAINST (‘the way i am’)
The above query, is very slow! It took like 13 seconds or so.

Note the stopword list and ft_min_word_len. I is not a stopword, but “the”, “way”, and “am” are stopwords.

ft_min_word_len = 1 will mean that all words except “i” will be filtered out with the standard stoplist. “i” is contained in lots of text!

Search with error correction? Use soundex() MySQL function (sounds similar). select soundex(“Dilane”) should equate to Dylan. You can sort it either by popularity or Levenstein distance (either by a stored procedure or a UDF).

Sphinx – nice, open source, can be faster than MySQL full text index on a large dataset, supports multi-node clustering out of the box. It is however an external solution that isn’t built-in, and needs to be integrated.

MySQL 5.0: need to patch source code. MySQL 5.1: copy Sphinx plugin to the plugin_dir.

You can set Sphinx to be MySQL’s storage engine if you like.

Resources

Technorati Tags: , , , , , , , ,