MySQL Full Text Search

Today I played to optimize a slow one SQL request of the type

SELECT * FROM 'table' WHERE `field` LIKE '%word%'

Where is the problematic moment here – last part ‘% word%’ and in even greater specificity the sign % before the word, for which we do. Wildcard symbol % ,before any value, directly turns our request directly into slow, because in this way our query stops using field indexes. Solutions as always, but they are not always clear 😆 In general MySQL have a solution to this problem with fulltext search field indexing. How to change the field is much written in the documentation, but I will quickly describe how the above query changes, because we will come to a little drama at the end. Follow by applying the fulltext to the box above, the request must change in type:

SELECT * FROM `table` WHERE MATCH (field) AGAINST ('word')

Thus, the structure is obvious and there is no need for unnecessary discussion. The above application will take effect, if the word, for which you are requesting is at least 4 characters, the default is this value, if you want to modify it you must specify the value, която желаете в my.cnf в частта [mysqld] with the declaration ft_min_word_len= 3 or 2, 1 не е добър избор очевидно 😉 . After changing the value and restarting the mysql server you need to repair your tables, so that the new indexing can take effect. So far, everything is clear: I make the changes, restart, I rebuild the indexes and make the request and it returns to me 0 line 😀 I check with

SHOW VARIABLES

I see that values, which I have set have come into force, I rebuild the indexes again – the same result. 🙄 Unpleasant, very unpleasant. From here, a big swearing and digging for the key to the shed began 😀 Which turned out to be quite, quite interesting. Generally, when I started reading the documentation for I don't know which way and I came to an interesting passage

Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word “MySQL” is present in every row of the articles table shown earlier, a search for the word produces no results

ГРЕДА 😳 Дам табличката ми беше малка – it was a test after all. Our query in a large table with over 2 000 000 order and there things fell asleep. Well, the problem is already clear. To make the decision clear, I will mention briefly, that full text search supports 3 advanced modes BOOLEAN , EXPRESSIONS and NATURAL LANGUAGE as the latter works by default. You can check the documentation for the different modes, I will explain with 2-3 words for BOOLEAN because it contains the key. It supports AND logical operators, OR , NOT and so on and you can do various spells with the searched phrases, to have one, to have no other and so on. It also supports the symbol *, which is the equivalent of the wildcard symbol % 😉 It is useful, when the search word is below the length of ft_min_word_len or for small plates ;). At least for me on a table with about 100 the order does a perfect job. It remains only to see the completed request:

SELECT * FROM `table` WHERE MATCH (field)
AGAINST ('*word*' IN BOOLEAN MODE)

Here comes the moment whether indexing with the wildcard symbol works for us – the answer is I don't know. Basically I think, that yes, because nothing else is said in the documentation, but the documentation obviously doesn't say or show many things 😀

Enhanced by Zemanta

2 comments

    1. Е при големи таблици вече има други решения 😉 partitions да речем или други механизмни за fulltext search като Sphinx

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti SPAM *