FULL TEXT SEARCHING
MySQL supports the use of several underlying database
engines. Not all engines support full-text searching as is
described in this tutorial. The two most commonly used engines
are MyISAM and InnoDB former supports full-text searching and
the latter does not.
Understanding Full-Text Searching
Not All Engines Support Full-Text Searching As will be
explained in Tutorial 26, "Creating and Manipulating Tables,"
MySQL supports the use of several underlying database engines.
Not all engines support full-text searching as is described in
this tutorial. The two most commonly used engines are MyISAM
and InnoDB; the former supports full-text searching and the
latter does not. This is why, although most of the sample
tables used in this book were created to use InnoDB, one (the
productnotes table) was created to use MyISAM. If you need
full-text searching functionality in your applications, keep
this in mind.
In Tutorial 13, "Using Wildcard Filtering," you were
introduced to the LIKE keyword that is used to match text (and
partial text) using wildcard operators. Using LIKE it is
possible to locate rows that contain specific values or parts
of values, regardless of the location of those values within
row columns.
In Tutorial 14, "Searching Using Regular Expressions,"
text-based searching was taken one step further with the
introduction to using regular expressions to match column
values. Using regular expressions, it is possible to write very
sophisticated matching patterns to locate the desired rows.
But as useful as these search mechanisms are, they have
several very important limitations:
- Performance Wildcard and regular expression matching
usually requires that MySQL try and match each and every
row in a table (and table indexes are rarely of use in
these searches). As such, these searches can be very
time-consuming as the number of rows to be searched
grows.
- Explicit control Using wildcard and regular expression
matching, it is very difficult (and not always possible) to
explicitly control what is and what is not matched. An
example of this is a search specifying a word that must be
matched, a word that must not be matched, and a word that
may or may not be matched but only if the first word is
indeed matched.
- Intelligent results Although wildcard- and regular
expression based searching provide for very flexible
searching, neither provide an intelligent way to select
results. For example, searching for a specific word would
return all rows that contained that word, and not
distinguish between rows that contain a single match and
those that contained multiple matches (ranking them as
potentially better matches). Similarly, searches for a
specific word would not find rows that did not contain that
word but did contain other related words.
All of these limitations and more are addressed by full-text
searching. When full-text searching is used, MySQL does not
need to look at each row individually, analyzing and processing
each word individually. Rather, an index of the words (in
specified columns) is created by MySQL, and searches can be
made against those words. MySQL can thus quickly and
efficiently determine which words match (which rows contain
them), which don't, how often they match, and so on.
Using Full-Text Searching
In order to perform full-text searches, the columns to be
searched must be indexed and constantly re-indexed as data
changes. MySQL handles all indexing and re-indexing
automatically after table columns have been appropriately
designated.
After indexing, SELECT can be used with Match() and
Against() to actually perform the searches.
Enabling Full-Text Searching Support
Generally, full-text searching is enabled when a table is
created. The CREATE TABLE statement (which will be introduced
in Tutorial 26) accepts a FULLTEXT clause, which is a
comma-delimited list of the columns to be indexed.
The following CREATE statement demonstrates the use of the
FULLTEXT clause:
• Input
CREATE TABLE productnotes
(
note_id
int
NOT NULL AUTO_INCREMENT,
prod_id
char(10) NOT NULL,
note_date datetime
NOT NULL,
note_text
text NULL
,
PRIMARY KEY(note_id),
FULLTEXT(note_text)
) ENGINE=MyISAM;
• Analysis
We'll look at the CREATE TABLE statement in detail in
Tutorial 26. For now, just note that this CREATE TABLE
statement defines table productnotes and lists the columns that
it is to contain. One of those columns is named note_text, and
it is indexed by MySQL for full-text searching as instructed by
the clause FULLTEXT(note_text). Here FULLTEXT indexes a single
column, but multiple columns may be specified if needed.
Once defined, MySQL automatically maintains the index. When
rows are added, updated, or deleted, the index is automatically
updated accordingly.
FULLTEXT may be specified at table creation
time, or later on (in which case all existing data would have
to be immediately indexed).
Tip
Don't Use FULLTEXT When
Importing Data Updating indexes takes timenot a lot of time,
but time nonetheless. If you are importing data into a new
table, you should not enable FULLTEXT indexing at that time.
Rather, first import all of the data, and then modify the table
to define FULLTEXT. This makes for a much faster data import
(and the total time needed to index all data will be less than
the sum of the time needed to index each row
individually).
Performing Full-Text Searches
After indexing, full-text searches are performed using two
functions: Match() to specify the columns to be searched and
Against() to specify the search expression to be used.
Here is a basic example:
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('rabbit');

• Output
+-------------------------------------------------------------------+
| note_text |
+-------------------------------------------------------------------|
| Customer complaint: rabbit has been able to detect trap, food |
| apparently less effective now. |
| Quantity varies, sold by the sack load. All guaranteed to be |
| bright and orange, and suitable for use as rabbit bait. |
+-------------------------------------------------------------------+
• Analysis
The SELECT statement retrieves a single row, note_text. For
the WHERE clause, a full-text search is performed.
Match(note_text) instructs MySQL to perform the search against
that named column, and Against('rabbit') specifies the word
rabbit as the search text. As two rows contained the word
rabbit, those two rows were returned.
Note
Use Full Match() Specification The value passed to
Match() must be the same as the one used in the FULLTEXT()
definition. If multiple columns are specified, all of them must
be listed (and in the correct order).
Searches Are Not Case Sensitive Full-text searches are
not case sensitive, unless BINARY mode (not covered in this
tutorial) is used.
The truth is that the search just performed could just as
easily have used a LIKE clause, as seen here:
• Input
SELECT note_text
FROM productnotes
WHERE note_text LIKE '%rabbit%';

• Output
+-------------------------------------------------------------------+
| note_text |
+-------------------------------------------------------------------|
| Quantity varies, sold by the sack load. All guaranteed to be |
| bright and orange, and suitable for use as rabbit bait. |
| Customer complaint: rabbit has been able to detect trap, food |
| apparently less effective now. |
+-------------------------------------------------------------------+
• Analysis
This SELECT retrieves the same two rows, but the order is
different (although that may not always be the case).
Neither of the two SELECT statements contained an ORDER BY
clause. The latter (using LIKE) returns data in no particularly
useful order. But the former (using full-text searching)
returns data ordered by how well the text matched. Both rows
contained the word rabbit, but the row that contained the word
rabbit as the third word ranked higher than the row that
contained it as the twentieth word. This is important. An
important part of full-text searching is the ranking of
results. Rows with a higher rank are returned first (as there
is a higher degree of likelihood that those are the ones you
really wanted).
To demonstrate how ranking works, look at this example:
• Input
SELECT note_text,
Match(note_text)
Against('rabbit') AS rank
FROM productnotes;

• Output
+-------------------------------------------------+-----------------+
| note_text | rank |
+-------------------------------------------------+-----------------+
| Customer complaint: Sticks not individually | 0 |
| wrapped, too easy to mistakenly detonate all | |
| at once. Recommend individual wrapping. | |
| Can shipped full, refills not available. Need | 0 |
| to order new can if refill needed. | |
| Safe is combination locked, combination not | 0 |
| provided with safe. This is rarely a problem | |
| as safes are typically blown up or dropped by | |
| customers. | |
| Quantity varies, sold by the sack load. All | 1.5905543170914 |
| guaranteed to be bright and orange, and | |
| suitable for as rabbit bait. | |
| Included fuses are short and have been known to | 0 |
| detonate too quickly for some customers. Longer | |
| fuses are available (item FU1) and should be | |
| recommended. | |
| Matches not included, recommend purchase of | 0 |
| matches or detonator (item DTNTR). | |
| Please note that no returns will be accepted if | 0 |
| safe opened using explosives. | |
| Multiple customer returns, anvils failing to | 0 |
| drop fast enough or falling backwards on | |
| purchaser. Recommend that customer considers | |
| using heavier anvils. | |
| Item is extremely heavy. Designed for dropping, | 0 |
| not recommended for use with slings, ropes, | |
| pulleys, or tightropes. | |
| Customer complaint: rabbit has been able to | 1.6408053837485 |
| detect trap, food apparently less effective | |
| now. | |
| Shipped unassembled, requires common tools | 0 |
| (including oversized hammer). | |
| Customer complaint: Circular hole in safe floor | 0 |
| can apparently be easily cut with handsaw. | |
| Customer complaint: Not heavy enough to | 0 |
| generate flying stars around head of victim. | |
| If being purchased for dropping, recommend | |
| ANV02 or ANV03 instead. | |
| Call from individual trapped in safe plummeting | 0 |
| to the ground, suggests an escape hatch be | |
| added. Comment forwarded to vendor. | |
+-------------------------------------------------+-----------------+
• Analysis
Here Match() and Against() are used in the SELECT instead of
the WHERE clause. This causes all rows to be returned (as there
is no WHERE clause). Match() and Against() are used to create a
calculated column (with the alias rank) which contains the
ranking value calculated by the full-text search. The ranking
is calculated by MySQL based on the number of words in the row,
the number of unique words, the total number of words in the
entire index, and the number of rows that contain the word. As
you can see, the rows that do not contain the word rabbit have
a rank of 0 (and were therefore not selected by the WHERE
clause in the previous example). The two rows that do contain
the word rabbit each have a rank value, and the one with the
word earlier in the text has a higher rank value than the one
in which the word appeared later.
This helps demonstrate how full-text searching eliminates
rows (those with a rank of 0), and how it sorts results (by
rank in descending order).
Note
Ranking Multiple Search Terms If multiple search terms
are specified, those that contain the most matching words will
be ranked higher than those with less (or just a single
match).
As you can see, full-text searching offers functionality not
available with simple LIKE searches. And as data is indexed,
full-text searches are considerably faster, too.
Using Query Expansion
Query expansion is used to try to widen the range of
returned full-text search results. Consider the following
scenario. You want to find all notes with references to anvils
in them. Only one note contains the word anvils, but you also
want any other rows that may be related to your search, even if
the specific word anvils is not contained within them.
This is a job for query expansion. When query expansion is
used, MySQL makes two passes through the data and indexes to
perform your search:
- First, a basic full-text search is performed to find
all rows that match the search criteria.
- Next, MySQL examines those matched rows and selects all
useful words (we'll explain how MySQL figures out what is
useful and what is not shortly).
- Then, MySQL performs the full-text search again, this
time using not just the original criteria, but also all of
the useful words.
Using query expansion you can therefore find results that
might be relevant, even if they don't contain the exact words
for which you were looking.
Note
MySQL Version 4.1.1 or Later Only Query expansion
functionality was introduced in MySQL 4.1.1, and can therefore
not be used in prior versions.
Here is an example. First, a simple full-text search,
without query expansion:
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('anvils');

• Output
+-------------------------------------------------------------------+
| note_text |
+-------------------------------------------------------------------+
| Multiple customer returns, anvils failing to drop fast enough or |
| falling backwards on purchaser. Recommend that customer considers |
| using heavier anvils. |
+-------------------------------------------------------------------+
• Analysis
Only one row contains the word anvils, so only one row is
returned.
Here is the same search, this time using query
expansion:
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('anvils' WITH QUERY
EXPANSION);

• Output
+-------------------------------------------------------------------+
| note_text |
+-------------------------------------------------------------------+
| Multiple customer returns, anvils failing to drop fast enough or |
| falling backwards on purchaser. Recommend that customer considers |
| using heavier anvils. |
| Customer complaint: Sticks not individually wrapped, too easy to |
| mistakenly detonate all at once. Recommend individual wrapping. |
| Customer complaint: Not heavy enough to generate flying stars |
| around head of victim. If being purchased for dropping, recommend |
| ANV02 or ANV03 instead. |
| Please note that no returns will be accepted if safe opened using |
| explosives. |
| Customer complaint: rabbit has been able to detect trap, food |
| apparently less effective now. |
| Customer complaint: Circular hole in safe floor can apparently be |
| easily cut with handsaw. |
| Matches not included, recommend purchase of matches or detonator |
| (item DTNTR). |
+-------------------------------------------------------------------+
• Analysis
This time seven rows were returned. The first contains the
word anvils and is thus ranked highest. The second row has
nothing to do with anvils, but as it contains two words that
are also in the first row (customer and recommend) it was
retrieved, too. The third row also contains those same two
words, but they are further into the text and further apart,
and so it was included, but ranked third. And this third row
does indeed refer to anvils (by their product name).
As you can see, query expansion greatly increases the number
of rows returned, but in doing so also increases the number of
returns that you might not actually want.
Tip
The More Rows the Better The
more rows in your table (and the more text within those rows),
the better the results returned when using query
expansion.
Boolean Text Searches
MySQL supports an additional form of full-text searching
called boolean mode. In Boolean mode you may provide specifics
as to
- Words to be matched
- Words to be excluded (if a row contained this word it
would not be returned, even though other specified words
were matched)
- Ranking hints (specifying which words are more
important than others so they can be ranked higher)
- Expression grouping
- And more
Tip
Useable Even Without a
FULLTEXT Index Boolean mode differs from the full-text search
syntax used thus far in that it may be used even if no FULLTEXT
index is defined. However, this would be a very slow operation
(and the performance would degrade further as data volume
increased).
To demonstrate what IN BOOLEAN MODE does, here is a simple
example:
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('heavy' IN BOOLEAN MODE);

• Output
+-------------------------------------------------------------------+
| note_text |
+-------------------------------------------------------------------+
| Item is extremely heavy. Designed for dropping, not recommended |
| for use with slings, ropes, pulleys, or tightropes. |
| Customer complaint: Not heavy enough to generate flying stars |
| around head of victim. If being purchased for dropping, recommend |
| ANV02 or ANV03 instead. |
+-------------------------------------------------------------------+
• Analysis
This full-text search retrieves all rows containing the word
heavy (there are two of them). The keywords IN BOOLEAN MODE are
specified, but no boolean operators are actually specified and
so the results are just as if boolean mode had not been
specified.
Note
IN BOOLEAN MODE Behaves Differently Although the results
in this example are the same as they would be without IN
BOOLEAN MODE, there is an important difference in behavior
(even if it did not manifest itself in this particular
example). I'll point these out in the use notes later in this
tutorial.
To match the rows that contain heavy but not any word
beginning with rope, the following can be used:
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('heavy rope*' IN BOOLEAN
MODE);

• Output
+-------------------------------------------------------------------+
| note_text |
+-------------------------------------------------------------------+
|Item is extremely heavy. Designed for dropping, not recommended for|
|use with slings, ropes, pulleys, or tighttropes. |
| Customer complaint: Not heavy enough to generate flying stars |
| around head of victim. If being purchased for dropping, recommend |
| ANV02 or ANV03 instead. |
+-------------------------------------------------------------------+
• Analysis
This time only two rows is returned. Again, the word heavy
is matched, but this time rope* instructs MySQL to explicitly
exclude any row that contains rope* (any word beginning with
rope, including ropes, which is why one of the rows was
excluded).
Note
Code Change Needed in MySQL 4.x If you are using MySQL
4.x, the previous example might not have returned any rows at
all. This is the result of a bug in the processing of the *
operator. To use this example in MySQL 4.x, use -ropes instead
of -rope* (exclude ropes instead of any word beginning with
rope).
You have now seen two full-text search boolean operators: -
excludes a word and * is the truncation operator (think of it
as a wildcard used at the end of a word).
Table 18.1 lists all of the supported boolean operators.
Table 18.1. Full-Text Boolean
Operators
|
Privilege
|
Description
|
|
+
|
Include, word must be
present.
|
|
-
|
Exclude, word must not
be present.
|
|
>
|
Include, and increase
ranking value.
|
|
<
|
Include, and decrease
ranking value.
|
|
()
|
Group words into sub
expressions (allowing them to be included,
excluded, ranked, and so forth as a
group).
|
|
~
|
Negate a word's ranking
value.
|
|
*
|
Wildcard at end of
word.
|
|
""
|
Defines a phrase (as
opposed to a list of individual words, the
entire phrase is matched for inclusion or
exclusion).
|
Here are some more examples to demonstrate the use of some
of these operators:
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('+rabbit +bait"' IN BOOLEAN
MODE);
• Analysis
This search matches rows that contain both the words rabbit
and bait.
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('rabbit bait' IN BOOLEAN
MODE);

• Analysis
Without operators specified, this search matches rows that
contain at least one of rabbit or bait.
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('"rabbit bait"' IN BOOLEAN
MODE);

• Analysis
This search matches the phrase rabbit bait instead of the
two words rabbit and bait.
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('>rabbit <carrot' IN
BOOLEAN MODE);

• Analysis
Match both rabbit and carrot, increasing the rank of the
former and decreasing the rank of the latter.
• Input
SELECT note_text
FROM productnotes
WHERE Match(note_text) Against('+safe +(<combination)' IN
BOOLEAN
MODE);

• Analysis
This search matches the words safe and combination, lowering
the ranking of the latter.
Note
Ranked, but Not Sorted In boolean mode, rows will not be
returned sorted descending by ranking score.
Full-Text Search Usage Notes
Before finishing this tutorial, here are some important
notes pertaining to the use of full-text searching:
When indexing full-text data, short words are ignored and
are excluded from the index. Short words are defined as those
having three or fewer characters (this number can be changed if
needed).
- MySQL comes with a built-in list of stop words, words
that are always ignored when indexing full-text data. This
list can be overridden if needed. (Refer to the MySQL
documentation to learn how to accomplish this.)
- Many words appear so frequently that searching on them
would be useless (too many results would be returned). As
such, MySQL honors a 50% rule if a word appears in 50% or
more rows, it is treated as a stop word and is effectively
ignored. (The 50% rule is not used for IN BOOLEAN
MODE).
- Full-text searching never returns any results if there
are fewer than three rows in a table (because every word is
always in at least 50% of the rows).
- Single quote characters in words are ignored. For
example, don't is indexed as don't.
- Languages that don't have word delimiters (including
Japanese and Chinese) will not return full-text results
properly.
As already noted, full-text searching is only supported in
the MyISAM database engine.
Note
No Proximity Operators One feature supported by many
full-text search engines is proximity searching, the ability to
search for words that are near each other (in the same
sentence, in the same paragraph, or no more than a specific
number of words apart, and so on). Proximity operators are not
yet supported by MySQL full-text searching, although this is
planned for a future release.
Summary
In this tutorial, you learned why full-text searching is
used, and how to use the MySQL Match() and Against() functions
to perform these searches. You also learned about query
expansion as a way to increase the chances of finding related
matches, and how to use boolean mode for more granular lookup
control.
[Next]
|