MySQL全文检索解析器ngram自版本5.7.6可用。
ngram_token_size变量
ngram_token_size
默认为2,表示按最小两个字来分词,比如"十六",单个词如"六"是检索不出来的,需要将ngram_token_size
设置为1。
ngram_token_size
变量的两种设置方式:
1.启动mysqld
命令时指定
mysqld --ngram_token_size=2
2.修改MySQL配置文件
[mysqld]
ngram_token_size=2
改完之后重启MySQL Server才能生效。
FULLTEXT索引 ngram parser
创建一个带全文检索索引的表
CREATE TABLE `content` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`data_id` int DEFAULT '0',
`body` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
PRIMARY KEY (`id`),
KEY `data_id` (`data_id`) USING BTREE,
FULLTEXT KEY `body` (`body`) WITH PARSER `ngram`
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
建表时如果没创建全文索引,还可以新增
ALTER TABLE content ADD FULLTEXT INDEX body (body) WITH PARSER ngram;
# 等价于
CREATE FULLTEXT INDEX body ON body (body) WITH PARSER ngram;
如果所使用的MySQL版本不支持ngram,则会报错
1128 - Function 'ngram' is not defined
这个时候只能升级到较高版本了(5.7.6+)。
查询
1. 自然语言模式检索
包含论语
或孟子
的查询
select *, MATCH (body) AGAINST ('论语 孟子' IN NATURAL LANGUAGE MODE) as score from content where MATCH (body) AGAINST ('论语 孟子' IN NATURAL LANGUAGE MODE) order by score desc
score
是相似度打分。
2. Boolean模式检索
包含论语
, 同时包含孟子
的查询
select *, MATCH (body) AGAINST ('+论语 +孟子' IN BOOLEAN MODE) as score from content where MATCH (body) AGAINST ('+论语 +孟子' IN BOOLEAN MODE) order by score desc
包含论语
,但不包含孟子
的查询
select *, MATCH (body) AGAINST ('+论语 -孟子' IN BOOLEAN MODE) as score from content where MATCH (body) AGAINST ('+论语 -孟子' IN BOOLEAN MODE) order by score desc
包含论语
,或包含孟子
的查询
select *, MATCH (body) AGAINST ('论语 孟子' IN BOOLEAN MODE) as score from content where MATCH (body) AGAINST ('论语 孟子' IN BOOLEAN MODE) order by score desc