本系列文章将展示ElasticSearch中23种非常有用的查询使用方法。由于篇幅原因,本系列文章分为六篇,本文是此系列的第二篇文章。欢迎关注大数据技术博客微信公共账号:iteblog_hadoop。
《23种非常有用的ElasticSearch查询例子(2)》
《23种非常有用的ElasticSearch查询例子(3)》
《23种非常有用的ElasticSearch查询例子(4)》
《23种非常有用的ElasticSearch查询例子(5)》
《23种非常有用的ElasticSearch查询例子(6)》
Fuzzy Queries(模糊查询)
模糊查询可以在Match
和 Multi-Match
查询中使用以便解决拼写的错误,模糊度是基于Levenshtein distance计算与原单词的距离。使用如下:
{ "query" : { "multi_match" : { "query" : "comprihensiv guide" , "fields" : [ "title" , "summary" ], "fuzziness" : "AUTO" } }, "_source" : [ "title" , "summary" , "publish_date" ], "size" : 1 }' [返回结果] { "took" : 208, "timed_out" : false , "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.5961596, "hits" : [ { "_index" : "iteblog_book_index" , "_type" : "book" , "_id" : "4" , "_score" : 0.5961596, "_source" : { "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr" , "title" : "Solr in Action" , "publish_date" : "2014-04-05" } } ] } } |
需要注意:上面我们将fuzziness的值指定为AUTO
,其在term的长度大于5的时候相当于指定值为2。然而80%的人拼写错误的编辑距离(edit distance)为1,所有如果你将fuzziness设置为1可能会提高你的搜索性能。具体的可以参考Elasticsearch权威指南相关章节。
Wildcard Query(通配符查询)
通配符查询允许我们指定一个模式来匹配,而不需要指定完整的trem。?
将会匹配如何字符;*
将会匹配零个或者多个字符。比如我们想查找所有作者名字中以t字符开始的记录,我们可以如下使用:
如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公共账号:iteblog_hadoop
{ "query" : { "wildcard" : { "authors" : "t*" } }, "_source" : [ "title" , "authors" ], "highlight" : { "fields" : { "authors" : {} } } }' [返回结果] { "took" : 37, "timed_out" : false , "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1, "hits" : [ { "_index" : "iteblog_book_index" , "_type" : "book" , "_id" : "1" , "_score" : 1, "_source" : { "authors" : [ "clinton gormley" , "zachary tong" ], "title" : "Elasticsearch: The Definitive Guide" }, "highlight" : { "authors" : [ "zachary <em>tong</em>" ] } }, { "_index" : "iteblog_book_index" , "_type" : "book" , "_id" : "2" , "_score" : 1, "_source" : { "authors" : [ "grant ingersoll" , "thomas morton" , "drew farris" ], "title" : "Taming Text: How to Find, Organize, and Manipulate It" }, "highlight" : { "authors" : [ "<em>thomas</em> morton" ] } }, { "_index" : "iteblog_book_index" , "_type" : "book" , "_id" : "4" , "_score" : 1, "_source" : { "authors" : [ "trey grainger" , "timothy potter" ], "title" : "Solr in Action" }, "highlight" : { "authors" : [ "<em>trey</em> grainger" , "<em>timothy</em> potter" ] } } ] } } |
Regexp Query(正则表达式查询)
ElasticSearch还支持正则表达式查询,此方式提供了比通配符查询更加复杂的模式。比如我们先查找作者名字以t字符开头,中间是若干个a-z之间的字符,并且以字符y结束的记录,可以如下查询:
{ "query" : { "regexp" : { "authors" : "t[a-z]*y" } }, "_source" : [ "title" , "authors" ], "highlight" : { "fields" : { "authors" : {} } } }' { "took" : 25, "timed_out" : false , "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1, "hits" : [ { "_index" : "iteblog_book_index" , "_type" : "book" , "_id" : "4" , "_score" : 1, "_source" : { "authors" : [ "trey grainger" , "timothy potter" ], "title" : "Solr in Action" }, "highlight" : { "authors" : [ "<em>trey</em> grainger" , "<em>timothy</em> potter" ] } } ] } } |
限于篇幅的原因,本系列文章分为六部分,欢迎关注过往记忆大数据技术博客及时了解大数据相关文章,微信公共账号:iteblog_hadoop。