分享

23种非常有用的ElasticSearch查询例子(2)

 株野 2017-11-03

23种非常有用的ElasticSearch查询例子(2)

 ElasticSearch  1808  0评论 下载为PDF

  本系列文章将展示ElasticSearch中23种非常有用的查询使用方法。由于篇幅原因,本系列文章分为六篇,本文是此系列的第二篇文章。欢迎关注大数据技术博客微信公共账号:iteblog_hadoop

Fuzzy Queries(模糊查询)

  模糊查询可以在Match和 Multi-Match查询中使用以便解决拼写的错误,模糊度是基于Levenshtein distance计算与原单词的距离。使用如下:

{
    "query": {
        "multi_match" : {
            "query" : "comprihensiv guide",
            "fields": ["title", "summary"],
            "fuzziness": "AUTO"
        }
    },
    "_source": ["title", "summary", "publish_date"],
    "size": 1
}'
 
[返回结果]
 
{
    "took": 208,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.5961596,
        "hits": [
            {
                "_index": "iteblog_book_index",
                "_type": "book",
                "_id": "4",
                "_score": 0.5961596,
                "_source": {
                    "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
                    "title": "Solr in Action",
                    "publish_date": "2014-04-05"
                }
            }
        ]
    }
}

  需要注意:上面我们将fuzziness的值指定为AUTO,其在term的长度大于5的时候相当于指定值为2。然而80%的人拼写错误的编辑距离(edit distance)为1,所有如果你将fuzziness设置为1可能会提高你的搜索性能。具体的可以参考Elasticsearch权威指南相关章节。

Wildcard Query(通配符查询)

  通配符查询允许我们指定一个模式来匹配,而不需要指定完整的trem。?将会匹配如何字符;*将会匹配零个或者多个字符。比如我们想查找所有作者名字中以t字符开始的记录,我们可以如下使用:


如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公共账号:iteblog_hadoop
{
    "query": {
        "wildcard" : {
            "authors" : "t*"
        }
    },
    "_source": ["title", "authors"],
    "highlight": {
        "fields" : {
            "authors" : {}
        }
    }
}'
 
[返回结果]
 
{
    "took": 37,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [
            {
                "_index": "iteblog_book_index",
                "_type": "book",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "authors": [
                        "clinton gormley",
                        "zachary tong"
                    ],
                    "title": "Elasticsearch: The Definitive Guide"
                },
                "highlight": {
                    "authors": [
                        "zachary <em>tong</em>"
                    ]
                }
            },
            {
                "_index": "iteblog_book_index",
                "_type": "book",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "authors": [
                        "grant ingersoll",
                        "thomas morton",
                        "drew farris"
                    ],
                    "title": "Taming Text: How to Find, Organize, and Manipulate It"
                },
                "highlight": {
                    "authors": [
                        "<em>thomas</em> morton"
                    ]
                }
            },
            {
                "_index": "iteblog_book_index",
                "_type": "book",
                "_id": "4",
                "_score": 1,
                "_source": {
                    "authors": [
                        "trey grainger",
                        "timothy potter"
                    ],
                    "title": "Solr in Action"
                },
                "highlight": {
                    "authors": [
                        "<em>trey</em> grainger",
                        "<em>timothy</em> potter"
                    ]
                }
            }
        ]
    }
}

Regexp Query(正则表达式查询)

  ElasticSearch还支持正则表达式查询,此方式提供了比通配符查询更加复杂的模式。比如我们先查找作者名字以t字符开头,中间是若干个a-z之间的字符,并且以字符y结束的记录,可以如下查询:

{
    "query": {
        "regexp" : {
            "authors" : "t[a-z]*y"
        }
    },
    "_source": ["title", "authors"],
    "highlight": {
        "fields" : {
            "authors" : {}
        }
    }
}'
 
{
    "took": 25,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
            {
                "_index": "iteblog_book_index",
                "_type": "book",
                "_id": "4",
                "_score": 1,
                "_source": {
                    "authors": [
                        "trey grainger",
                        "timothy potter"
                    ],
                    "title": "Solr in Action"
                },
                "highlight": {
                    "authors": [
                        "<em>trey</em> grainger",
                        "<em>timothy</em> potter"
                    ]
                }
            }
        ]
    }
}

限于篇幅的原因,本系列文章分为六部分,欢迎关注过往记忆大数据技术博客及时了解大数据相关文章,微信公共账号:iteblog_hadoop

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多