如何用ElasticSearch做中文的模糊查询

发布于 2021-12-21  663 次阅读


  • 首先ElasticSearch在整体上分为两种不同的查询clauses:Leaf query clausesCompound query clauses,前者将查询限制在特定的列(即ES里称为的field),后者则可以合并多个查询(包括前者)。
  • 在使用 term level query ( belongs to Leaf query clauses ) 门下的 fuzzy 时,遇到了不小的问题:当查询多于三个中文字时就没有返回结果,该问题的原因是此 fuzzy 方法仅应对于英文单词的 Damerau-Levenshtein distance 。经过不懈努力地寻找,在 full text query 门下的 Query String 里的 Proximity Search 处找到了解决办法:
原文链接为:https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-query-string-query.html#_proximity_searches
  • 值得一提的是:在 full text query ( belongs to Compound query clauses ) 门下的 Query String 里也有与 fuzzy 完全等价的写法,并非是"fox quick"~5,而是fox quick~5,这就与fuzzy的效果完全相同,所以在解决中文的这个问题上完全没有帮助。
  • 综上,Proximity Search 应用于短语里的单词,而非单词里的字母,所以可以解决这个问题:
        {
            "query": {
                "query_string": {
                    "query": "\""+keyword+"\"~3",
                    "default_field": field
                }
            },
            "size": 10,
            "from": 0,
            "sort": []
        }

关于 full text queryterm level query 的区别,原文链接为:Term level queries | Elasticsearch Guide [6.4] | Elastic

Full text queries will do analysis on the query string before executing. The term-level queries operate on the exact terms that are stored in the inverted index (without analyzing), however, will normalize any keyword fields with normalizer property. 

The full text queries are usually used for running queries on full text fields like the body of an email. Term level queries are usually used for structured data like numbers, dates, and enums, rather than full text fields. 


暂时还没找到人生乐趣的消极家