首頁 > 軟體

ElasticSearch學習之多條件組合查詢驗證及範例分析

2023-09-12 18:03:35

多條件組合查詢

bool

es中使用bool來控制多條件查詢,bool查詢支援以下引數:

  • must:被查詢的資料必須滿足當前條件
  • mush_not:被查詢的資料必須不滿足當前條件
  • should:被查詢的資料應該滿足當前條件。should查詢被用於修正查詢結果的評分。需要注意的是,如果組合查詢中沒有must,那麼被查詢的資料至少要匹配一條should。如果有must語句,那麼就無須匹配shouldshould將完全用於修正查詢結果的評分
  • filter:被查詢的資料必須滿足當前條件,但是filter操作不涉及查詢結果評分。僅用於條件過濾

下面通過一個例子來看下如何使用:

GET class_1/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple"
        }}
      ],
      "must_not": [
        {"term": {
          "num": {
            "value": "5"
          }
        }}
      ],
      "should": [
        {"match": {
          "name": "k"
        }}
      ],"filter": [
        {"range": {
          "num": {
            "gte": 0,
            "lte": 10
          }
        }}
      ]
    }
  }
}

結果返回:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.7389809,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}

constant_score

constant_score查詢可以通過boost指定一個固定的評分,通常來說,constant_score的作用是代替一個只有filterbool查詢

下面看具體使用:

GET class_1/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "num": 6
        }
      },
      "boost": 1.2
    }
  }
}

返回:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.2,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : 1.2,
        "_source" : {
          "name" : "b",
          "num" : 6
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.2,
        "_source" : {
          "name" : "l",
          "num" : 6
        }
      }
    ]
  }
}

查詢驗證 & 分析

驗證

es中通過/_validate/query路由來驗證查詢條件的正確性, 這裡要注意是驗證查詢條件是否準確

範例:

GET class_1/_validate/query?explain
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple"
        }}
      ]
    }
  }
}

正常返回:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : "+name:apple"
    }
  ]
}

name欄位改為 name1再查詢:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : """+MatchNoDocsQuery("unmapped fields [name1]")"""
    }
  ]
}

可以看到報了異常錯誤

分析

es中通過/_validate/query?explain路由來進行查詢分析

範例:

GET class_1/_validate/query?explain
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple so"
        }}
      ]
    }
  }
}

返回:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : "+(name:apple name:so)"
    }
  ]
}

可以看到"explanation" : "+(name:apple name:so)",查詢的短語apple so被進行了分詞,分成了name:apple, name: so

排序

預設排序

在前面的幾個例子中,我們可以看到它的預設排序是按照_score降序,也就是匹配度高的比較靠前,但是_socre的計算是很佔用查詢效能的,這個不難理解。

當我們不需要進行_score計算,可以通過filterconstant_score來進行構建查詢條件

filter範例:

GET class_1/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {
          "num": 1
        }}
      ]
    }
  }
}

返回:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}

通過查詢結果我們發現score都為0.0了,說明沒有進行score計算

constant_score範例:

GET class_1/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "num": 1
        }
      },
      "boost": 1.2
    }
  }
}

返回:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.2,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}

可以看到,對應返回的分值,都是使用boost屬性指定的分值

自定義排序

自定義可以用於大部分場景,那麼es中怎麼進行自定義排序呢? es中使用sort引數來自定義排序順序,預設為升序,那麼降序怎麼操作呢?

  • 升序
{"sort":["num"]}
  • 降序, desc代表降序
{"sort":[{"num":{"order":"desc"}}]} 

tips

  • es中使用doc value列式儲存來實現欄位的排序功能
  • text欄位預設不建立doc value,因此無法針對text欄位進行排序
  • 可以通過設定text欄位屬性fielddata=true來開啟對text欄位的排序功能,但是不建議開啟,對text欄位排序及其消耗查詢效能且不符合需求

單欄位排序

GET class_1/_search
{
    "sort": [
        "num"
    ]
}

返回:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : null,
        "_source" : {
          "name" : "b",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "l",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "num" : 9,
          "name" : "e",
          "age" : 9,
          "desc" : [
            "hhhh"
          ]
        },
        "sort" : [
          9
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "f",
          "age" : 10,
          "num" : 10
        },
        "sort" : [
          10
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "RWlfBIUBDuA8yW5cu9wu",
        "_score" : null,
        "_source" : {
          "name" : "一年級",
          "num" : 20
        },
        "sort" : [
          20
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iGFt-4UBECmbBdQAnVJe",
        "_score" : null,
        "_source" : {
          "name" : "g",
          "age" : 8
        },
        "sort" : [
          9223372036854775807
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iWFt-4UBECmbBdQAnVJg",
        "_score" : null,
        "_source" : {
          "name" : "h",
          "age" : 9
        },
        "sort" : [
          9223372036854775807
        ]
      }
    ]
  }
}

可以看到是按照num預設升序排序

再看下降序:

GET class_1/_search
{
    "sort": [
        {"num": {"order":"desc"}}
    ]
}

返回:

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "RWlfBIUBDuA8yW5cu9wu",
        "_score" : null,
        "_source" : {
          "name" : "一年級",
          "num" : 20
        },
        "sort" : [
          20
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "f",
          "age" : 10,
          "num" : 10
        },
        "sort" : [
          10
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "num" : 9,
          "name" : "e",
          "age" : 9,
          "desc" : [
            "hhhh"
          ]
        },
        "sort" : [
          9
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : null,
        "_source" : {
          "name" : "b",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "l",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iGFt-4UBECmbBdQAnVJe",
        "_score" : null,
        "_source" : {
          "name" : "g",
          "age" : 8
        },
        "sort" : [
          -9223372036854775808
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iWFt-4UBECmbBdQAnVJg",
        "_score" : null,
        "_source" : {
          "name" : "h",
          "age" : 9
        },
        "sort" : [
          -9223372036854775808
        ]
      }
    ]
  }
}

這下就降序排序了

多欄位

GET class_1/_search
{
    "sort": [
        "num", "age"
    ]
}

scroll分頁

還記得之前給大家講的from+size的分頁方式嗎,es中預設允許from+size的分頁的最巨量資料量為10000。當我們想要批次獲取更大的資料量時,使用from+size就會十分的耗費效能。

然而大部分應用場景下的資料量是極其龐大的,比如你要查詢某些系統紀錄檔資料。es中可以使用/scorll路由來進行捲動分頁查詢,它類似於在查詢初始時間點建立了一個當前服務叢集的資料快照(包含每一個分片),並保留它一段時間。在時間超過了設定的過期時間以後,快照將在es空閒時被刪除。

需要注意的是,因為是進行快照查詢,因此在快照建立後資料的變更在本次的捲動查詢中,不可見

初始化快照 & 快照儲存10分鐘

查詢範例:

GET class_1/_search?scroll=10m
{
"query": {
 "match_phrase": {
   "name": "apple"
 }
},
"size": 2
}

返回:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      }
    ]
  }
}

如圖,當前共返回2條資料,並且返回了一個快照ID,後續可以根據快照ID進行卷動查詢:

根據快照ID捲動查詢

GET /_search/scroll
{
 "scroll": "10m", 
 "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw=="
}

返回:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.7389809,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}

在捲動一次:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [ ]
  }
}

有的小夥伴可能不知道怎麼捲動的,因為後續捲動都是同一個scroll_id,其實通過結果,我們不難發現:

  • 首先建立了一個10分鐘的快照,規定了每次返回的資料量為2條,並且初始化的時候,返回了2條
  • 通過scroll_id進行卷動操作,返回了1條資料,原因是快照的資料量總共只有3條,初始化的時候返回了2條,所以現在只有1條
  • 再次捲動的時候,發現返回了空,因為資料已經被查完了

以上就是ElasticSearch 多條件組合查詢驗證及範例分析的詳細內容,更多關於ElasticSearch 多條件組合查詢的資料請關注it145.com其它相關文章!


IT145.com E-mail:sddin#qq.com