首頁 > 軟體

你還不會ES的CUD嗎?

2020-09-23 22:30:21

近端時間在搬磚過程中對es進行了操作,但是對es查詢文件不熟悉,所以這兩週都在研究es,簡略看了《Elasticsearch權威指南》,摸摸魚又是一天。

es是一款基於Lucene的實時分散式搜尋和分析引擎,今天咱不聊其應用場景,聊一下es索引增刪改。

環境:Centos 7,Elasticsearch6.8.3,jdk8

(最新的es是7版本,7版本需要jdk11以上,所以裝了es6.8.3版本。)

下面都將以student索引為例

一、創建索引

PUT   http://192.168.197.100:9200/student
{
    "mapping":{
      "_doc":{ //“_doc”是類型type,es6中一個索引下只有一個type,不能有其它type
        "properties":{
          "id": {
              "type": "keyword"
          },
          "name":{
            "type":"text",
            "index":"analyzed",
            "analyzer":"standard"
          },
          "age":{
            "type":"integer",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above":256
              }
            }
          },
          "birthday":{
            "type":"date"
          },
          "gender":{
            "type":"keyword"
          },
          "grade":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                 "ignore_above":256
              }
            }
          },
          "class":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                 "ignore_above":256
              }
            }
          }
        }
      }
    },
    "settings":{
      //主分片數量
      "number_of_shards" : 1, 
      //分片副本數量
      "number_of_replicas" : 1
    }
}

 

type屬性是text和keyword的區別:

(1)text在查詢的時候會被分詞,用於搜尋

(2)keyword在查詢的時候不會被分詞,用於聚合

index屬性是表示字元串以何種方式被索引,有三種值

(1)analyzed:欄位可以被模糊匹配,類似於sql中的like

(2)not_analyzed:欄位只能精確匹配,類似於sql中的“=”

(3)no:欄位不提供搜尋

analyzer屬性是設定分詞器,中文的話一般是ik分詞器,也可以自定義分詞器。

number_of_shards屬性是主分片數量,預設是5,創建之後不能修改

number_of_replicas屬性時分片副本數量,預設是1,可以修改

創建成功之後會返回如下json字元串

{    "acknowledged": true,    "shards_acknowledged": true,    "index": "student"}

 

創建之後如何檢視索引的詳細資訊呢?

GET http://192.168.197.100:9200/student/_mapping

 

es6版本,索引之下只能有一個類型,例如上文中的“_doc”。

es跟關係型資料庫比較:

 

二、修改索引

//修改分片副本數量為2
PUT http://192.168.197.100:9200/student/_settings
{
  "number_of_replicas":2
}

 

三、刪除索引

//刪除單個索引 
DELETE http://192.168.197.100:9200/student

//刪除所有索引
DELETE  http://192.168.197.100:9200/_all

 

四、預設分詞器standard和ik分詞器比較

es預設的分詞器是standard,它對英文的分詞是以空格分割的,中文則是將一個詞分成一個一個的文字,所以其不適合作為中文分詞器。

例如:standard對英文的分詞

//此api是檢視文字分詞情況的 
POST http://192.168.197.100:9200/_analyze
{
  "text":"the People's Republic of China",
  "analyzer":"standard"
}

 

結果如下:

{
    "tokens": [
        {
            "token": "the",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "people's",
            "start_offset": 4,
            "end_offset": 12,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "republic",
            "start_offset": 13,
            "end_offset": 21,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "of",
            "start_offset": 22,
            "end_offset": 24,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "china",
            "start_offset": 25,
            "end_offset": 30,
            "type": "<ALPHANUM>",
            "position": 4
        }
    ]
}

 

對中文的分詞:

POST http://192.168.197.100:9200/_analyze
{
  "text":"中華人民共和國萬歲",
  "analyzer":"standard"
}

 

結果如下:

{
    "tokens": [
        {
            "token": "中",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "華",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "人",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "民",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position": 3
        },
        {
            "token": "共",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position": 4
        },
        {
            "token": "和",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position": 5
        },
        {
            "token": "國",
            "start_offset": 6,
            "end_offset": 7,
            "type": "<IDEOGRAPHIC>",
            "position": 6
        },
        {
            "token": "萬",
            "start_offset": 7,
            "end_offset": 8,
            "type": "<IDEOGRAPHIC>",
            "position": 7
        },
        {
            "token": "歲",
            "start_offset": 8,
            "end_offset": 9,
            "type": "<IDEOGRAPHIC>",
            "position": 8
        }
    ]
}

 

 

ik分詞器是支援對中文進行詞語分割的,其有兩個分詞器,分別是ik_smart和ik_max_word。

(1)ik_smart:對中文進行最大粒度的劃分,簡略劃分

例如:

POST http://192.168.197.100:9200/_analyze
{
  "text":"中華人民共和國萬歲",
  "analyzer":"ik_smart"
}

 

結果如下:

{
    "tokens": [
        {
            "token": "中華人民共和國",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "萬歲",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

 

(2)ik_max_word:對中文進行最小粒度的劃分,將文字劃分儘量多的詞語

例如:

POST http://192.168.197.100:9200/_analyze
{
  "text":"中華人民共和國萬歲",
  "analyzer":"ik_max_word"
}

 

結果如下:

{
    "tokens": [
        {
            "token": "中華人民共和國",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "中華人民",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "中華",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "華人",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "人民共和國",
            "start_offset": 2,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "人民",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "共和國",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "共和",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "國",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 8
        },
        {
            "token": "萬歲",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 9
        },
        {
            "token": "萬",
            "start_offset": 7,
            "end_offset": 8,
            "type": "TYPE_CNUM",
            "position": 10
        },
        {
            "token": "歲",
            "start_offset": 8,
            "end_offset": 9,
            "type": "COUNT",
            "position": 11
        }
    ]
}

 

ik分詞器對英文的分詞:

POST http://192.168.197.100:9200/_analyze
{
  "text":"the People's Republic of China",
  "analyzer":"ik_smart"
}

結果如下:會將不重要的詞去掉,但standard分詞器會保留(英語水平已經退化到a an the都不知道是屬於什麼類型的詞了,身為中國人,這個不能驕傲)

{
    "tokens": [
        {
            "token": "people",
            "start_offset": 4,
            "end_offset": 10,
            "type": "ENGLISH",
            "position": 0
        },
        {
            "token": "s",
            "start_offset": 11,
            "end_offset": 12,
            "type": "ENGLISH",
            "position": 1
        },
        {
            "token": "republic",
            "start_offset": 13,
            "end_offset": 21,
            "type": "ENGLISH",
            "position": 2
        },
        {
            "token": "china",
            "start_offset": 25,
            "end_offset": 30,
            "type": "ENGLISH",
            "position": 3
        }
    ]
}

 

五、新增文件

可以任意新增欄位

//1是“_id”的值,唯一的,也可以隨機生成
POST http://192.168.197.100:9200/student/_doc/1
{
  "id":1,
  "name":"tom",
  "age":20,
  "gender":"male",
  "grade":"7",
  "class":"1"
}

 

 

六、更新文件

POST http://192.168.197.100:9200/student/_doc/1/_update
{
  "doc":{
    "name":"jack"
  }
}

 

七、刪除文件

//1是“_id”的值 
DELETE http://192.168.197.100:9200/student/_doc/1

 

上述就是簡略的對es進行索引創建,修改,刪除,文件新增,刪除,修改等操作,為避免篇幅太長,文件查詢操作將在下篇進行更新。


IT145.com E-mail:sddin#qq.com