Elastic Search 基础介绍

ElasticSearch 是一个分布式、可扩展、实时的搜索与数据分析引擎。

功能介绍

  • 全文搜索
  • 结构化数据的实时统计
  • 数据分析
  • 复杂的语言处理
  • 地理位置和对象间关联关系

应用场景

  • Wikipedia 使用 Elasticsearch 提供带有高亮片段的全文搜索
  • Stack Overflow 将地理位置查询融入全文检索中去,并且使用 more-like-this 接口去查找相关的问题与答案
  • GitHub 使用 Elasticsearch 对1300亿行代码进行查询

特点

  • 一个分布式的实时文档存储,每个字段 可以被索引与搜索
  • 一个分布式实时分析搜索引擎
  • 能胜任上百个服务节点的扩展,并支持 PB 级别的结构化或者非结构化数据

操作

索引结构分析

路径 /megacorp/employee/1 包含了三部分的信息:

megacorp:索引名称,类似数据库
employee:类型名称,类似数据表
1:特定雇员的ID,类似表里的ID

写入

1
2
3
4
5
6
7
8
9
curl -X PUT "localhost:9200/megacorp/employee/1" -H 'Content-Type: application/json' -d'
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
'

查询

1
2
3
4
5
6
7
8
9
10
curl  'localhost:9200/megacorp/employee/1?pretty'
{"_index":"megacorp","_type":"employee","_id":"2","_version":1,"found":true,"_source":
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}

搜索

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
curl -X GET "10.96.83.188:9200/megacorp/employee/_search"
{"took":71,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"megacorp","_type":"employee","_id":"2","_score":1.0,"_source":
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
},{"_index":"megacorp","_type":"employee","_id":"1","_score":1.0,"_source":
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},{"_index":"megacorp","_type":"employee","_id":"3","_score":1.0,"_source":
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
}]}}

简单查询 query_string

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
curl -X GET "10.96.83.188:9200/megacorp/employee/_search?q=last_name:Smith"
{"took":31,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.2876821,"hits":[{"_index":"megacorp","_type":"employee","_id":"2","_score":0.2876821,"_source":
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
},{"_index":"megacorp","_type":"employee","_id":"1","_score":0.2876821,"_source":
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}]}}

查询表达式搜索 query

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
curl -X GET "10.96.83.188:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
'

{"took":9,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.2876821,"hits":[{"_index":"megacorp","_type":"employee","_id":"2","_score":0.2876821,"_source":
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
},{"_index":"megacorp","_type":"employee","_id":"1","_score":0.2876821,"_source":
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}]}}

更复杂的搜索 过滤器 filter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
curl -X GET "10.96.83.188:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
"query" : {
"bool": {
"must": {
"match" : {
"last_name" : "smith"
}
},
"filter": {
"range" : {
"age" : { "gt" : 30 }
}
}
}
}
}
'
{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"megacorp","_type":"employee","_id":"2","_score":0.2876821,"_source":
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
}]}}

全文搜索

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
curl -X GET "10.96.83.188:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
'
{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.53484553,"hits":[{"_index":"megacorp","_type":"employee","_id":"1","_score":0.53484553,"_source":
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},{"_index":"megacorp","_type":"employee","_id":"2","_score":0.26742277,"_source":
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
}]}}

Elasticsearch 默认按照相关性得分排序,即每个文档跟查询的匹配程度。
第一个最高得分的结果很明显:John Smith 的 about 属性清楚地写着 “rock climbing”
Jane Smith也作为结果返回,原因是她的 about 属性里提到了 “rock” 。因为只有 “rock” 而没有 “climbing” ,所以她的相关性得分低于 John 的。

这是完全区别于传统关系型数据库的一个概念,数据库中的一条记录要么匹配要么不匹配。

短语搜索

想要精确匹配一系列单词或者短语。
匹配同时包含 “rock” 和 “climbing” ,并且 二者以短语 “rock climbing” 的形式紧挨着的雇员记录。
为此对 match 查询稍作调整,使用一个叫做 match_phrase 的查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
 curl -X GET "10.96.83.188:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
'
{"took":13,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.53484553,"hits":[{"_index":"megacorp","_type":"employee","_id":"1","_score":0.53484553,"_source":
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}]}}

高亮搜索

返回结果与之前一样,与此同时结果中还多了一个叫做 highlight 的部分。这个部分包含了 about 属性匹配的文本片段,并以 HTML 标签 封装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
curl -X GET "10.96.83.188:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
'
{"took":36,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.53484553,"hits":[{"_index":"megacorp","_type":"employee","_id":"1","_score":0.53484553,"_source":
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
,"highlight":{"about":["I love to go <em>rock</em> <em>climbing</em>"]}}]}}

分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
curl -X GET "10.96.83.188:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
'


{
...
"hits": { ... },
"aggregations": {
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2
},
{
"key": "forestry",
"doc_count": 1
},
{
"key": "sports",
"doc_count": 1
}
]
}
}
}

本文标题:Elastic Search 基础介绍

文章作者:Craze lee

发布时间:2019年04月11日 - 17:04

最后更新:2019年04月25日 - 10:04

原始链接:http://craze-lee.github.io/2019/04/11/ElasticSearch/ES基础介绍/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

您的支持将鼓励我继续创作!