Elasticsearch Tutorial – Power Up Your Searches

Last updated on Jan 28,2021 16.9K Views
Sr Research Analyst at Edureka. A techno freak who likes to explore... Sr Research Analyst at Edureka. A techno freak who likes to explore different technologies. Likes to follow the technology trends in market and write...

Elasticsearch Tutorial – Power Up Your Searches

edureka.co

In my previous blog on What is Elasticsearch, I have introduced Elasticsearch, talked about its advantages, and did the installation on windows. I have also discussed the basic concepts and different API conventions present in Elasticsearch. But let me tell you something interesting, whatever I have discussed in the previous blog, is just the tip of the iceberg. In this Elasticsearch tutorial blog, I will introduce all the features which make the Elasticsearch the fastest and most popular among its competitors. Also, I will introduce you to the different API’s present in Elasticsearch and how you can perform different searches using them through this Elasticsearch tutorial blog.

Below are the topics that I will be discussing this Elasticsearch tutorial blog:

So, let’s get started with the very first topic of this Elasticsearch tutorial blog.

 

Elasticsearch APIs – Elasticsearch Tutorial

This section of Elasticsearch tutorial blog talks about various kinds of API’s supported by Elasticsearch. Let’s understand each of them in detail.

Document API

Elasticsearch provides both single document APIs and multi-document APIs.

  1. SINGLE DOCUMENT API
    • Index API
    • Get API
    • Update API 
    • Delete API
  2. MULTI-DOCUMENT API
    • Multi Get API
    • Bulk API
    • Delete By Query API
    • Update By Query API
    • Reindex API

Now that you know about different types of Document APIs, let’s try to implement CRUD operations to them.

Index API

The index API is responsible for adding and updating a typed JSON document in a specific index and then making it searchable. The following example inserts the JSON document into the “playlist” index, under a type called “kpop” with an id of 1:

PUT /playlist/kpop/1
{
 "title" : "Beautiful Life",
 "artist" : "Crush",
 "album" : "Goblin",
 "year" : 2017
}

GET API

The get API is responsible for fetching a typed JSON document from the index based on its unique id. The following example gets a JSON document from a “playlist” index, under a type called “kpop”, with id valued 2:

GET /playlist/kpop/2

UPDATE API

The updated API is responsible for updating a document based on a script provided. The operation fetches the document from the index, runs the script and then indexes back the result. To make sure no updates happen during the “get” and “reindex”, it uses versioning. The following example updates a JSON document from a “playlist” index, under a type called “kpop”, by adding a new field called “time”:

PUT /playlist/kpop/1
{
 "title" : "Beautiful Life",
 "artist" : "Crush",
 "album" : "Goblin",
 "year" : 2017,
 "time" : 5
}

DELETE API

The delete API is responsible for deleting a typed JSON document from a specific index based on its unique id. The following example gets a JSON document from a “playlist” index, under a type called “kpop”, with id valued 3:

DELETE /playlist/kpop/3

Search API

The search API is responsible for searching the content within the Elasticsearch. You can search either by sending a get request with a query having a string parameter or a query in the message body of a post request. Generally, the search APIs are multi-index or multi-type.

There are various parameters which can be passed in a search operation having Uniform Resource Identifier (URI):

ParameterDescription
qThis parameter specifies query string
lenientBy setting this parameter’s value to true, format based errors can be ignored
fieldsThis parameter fetches response from selective fields
sortThis parameter sorts the result
timeoutThis parameter helps in restricting the search time
terminate_afterThis parameter restricts the response to a specific number of documents in each shard
fromThis parameter specifies the start index
sizeThis parameter specifies the number of hits to return

Now that you are familiar with the search parameter, let’s see how you can perform the search through multiple indexes and types.

  1. Multi-Index

    In Elasticsearch, you can search for the documents present in all the indices or in some particular indices. The following example searches for JSON documents from all the indexes, where the year is 2014:

    GET playlist,my_playlist/_search?q=2014
    {
     "title" : "MAMACITA",
     "artist" : "SuJu",
     "album" : "MAMACITA",
     "year" : 2014,
     "time" : 4
    }
  2. Multi-Type

    You can also search all the documents in a particular index across all types or in some specified type. The following example searches for JSON documents from a “playlist” index, under all types, where the year is 2017:

    GET playlist/_search?q=2017

The next section of Elasticsearch tutorial will talk about the aggregations and its types supported by Elasticsearch.

Aggregations

In Elasticsearch, aggregations framework is responsible for providing the aggregated data based on a search query. Aggregations can be composed together in order to build complex summaries of the data. For a better understanding, consider it as a unit-of-work. It develops analytic information over a set of documents that are available in Elasticsearch. Various types of aggregations are available, each of them having its own purpose and output. For simplification, they are generalized to 4 major families:

  1. Bucketing

    Here each bucket is associated with a key and a document. Whenever the aggregation is executed, all the buckets criteria are evaluated on every document. Each time a criterion matches, the document is considered to “fall in” the relevant bucket.

  2. Metric

    Metrics are the aggregations which are responsible for keeping a track and computing the metrics over a set of documents.

  3. Matrix

    Matrix are the aggregations which are responsible for operating on multiple fields. They produce a matrix result out of the values extracted from the requested document fields. Matrix does not support scripting.

  4. Pipeline

    Pipeline are the aggregations which are responsible for aggregating the output of other aggregations and their associated metrics together.

The following example shows how a basic aggregation is structured:

"aggregations" : {
 "<aggregation_name>" : {
 "<aggregation_type>" : {
 <aggregation_body>
 }
 [,"meta" : { [<meta_data_body>] } ]?
 [,"aggregations" : { [<sub_aggregation>]+ } ]?
 }
 [,"<aggregation_name_2>" : { ... } ]*
}

Index API

In Elasticsearch, the index APIs or the indices APIs are responsible for managing individual indices, index settings, aliases, mappings, and index templates. Following are some of the operations that we can perform on Index APIs:

Cluster API

The Cluster API in Elasticsearch is responsible for fetching information about a cluster and its nodes and making further changes in them. 

Next section of this Elasticsearch Tutorial blog talks about the Query DSL provided by Elasticsearch.

Query DSL – Elasticsearch Tutorial

Elasticsearch provides a full Query DSL which is based on JSON and is responsible for defining queries. The Query DSL consisting of two types of clauses:

  1. Leaf Query Clauses

    In Elasticsearch, the leaf query clauses search for a particular value in a particular field like match, term or range queries. These queries can be used by themselves as well.

  2. Compound Query Clauses

    In Elasticsearch, the compound query clauses wrap up other leaf or compound queries. These queries are used for combining multiple queries in a logical fashion or for altering their behavior.

The following example shows a simple join query:

POST /my_playlist/_search
{
 "query":
 {
 "has_child" : {
 "type" : "kpop", "query" : {
 "match" : {
 "artist" : "EXO"
 }
 }
 }
 }
}
  1. geo_point: These are the fields which support lat/ lon pairs
  2. geo_shape: These are the fields which support points, lines, circles, polygons, multi-polygons etc.
{
 "query":{
 "filtered":{
 "filter":{
 "geo_distance":{
 "distance":"150km",
 "location":[42.056098, 86.674299]
 }
 }
 }
 }
}

Next part of this Elasticsearch Tutorial blog talks about different mappings available in Elasticsearch.

Mapping – Elasticsearch Tutorial

In Elasticsearch, mapping is responsible for defining how a document and its fields are stored and indexed. The following example shows a simple mapping query:

POST /playlist
POST /playlist
{
 "mappings": {
 "report": {
 "_all": {
 "enabled": true
 },
 "properties":{
 "title":{ "type":"string"}, "artist":{ "type":"string"},
 "album":{ "type":"string"}, "year":{ "type":"integer"}
 }
 }
}

Following section of this Elasticsearch Tutorial blog will introduce you to the analysis processes in Elasticsearch.

Analysis – Elasticsearch Tutorial

In Elasticsearch, analysis is the process of conversion of text into tokens or terms. These tokens are then added to the inverted index for the searching purpose. This process of analysis is performed by an analyzer. An analyzer can be of two types:

  1.  Built-in analyzer 
  2.  custom analyzer defined per index.

Thus, if no analyzer is defined, then by default the built-in analyzers will perform the analysis. The following example shows a simple analysis query:

PUT cities
{
 "mappings": {
 "metropolitan": {
 "properties": {
 "title": {
 "type": "text",
 "analyzer": "standard"
 }
 }
 }
 }
}

Next part of this Elasticsearch Tutorial blog talks about different modules provided by Elasticsearch.

Modules – Elasticsearch Tutorial

Elasticsearch is composed of different modules, which are responsible for various aspects of its functionality. Each of these modules can have any one of the following settings:

  1. static – These settings must be done at the node level and must be set on every relevant node.
  2. dynamic – These settings can be updated dynamically on a live cluster.
ModulesDescription
Cluster-level routing and shard allocationResponsible for the settings which control where, when, and how shards are allocated to nodes.
DiscoveryResponsible for discovering a cluster and maintaining the state of all the nodes in it.
GatewayResponsible for maintaining the cluster state and the shard data across full cluster during restarts.
HTTPResponsible for managing the communication between HTTP client and Elasticsearch APIs.
IndicesResponsible for maintaining the settings that are set globally for every index.
NetworkResponsible for controlling default network settings.
Node ClientResponsible for starting a node in a cluster.
PainlessDefault scripting language responsible for safe use of inline and stored scripts.
PluginsResponsible for enhancing the basic elasticsearch functionality in a custom manner.
ScriptingEnables user to use scripts to evaluate custom expressions.
Snapshot/ RestoreResponsible for creating snapshots of individual indices or an entire cluster into a remote repository.
Thread poolsResponsible for holding several thread pools in order to improve how threads memory consumption are managed within a node.
TransportResponsible for configuring the transport networking layer.
Tribe NodesResponsible for joining one or more clusters and act as a federated client across them.
Cross-Cluster SearchResponsible for executing the search requests across more than one cluster without joining them and act as a federated client across them.

This brings us to the end of the blog on Elasticsearch tutorial. I hope through this blog on Elasticsearch tutorial I was able to clearly explain different Elasticsearch APIs and how to use them. 

Elasticsearch Tutorial | Getting Started with Elasticsearch 

If you want to get trained in Elasticsearch and wish to search and analyze large datasets with ease, then check out the ELK Stack Training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe.

Got a question for us? Please mention it in the comments section and we will get back to you.

BROWSE COURSES