What Is Elasticsearch - Getting Started | ELK Stack

ELK Stack Certification Training (3 Blogs) Become a Certified Professional

In today’s IT world, a voluminous amount of data sizing approx 2.5 Quintillion bytes is generated every day. This data majorly comes from different sources, for example, social media sites, video sharing sites, and medium to large-scale organizations. This data is referred as data ocean or in more general terms called the Big Data. A considerable part of this data is insignificant, unstructured and scattered when it’s alone. To make sense out of it you need analytic tools. There are many analytics tools available in the market using which you can explore, record, access, analyze and process the unstructured data. Among all those tools, Elasticsearch stands out the most. Through this blog on what is Elasticsearch, I’ll explain all about it.

But before moving ahead in this what is Elasticsearch blog, let’s take a quick glance at the topics I will be explaining:

What Is Elasticsearch?
Elasticsearch Advantages
Elasticsearch Installation
Elasticsearch Basic Concepts
API Conventions In Elasticsearch

The following part of this Elasticsearch tutorial blog will introduce you to the Elasticsearch in detail.

What Is Elasticsearch?

Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

– Wikipedia

In other words, Elasticsearch is an open source, standalone database server developed in Java. Basically, it is used for full-text-search and analysis. It takes in unstructured data from various sources and stores it in a sophisticated format that is highly optimized for language based searches. As mentioned above, Elasticsearch uses Apache Lucene at its core for indexing and searching. Since, Lucene is just a library, working with it is a really complex. But you don’t have to worry about it as Elasticsearch hides all the complexities by providing access to the API. The API comes in the form of an HTTP RESTful API that uses JSON as the data exchange format. Using Elasticsearch you can store, search, and analyze big volumes of data in a quick and efficient manner. It is especially useful while dealing with semi-structured data i.e natural language.

Now that you know what is Elasticsearch, let’s dig a little into its history.

Elasticsearch is a product of the company named Elastic, which was founded back in 2012. ElasticSearch is one of the major open source products along with Logstash, Kibana, and Beats. Elastic provides several other commercial products like Marvel, Shield, Watcher, Found, etc.

Shay Banon in 2004, created the forerunner to Elasticsearch, called Compass. Rest of its evolution is depicted in the following timeline:

In the following section of this blog on what is Elasticsearch, you’ll find out what features of Elasticsearch made it stand out from the lot.

Advantages Of Elasticsearch

Following are few of its advantages:

Scalability: Elasticsearch is very easy to scale and reliable as well. It is a very important feature which helps to simplify the complex architectures and save time during the implementation of projects.
Speed: Elasticsearch uses distributed inverted indices to find the best matches for your full-text searches. This makes it really fast even when searching from very large data sets.
Easy to use API: Elasticsearch provides simple RESTful APIs and uses schema-free JSON documents which makes indexing, searching, and querying the data really easy.
Multilingual: One of the most distinct features Elasticsearch has is, it is multilingual. It supports a wide variety of documents written in different languages like Arabic, Brazilian, Chinese, English, French, Hindi, Korean etc.
Document-Oriented: Elasticsearch stores real-world complex entities as structured JSON documents and indexes all fields by default to make the data searchable. Since there are no rows and columns of data, you can perform complex full-text search easily.
Auto-completion: Elasticsearch also provides autocompletion functionality. By predicting the word using very few characters, autocompletion speeds up human-computer interaction.
Schema-Free: Elasticsearch is schema-free as it accepts JSON documents. It tries to detect the data structure, index the data and thus makes the data searchable.

Let’s now proceed and see how to install Elasticsearch on windows in the following section of what is Elasticsearch blog.

Installation

STEP I – Install the latest Java version or if you already have Java Installed then check for its version using java –version command in cmd.

NOTE: Java version must be 7 or more

STEP II – Go to https://www.elastic.co/downloads.

STEP III – Click on Download to get the zip file.

STEP IV – Once the file is downloaded, unzip it and extract the contents.

STEP V – Go to elasticsearch-x.y.z > bin.

STEP VI – Inside bin folder, find elasticsearch.bat file and double-click on it to start the Elasticsearch server.

STEP VII – Wait for the server to start.

STEP VIII – Open browser and type localhost:9200 to check whether the server is running or not.

STEP IX – If you can see the above-shown message on the browser, it means everything is fine.

STEP X – Last thing you need to do is, to add the Sense(beta) plugin which will act as a developers interface to Elasticsearch.

Elasticsearch Basic Concepts

Before diving deeper into Elasticsearch there are few concepts that you must get familiar with.

Near Real-Time
Elasticsearch is a near real-time search platform which means it can regularly schedule a fresh state of searchable documents. By default, it is one state per second. Thus, there is a slight latency until the time a document becomes searchable, from the time you index it.
Index

An index is a collection of documents having similar characteristics. It stores the data in one or more indices using SQL analogies. It is used to store and read the documents from it. In Elasticsearch, an index is identified by a unique name and must be in all lowercase. This name is then used to refer to a particular index while performing various activities on the documents present in it. In a single cluster, there can be n number of indexes.

Document

In Elasticsearch, a document is a basic unit of information which we can index. These documents consist of different fields and each of these fields is identified by its name and can contain one or more values. These documents are schema free and may have a different set of fields. This document is a JSON (JavaScript Object Notation). Within an index n number of documents can be stored.

Type

In Elasticsearch, a type is defined for documents which have a common set of fields. It is a logical category/ partition of an index whose semantics is completely up to the user. You can also define more than one type within an index.

Node

A node is a single instance of the Elasticsearch server which stores the data. It participates in the cluster’s indexing and searching capabilities. A node is identified by a name. By default, a random Universally Unique IDentifier (UUID) is assigned to the node at the startup. This name is used for the administration purposes. You can identify which servers in your network correspond to which nodes in your Elasticsearch cluster using these names.

Cluster

A cluster is a collection of one or more Elasticsearch nodes (servers) that works together. It holds the entire data and provides easy indexing and search capabilities across all the nodes. This distributed nature grant the easy handling of data that is too large for a single node to handle on its own. Like a node, a cluster is also identified by a unique name. By default, the name is “elasticsearch”. A node can only be part of a cluster if the node is set up to join the cluster by its name and that’s why the name of the cluster is very important.

Shards

Using a cluster, you can store large volumes of information that can exceed abilities of a single server. To solve this problem, Elasticsearch allows you to subdivide your index into multiple pieces which are called shards. The number of shards needed can be defined while creating an index. Each shard is a fully-functional and independent “index” which can be hosted on any node within the cluster.

Replicas

To avoid any kind of accidental failures, such as a shard or node going offline for some reason, its always recommended having a failover mechanism. Thus as a solution, Elasticsearch provides replicas. Replicas are just an additional copy of a shard and can be used for queries just as the original shards.

API Conventions

The Elasticsearch REST APIs are accessed using JSON over HTTP. Elasticsearch uses following conventions throughout the REST API:-

Multiple Indices: Generally, the operations in API’s are for multiple indices. This helps the user in performing various operations through the entire API by executing the related query once. Some of the notations used for these queries are:
1. Comma-separated notations (demo1,demo2,demo3)
2. Wildcard notations(demo*,de*o2,+demo3,-demo3)
3. _all keyword for all indices
4. URL Query String Parameters (ignore_unavailable, allow_no_indices, expand_wildcards)
Date Math Support in Index Name: You can search a range of time-series indices by using the date math index name resolution. This type of search limits the number of indices that are being searched, thus reducing the load on the cluster and improving the execution performance. You need to specify date and time in a specific format like: <static_name{date_math_expr{date_format|time_zone}}>
1. static_name: Represents the static text part of the name.
2. date_math_expr: Represents a dynamic date math expression which computes the date dynamically.
3. date_format: Represents the optional format in which the computed date should be rendered.
4. time_zone: Represents the optional time zone.
Common Options: Few of the common options are:
1. Pretty Result
2. Human Readable Output
3. Date Math
4. Response Filtering
5. Flat Settings
6. Parameter
7. No Values
8. Time Units
9. Byte Size Units
10. Unit-less quantities
11. Distance Units
12. Fuzziness
13. Enabling Stack Traces
14. Request Body In Query String
URL based Access Control: Users can also use a proxy with URL-based access control to secure access to the Elasticsearch indices. Elasticsearch provides an option of specifying an index in the URL and on each individual request within the request body for some requests like:
1. multi-search
2. multi-get
3. bulk

This brings us to the end of the blog on what is Elasticsearch. I hope through this blog on what is Elasticsearch I was able to clearly explain what is Elasticsearch and its basic components. For more advanced concepts and practical demonstrations, you can refer my next blog on Elasticsearch Tutorial.

If you want to get trained in Elasticsearch and wish to search and analyze large datasets with ease, then check out the ELK Stack Training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe.

Got a question for us? Please mention it in the comments section and we will get back to you.

Introduction to ELK Stack

Big Data

What Is Elasticsearch – Getting Started With No Constraints Search Engine

What Is Elasticsearch?

Advantages Of Elasticsearch

Installation

Elasticsearch Basic Concepts

Near Real-Time

Index

Document

Type

Node

Cluster

Shards

Replicas

API Conventions

Recommended videos for you

Hadoop for Java Professionals

Hadoop Architecture – Hadoop Tutorial on HDFS Architecture

Improve Customer Service With Big Data

Introduction to Apache Solr-1

Hadoop Cluster With High Availability

Administer Hadoop Cluster

Tailored Big Data Solutions Using MapReduce Design Patterns

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Is Hadoop A Necessity For Data Science?

New-Age Search through Apache Solr

What Is Hadoop – All You Need To Know About Hadoop

Python for Big Data Analytics

Big Data Processing with Spark and Scala

5 Things One Must Know About Spark

Reduce Side Joins With MapReduce

Boost Your Data Career with Predictive Analytics! Learn How ?

Introduction to Hadoop Administration

Distributed Cache With MapReduce

Spark SQL | Apache Spark

Ways to Succeed with Hadoop in 2015

Recommended blogs for you

What is the difference between Big Data and Hadoop?

Top Hive Commands with Examples in HQL

Install Puppet – Install Puppet in Four Simple Steps

Scala Functional Programming

Oozie Tutorial: Learn How to Schedule your Hadoop Jobs

Implementing Hadoop & R Analytic Skills in Banking Domain

Essential Hadoop Tools for Crunching Big Data

5 Reasons to Learn Hadoop

Introduction to Spark with Python – PySpark for Beginners

Splunk Careers – Your Pathway To Hot Big Data Jobs

Introduction of Hadoop Architecture

PySpark CheatSheet: Spark RDD with Python

Hadoop and Java Job Trends

Apache Pig UDF: Part 2 – Load Functions

Introduction to Hadoop 2.0 and Advantages of Hadoop 2.0 over 1.0

Apache Spark Lighting up the Big Data World

Apache Spark Architecture – Spark Cluster Architecture Explained

PySpark Dataframe Tutorial – PySpark Programming with Dataframes

Transfer files from Windows to Cloudera Demo VM

A Beginner’s Guide to Understanding Big Data & Hadoop

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

PySpark Certification Training Course

Microsoft Fabric Data Engineer Associate Trai ...

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...

Apache Spark and Scala Certification Training ...

ELK Stack Training & Certification

Comprehensive MapReduce Certification Trainin ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

What Is Elasticsearch – Getting Started With No Constraints Search Engine