Elasticsearch Interview Questions And Answers

elasticsearch interview questions and answers   Top 22 Elasticsearch Interview Questions And Answers

1) What is Elasticsearch?

Elasticsearch is a search engine is based on Apache Lucene which is a free open source information retrieval Java based library. It provides a distributed, full-text search engine with an HTTP web interface and schema-free JSON documents. It is one of the most popular enterprise search engine.

Elasticsearch is developed in Java and is open source under Apache licence. It was developed by Shay Banon and published in 2010. It can be used to search all kinds of documents. Also It provides scalable search, has near real-time search, and supports multitenancy. It allows you to store, search, and analyze big volumes of data quickly and in real time. It is normally used as the underlying engine/technology that powers applications that have complex search features, requirements and has huge data.


2) What are the features of Elasticsearch?

Following are the key feature of Elasticsearch:

● Elasticsearch is scalable up to petabytes of structured and unstructured data.
● It uses denormalization to improve the search performance.
● Elasticsearch is one of the popular enterprise search engines which is being used my many big organizations.
● It is an open source and comes under Apache licence.

Many big organizations across the globe have started using this Elasticsearch as the key enterprise search engine to dealth with the problem of big data, and to enable users to explore very large of data at very high speed. Some of the organizations which are usng Elasticsearch are StackOverflow, Wikipedia, The Guardian, GitHub etc.


3) What are the core concepts of Elasticsearch?

Following are the key core concepts of Elasticsearch:
● Near Realtime (NRT)
● Cluster
● Node
● Index
● Document
● Shards and Replicas


4) What is Near Realtime (NRT) in Elasticsearch?

Elasticsearch has Near Realtime (NRT) search i.e. there is just slight delay or latency from the time the document is indexed to when it is available for the search. The latency is normally of few seconds.


5) What is Cluster in Elasticsearch?

A cluster is the collection of nodes having the data and provide indexing and searching capabilities.


6) What is the default name of cluster in Elasticsearch?

The default name of cluster is 'elasticsearch'.

This name is important as the node can only be part of a cluster if the node is set up to join the cluster by its name.


7) What is Node in Elasticsearch?

A node is single server which is a part of the cluster, stores data, and participates in the cluster's indexing and search capabilities.

Multiple nodes work together to form Elasticsearch cluster.


8) What is the default name of node in Elasticsearch?

The default name of node is 'Universally Unique Identifier (UUID)'.


9) What is Index in Elasticsearch?

An index is a collection of documents that have similar characteristics.

In a single cluster, there can be as many indexes as you want.


10) What is Shards in Elasticsearch?

Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.


11) Why Shards is important in Elasticsearch?

Following are the reason why Shards is important in Elasticsearch:
● It allows horizontally split/scale of the content volume.
● It allows to distribute and parallelize operations across shards


12) What is Replicas in Elasticsearch?

Elasticsearch allows you to make one or more copies of your index's shards into what are called replica shards, or replicas for short.

This provides high-availability and fault-tolerance.


13) Why Replicas is important in Elasticsearch?

Following are the reason why Replicas is important in Elasticsearch:
● It provides high availability in case shard/node fails.
● It allows to scale out search volume/throughput since searches can be executed on all replicas in parallel.


14) How many allocations been done to index in Elasticsearch?

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica.


15) What are the advantages of Elasticsearch?

Following are the key advantages of Elasticscearch:

● Elasticsearch is developed on Java, so it has the compatibility with almost every platform.
● Elasticsearch is near real time, in other words after one second or so the added document is searchable in this engine.
● It is distributed, so it can be easily scaleable and integrated in any of the big organization.
● Very easy to take the backup.


16) What are the disadvantages of Elasticsearch?

Following is the disadvantage of Elasticscearch:

● It does not have multi-language support.
● There is no built-in authentication or authorization system.
● Elasticsearch is not an ACID compliant system.
● Elasticsearch queries can't be written in SQL.


17) What is elasticsearch search API?

In Elasticsearch search APIs are used to search the content. Either a user can search by sending a get request with query string as a parameter or a query in the message body of post request. Mainly all the search APIs are multi-index, multi-type.


18) What is multi-index in Elasticsearch?

Elasticsearch allows to search for the documents present in all the indices or in some specific indices.


19) What is multi-type in Elasticsearch?

Multi-type means that we can search all the documents in an index across all types or in some specified type.


20) What is aggregations in Elasticsearch?

It is a framework that collects all the data which is selected by search query. It consist of many building blocks which help in building complex data.

Basic syntax or structure of aggregation is:

"aggregations" : {
   "<aggregation_name>" : {
      "<aggregation_type>" : {
         <aggregation_body>
      }
  
      [,"meta" : { [<meta_data_body>] } ]?
      [,"aggregations" : { [<sub_aggregation>]+ } ]?
   }
}


21) What are the different types of aggregations?

Following are the different types of aggregations:

● Metrics Aggregations
● Bucket Aggregations


22) What are the different metrics aggregations?

Metrics aggregations help in computing matrices either from fields value or from scripts. Following are the different types of metrics aggregations:

● Avg Aggregation
● Cardinality Aggregation
● Extended Stats Aggregation
● Max Aggregation
● Min Aggregation
● Sum Aggregation

Avg Aggregation - It is used to get the average of numeric field. Example:

Request: 

{
   "aggs":{
      "avg_fees":{"avg":{"field":"fees"}}
   }
}

Response: 

{
   "took":44, "timed_out":false, "_shards":{"total":5, "successful":5, "failed":0},
   "hits":{
      "total":3, "max_score":1.0, "hits":[
         {
            "_index":"schools", "_type":"school", "_id":"2", "_score":1.0,
            "_source":{
               "name":"SPaul School", "description":"ICSES Affiliation",
               "street":"Dawarka", "city":"Delhi", "state":"Delhi", 
               "zip":"110075", "location":[18.5733056, 57.0122136], "fees":5000, 
               "tags":["Good Faculty", "Great Sports"], "rating":"4.5"
            }
         },
   
         {
            "_index":"schools", "_type":"school", "_id":"1", "_score":1.0,
            "_source":{
               "name":"Central School", "description":"CBSEW Affiliation",
               "street":"Sagan", "city":"papola", "state":"HP", "zip":"165715",
               "location":[31.8955385, 76.8380405], "fees":2200, 
               "tags":["Senior Secondary", "beautiful campus"], "rating":"3.3"
            }
         },

         {
            "_index":"schools", "_type":"school", "_id":"1", "_score":1.0,
            "_source":{
               "name":"Central High School", "description":"CBSE Affiliation",
               "street":"Gagan", "city":"papola", "state":"HP", "zip":"868815",
               "location":[41.8955385, 79.8380405], "fees":2200, 
               "tags":["Senior Secondary", "Great infrastructure"], "rating":"3.9"
            }
         },
   
         {
            "_index":"schools", "_type":"school", "_id":"3", "_score":1.0,
            "_source":{
               "name":"Crescent School", "description":"State Board Affiliation",
               "street":"Tonk Road", "city":"Jaipur", "state":"RJ", 
               "zip":"176114", "location":[25.8535922, 35.8923988], "fees":2500, 
               "tags":["Labs"], "rating":"4.5"
            }
         }
      ]
   }, "aggregations":{"avg_fees":{"value":1133.3333333333335}}
}

Cardinality Aggregation - It gives count of distinct values. Example:

Request header: 
{
   "aggs":{
      "distinct_name_count":{"cardinality":{"field":"name"}}
   }
}

Extended stats Aggregation - It generates the statistics about a specific numerical field. Example:

Request header: 
{
   "aggs" : {
      "fees_stats" : { "extended_stats" : { "field" : "fees" } }
   }
}

Max Aggregation - It gives the max value. Example:

Request header: 
{
   "aggs" : {
      "max_fees" : { "max" : { "field" : "fees" } }
   }
}

Min Aggregation - It gives the min value. Example:

Request header: 
{
   "aggs" : {
      "min_fees" : { "min" : { "field" : "fees" } }
   }
}

Sum Aggregation - It gives the sum value. Example:

Request header: 
{
   "aggs" : {
      "total_fees" : { "sum" : { "field" : "fees" } }
   }
}