Solr is an open source search platform which is commonly used to build search applications. It was built on top of Lucene (full text search engine). Solr is enterprise-ready, fast and highly scalable. The applications built with Solr are sophisticated and are highly efficient to deliver high performance.
Solr was developed by Yonik Seely in 2004. In 2006 it was made open source under Apache licence. Solr is a scalable, ready to deploy, search engine optimized to search large volumes of text-centric data.
It can be used with Hadoop. Solr can be used for storage purpose like other No SQL databases.
Solr is a non-relational data storage and processing technology.
Following are the features of Solr:
● Restful API services can be used to communicate with Solr. Where we enter documents in Solr in file formats like XML, JSON and .CSV and get results in the same file formats. So there is no need to have Java programming skills while working with Apache Solr.
● Solr provides full text search such as tokens, phrases, spell check, wildcard, and auto-complete.
● According to the need of the organization, Solr can be adjusted/modified and deployed in any kind of systems (big or small) such as standalone, distributed, cloud, etc.
● Components of Solr are highly customizable.
● Solr can be used as big data scale NoSQL database where we can distribute the search tasks along a cluster.
● Solr provides an easy-to-use, user friendly, feature powered, user interface
● Solr is highly scalable product.
Following are the major building blocks (components) of Apache Solr:
● Request Handler - The requests we send to Apache Solr are processed by these request handlers. The requests might be query requests or index update requests.
● Search Component - It might be spell checking, query, faceting, hit highlighting, etc. These search components are registered as search handlers.
● Query Parser - The Apache Solr query parser parses the queries that we pass to Solr and verifies the queries for syntactical errors.
● Response Writer - It generates the formatted output for the user queries. Solr supports response formats such as XML, JSON, CSV, etc.
● Analyzer/tokenizer - Apache Solr analyzes the content, divides it into tokens, and passes these tokens to Lucene. An analyzer in Apache Solr examines the text of fields and generates a token stream.
● Update Request Processor - Whenever we send an update request to Apache Solr, the request is run through a set of plugins (signature, logging, indexing), collectively known as update request processor.
Instance is the application server that runs inside a JVM. The home directory of Solr provides reference to each of the Solr instances, in which one or more cores can be configured to run in each instance.
While running multiple indexes in the application, it can have multiple cores in each instance.
The term $SOLR_HOME refers to the home directory which has all the information regarding the cores with their indexes, configurations, and dependencies.
In distributed environment, the data is partitioned between multiple Solr instances, where each chunk of that data can be called as a Shard. It contains a subset of the whole index.
Zookeeper is an Apache project that Solr Cloud uses for centralized configuration, coordination, to manage the cluster and to elect a leader.
In Solr Core, a copy of shard that runs in a node is known as a replica.
Leader is also a replica of shard, which distributes the requests of the Solr Cloud to the remaining replicas.
A cluster has a logical index that is known as a collection.
In Solr cloud, each single instance of Solr is regarded as a node.
Following are the main configuration files in Apache Solr:
● Solr.xml - This file is in $SOLR_HOME directory and contains Solr Cloud related information.
● Schema.xml - It contains the whole schema.
● Solrconfig.xml - It contains the definitions and core-specific configurations related to request handling and response formatting.
● Core.properties - This file contains the configurations specific to the core.
Following commands need to be used to start Solr:
[Hadoop@localhost ~]$ cd [Hadoop@localhost ~]$ cd Solr/ [Hadoop@localhost Solr]$ cd bin/ [Hadoop@localhost bin]$ ./Solr start
It starts the Solr in background and it listens on port 8983.
Following command can be used to start Solr in foreground:
[Hadoop@localhost bin]$ ./Solr start -f
Following command can be used to start Solr on another port:
[Hadoop@localhost bin]$ ./Solr start -p 8984
Following command should be used to stop Solr:
$ ./Solr stop
Following command can be used to restart Solr:
Following command can be used to check the statkus of running Solr instance:
[Hadoop@localhost bin]$ ./Solr status
This status command of Solr can be used to search and find out the running Solr instances on your computer.
Solr Core is a running instance of a Lucene index that contains all the Solr configuration files. Solr core needs to be created to perform operations like indexing and analyzing. Solr application may contain one or multiple cores. If necessary, two cores in a Solr application can communicate with each other.
Here is the way to create schema-less core using the create command:
[Hadoop@localhost bin]$ ./Solr create -c Solr_sample