Apache Cassandra is a highly scalable, high performance distributed database which is designed to handle large amounts of data across multiple servers, provides high availability with no single point of failure. It is a type of NoSQL database.
● Cassandra was developed at Facebook for inbox search.
● It was open-sourced by Facebook in July 2008.
● Cassandra was accepted into Apache Incubator in March 2009.
A NoSQL database provides mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have very simple APIs,eventually consistent and can handle huge amounts of data.
Following are the primary objectives of NoSQL databases:
● These are simple in designs.
● Horizontal scalable.
● Finer control over availability.
Following are the features of Cassandra:
● Elastic scalability
● Always on architecture
● Flexible data storage
● Fast linear-scale performance
● Easy data distribution
● Transaction support
● Fast writes
In Cassandra the design goal is to handle big data workloads across multiple nodes without any single point of failure. Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes in a cluster.
● All the nodes in a cluster play same role. Each node is independent and at the same time they are interconnected to other nodes.
● Each node in a cluster can accept read and write requests, regardless of where the data is located in the cluster.
● When a node goes down, read/write requests can be served from other nodes in the network. This provides continuous availability.
In Cassandra, one or more nodes in a cluster act as replicas for a given data. If it is detected that some of the nodes responded with an out-of-date value, Cassandra will return the most recent value to the client. After returning the most recent value, Cassandra performs a read repair in the background to update the stale values.
Cassandra uses the Gossip Protocol in the background which allow the nodes to communicate with each other and detect any faulty nodes in the cluster.
Following are the key components of cassandra:
● Data center
● Commit log
● Bloom filter
Users can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL treats the database as a container of tables. Programmers use cqlsh: a prompt to work with CQL or separate application language drivers.
Cassandra database is distributed over several machines that operate together. The outermost container is known as the Cluster. For failure handling, every node contains a replica, and in case of a failure, the replica takes charge. Cassandra arranges the nodes in a cluster, in a ring format, and assigns data to these clusters.
It is the outermost container for data in Cassandra. The basic attributes of a Keyspace in Cassandra are:
● Replication factor
● Replica placement strategy
● Column families
It is a container for an ordered collection of rows. Each row, in turn, is an ordered collection of columns. Following are the basic attributes of column family:
A super column is a special column, therefore, it is also a key-value pair. But a super column stores a map of sub-columns.
Here are the key points:
● Cassandra deals with unstructured data whereas RDBMS deals with structured data.
● Cassandra has a flexible schema.
● In Cassandra, a table is a list of "nested key-value pairs". (ROW x COLUMN key x COLUMN value)
● Keyspace is the outermost container that contains data corresponding to an application.
● Tables or column families are the entity of a keyspace.
● Row is a unit of replication in Cassandra.
● Column is a unit of storage in Cassandra.
Cluster class is the main entry point of the driver which is in com.datastax.driver.core package. Following are the key methods of this class:
● Session connect()
● void close()
● static Cluster.Builder builder()
It is used to instantiate the Cluster.Builder class. Following are the key methods of this class:
● Cluster.Builder addContactPoint(String address)
● Cluster build()
Session is an interface in Cassandra. It holds the connections to Cassandra cluster. Using this interface, you can execute CQL queries. It belongs to com.datastax.driver.core package. Following are the key methods of this class:
● ResultSet execute(Statement statement)
● ResultSet execute(String query)
● PreparedStatement prepare(RegularStatement statement)
● PreparedStatement prepare(String query)
This method prepares the provided query. The query is to be provided in the form of a Statement.
CQLSH stands for Cassandra query language shell - By default, Cassandra provides a prompt Cassandra query language shell (cqlsh) that allows users to communicate with it. Using this shell, you can execute Cassandra Query Language (CQL).
Using cqlsh following things can be done:
● Define a schema
● Insert data
● Execute a query
This command is used to show the current consistency level, or sets a new consistency level.
It expands the output of a query vertically.
It enables or disables query paging.
It enables or disables request tracing.
Following are the CQL data definition commands:
● Create Keyspace
● Drop Keyspace
● Alter Keyspace
● Create Table
● Drop Table
● Alter Table
● Create Index
● Drop Index
Following are the CQL data manipulation commands: