November 11, 2019

#Cassandra Part 3

Define commit log.
It is a mechanism that is used to recover data in case the database crashes. Every operation that is carried out is saved in the commit log. Using this the data can be recovered.

Explain the concept of tunable consistency in Cassandra. Name the types of tunable consistency.
Consistency is a technique to synchronize and update rows of Cassandra data and it replica.. Cassandra’s tunable consistency allow us  to select the consistency level which is best suited for our use cases. Cassandra support two types of consistencies: Eventual and Strong Consistency

Eventual guarantees consistency when no new updates are made on a given data item, i.e., all accesses return the last updated value eventually. However, the systems with eventual consistency are known to have achieved replica convergence.

For strong consistency, Cassandra supports the following condition:
R + W > N where, N is the number of replicas, W and R is the number of nodes that need to agree for a successful write and read.

Describe Memtable.
Memtables are basically the in-memory/ write-back cache space containing content in key and column format. The data in a memtable is sorted by key, and each column family consists of a distinct memtable that retrieves column data via the key. It stores the writes until it is full, and then flushes them out.

What is SSTable? How is it different from other relational tables?
SSTable is a ‘Sorted String Table,’ which refers to an important data file in Cassandra and accepts regular written memtables.

SSTables are stored on disk and exist for each Cassandra table. SSTables are immutable i.e do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary, and a bloom filter.

What is a Bloom Filter.
Bloom filter is associated with SSTable is an off-heap (off the Java heap to native memory) data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.

How does Cassandra write?
Cassandra offers faster write performance. Cassandra performs the write function by applying two commits. First it writes to a commit log on the disk, and then it commits to an in-memory structure known as memtable. The write is achieved after these two commits are successful, the write is achieved. Writes are written in the table structure as SSTables (sorted string tables).

What is the differences between a node, cluster, and a data center in Cassandra?
A node is a single machine running Cassandra, where as cluster is a collection of nodes that have similar types of data grouped together.

Data centers are useful components when serving customers in different geographical areas. We can group different nodes of a cluster into different data centers.

Explain the concept of compaction in Cassandra.
Compaction refers to a maintenance process in Cassandra, in which the SSTables are reorganized for data optimization of data structures on the disk. The compaction process is useful during interacting with memtables.

There are two types of compaction in Cassandra.
  • Minor compaction gets started automatically when a new SSTable is created. Here, Cassandra condenses all the equally sized SSTables into one.
  • Major compaction is triggered manually using the nodetool. It compacts all SSTables of a column family into one.
What is Tombstone in Cassandra?
Tombstone is a row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassandra supports eventual consistency, where the data must respond before any successful operation.

What is Super Column in Cassandra?
It is a unique element consisting of similar collections of data. They are actually key–value pairs with values as columns.

It is a sorted array of columns, and they follow a hierarchy when in action: keystore > column family > super column > column data structure in JSON.

Similar to the row keys, super column data entries contain no independent values but are used to collect other columns. It is interesting to note that super column keys appearing in different rows do not necessarily match and will not ever.

What is the difference between Column and Super Column?
Both elements work on the principle of tuples having name and value. However, the Column’s value is a string, while the value of the Super Column is a map of columns with different data types.

Unlike Columns, Super Columns do not contain the third component of timestamp.

Name the management tools in Cassandra.
DataStaxOpsCenter and SPM.

DataStax OpsCenter is the Internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional edition of OpsCenter.

SPM primarily administers Cassandra metrics and various OS and JVM metrics. Besides Cassandra, SPM also monitors Hadoop, Spark, Solr, Storm, ZooKeeper, and other Big Data platforms. The main features of SPM include correlation of events and metrics, distributed transaction tracing, creating real-time graphs with zooming, anomaly detection, and heartbeat alerting.

-K Himaanshu Shuklaa..

No comments:

Post a Comment