November 11, 2019

#Cassandra Part 1: NoSQLDatabase

NoSQLDatabase
  • A NoSQL database is sometimes called as Not Only SQL. It is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. 
  • These type databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data.
  • Primary objective of a NoSQL database is to have: simplicity of design, horizontal scaling, and finer control over availability.
  • SQL was designed to be a query language for relational databases, and relational databases are usually table- based, much like what we see in a spreadsheet. In a relational database, records are stored in rows and then the columns represent fields in each row. SQL allows us to query within and between tables in that relational database.
  • On the other hand, NoSQL databases are more flexible, NoSQL databases allow us to define fields as we create a record.
  • Nested values are common in NoSQL databases. We can have hashes and arrays and objects, and then nest more objects and arrays and hashes within those.
  • Also fields are not standardized between records in NoSQL databases, we can have a different structure for every record in your NoSQL database.

Difference between NoSQLDatabase and a Relational database
  • Relational Database supports powerful query language, where as NoSQLDatabase supports very simple query language.
  • Relational Database has a fixed schema. No fixed schema in NoSQLDatabase.
  • Relational Database follows ACID (Atomicity, Consistency, Isolation, and Durability). On other hand NoSQLDatabase is only 'eventually consistent'.
  • Relational Database supports transactions, where as NoSQLDatabase does not support transactions.
Name different types of NoSQL database.
There are four types of NoSQL Database:
  • Key Value Store type database
  • Document Store type database
  • Column Store type database
  • Graph Database
Distributed Database
  • Distributed means splitting data or tasks across multiple machines.
  • In Cassandra no single node (a machine in a cluster is usually called a node) holds all the data, but just a chunk of it.
  • The main advantage of this is we are not limited by the storage and processing capabilities of a single machine. In future if the data gets larger we can add more machines.
High Availability
  • A high-availability system is one that is ready to serve any request at any time.
  • It is usually achieved by adding redundancies. So, if one part fails, the other part of the system can serve the request without telling this to the client.
  • Cassandra is a robust software, where joining and leaving of nodes are automatically taken care of.
  • With proper settings, Cassandra can be made failure-resistant. With this if the servers fail, the data loss will be zero.
Replication
  • Replication is achieved by frequent copying of data from a database in one computer/server to a database in another so that all users share the same level of information.
  • Cassandra has a pretty powerful replication mechanism.
  • It treats every node in the same manner and doesn't have any master-slave concept. In Cassandra, data need not be written on a specific server (master) and we need not wait until the data is written to all the nodes that replicate this data (slaves). This means that the client can be returned with success as a response as soon as the data is written on at least one server.
Define replication factor.
The data in a node undergoes replication. The data is copied from one node to another to ensure fault tolerance. The replication factor is the number of copies of the data that are sent to different nodes.

Define replication strategy.
These strategies define the technique how the replicas are placed in a cluster. There are mainly two types of Replication Strategy: Simple strategy and Network Topology Strategy.

Can we change the replication factor on a live cluster?
Yes, but it will require running repair to alter the replica count of the existing data.

-K Himaanshu Shuklaa..

1 comment: