November 11, 2019

#Cassandra Part 4

Command To Create Keyspace
CREATE KEYSPACE EmployeeDetails
WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};

To verify whether the table is created or not we can use the command Describe. If we use this command over keyspaces, it will display all the keyspaces created.
DESCRIBE keyspaces;


Creating The Table
CREATE TABLE employee_details(
id uuid,
emp_name text,
emp_id int,
emp_designation text,
emp_city text,
emp_level int,
PRIMARY KEY(id));

We need to select our primary keys properly, because they will be used to fetch the data using the select statements in where clause.

Change The Sorting Order
Cassandra gives us the option to change the default sort order of rows on the disk, but we have to do this when we create tables. To do this, we use the CLUSTERING ORDER BY clause. Cassandra does not provide a mechanism to sort query results at query time, so we have to consider sort order when creating a table. Here is an example of ordering by emp_level in descending order.

CREATE TABLE employee_details(
id uuid,
emp_name text,
emp_id int,
emp_designation text,
emp_city text,
emp_level int,
PRIMARY KEY(id))

with CLUSTERING ORDER BY (emp_level DESC)

Creating Indexes
Suppose we want tp fetch employees based on the city name, e.g:

SELECT * FROM employee_details WHERE emp_city='Munich';

We will get below error message when we execute above query (since emp_city is not the primary key).

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

The above error indicates that Cassandra cannot execute this query because it will involve filtering, or scanning through all of the rows, and that can have unpredictable performance implications.

To fetch the employees based on the city, we need to create an index. We can create an index on emp_city using the following command.

CREATE INDEX emp_city_idx ON employee_details(emp_city);

This will create a new index on employee_details tables, which will allow us to fetch employees by using emp_city in where clause.

We need to be extra careful while using secondary indexes, because they create tables which are updated along with the base table on which the index is defined. Both tables need to be updated when data is inserted/ updated, which will in turn lead to additional overhead. It is important not to use secondary indexes on columns with very low or very high numbers of unique values because they'll provide little benefit to querying, but will still incur additional overhead.

-K Himaanshu Shuklaa..

No comments:

Post a Comment