Top Apache Cassandra Interview Questions You Must Prepare In 2024

Last updated on Mar 18,2024 36.2K Views
Kurt is a Big Data and Data Science Expert, working as a... Kurt is a Big Data and Data Science Expert, working as a Research Analyst at Edureka. He is keen to work with Machine Learning,...

Top Apache Cassandra Interview Questions You Must Prepare In 2024

edureka.co

Schema-less databases are the latest buzzword in the IT world. Geek programmers seem to love the flexibility and low cost and these attributes have fired up many a start-up. NoSQL database is schema Agnostic:  Information can be stored without doing any upfront schema designing. So with so much demand in the industry for NoSQL, let’s have a look at the Top Cassandra Interview Questions you must know if you are going to apply for a NoSQL Database Developer or a NoSQL Database Administrator. You can even check out the details of relational databases, functions, queries, variables, etc with the SQL Course.

As you can see the Salary trend for people having Cassandra Experience, it is quite high. So Let’s begin with the Cassandra Interview Questions

I’ve divided this blog of Cassandra Interview Questions in 3 Parts:

General NoSQL Interview Questions

1. What are the key features of any NoSQL Database?

Features of NoSQL Database

FeatureDescription
Schema AgnosticInformation can be stored without doing any upfront schema design
Auto-Sharding & ElasticNoSQL allows the workload to automatically spread across any number of servers
Highly DistributableA cluster of servers can be used to hold a single large database.
Easily ScalableAllows easy scaling to adapt to the data volume and complexity of cloud applications
Integrated CachingCached data in system memory is transparent to the application developers & operations team.

2. What is a NoSQL Database?

3. What are the different types of NoSQL Databases?

There are majorly 4 types of NoSQL Databases,

4. What is Key-Value Store DB? Explain with an example.

All of the data within database consists of an indexed key and a value. A key may correspond to one or multiple values (hash table). Provides a great performance and can be very easily scaled as per business needs.

5. What is Document Store DB? Explain with an example.

The data record is the JSON/XML representation of key-value pairs. Every record can have a different set of fields.
Document DBs are similar to Key-value pairs, But the difference is that the key is associated with a document

6. What is Column Store DB? Explain with an example.

Data is stored in cells are grouped in columns of data rather than as rows of data. Columns are logically grouped into column families.
One row may have one or multiple data records, which is indexed by a partition key.

7. What is Graph DB? Explain with an example.

The type of NoSQL database in which a flexible graphical representation is used. The key purpose is to store relationships between nodes.

Here, Nodes are Id 1, 2 and 3. Properties for Node 1 are Name and Age
Edges are : Id 100, 101, 102, 103, 104 and 105

Beginners Cassandra Interview Questions

8. What is Apache Cassandra?

Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

9. What are the features of Apache Cassandra?

Apache Cassandra has a lot of features, some of them which make it stand out of crowd are:

10. What are the Different types of Data Model?

There are majorly 3 types/stages of Data Model

Cassandra Interview Questions

11. What are the Key Differences between Cassandra and Traditional RDBMS?

12. What are the different Database Elements of Cassandra?

There are 4 main Cassandra Database Elements:

13. What is CQLSH? And why is it used?

Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things:

14. What is a YAML file in Cassandra?

The cassandra.yaml file is the main configuration file for Cassandra. After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.

15. What are Clusters in Cassandra?

The outermost structure in Cassandra is the cluster. A cluster is a container for Keyspaces
Sometimes called the ring, because Cassandra assigns data to nodes in the cluster by arranging them in a ring
A node holds a replica for a different range of data.

16. What is a Keyspace in Cassandra?

A keyspace is the outermost container for data in Cassandra. Like a relational database, a keyspace has a name and a set of attributes that define keyspace-wide behaviour. The keyspace is used to group Column families together.

17. How is a Keyspace created in Cassandra? & What are the parameters used?

CREATE KEYSPACE ABC
WITH replication = { ‘class ’: ‘SimpleStrategy’, ‘replication_factor’: ‘3’}
AND durable_writes = ‘TRUE’;

The parameters used while creating a keyspace are:

18. What are durable writes?

Durable Writes provides a means to instruct Cassandra whether to use commitlog for updates on the current KeySpace or not.
This option is not mandatory. The default value for durable writes is TRUE.

19. What do you mean by replication factor?

Cassandra stores copies (called replicas) of each row based on the row key. The replication factor refers to the number of nodes that will act as copies (replicas) of each row of data.

20. What do you mean by replication Strategy?

The replica placement strategy refers to how the replicas will be placed in the ring
There are different strategies that ship with Cassandra for determining which nodes will get copies of which keys
There are mainly two types of Strategies:

21. What is Simple Strategy?

It uses Simple Single Datacenter Clusters. It places the first Replica on a node determined by the Partitioner. Additional Replicas are placed on the next nodes in clockwise (in a Ring) manner without considering Rack or Datacenter location.

22. What is Network Topology Strategy?

This is used when we deploy a cluster across Multiple Datacenters. It is the primary consideration to insert replicas. Can satisfy reads, locally without incurring cross Data-Center Latency and also Handle Failure Scenarios.

23. What is a Column Family?

A column family is a container for an ordered collection of rows, each of which is itself an ordered collection of columns. We can freely add any column to any column family at any time, depending on your needs. The comparator value indicates how columns will be sorted when they are returned to you in a query.

24. What is a Row in Cassandra? and What are the different elements of it?

A row is a collection of sorted columns. It is the smallest unit that stores related data in Cassandra. Any component of a Row can store data or metadata

The different elements/parts of a row are the

25. What is a Primary Key? And what are it’s different types?

The Primary Key is a column that is used to uniquely identify a row

There are 3 types of Primary Keys:

These were some Beginner Level Cassandra Interview Questions, you must know about.

So, let’s move ahead with some Advance Cassandra Interview Questions

Advance Cassandra Interview Questions

26. Differentiate between the various types of Primary Keys in Cassandra.

The column is also called partitioning key. Data is partitioned on the basis of that column. Data is spread on different nodes on the basis of the partition key.

race_name is the partitioning key and race_position is the Clustering key. Data will be partitioned on the basis of race_name and data will be clustered on the basis of race_position. Clustering is the process that sorts data in the partition. Retrieval of rows is very efficient when rows for a partition key are stored in order, based on the clustering column.

race_year and race_name are the composite partition key and data will be partitioned on the basis of both columns. Data will be clustered on the basis of the rank. It is used when too much data is present on the single partition.

27. Differentiate between Static and Dynamic CQL Tables.

  1. A Static Table uses a relatively static set of column names and is similar to Relational Database Table.
  2. A dynamic table allows you to pre-compute result sets and stores them in a single row for efficient data retrieval.

28. Differentiate between Drop and Truncate in CQLSH

  1. The Drop table command drops specified table including all the data from the keyspace.
  2. The Truncate table command is used to truncate a table and deletes all the rows of the table permanently.

29. What is Gossip Protocol?

Gossip Protocol in Cassandra is a peer-to-peer communication protocol in which nodes can choose among themselves with whom they want to exchange their state information. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster.

30. How does gossip Protocol Work?

31. How does gossip Protocol help in Failure Detection?

The process of Acknowledging messages helps in failure detection. When a node is down/failing it is unable to send or receive messages and hence the Acknowledgements are not received.

32. What are partitions and Tokens in Cassandra?

33. What are the different types of Partitioners in Cassandra? Explain.

64-bit hash value partition key with Range: 263 to 263-1

It uses MD5 hash values with Range: 0 to 2127-1

34. What do you mean by Snitch? Name a few

A snitch determines which datacenters and racks, nodes belong to. They inform Cassandra about the network topology and allows Cassandra to distribute replicas specifically, the Replication strategy places the replicas based on the information provided by the new snitch.

There are many types of snitches, to name a few:

35. How does Cassandra perform write operations?

When write request comes to the node:

All writes are automatically partitioned and replicated throughout the cluster Cassandra periodically consolidates the SSTables, discarding unnecessary data.

36. Explain the terms Memtable, CommitLog and SSTables.

37. What is the use of Coordinator Node in Read?

Read Operation is easy because clients can connect to any node in the cluster to perform reads. If a client connects to a node that doesn’t have the data it’s trying to read, the node it’s connected to will act as the coordinator node.

38. How does Cassandra perform Read operation? Explain

39. What do you mean by Compaction?

It is the process of freeing up space by merging largely accumulated datafiles. It improves performance by reducing the number of required seeks.

40. What is Anti-Entropy and How is it associated with Merkel Tree?

Anti-entropy is the replica synchronization mechanism, ensuring that data on different nodes is updated to the newest version
Cassandra uses Merkle tree for anti-entropy repair. A Merkel Tree is a hash tree where leaves are hashes of the values of individual keys.

41. Explain the different types of Repairs.

Anti-entropy repair is very useful and is often recommended to run periodically to keep data in sync.

42. What is Hinted Handoff?

Hinted Handoff is a mechanism to ensure availability, fault-tolerance and graceful degradation in Cassandra. The node that receives the hint will know when the unavailable node comes back online again, because of Gossip.

43. What do you mean by Logging in Cassandra?

Logs are written to the system.log and debug.log file in the Cassandra logging directory
We can configure logging programmatically or manually. The simplest way to get a picture of what’s happening in your database is to just change the logging level to make the output more verbose, by default it is set at INFO.

44. Explain the different Logging levels in Cassandra.

45: What is JMX? And How is it useful in Cassandra?

JMX (Java Management Extension) is a Java technology that supplies tools for managing and monitoring Java applications and services. Cassandra makes use of JMX to enable remote management of the servers.

46. What are snapshots and how do you create one in Cassandra?

Snapshot represents the state of the data files at a particular point in time. Snapshot command is used while taking a backup and creates hard links for SSTables in the snapshots folder which can later be used to restore the node,

47. Why is JConsole used? What is it’s different elements?

JConsole is used to Monitor and perform analysis on the Server activities. Once you’ve connected to a server, the default view includes four major categories about your server’s state, which are updated constantly:

48. Explain Nodetool Utility.

The Nodetool Utility is a command-line utility that comes out of the box with Cassandra and is a great tool for administration and monitoring. It communicates with JMX to perform operational and monitoring tasks exposed by MBeans.

49. What are Roles in CQLSH?

Roles enable authorization management on a larger scale than security per user can provide. A role is created and may be granted to other roles. Hierarchical sets of permissions can be created with the help of it.

50. What is Python Stress test in Cassandra?

Cassandra comes with a popular utility called py_stress that can be used to run a stress test on Cassandra cluster. The Cassandra-stress tool is a Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster. This is an effective tool for populating a cluster and stress testing CQL tables and queries.

So, I hope these Cassandra Interview Questions helped you to brush up your knowledge of Apache Cassandra.

Got a question for us? Please mention it in the comments section and we will get back to you at the earliest.

If you wish to build a career in the domain of Cassandra and gain expertise in NoSQL Databases, get enrolled in live-online Edureka Apache Cassandra Certification Training here, that comes with 24*7 support to guide you throughout your learning period.

BROWSE COURSES
REGISTER FOR FREE WEBINAR Analyzing Customer-Product Relationships for Business Growth with Tableau