01 Apache Cassandra – Introduction Notes

1. What is Cassandra?

Apache Cassandra is a distributed NoSQL database designed for managing large amounts of structured, semi-structured, and unstructured data.
Originally developed at Facebook, it became an Apache Software Foundation project.
Built for high availability, fault tolerance, scalability, and performance.

2. Key Features

Distributed & Decentralized: No single point of failure. Every node is equal (peer-to-peer architecture).
Highly Scalable: Scales horizontally by adding more nodes without downtime.
Fault Tolerant: Replication and data distribution ensure resilience.
High Write Throughput: Optimized for fast write operations.
Schema-free (or schema-optional): Supports dynamic column families (NoSQL flexibility).
Tunable Consistency: Lets you configure consistency levels per operation.
Support for Multi-Data Center Replication.

3. Data Model Basics

Inspired by Google Bigtable and Amazon Dynamo.
Data is organized as:
- Keyspace: Top-level namespace (like a database).
- Table (Column Family): Similar to an RDBMS table.
- Row: Identified by a Primary Key.
- Column: Each row can have different columns (flexible schema).

4. Core Concepts

Partitioner: Determines which node stores a given row.
Replica: A copy of data stored on multiple nodes.
Consistency Level: Defines how many replicas must respond before considering a write/read successful (e.g., ONE, QUORUM, ALL).
Hinted Handoff: Temporarily stores writes when a node is down, to be delivered later.
Gossip Protocol: Nodes exchange state information with each other periodically.
SSTable (Sorted String Table): Immutable files stored on disk after memtables are flushed.

5. Write & Read Path

Write Path:
- Write → Commit Log → MemTable → SSTable.
- Eventually, MemTables are flushed to disk into SSTables.
Read Path:
- Query → Check MemTable → Check Bloom Filter → Check SSTables.

6. Use Cases

Real-time big data applications.
Logging and time-series data.
IoT sensor data storage.
Messaging systems, recommendation engines.

7. Comparison to RDBMS

Feature	Cassandra	RDBMS
Schema	Flexible	Fixed (strict schema)
Joins	Not supported	Supported
Scalability	Horizontal	Vertical (mostly)
ACID compliance	No (uses BASE)	Yes (ACID compliant)
Query Language	CQL (Cassandra Query Language)	SQL

8. Cassandra Query Language (CQL)

Similar to SQL but with limitations.
No joins, subqueries, or aggregate functions like in traditional SQL.
Sample:

CREATE TABLE users (
   user_id UUID PRIMARY KEY,
   name TEXT,
   email TEXT
);

INSERT INTO users (user_id, name, email)
VALUES (uuid(), 'John Doe', 'john@example.com');

Let me know if you'd like this in PDF format or want more details on any section like installation, data modeling, or CQL commands.

Info Thread

01 Apache Cassandra – Introduction Notes

01 Apache Cassandra – Introduction Notes

1. What is Cassandra?

2. Key Features

3. Data Model Basics

4. Core Concepts

5. Write & Read Path

6. Use Cases

7. Comparison to RDBMS

8. Cassandra Query Language (CQL)

Contact Form