01 Apache Cassandra – Introduction Notes
1. What is Cassandra?
- Apache Cassandra is a distributed NoSQL database designed for managing large amounts of structured, semi-structured, and unstructured data.
- Originally developed at Facebook, it became an Apache Software Foundation project.
- Built for high availability, fault tolerance, scalability, and performance.
2. Key Features
- Distributed & Decentralized: No single point of failure. Every node is equal (peer-to-peer architecture).
- Highly Scalable: Scales horizontally by adding more nodes without downtime.
- Fault Tolerant: Replication and data distribution ensure resilience.
- High Write Throughput: Optimized for fast write operations.
- Schema-free (or schema-optional): Supports dynamic column families (NoSQL flexibility).
- Tunable Consistency: Lets you configure consistency levels per operation.
- Support for Multi-Data Center Replication.
3. Data Model Basics
- Inspired by Google Bigtable and Amazon Dynamo.
Data is organized as:
- Keyspace: Top-level namespace (like a database).
- Table (Column Family): Similar to an RDBMS table.
- Row: Identified by a Primary Key.
- Column: Each row can have different columns (flexible schema).
4. Core Concepts
- Partitioner: Determines which node stores a given row.
- Replica: A copy of data stored on multiple nodes.
- Consistency Level: Defines how many replicas must respond before considering a write/read successful (e.g., ONE, QUORUM, ALL).
- Hinted Handoff: Temporarily stores writes when a node is down, to be delivered later.
- Gossip Protocol: Nodes exchange state information with each other periodically.
- SSTable (Sorted String Table): Immutable files stored on disk after memtables are flushed.
5. Write & Read Path
Write Path:
- Write → Commit Log → MemTable → SSTable.
- Eventually, MemTables are flushed to disk into SSTables.
Read Path:
- Query → Check MemTable → Check Bloom Filter → Check SSTables.
6. Use Cases
- Real-time big data applications.
- Logging and time-series data.
- IoT sensor data storage.
- Messaging systems, recommendation engines.
7. Comparison to RDBMS
Feature | Cassandra | RDBMS |
---|---|---|
Schema | Flexible | Fixed (strict schema) |
Joins | Not supported | Supported |
Scalability | Horizontal | Vertical (mostly) |
ACID compliance | No (uses BASE) | Yes (ACID compliant) |
Query Language | CQL (Cassandra Query Language) | SQL |
8. Cassandra Query Language (CQL)
- Similar to SQL but with limitations.
- No joins, subqueries, or aggregate functions like in traditional SQL.
- Sample:
CREATE TABLE users (
user_id UUID PRIMARY KEY,
name TEXT,
email TEXT
);
INSERT INTO users (user_id, name, email)
VALUES (uuid(), 'John Doe', 'john@example.com');
Let me know if you'd like this in PDF format or want more details on any section like installation, data modeling, or CQL commands.