02 Apache Cassandra Architecture Notes
1. Peer-to-Peer Architecture
- No master/slave — All nodes are equal (peer nodes).
- Every node can handle read/write requests.
- Nodes communicate via the Gossip Protocol to share state and topology information.
2. Key Components of Cassandra Architecture
Component |
Description |
Node |
Basic unit in Cassandra — stores and manages data. |
Cluster |
Collection of connected nodes. |
Data Center |
Logical grouping of nodes (used for replication and geographic distribution). |
Commit Log |
Every write is first recorded in the commit log for durability. |
MemTable |
In-memory data structure where data is temporarily stored after writing. |
SSTable |
Immutable data file created from MemTables when flushed to disk. |
Bloom Filter |
Probabilistic structure to check if data might exist in an SSTable. |
Gossip Protocol |
A peer-to-peer communication protocol for nodes to exchange info. |
3. Data Distribution and Partitioning
- Partitioner decides which node stores which data.
- Default: Murmur3Partitioner (hash-based).
- Consistent Hashing is used to distribute data across the ring (cluster).
- Virtual Nodes (vnodes): Each node owns multiple token ranges (helps with balanced data distribution and easier scaling).
4. Replication
- Each piece of data is replicated on multiple nodes.
- Defined in Keyspace using a replication strategy:
SimpleStrategy
: Best for single data center.
NetworkTopologyStrategy
: Best for multiple data centers.
- Replication Factor (RF): Number of copies of each piece of data.
5. Consistency and Availability
- Cassandra uses Tunable Consistency:
- You can specify consistency level per query:
ONE
, QUORUM
, ALL
, etc.
- Uses BASE model:
- Basically Available, Soft state, Eventually consistent.
6. Read Path
- Client sends read request to any node (Coordinator Node).
- Coordinator checks:
- MemTable (in-memory),
- Bloom Filter,
- SSTables on disk.
- Coordinator gathers data from replicas based on consistency level.
- Read Repair and Hinted Handoff may occur to ensure consistency.
7. Write Path
- Write sent to any node (Coordinator Node).
- Data is written to:
- Commit Log (for durability),
- MemTable (cached in memory).
- Once full, MemTable is flushed to disk as SSTable.
- Acknowledgement sent to client after desired replicas receive data.
8. Internal Processes
- Compaction: Merges multiple SSTables into a single one to optimize reads and reclaim space.
- Hinted Handoff: Temporary storage of missed writes for a failed node.
- Anti-Entropy Repair: Ensures data consistency across replicas using Merkle trees.
- Snitch: Determines data center and rack location for a node (used in replica placement).
Diagram (Text Version)
+-------------+ +-------------+
| Client | | Client |
+------+------+\ /+------+------+
| |
v v
+----------------------+
| Coordinator Node |
+----------------------+
| |
Write to RF nodes (Replica Nodes)
| |
+------+------+
| |
+------+ +--------+
| Node 1| | Node 2 |
+------+ +--------+