02 Apache Cassandra Architecture Notes

02 Apache Cassandra Architecture Notes


1. Peer-to-Peer Architecture

  • No master/slave — All nodes are equal (peer nodes).
  • Every node can handle read/write requests.
  • Nodes communicate via the Gossip Protocol to share state and topology information.

2. Key Components of Cassandra Architecture

Component Description
Node Basic unit in Cassandra — stores and manages data.
Cluster Collection of connected nodes.
Data Center Logical grouping of nodes (used for replication and geographic distribution).
Commit Log Every write is first recorded in the commit log for durability.
MemTable In-memory data structure where data is temporarily stored after writing.
SSTable Immutable data file created from MemTables when flushed to disk.
Bloom Filter Probabilistic structure to check if data might exist in an SSTable.
Gossip Protocol A peer-to-peer communication protocol for nodes to exchange info.

3. Data Distribution and Partitioning

  • Partitioner decides which node stores which data.
    • Default: Murmur3Partitioner (hash-based).
  • Consistent Hashing is used to distribute data across the ring (cluster).
  • Virtual Nodes (vnodes): Each node owns multiple token ranges (helps with balanced data distribution and easier scaling).

4. Replication

  • Each piece of data is replicated on multiple nodes.
  • Defined in Keyspace using a replication strategy:
    • SimpleStrategy: Best for single data center.
    • NetworkTopologyStrategy: Best for multiple data centers.
  • Replication Factor (RF): Number of copies of each piece of data.

5. Consistency and Availability

  • Cassandra uses Tunable Consistency:
    • You can specify consistency level per query: ONE, QUORUM, ALL, etc.
  • Uses BASE model:
    • Basically Available, Soft state, Eventually consistent.

6. Read Path

  1. Client sends read request to any node (Coordinator Node).
  2. Coordinator checks:
    • MemTable (in-memory),
    • Bloom Filter,
    • SSTables on disk.
  3. Coordinator gathers data from replicas based on consistency level.
  4. Read Repair and Hinted Handoff may occur to ensure consistency.

7. Write Path

  1. Write sent to any node (Coordinator Node).
  2. Data is written to:
    • Commit Log (for durability),
    • MemTable (cached in memory).
  3. Once full, MemTable is flushed to disk as SSTable.
  4. Acknowledgement sent to client after desired replicas receive data.

8. Internal Processes

  • Compaction: Merges multiple SSTables into a single one to optimize reads and reclaim space.
  • Hinted Handoff: Temporary storage of missed writes for a failed node.
  • Anti-Entropy Repair: Ensures data consistency across replicas using Merkle trees.
  • Snitch: Determines data center and rack location for a node (used in replica placement).

Diagram (Text Version)

+-------------+        +-------------+
|  Client     |        |   Client    |
+------+------+\      /+------+------+
       |             |
       v             v
   +----------------------+
   |   Coordinator Node   |
   +----------------------+
          |      |
   Write to RF nodes (Replica Nodes)
          |      |
   +------+------+
   |             |
+------+     +--------+
| Node 1|     | Node 2 |
+------+     +--------+