Distributed SQL with cockroach DB — Architectural Overview
Cockroach DB is a key-value data store with an SQL frontend with the following features
· Dynamically distributed data.
· Distributed transactions
· Geo partitioning
· Fault-tolerant and resilient.
Cockroach DB supports ANSI SQL as the query language to interact with the underlying key-value store.
The data is viewed as relational table formats like rows and columns. A set of tables form a database and a cluster can have multiple databases inside it.
SQL queries reach your cluster through the PostgreSQL wire protocol. This makes connecting your application to the cluster simple by supporting most PostgreSQL-compatible drivers, as well as many PostgreSQL ORMs, such as GORM (Go) and Hibernate (Java).
Components of the physical plan are sent to one or more nodes for execution. On each node, CockroachDB spawns a logical processor to compute a part of the query. Logical processors inside or across nodes communicate with each other over a logical flow of data. The combined results of the query are sent back to the first node where the query was received, to be sent further to the SQL client.
Data distribution and Storage
To make all the data available across all the nodes cockroach DB stores the data in monolithic sorted map of key value pairs. This key space contains all the information of your database, tables, indexes and it’s locations.
Below is a comparison diagram of how the data is indexed in cockroach DB vs traditional RDBMS.
Ranges and System Key Spaces
The key spaces are split into contiguous chunks of 512MB of size called ranges. There will be multiple copies of these ranges stored across nodes in a cockroach DB cluster.
The monolithic sorted map is comprised of 2 fundamental elements
System Key Space
- System data, which include meta ranges that describe the locations of data in your cluster (among many other cluster-wide and local data elements).
- Meta ranges are treated mostly like normal ranges and are accessed and replicated just like other elements of your cluster’s KV data.
Table Key Space
- User data, which stores your cluster’s table data.
- Each table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the primary index because the table is sorted by the primary key) or a single row in a secondary index. As soon as a range reaches 512 MiB in size, it splits into two ranges. This process continues as a table and its indexes continue growing.
- Once a table is split across multiple ranges, it’s likely that the table and secondary indexes will be stored in separate ranges. However, a range can still contain data for both the table and a secondary index.
High Availability and Raft Consensus
To make cockroach DB fault-tolerant, cockroach DB maintains multiple copies of range across the different nodes.
To achieve this cockroach DB follows Raft consensus algorithms which require a quorum of replicas to agree on any changes to the range before those changes are committed.
Because 3 is the smallest number that can achieve quorum (i.e., 2 out of 3), CockroachDB’s high availability (known as multi-active availability) requires 3 nodes.
The number of failures that can be tolerated is (replication factor -1)/2. This means for 1 node to go offline, the replication factor should be 3, similarly for 2 nodes to go offline, the replication factor should be 5.
One can control the parameter replication factor at multiple levels like tables, databases, and clusters.
For a better understanding of the raft consensus algorithm please follow the below link