Distributed SQL with cockroach DB — Architectural Overview

1. Introduction

Cockroach DB is a key-value data store with an SQL frontend with the following features

· Dynamically distributed data.

· Distributed transactions

· Geo partitioning

· Fault-tolerant and resilient.

SQL Engine

Cockroach DB supports ANSI SQL as the query language to interact with the underlying key-value store.

The data is viewed as relational table formats like rows and columns. A set of tables form a database and a cluster can have multiple databases inside it.

SQL queries reach your cluster through the PostgreSQL wire protocol. This makes connecting your application to the cluster simple by supporting most PostgreSQL-compatible drivers, as well as many PostgreSQL ORMs, such as GORM (Go) and Hibernate (Java).

Components of the physical plan are sent to one or more nodes for execution. On each node, CockroachDB spawns a logical processor to compute a part of the query. Logical processors inside or across nodes communicate with each other over a logical flow of data. The combined results of the query are sent back to the first node where the query was received, to be sent further to the SQL client.

Data distribution and Storage

Key-Value Pairs

To make all the data available across all the nodes cockroach DB stores the data in monolithic sorted map of key value pairs. This key space contains all the information of your database, tables, indexes and it’s locations.

Below is a comparison diagram of how the data is indexed in cockroach DB vs traditional RDBMS.

Ranges and System Key Spaces

The key spaces are split into contiguous chunks of 512MB of size called ranges. There will be multiple copies of these ranges stored across nodes in a cockroach DB cluster.

The monolithic sorted map is comprised of 2 fundamental elements

System Key Space

Table Key Space

High Availability and Raft Consensus

To make cockroach DB fault-tolerant, cockroach DB maintains multiple copies of range across the different nodes.

To achieve this cockroach DB follows Raft consensus algorithms which require a quorum of replicas to agree on any changes to the range before those changes are committed.

Because 3 is the smallest number that can achieve quorum (i.e., 2 out of 3), CockroachDB’s high availability (known as multi-active availability) requires 3 nodes.

The number of failures that can be tolerated is (replication factor -1)/2. This means for 1 node to go offline, the replication factor should be 3, similarly for 2 nodes to go offline, the replication factor should be 5.

One can control the parameter replication factor at multiple levels like tables, databases, and clusters.

For a better understanding of the raft consensus algorithm please follow the below link

http://thesecretlivesofdata.com/raft/

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store