Articles


CruzDB architecture part 6: garbage collection

Feb 2018

CruzDB architecture part 5: database catalog

Feb 2018

Welcome back to our series on the architecture of CruzDB. In the previous post we discussed how afterimages and persistent pointers are managed in order to enable parallel I/O. Today’s post is short, and covers the implementation of the database catalog.

CruzDB architecture part 4: afterimage management

Feb 2018

It’s time for the fourth post in our on-going series about the architecture of CruzDB. In the previous post we took a detailed look at transaction processing and what exactly is happening when a transaction intention in the log is replayed. We saw how a commit or abort decision is made, and how a new database state is created when a transaction commits. In this post we take a look at the challenge of increasing transaction throughput by writing database snapshots in parallel.

CruzDB architecture part 3: transaction management

Feb 2018

This is the third post in a series of articles exploring the architecture of CruzDB. In the previous post we saw how the copy-on-write tree structure used in CruzDB is serialized into the log, and today we’ll examine how intentions in the log are replayed to produce new versions of the database. We’ll see how transactions are analyzed to determine if they commit or abort, and also how metadata is stored in the database itself to accelerate the conflict analysis process.

CruzDB architecture part 2: copy-on-write trees

Feb 2018

This is the second post in a series of articles examining the architecture of CruzDB. In the previous post we examined the basics of how transactions are stored in an underlying shared-log, and began to discuss the distributed operation of the database. In this post we are going to examine how the database is physically stored in the log as a serialized copy-on-write binary tree. In addition, we’ll cover some complications that appear in a distributed setting such as duplicate snapshots, and how they are handled.

CruzDB architecture part 1: it's a log-structured database

Feb 2018

CruzDB is a distributed shared-data key-value store that manages all its data in a single, high-performance distributed shared-log. It’s been one the most challenging and interesting projects I’ve hacked on, and this post is the first in a series that will explore the current implementation of the system, and critically, where things are headed.