Column Oriented Database

TL;DR

A column-oriented database is faster at reading data than a row-oriented database

When I think about reading some data from my database, I usually think about how to access a row. However, I have come across the concept of a column-oriented database, which means that a table is organized by its columns rather than rows. Interesting. and I’ve never heard of it, so I decided to dig into this a little bit.

Databases like PostgreSQL, MySQL, and MariaDB are row-oriented databases, which is probably why I usually think about accessing rows in a table. Databases like BigQuery are known to be column-oriented. I have been told that BigQuery is fast, so I guess column-oriented databases are fast.

Let’s look into a row-oriented database first. and let’s think about a table that looks like this

Name	City	Age
John	New York	33
James	Seoul	27
David	London	27

The data above is stored in a disk like below Row data stored in a disk

Let’s say you would like to insert some data into the table like this. It won’t be that big of a problem because it would look like this.

Name	City	Age
John	New York	33
James	Seoul	27
David	London	27
Paul	Chicago	22

And if you were to add that row into a disk, you just gotta put it at the end of your original data.

A row added into a row-oriented database

Reading from a row-oriented database isn’t too bad. but it actually allocates more memory when you try to aggregate some data. Let’s say you want to get the average age of all the people from the table. Then you would have to access the entire row. And it gets worse if your disk is too small so that a disk can hold only one row. That means that you have to read from every single disk to calculate the age

a row-oriented database split into several disks

Now let’s talk about a column-oriented database. The very first table you saw would look like this in a disk.

Row data stored in a disk

This means that adding data to the table could be inefficient because it would look like this.

data added into a disk from a column-oriented database

But it’s actually not a problem if your disk holds only one column. Instead of accessing every single disk that you did when the row-oriented case, you just have to access only one disk like this.

a column-oriented database split into several disks

And in this case, if you were to add some data, you just have to append new data at the end of each disk.

Reading some specific person’s data from a column-oriented database is also. fast because the columns are indexed.

Data added to a column-oriented database

May 24, 2022

Database

AI Enthusiast and a Software EngineerJason Kang on LinkedIn