By now you should be familiar with a relational data model, a set of tuples, in other words, a two-dimensional table. Accordingly, such a system has two storage options: row-wise or column-wise. Both of these data storage types have their specifics and pros and cons of application. A row-oriented management system is a traditional one, that was commonly used in most popular relational databases for the last decades. Let's explore the basic principles of a row database to find out in which cases it is a more efficient way of storing data and when it is better to choose it over the columnar option.
Definition of a Row database
A row-oriented database is a database management system that arranges data records by rows, storing them close to one another in memory. In other words, data is placed horizontally in a table, forming a straight line.
Row-oriented storage system implies working only with one 'row' record at a time. If you want to read that line, you get all columns of that row at once.
Let's consider an example with a table named Travel destinations:
| id | Country | Tour price |
|---|---|---|
| 1 | China | 2000 USD |
| 2 | USA | 4000 USD |
| 3 | Australia | 5500 USD |
In a row-oriented database this table will be stored on a disk as follows:
| 1 | China | 2000 USD | 2 | USA | 4000 USD | 3 | Australia | 5500 USD |
Let's say we want to add a new country to our list:
| 4 | Argentina | 4700 USD |
This operation is pretty simple. We just have to append this record to the existing line:
| 1 | China | 2000 USD | 2 | USA | 4000 USD | 3 | Australia | 5500 USD | 4 | Argentina | 4700 USD |
Data extraction in row DBMS is also fast and simple when it comes to getting data from one or several rows.
Let's say we want to get information on China from our table of travel destinations. Since we do not need to get data (columns) on other countries, we will read only this line and get all the necessary columns at once.
| 1 | China | 2000 USD |
However, if we want to aggregate the data stored in our row RDBS, that would require additional disk space and processing time.
How so? Here is an example. Imagine that we want to figure out the average price of all our four destinations. In order to perform this we have to load all twelve data cells into memory in order to pull out the price data for aggregation. In case we have our data stored in columns, we need to process with four cells of price column only. The more data we have in our row DBMS, the more time- and space-consuming its processing is.
As you can see, a row-oriented structure is especially beneficial for those applications, where you want to write (insert, update) or extract one line at a time. However, it has certain limitations when working with huge amounts of data.
Benefits of Row databases
- Simplicity of inserting and deleting data. Row-oriented DBs have a simple structure hence records are easy to read and write.
- Best option for online transactions. Row-oriented formats are best suited for streaming recording because in case of failure the information can be restored (resynchronized) from the last synchronization point.
- Support of atomic transactions. What's this? Suppose I want to transfer X dollars from Mike to Jane. I want to do 3 things: decrease Mike's balance by X, increase Jane's balance by X, and document the transaction. I can designate these actions as an atomic unit of the database—all actions will happen, or none. This protects against crash errors.
Disadvantages of Row databases
- Time-consuming data processing. Even if only a few values from a string need to be retrieved, we still have to read the entire string from the disk into memory. This results in slower reads and selective queries, as well as higher disk space consumption.
- Insufficient compression. Standard mechanisms of compression in row-oriented DBs are not as efficient as those applied to similar data in column-oriented data stores.
- Not suitable for data analytics. Because of time- and space-consuming operations and low compression while data processing, row-oriented DBs are not efficient for big data. In other words, they could not be used for analytical processing (OLAP) applications.
Most popular row databases
Row-oriented databases are considered to be conventional and are typically transactional, aimed at handling a large number of transactions with less data volume. Row DBMS are mainly used in OLTP systems (Online Transaction Processing), which imply transaction-related applications that use a large number of short, atomic, database operations (insert, update, delete). Those applications include banking, e-commerce, and other day-to-day apps. Below you can see a list of popular row DBMS:
- MySQL is an open-source relational database management system. Currently, this DBMS is one of the most popular in web applications. The vast majority of content management systems use MySQL, and almost all web frameworks support MySQL already at the basic configuration level.
- PostgreSQL is an open-source free database, dynamic and extensible. Its application ranges from small applications to data warehouses.
- Oracle is a database commonly used to perform online transaction processing (OLTP), data warehouse (DW), and mixed (OLTP and DW) database workloads.
Conclusion
In this topic, we've studied the row DBMS (row-oriented) structure and discovered its pros and cons. We've figured out that row databases could be the best solution for transactional processing (OLTP) applications, where fast inserting and deleting of data is crucial. Whereas if you need to proceed with more complex queries and aggregations, or you need increased compression and less space to store data, you'd better choose another option, for instance, columnar DBMS.
Let's move on to some practical tasks on row-oriented databases. Are you ready to test your knowledge?