11 minutes read

Introduction

We have already got acquainted with NoSQL databases that use various data models to access and process data. These types of databases are ideal for data-intensive applications that require low latency and flexible data models. Today we are going to explore one of the popular NoSQL data types, the Document data model, and its peculiarities.

What is Document DB

The Document data structure consists of compound documents matching a particular key. In the application, the code data is usually represented as an object or a document in a JSON-like format. Below you can see an example of an embedded document used as a value of the key representing personal data:

{
  "name": "Elon Musk",
  "address": {
    "street": "1 Tesla Road",
    "city": "Austin",
    "state": "Texas"
  }
}

Document store DB is beneficial for developers as they can store and request data from databases using the same document model that they use in the application code. The flexibility and hierarchical nature of documents in Document store DB facilitate their development according to the needs of applications. What's more, Document databases provide indexing flexibility, standard query performance, and document set analytics.

The Document store model fits well for catalogs, user profiles, and content management systems, where each document is unique and changes over time. Popular Document databases include MongoDB, Elastic, Amazon DocumentDB, ArangoDB, Azure DocumentDB, and others.

BSON format in Document DB

BSON is the binary encoding of JSON-like documents that MongoDB uses when storing documents in collections. It consists of a list of ordered elements containing the field's name, type, and value. Field name types are usually strings. Take a look at the table with some of the BSON types below:

Type

Value

Name Details
String 2 "string"

for storing string values:

"name": "Elon"

"last name": "Musk"

32-bit Integer/

64-bit Integer

16/18 "int"/"long"

for storing integer values:

"year_of_birth": 1971

Double 1 "double"

for storing float values:

"private_assets_in_billion": 40.3

"cash_in_billion": 5.7

Array 4 "array"

this data type can store multiple fields containing different types of data stored in an array:

"creations": ["Tesla","SpaceX_Falcon","Hyperloop"]

Boolean 8 "bool" data types only take a boolean value (True or False)
Object 3 "object"

the object data type is used to store the embedded document in a MongoDB document. The object data type accepts other data types as key-value pairs.

person={"name": "Elon Musk", "year_of_birth": 1971, "private_assets_in_billion": 40.3}

ObjectId 7 "objectId" unique identifier for the document stored in DB
Timestamp 17 "timestamp" this data type can be used to set a timestamp in a document, which is very useful especially when you update databases regularly
Date 9 "date" this type returns a date object and uses the ISO date wrapper to display the field
Binary Data 5 "binData" datatype is used to represent arrays of bytes

BSON added support for data types such as binary and dates that are not supported in JSON. It is also designed in such a way that it has a comparatively faster encoding and decoding method. For example, all of these integers are stored as 32-bit integers so that they are not parsed along with and from text. So it uses more space compared to JSON for smaller integers, but BSON is much faster to parse anyway.

Below is a JSON document that includes string, integer, double, array, and object data types:

{
  "first_name": "Elon",
  "last_name": "Musk",
  "year_of_birth": 1971
  "title": "SpaceX CEO",
  "private_assets_in_billion": 40.3,
  "creations": ["Tesla","SpaceX_Falcon","Hyperloop"],
  "ex-girlfriend": {
     "name": "Grimes",
     "year_of_birth": 1988
  }
}

What is ObjectId

An Object Identifier (ObjectId) is a separate 12-byte data type that is used to identify documents. Every document must have an _id field, the field can be of any type, but is usually of type ObjectId. The ObjectId type was designed to be lightweight at the same time so that it can generate unique keys across platforms. If you create multiple object ids in quick succession, you'll notice that only the last few digits change each time. This has to do with how ObjectIds are created.

12 bytes are formed as follows:

  • The first 4 bytes are the timestamp. This information implicitly contains the creation date of the document.
  • The next 3 bytes are the machine PID, which means the machine's unique identifier, designed to prevent different machines from accidentally creating the same ObjectId. This is usually the hash of the machine.
  • To ensure uniqueness between processes, the next 2 bytes are taken from the PID of the process.
  • The remaining 3 bytes are just incrementation, which is responsible for the uniqueness of the execution in a given second, which allows us to create 16,777,216 ObjectIds per second on one machine in one process.
generation :

0|1|2|3|     4|5|6|  7|8 |9|10|11
 
Time stamp| machine |PID| Counter 

Generating Unique ID

Like in any Document store DB, in MongoDB every document forming a collection should be represented by a unique _id field that plays the role of a primary key.

Primary key in SQL DB is a parameter that is set to uniquely identify a particular record in the table.

If that _id field is missing in the inserted document field, the MongoDB driver automatically creates an ObjectId for the _id field.

Let's say you are working with Mongo DB and you want to create a unique identifier for your document using ObjectId format. How would you implement this?

You have two options:

1. Generate a random ObjectId using the code below:

RandomOId=ObjectId()

2. Define your own 12-byte id as follows:

OwnOId=ObjectId("5349b4ddd2781d08c09890f5")

If you want to convert your ObjectId to string format, you should add str as follows:

StrOId=OwnOId.str

As a result, you get string output for your 5349b4ddd2781d08c09890f5 value.

As we already know, the first 4 bytes of ObjectId include information on document creation data. However, this timestamp type is not equal to the standard Data type. Imagine that you need to extract the creation date of a particular document in ISO date format. How would you arrange that? See the code below:

DocumentCreationData=OwnOId.getTimestamp()

That is how you can get document creation data in the following format: ISODate("2022-04-24T21:49:17Z").

Conclusion

In this topic we have studied the following aspects:

  • how the document data structure is formed and what are the main benefits of the document data model
  • what is BSON format and how it is implemented in Document DB
  • what is ObjectId type and how it is formed
  • how to generate a unique id while working with Mongo DB

We hope this information will be helpful for you while working with Document databases. Let's move further and practice a little!

20 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo