Computer scienceFundamentalsEssentialsStandards and formatsEncoding standards

Base64 and other bases

6 minutes read

For historical reasons, we transfer data as text characters in some communication systems. Either they cannot transmit 8-bit binary data or cannot do it safely. Because of this, some encoding schemes were made for translating binary data to text. They are used nowadays not only in those communication systems but in a few other areas. Let's look at some of these encoding schemes, how they work and where you can use them. Let's begin with the most popular one.

Base64

In simple terms, Base64 is an encoding scheme for translating binary data to text using 64 ASCII symbols. Programming languages like Python, PHP, C#, Go, Ruby, and others also implement this encoding scheme. In Base64, we use 64 ASCII symbols to represent all possible combinations of 0s and 1s in 6 bits of binary data. Here is an index table with the 64 ASCII symbols.

base64 table

When do you need this encoding scheme? Primarily when you want to transmit 8-bit binary data by protocols designed to carry text data. The main applications are:

  • transferring binary data, like attachments in email;
  • inserting binary data such as images into text files like HTML, CSS, XML, or scripts;
  • part of encryption programs following OpenPGP protocol;
  • converting binary data to include it in URLs.

How does it work?

Let's look at the process of encoding binary using base64.

process of encoding binary using base64

  1. First, divide 8-bit binary data into groups, each containing 6 bits.
  2. Then, code each combination of 6 bits with a symbol according to the index table. Sometimes this is sufficient.
  3. However, you can see that every 3 bytes (24 bits) are encoded with 4 groups of 6 bits. If the end of the sequence is less than 3 bytes (like in the picture), fill all empty places for the 6-bit group that has some bits of binary data with zeros (the second group with added zeroes), and fill the remaining 6 empty bits with the padding symbol =.

As a result, the binary data 10100111010010101101011101010110, is encoded as p0rXVg==.

Did you notice that groups of 6-bit binary data have 64 different combinations ( 26{2^6}). That's why it has 64 symbols and has "64" in its name.

Other base*

Other binary-to-text base-encoding schemes primarily differ by the number of bits in the groups after dividing binary data. They have their own pros and cons. Here are some other encoding schemes:

  • base16 divides data into 4-bit groups and uses 16 symbols to encode them. Conveniently, this encodes one byte as 2 symbols at the same time. Because of this, it takes two bytes to encode one byte, so the space efficiency is only 50%.
  • base32 divides data into 5-bit groups and uses 32 symbols to encode them. The advantage of this scheme is the use of symbols of one register and removing similar-looking numbers like 1 and 0 that look similar to letters I and O. Though it is more space efficient than base16, it still uses more space for encoded data than base64.
  • Base85 encodes 4 bytes with 5 symbols. It treats these 4 bytes as the whole 32-bit number (4 bytes * 8 bits). This encoding scheme is more space efficient than base64. But it also contains symbols like " ' / < that have special meaning in many programming languages.

Conclusion

Base{16, 32, 64, 85} are encoding schemes for converting binary data to text data. Base64 is widely used in applications where only text data can be sent safely as mail attachments or for embedding binary files as images to text documents. Translation in the base64 scheme involves dividing 8-bit data into 6-bit groups and assigning them to their respective ASCII values according to the Base64 index table. Translation on other bases involves dividing 8-bit binary data into groups with different numbers of bits and assigning values from their own index tables.

65 learners liked this piece of theory. 4 didn't like it. What about you?
Report a typo