Computer scienceAlgorithms and Data StructuresAlgorithmsString coding algorithmsError control

Hamming code

7 minutes read

Are you tired of dealing with data transmission errors that corrupt your valuable information? Even if you don't work with data transmission on a daily basis, you've likely experienced errors in downloaded files or while copying data from one device to another. Data transmission errors are a common issue in computer networks, communication systems, and data storage. They can occur due to noise, interference, or other factors, which can lead to incorrect data being received or stored.

error in binary data

Fortunately, there is a solution to this problem: the Hamming Coding algorithm, a technique that is widely used for error detection and correction. In this topic, we will learn how to compute the hamming code, and in the following topic, this code shall be used to detect errors and fix them. Let's get down to business!

Compute Hamming code

Let's take a closer look at the algorithm itself. The idea is to add a set of new bits, also called parity bits, to detect and correct errors in the data. Each parity bit covers a specific set of data bits. The $i$ -th parity bit covers bit positions in the following way (denote $k = 2^{i-1}$ ):

Start from position $k$ .
Include $k$ consecutive bits.
Skip the next $k$ consecutive bits.
Repeat steps 2 and 3 until the end of the string is reached.

The picture below illustrates the process for the first four parity bits. Pretty intuitive and simple, isn't it?

covering sets

Formally, this is saying that a bit position, say

m

, belongs to the set of

i

-th parity bit if and only if

\text{binary}(m)\; \& \; \text{binary}(2^i) = \text{binary}(2^i)

, where binary(n) returns the binary representative of n. Feel free to think about it in your free time...

In order to create a Hamming Code of the data given as a binary string, the algorithm follows these steps:

Identify the number of parity bits required. This is done by finding the smallest number $r$ such that $2^r \geq n + r + 1$ , where $n$ is the number of data bits.
Assign each data bit to a position in the code word. The data bits are placed in positions that are not powers of $2$ in the same order they are in the original word, just like in the picture below. The remaining positions are reserved for the parity bits and initialized equal to zero.
Calculate the value of each parity bit. Check the number of $1$ s (ones) in the set, covered by that parity bit: if it is odd, set the value of this parity bit equal to $1$ , otherwise set is $0$ .
Insert the parity bits into their positions in the code word to create the final message. The resulting code word, which is known as Hamming code, is transmitted or stored.

Later on, this code is stored or transmitted in order to check if there are any errors. How is this done? Be patient until the next topic.

Example: 1100101

Too abstract so far? Let's get our hands dirty and encode some string ourselves. A string of length seven will do, so let's decode the string $data = 1100101$ . It should be easy if we follow the steps above:

First, we have to find the number of parity bits needed. In our case, we have a string of length $7$ , hence $n=7$ and we need to find the smallest $r$ , such that $2^r \geq r + 8$ . Some simple calculations would show that $1, 2, 3$ won't do it (try it!), while for $r=4$ we get $16 = 2^4 \geq 4 + 8 = 12$ .
Next, we position the data bits and put placeholders equal to zero for parity bits, as shown below:
It's time to calculate the parity bit values. We will perform the calculations in detail for $p_1$ and $p_4$ , while you are advised to check for $p_2$ and $p_3$ . Recall that we denote by $p_i$ the $i$ -th parity bit.
1. As for the first parity bit, we have the following set covered by it (the red cells in the picture below). It should not be difficult to count, that there are $4$ ones in that set, i.e. an even number. Consequently, we set the value of the first parity bit equal to $0$ : $p_1 = 0$ .
2. Similarly, $p_2 = 0$ and $p_3 = 1$ .
3. Last but not least, we have the situation below for the fourth parity bit. We can see that there are 2 ones in the red zone, which is an even number. Hence, $p_4 = 0$ .
Finally, we have all our parity values, and we can write the Hamming code for our string $1100101$ in the format $p_1p_2d_1p_3d_2d_3d_4p_4d_5d_6d_7$ . We have computed all the values, so the encoded string will be $00111000101$ .

Time complexity and pseudocode

Some of us find it difficult to turn words into code. To help with this, here is a pseudocode on Hamming encoding process:

// data = the binary string to be encoded

function compute_parity_bit(data, p_idx):
    count = 0
    for i in [p_idx, len(data)]:
        if i & p_idx == p_idx then             // the set covered by i-th parity bit
            if data[i] == '1' then             // count the number of 1's covered by the parity bit
                count = count + 1
    return '1' if count mod 2 == 1 else '0'    // p_i = 1 if the number of ones is odd, 0 otherwise


function hamming_encode(data):
    n = len(data)                              // number of data bits
    r = min r s.t. 2^r >= n + r + 1            // number of parity bits
    j = 0                                      // index of current data bit
    // placing bits
    for i in [1, n + r]:
        if i is a power of 2 then
            code = code + '0'                  // add a placeholder for the parity bit
        else:
            code = code + data[j]              // add the next data bit
            j = j + 1
    // computing parity bits
    for i in [0, r - 1]:
        p_idx = 2^i                            // calculate the index of the parity bit
        p_i = compute_parity_bit(code, p_idx)  // calculate the parity bit value
        code[p_idx] = p_i                      // replace the placeholder with the parity bit value
    return code

Now, let's study the time complexity of this process, analyzing one step at a time:

We can find the lowest number $r$ satisfying the condition, by using binary search, for example. Hmmm, this is a good moment to refresh and test your knowledge on binary search. Recall that the time complexity of this process would be $O(\log n)$ , since $r$ can't be greater than $n$ . Indeed, $2^n$ is always greater than $2n+1$ for large $n$ .
Simply positioning the data bits requires linear time.
Computing the value of one parity bit would require linear time since we scan all the characters in the string. However, we have $r$ parity bits, whose value we have to calculate. In total, we get $O(n\cdot r)$ .
The last step consists of positioning parity bits, which are linear as well.

Gathering all together, we see that the third step outweighs the other steps, hence the time complexity of the encoding algorithm is $O(n\cdot r)$ .

It is worth noting, that

r

is calculated to be approximately

\log n

, hence this algorithm's time complexity would be

O(n\log n)

Conclusion

Now it's time to recap the main information on this topic. Hamming coding is a technique that is widely used for error detection and correction, consisting of encoding, which we mentioned in this topic, and decoding, which is to be discussed in the next topic. The encoding process consists of 4 main steps:

Identify the number of parity bits required.
Assign each data bit to a position in the code word.
Calculate the value of each parity bit.
Append the parity bits to the code word to create the final message.

The time complexity of such an algorithm is estimated to be $O(n\log n)$ . Now that you have a clear idea of how the algorithm works, let's practice with some tasks.

5 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo

Hamming code

Compute Hamming code

Example: 1100101

Time complexity and pseudocode

Conclusion

Related topics