Computer scienceAlgorithms and Data StructuresAlgorithmsString coding algorithmsError control

Hamming code

8 minutes read

Are you tired of dealing with data transmission errors that corrupt your valuable information? Even if you don't work with data transmission on a daily basis, you've likely experienced errors in downloaded files or while copying data from one device to another. Data transmission errors are a common issue in computer networks, communication systems, and data storage. They can occur due to noise, interference, or other factors, which can lead to incorrect data being received or stored.

error in binary data

Fortunately, there is a solution to this problem: the Hamming Coding algorithm, a technique that is widely used for error detection and correction. In this topic, we will learn how to compute the hamming code, and in the following topic, this code shall be used to detect errors and fix them. Let's get down to business!

Compute Hamming code

Let's take a closer look at the algorithm itself. The idea is to add a set of new bits, also called parity bits, to detect and correct errors in the data. Each parity bit covers a specific set of data bits. The ii-th parity bit covers bit positions in the following way (denote k=2i1k = 2^{i-1}):

  1. Start from position kk.
  2. Include kk consecutive bits.
  3. Skip the next kk consecutive bits.
  4. Repeat steps 2 and 3 until the end of the string is reached.

The picture below illustrates the process for the first four parity bits. Pretty intuitive and simple, isn't it?

covering sets

Formally, this is saying that a bit position, say mm, belongs to the set of ii-th parity bit if and only if binary(m)  &  binary(2i)=binary(2i)\text{binary}(m)\; \& \; \text{binary}(2^i) = \text{binary}(2^i), where binary(n) returns the binary representative of n. Feel free to think about it in your free time...

In order to create a Hamming Code of the data given as a binary string, the algorithm follows these steps:

  1. Identify the number of parity bits required. This is done by finding the smallest number rr such that 2rn+r+12^r \geq n + r + 1, where nn is the number of data bits.

  2. Assign each data bit to a position in the code word. The data bits are placed in positions that are not powers of 22 in the same order they are in the original word, just like in the picture below. The remaining positions are reserved for the parity bits and initialized equal to zero.

    positioning bits

  3. Calculate the value of each parity bit. Check the number of 11s (ones) in the set, covered by that parity bit: if it is odd, set the value of this parity bit equal to 11, otherwise set is 00.

  4. Insert the parity bits into their positions in the code word to create the final message. The resulting code word, which is known as Hamming code, is transmitted or stored.

Later on, this code is stored or transmitted in order to check if there are any errors. How is this done? Be patient until the next topic.

Example: 1100101

Too abstract so far? Let's get our hands dirty and encode some string ourselves. A string of length seven will do, so let's decode the string data=1100101data = 1100101. It should be easy if we follow the steps above:

  1. First, we have to find the number of parity bits needed. In our case, we have a string of length 77, hence n=7n=7 and we need to find the smallest rr, such that 2rr+82^r \geq r + 8. Some simple calculations would show that 1,2,31, 2, 3 won't do it (try it!), while for r=4r=4 we get 16=244+8=1216 = 2^4 \geq 4 + 8 = 12.
  2. Next, we position the data bits and put placeholders equal to zero for parity bits, as shown below:

    bits positioned

  3. It's time to calculate the parity bit values. We will perform the calculations in detail for p1p_1 and p4p_4, while you are advised to check for p2p_2 and p3p_3. Recall that we denote by pip_i the ii-th parity bit.
    1. As for the first parity bit, we have the following set covered by it (the red cells in the picture below). It should not be difficult to count, that there are 44 ones in that set, i.e. an even number. Consequently, we set the value of the first parity bit equal to 00: p1=0p_1 = 0.

      computing first parity bit

    2. Similarly, p2=0p_2 = 0 and p3=1p_3 = 1.
    3. Last but not least, we have the situation below for the fourth parity bit. We can see that there are 2 ones in the red zone, which is an even number. Hence, p4=0p_4 = 0.

      computing fourth parity bit

  4. Finally, we have all our parity values, and we can write the Hamming code for our string 11001011100101 in the format p1p2d1p3d2d3d4p4d5d6d7p_1p_2d_1p_3d_2d_3d_4p_4d_5d_6d_7. We have computed all the values, so the encoded string will be 0011100010100111000101.

Time complexity and pseudocode

Some of us find it difficult to turn words into code. To help with this, here is a pseudocode on Hamming encoding process:

// data = the binary string to be encoded

function compute_parity_bit(data, p_idx):
    count = 0
    for i in [p_idx, len(data)]:
        if i & p_idx == p_idx then             // the set covered by i-th parity bit
            if data[i] == '1' then             // count the number of 1's covered by the parity bit
                count = count + 1
    return '1' if count mod 2 == 1 else '0'    // p_i = 1 if the number of ones is odd, 0 otherwise


function hamming_encode(data):
    n = len(data)                              // number of data bits
    r = min r s.t. 2^r >= n + r + 1            // number of parity bits
    j = 0                                      // index of current data bit
    // placing bits
    for i in [1, n + r]:
        if i is a power of 2 then
            code = code + '0'                  // add a placeholder for the parity bit
        else:
            code = code + data[j]              // add the next data bit
            j = j + 1
    // computing parity bits
    for i in [0, r - 1]:
        p_idx = 2^i                            // calculate the index of the parity bit
        p_i = compute_parity_bit(code, p_idx)  // calculate the parity bit value
        code[p_idx] = p_i                      // replace the placeholder with the parity bit value
    return code

Now, let's study the time complexity of this process, analyzing one step at a time:

  1. We can find the lowest number rr satisfying the condition, by using binary search, for example. Hmmm, this is a good moment to refresh and test your knowledge on binary search. Recall that the time complexity of this process would be O(logn)O(\log n), since rr can't be greater than nn. Indeed, 2n2^n is always greater than 2n+12n+1 for large nn.
  2. Simply positioning the data bits requires linear time.
  3. Computing the value of one parity bit would require linear time since we scan all the characters in the string. However, we have rr parity bits, whose value we have to calculate. In total, we get O(nr)O(n\cdot r).
  4. The last step consists of positioning parity bits, which are linear as well.

Gathering all together, we see that the third step outweighs the other steps, hence the time complexity of the encoding algorithm is O(nr)O(n\cdot r).

It is worth noting, that rr is calculated to be approximately logn\log n, hence this algorithm's time complexity would be O(nlogn)O(n\log n).

Conclusion

Now it's time to recap the main information on this topic. Hamming coding is a technique that is widely used for error detection and correction, consisting of encoding, which we mentioned in this topic, and decoding, which is to be discussed in the next topic. The encoding process consists of 4 main steps:

  • Identify the number of parity bits required.
  • Assign each data bit to a position in the code word.
  • Calculate the value of each parity bit.
  • Append the parity bits to the code word to create the final message.

The time complexity of such an algorithm is estimated to be O(nlogn)O(n\log n). Now that you have a clear idea of how the algorithm works, let's practice with some tasks.

4 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo