Computer scienceAlgorithms and Data StructuresAlgorithmsPrinciples and techniquesHashing

Cryptographic hash functions

8 minutes read

In an increasingly digital world where data breaches and cyberattacks pose significant threats, it is crucial to protect the integrity and security of sensitive information. Cryptographic hash functions play a vital role in this regard, serving as the guardians of data integrity and providing a robust layer of protection. But why aren't hash functions enough?

Encoding using hashes

Hash functions and cryptographic hash functions are closely related concepts in the realm of computer science and cryptography. While both serve the purpose of transforming data into fixed-size representations, there are crucial differences between the two. This article aims to explain the differences between hash functions and cryptographic hash functions, highlighting the unique characteristics and uses of the latter ones.

Cryptographic hash functions

Cryptographic hashes are made to work with any input of any length and type by considering it as a sequence of bits: 11's and 00's. They also output a sequence of bits but of fixed length. The computer can work quite well with it, but for us, it's quite hard to "see" if we don't format it clearly. Typically, there are a few hundred bits in the output, so what we do is consider it as a large number in base 2, convert it to base 16, and write it as a string.

Formally, cryptographic hash functions are a specific subset of hash functions designed for secure cryptographic applications. They possess all the properties of traditional hash functions and incorporate additional security features. The distinctive properties of cryptographic hash functions include:

  • Pre-image resistance: it is hard to find a message with a given hash value.
  • Second pre-image resistance or weak collision resistance: it is difficult to find a message with the same hash value as a given message.
  • Collision resistance or strong collision resistance: it is hard to find two different messages with the same hash value.

Pre-image resistance

Imagine a company that stores a table of username-password pairs for a website you use. If that table gets leaked, then anyone can see your password. So, what they do is save the hash of your password instead. Whenever you send them your password, they compute the hash and check if it is the same as the one saved. Now, if the table is leaked, an attacker must find a password that hashes to the same value as your password to be able to log in as you.

Finding a key to open the lock

So, the hash function must make it very difficult for an attacker to find such a password. Roughly speaking, even if someone has the lock (the hash value), it would be extremely difficult or practically impossible for them to figure out the specific key (the original input) that would open the lock. Such a function is called pre-image resistant: given a hash value hh, it is hard to find a message mm with hash(m)=hhash(m) = h.

Second pre-image resistance

One of the ways we can make sure a message was not changed is by sending the hash of the message together with the message itself. Suppose that an attacker wants to change the message, but cannot change the hash. Then their changed message should hash to the same value as the original one. Even if the new message might not make sense, it can affect communication in some ways.

To prevent such situations, we can use a second pre-image-resistant or weak collision-resistant hash function. Imagine a person holding a unique key and a lock representing the hash function. The person and their key represent the original input. The lock represents the hash value of that input. Second pre-image resistance means that even if someone examines the key and the lock, it would be extremely difficult for them to find a different key that fits the lock and produces the same hash value. Formally, this means that, given a message m1m_1, it will be hard to find a different message m2m_2 with hash(m1)=hash(m2)hash(m_1) = hash(m_2). The difference between pre-image and second pre-image resistances is that in the first case we are given a hash value, while here, we are given a message.

Finding a second key to open the same lock

Collision resistance

In some very specific use cases, finding any pair of messages that have the same hash might result in problems. A hash function that makes sure it is hard to find any messages m1,m2m_1, m_2 such that hash(m1)=hash(m2)hash(m_1) = hash(m_2) is called collision-resistant or strong collision-resistant. In simple words, collision resistance ensures that it is extremely difficult to find two keys that can open the same safe or produce the same hash value. Unlike the second pre-image resistance, here we are given nothing; we simply want to make it difficult to find pairs of distinct messages with equal hash values.

No two keys open the same lock

The purpose of collision resistance is to prevent malicious actors from intentionally finding different inputs that produce the same hash value. This property is crucial for ensuring the security of cryptographic protocols, digital signatures, and other applications that rely on the uniqueness and integrity of hash values. By making it computationally impractical to find collisions, collision resistance strengthens the overall security of cryptographic systems.

When we say it's hard to find some value, that means that finding a value with the needed properties would take years, even with the most formidable supercomputers. If you think that it's hard to get those properties, you're right! Not everyone can create such a function. Fortunately, there are a few standard ones that are widely used today. Their implementations are complicated, so we will not go into detail.

Examples

Finally, it is worth bringing up a couple of real examples of cryptographic hash functions and a bit of information on their history.

First, let's look at MD5. It was created in 1992 as a better alternative to its predecessor, MD4, which turned out to be non-pre-image resistant due to its linear structure (2008). MD5 takes any input and outputs a 128-bit hash value. Initially, it was believed to be collision-resistant, but in 2004 it was proved that it wasn't the case. It took 12 years and a lot of research to find this, so it's better to stick with existing hashes than try to create new ones!

Another cryptographic hash function, a more secure one, is SHA256, also known as "SHA-2" or "SHA-256 bit". Its output is 256-bit long and is used in many places, one of them being the Bitcoin proof of work. Let's look at some examples and see what small changes in the input do to the hash value:

SHA256("Hashes are fun!") = c128f0e84f60397828b11eaa3cbb67262b99f4d11f3ad630139ffa36cc600278
SHA256("hashes are fun!") = 9659f1fcdf143e3e1f66a922d71500d86c3b7d8f3a5ef03e1d9676547632317e
SHA256("Hashes are fun.") = b2cde78b638667158fb91d0c665d1d20cc247925b45d9eccb7d25c08721fea12
SHA256("Hashes are fnu!") = c75bb5fed45a21695e0c607376cc1c925939a02c38fac8e7d5488b8820487bca

Even small changes produce huge differences! This is the consequence of collision resistance and pre-image resistance.

It is worth noting that due to quick advancements in computing power and new cryptographic techniques, it should not be unexpected if some of the functions we utilize nowadays cease to be used in the near future.

Conclusion

In summary, cryptographic hash functions are essential for protecting data integrity and security. There is a simple difference between them and hash functions: they are in fact a subset of hash functions that in addition satisfy three important properties:

  • Pre-image resistance
  • Second pre-image resistance
  • Collision resistance.

These properties ensure that it is difficult to find original inputs given their hash values, find different inputs with the same hash value, or find two distinct inputs with the same hash value. By incorporating these properties, cryptographic hash functions provide robust protection against tampering and unauthorized access to data.

One could talk for hours on cryptographic hash functions, their perspectives, and limitations, however, there is a lot more to learn for you in the next topics. So, consider moving on with the tasks and read more on hash functions in your free time: the Internet is packed with information.

24 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo