Hashing and encryption, are they the same?
Hashing is a one-way function that outputs a fixed-length string, where it’s impossible to decipher back into the original input. Encryption is a reversible process used to scramble data so that unauthorised users can’t read it. So, if you’re looking for a way to keep your passwords safe and secure, look no further than hashing! But there is a slight catch here that relates to the term salting. We will dig into these terminologies, examples and the difference between hashing and encryption and salting in this blog.
Hashing and encryption are two terminologies used widely and interchangeably in the IT world. However, that is incorrect as encryption and hashing are two very different but relatable concepts. They cannot be used as an alternative because they are used to achieve other goals. In a nutshell, hashing usually deals with data integrity, whereas encryption focuses on data confidentiality.
This article will discuss what hashing and encryption are, their common usage, and how they differ from each other.
What is hashing in cybersecurity?
Hashing uses mathematical algorithms called a hash function that calculates a fixed-size resulting string (called a hash) from the input string supplied to it. The size of the input string can vary to any size, but the resulting hash will always be of a fixed size. Hashes can therefore be considered a condensed summary of the input string.
A good hashing algorithm displays an avalanche effect; even if a single bit or character of the input changed, the resultant hash would change completely. If a hash function does not exhibit this characteristic, it is said to have inadequate randomisation and can be exploited by attackers.
Hashes are typically only one-way or unidirectional. If an attacker has the hashed value of a password, he cannot recreate the original password using the hash value. We will further explain this in the later sections of this article.
In cyber security, hashing is used to store sensitive data, help with authentication mechanisms, and check data integrity (i.e., check if data has changed or not). For example, if a company’s databases are breached and stored data in a readable format (clear-text), the information falls into the wrong hands for further misuse. But if stored data were in a hashed form, then the values would be useless to the attackers.
Features of hash functions
Features of a hash function include:
- Fixed length output:
- Hash functions take an arbitrary length input and always return a fixed-length hash value depending upon the algorithm.
- The length of the hashes can be relatively much smaller than the input data because these hash functions are also called compression functions.
- Efficiency
- For any hash function H with input X, H(X) computation is relatively a much faster operation.
- Uni-directional or One-way
- Hashes are irreversible.
- Once a hash value is generated, converting it back to the original data is impossible.
- Collision Free
- A good hash function must be collision-free which means that no two input strings can produce the same hash value. Or, to put it in another way, a hash function should always have a unique output for different input strings.
- Whenever a hash function is run against an input value, it must always produce the same result.
Types of hashing
There are many different algorithms used for hashing, but the most frequently used are as follows:
MD5
- MD5 stands for Message Digest algorithm 5.
- MD5 generates 32 characters or 128-bit hexadecimal hash of any input string supplied.
- It is often used as a checksum to verify data integrity.
- MD5 does have some identified collisions. Hence it is no longer recommended use.
SHA-2
- SHA stood for the Secure Hash function and was developed by the National Security Agency (NSA).
- SHA-2 is the most widely used algorithm out of the SHA family.
- SHA-2 consists of six hash functions that can generate 224, 256, 384 or 512 bits. The six hash functions are:
- SHA-224
- SHA-256
- SHA-384
- SHA-512
- SHA-512/224
- SHA-512/256.
- Until now, no collisions for SHA-2 have been detected.
CRC32
- CRC stands for Cyclic Redundancy Check.
- This is an error-detecting code.
- CRC mainly uses file integrity checks and is commonly used on Zip files and FTP servers.
- CRC generated hexadecimal hash values of 32-bits or eight characters.
What is a collision in encryption?
Being unidirectional is one of the requirements of hashing in cybersecurity. It is necessary so that an attacker can not recreate the original data if hashes are leaked in any data breach. Simultaneously two input strings mustn’t generate an identical hash, known as a collision.
Let’s take the example of detecting a collision. The images below of a ship and a plane are visually different and must generate two unique hashes.
When using MD5 to generate the hash, you can see in the below screenshot that the value of these hashes for the two different images is the same. Hence a collision exists.
However, using SHA-256 to generate the hash, the below screenshot shows that the values are entirely different; hence there is no collision.
Applications of hash functions
Hashing is helpful in a number of scenarios. For example, it can verify the integrity of a message; hash functions are commonly used in digital signatures. In addition, hash functions can be used for encryption and decryption; for example, AES-256 uses a key derived from a password by means of PBKDF2, which is itself a cryptographic hash function. Finally, hashes are used to index data in hash tables or to detect duplicate files. Here are some of the more detailed use cases of hash functions:
Data Retrieval
Hashes use algorithms to create fixed-size strings (hash) from the input supplied; this input can be a file or any other data object. Due to this, hashes can be used to optimise searches.
Hash tables can store data in the form of key-value pairs. The key is the identifier of the data object stored, while the hash is the actual value of the data object. So now, instead of searching for the larger data object, the search is conducted on the hash value instead.
Digital Signatures
Hashing is used to determine a message’s integrity for the message senders and receivers with the help of digital signatures.
Before sending the message, a hash of the signed message is generated, i.e. H1, and then the message along with hash H1 is transferred across the network. A second hash is generated using the same algorithm on the receiver’s end, i.e. H2. Both H1 and H2 are compared; if the hashes are equal, the message is not tampered with during transfer.
Password Storage
Hash functions are widely used to store passwords in databases etc., securely. It can sufficiently decrease the impact and extent of an attacker if he has to access any breached database. It is because an attacker will not be able to revert the hashes to the original passwords; therefore, not be able to gain access to the user accounts.
Other than this, many application login processes use hash functions when logging in a user. Because the entire user input is hashed before sending it to the server, it reduces injection attacks.
File equality
Hashing is also proper when there is a need to compare two or more files. Traditionally, a person would have to open both files and compare word by word if any changes are made, but any changes can be quickly identified using hashing.
Data Integrity Checks
Checking if data has remained untampered and the data integrity remains intact is the most common application of hash functions as it is used to generate checksums on files. Using hash functions in this way assures the user and application about the correctness of data.
Integrity checks are used in File monitoring systems, where the changes made to sensitive files are monitored.
The process of how to ensure data integrity is described in the illustration below:
Hashing and passwords
The process for authenticating a user is almost the same for many applications. The user creates a new account by choosing a username and password; the application stores this information in a database. When the user wants to authenticate into the application later, he enters the username and password; these values are compared to the value from the database. If they match, the user is granted access to the application.
There is a security risk here, though. Any employee who has access to the database or an attacker who compromises the system can read and use all the credentials stored and possibly log in and access the users’ accounts.
However, if the database stores password as hashes, then these will not be of any use to an attacker. The attacker would view some value in the database like 5f4dcc3b5aa765d61d8327deb882cf99, but he cannot enter this value in the login screen. Thus, it will protect the passwords.
When implementing hash functions, the developers must keep in mind not to implement obsolete functions in which collisions exist.
For robust protection, hashing functions can be used in multiple iterations. This means that the original input is hashed (h1), then the hash becomes the input and is re-hashed (h2), which can be carried out multiple times. It is known as the work factor and makes cracking hashes even more difficult.
Can a hash be decrypted?
As discussed earlier, hashes are only a one-way process meaning that the original data cannot be reconstructed from a hash value using mathematical algorithms. However, it is possible to crack hashes using the rainbow table attack technique.
For simplicity, a rainbow table is a database that contains two columns; the first column contains plaintext values, and the second column contains their hash values. Attackers use this database to find the plaintext value; it can also be called a reverse lookup.
Since hash functions generate unique values, only one plaintext string can correspond to a hash value, and that is the cracked hash.
Many websites host millions of entries of standard plaintext strings along with their hash values, so attackers just have to search the hash value and see the corresponding plaintext string. Many automated tools can perform these attacks; JohnTheRipper and HashCat are some of the most famous.
What is salt in cybersecurity?
Salt is a random string of data added to the original string when hash functions are being used. It is typically used to increase password protection and protect against brute force attacks. Since two or more users may use the same passwords, this will result in similar hashes being stored. Salts prevent this from happening as using the random strings generates unique hashes each time.
If an attacker has access to breached databases and he sees duplicate hashes stored, a probable reason for that will be that there is no salting being used or a weak algorithm is being used, and collisions are taking place.
What does salting a password mean?
Passwords can be stored in databases as hashed values. The benefits and risks of doing so are already discussed earlier. But by salting a password, an extra layer of security is added to the password protection. Salting a password means that the application code appends or prepends a random string to the original password and then creates a hash of this salted password.
For example, if the password is “DontHackMe”, the SHA-2 hash of this password store in the database will be: 446d2b4e925b732ae6917062bcfaf6f07223d95fb7bcdba78c341c32d85e5333.
By adding a random salt, the password becomes “DontHackMe885eef” and the generated hash is then: a45856b0757c7cec9eda5382bcfd47bb52d2ff2125b9595cbbf8aad345ec074c.
What is the use of salting in hashing?
Salting is used in hashing to:
- Increase password protection.
- Reduce the probability of collisions.
- Increase the complexity so that attackers can not crack the hashes.
- Mitigate the risks of rainbow table attacks.
Attacking unsalted passwords
There are typically two ways an attacker can go about attacking unsalted passwords, either by password guessing attacks (brute-forcing / dictionary attacks) or by using hash tables or rainbow tables.
Password Guessing Attacks
Many people use dictionary words as their passwords. In such cases, an attacker can use a publicly available list of dictionary words and their computed hashes and compare the victims’ password hash in the list. Other than words from the dictionary, many word lists are available on the Internet, containing leaked or common passwords. These lists can also be used similarly for password cracking.
Hash and Rainbow Tables
Unlike password guessing attacks, hash and rainbow tables provide pre-computed hashes of millions of password entries. This is a faster approach because an attacker can simply do a reverse lookup of the password hashes from the available databases and find the corresponding plaintext password.
How does cryptographic salt improves password management security?
Mitigating password attacks with salts
The technique for salting passwords is widely used to mitigate attacks such as hash tables or dictionary attacks. As described previously, a salt is a random string either appended or prepended to the existing password. The use of salting results in a non-deterministic hash function which means that duplicate passwords can not be identified. Let’s consider an example:
Both users, Alice and Carol, are using the same password, “iM$ecuR3”. Without salting, the hashes calculated for both these passwords are the same. But if random salts are appended to each password, then the resulting salted hashes are unique, thus mitigating the security risks.
Different users having the same passwords with other salts produce different hashes. Suppose a data breach were to happen in this scenario. In that case, an attacker could not identify Alice and Carol using the same password and using the salted hash in a rainbow table would return with an incorrect plaintext password.
How to make the most of hashing using a salt
- Unique salts should be used for every user and every password.
- The length of the salt should be ideally equal to the length of a hash generated. For example, if SHA-2 is used, the generated hash will be 256-bit, so the added salt value should also be at least 256-bit in length.
- Avoid using usernames as the salt values. Usernames are easily guessable and often publicly available, so these will not be secure compared to using a completely random value for the salt.
- Use pseudo-random number generators to create salt values.
What is encryption in cyber security?
Encryption means using a certain key to scramble useful information into gibberish so that only the person with the corresponding key can unscramble and read it. Encryption is a bidirectional or two-way process that means that if a string is encrypted, it can also be decrypted back to its original form.
Encryption is mostly used to hide information or preserve the confidentiality of data. One of the most common uses of encryption is in web applications that use SSL or Secure Socket Layer in combination with HTTP to create HTTPS web applications. If any malicious user intercepts the encrypted data, it will be incomprehensible, and data confidentiality will remain intact.
Encryption is also achieved by using mathematical algorithms called ciphers. These are a sequence of well-defined steps to encrypt or decrypt information.
For a more straightforward approach, consider Alice has a safety deposit box in a bank to store her valuables. Anyone who looks at the deposit box will not be able to see the contents inside the box. Only when Alice shares the key with someone can they open and access the valuables. The same concepts apply when talking about encryption.
How does encryption work?
To explain how encryption works on a basic level, let’s take an example:
Alice and Bob want to send each other a message containing some sensitive information. They are concerned that if they send the data in plaintext, a malicious person, Eve, will intercept and read the sensitive information.
Alice uses encryption to solve this problem. She uses Bob’s public key (known only to everyone) to encrypt her message and send it across the network. The encryption makes it possible that no one on the web can eavesdrop on the communication. On the other side, Bob uses his private key (which is known only to Bob) and decrypts Alice’s message.
Eve, who was listening on the network, captured the encrypted message during the entire communication. Since she does not have Bob’s private key, she cannot decrypt the message and is left with unreadable and incomprehensible gibberish data.
Types of encryption
Asymmetric encryption
This type of encryption can also be called public-key encryption. It consists of two keys; one key, called the public key, is used for encryption, and the second key, called the private key, is used for decryption.
So basically, Bob’s public key will be available to everyone, and anyone can send him encrypted messages using his public key. But only Bob will be able to decrypt and read the message.
Symmetric encryption
In symmetric encryption, each party has its own single key used for both encryption and decryption. This can also be called Private Key Cryptography. The key is shared or exchanged between both parties to establish an encrypted communication channel.
In the earlier decades, encryption was entirely based upon Private Key Cryptography, in which the private keys had to be physically shared or exchanged between the two parties. And if because, for any reason, the private key was disclosed, it meant that all the communication was compromised.
Today, however, Public Key Cryptography is being used in which sharing of private keys is not needed. As in the example above, the encryption is performed using the public keys and decryption using the user’s private key.
Common encryption algorithms
AES – Advanced Encryption Standard
AES is the most commonly used symmetric block cipher that is trusted as a standard by many organisations. AES 256 has been proven reliable against brute-force methods.
RSA – Rivest-Shamir-Adleman
RSA is a public key or asymmetric encryption algorithm.
ECC – Elliptic Curve Cryptography
ECC is based on the algebraic structure of an elliptic curve over finite fields.
3DES – Triple Data Encryption Standard
Originally based on DES, 3DES uses three individual 56-bit keys to perform encryption.
Is hashing a form of encryption?
Hash and encryption both use mathematical functions to generate hash values and ciphertext, respectively. However, hash functions are unidirectional and non-reversible, which means that a hash can be converted back to its original value. At the same time, encryption is a two-way or bi-directional technique in which the original message can be retrieved by using the decryption key.
Thus hash functions are not the same as encryption.
What is the difference between hashing and encryption?
The table below highlights the difference between encryption and hashing:
Encryption
- Encryption is a reversible, bi-directional function.
- The original message can be retrieved using a decryption key.
- The resultant encrypted string is of variable length.
- The length of the encrypted string depends on the length of the input string.
- The purpose of encryption is to ensure data confidentiality.
- Encryption is used to keep data secret from others.
- Encryption is accomplished with the use of keys.
- In encryption, messages are scrambled so that only the authorised receiver can view the contents.
- Examples of encryption algorithms include AES, DES, RSA, ECC etc.
Hashing
- Hashing is irreversible and unidirectional.
- The original message can not be retrieved.
- The resultant hash is of fixed length.
- The hash length is fixed and does not depend on the size of the input string.
- The purpose of hashing is to ensure data integrity.
- Hashing is used for indexing, data retrieval and storing passwords.
- There is no use of keys in hashing.
- Hashing is a process of condensing input strings into a fixed length. They can be used as checksums.
- Examples of hashing algorithms include SHA-1, SHA-2, MD5, CRC etc.
Is hashing more secure than encryption?
Hashing is more secure in comparison to encryption. In cases where the original data is not needed, hashing is a better and more secure approach. Because, as mentioned earlier, hashes are irreversible so the data stored as hash values, especially if it is salted, can never be disclosed to any unauthorised user.
Nevertheless, in scenarios where it is necessary to view the data as an understandable and human-readable text, such as after an SSL handshake, it is required by design to use encryption.
Is hashing better than encryption?
Whether hashing or encryption all comes down to the required objective and goal. If one needs to maintain data integrity, then hashing should be used, and if data is supposed to be kept safe and hidden, encryption should be used.
Although both techniques transform data from one form to another, user requirements need to be considered when choosing one. Do you need the original text back? Do you need to ensure no one changes the input? Do you need to keep data private from everyone except one party? These, along with many other questions, need to be considered when choosing the appropriate technique.
Which is more secure, Hashing or Encryption?
Hashing is used to validate integrity, whereas encryption scrambles data to ensure confidentiality. If we look at the techniques, overall, hashing would be a more secure option. Even if an adversary were to capture the hashes, cracking them to retrieve the actual data would be very difficult, mainly if the proper implementation of salting is used. And even if an adversary could crack one hash, it does not mean that he will be able to crack them all. On the other hand, if encrypted data was leaked and the private key was also revealed, the entire encrypted data would be readable and compromised.
The decision of when to which technique solely depends on the requirements. Hashing and encryption cannot be used as a substitute for one another, so the requirements should be particular to the end goal.
Get in touch for a free consultation or discuss your encryption security concerns.