Hashing functions represent the third type of cipher technology to consider. These are typically used to generate a digital signature. They are also considered "one-way" or impossible to invert. To recreate the original data after passing through a hash function would be a brute-force attack of every possible combination of inputs. The key attributes of a hashing function include:
- Always generates the same hash from the same input
- Quick to compute but not free (see proof of work)
- It is non-reversible and cannot regenerate the original message from the hash value
- A small change to the input will result in significant entropy or a change in the output
- Two different messages will never have the same hash value
Input: Boise Idaho
SHA1 Hash Output: 375941d3fb91836fb7c76e811d527d6c0a251ed4
Input: Boise Idaho
SHA1 Hash Output: 82b6109838f8f40dc1d1530e5535908853e3fd5f
SHA algorithms are used extensively in applications such as:
- Git repositories
- TLS certificate signing for web browsing (HTTPS)
- Validating file or disk image content authenticity
Most hash functions are built upon the Merkle-Damgård construction. Here, the input is split into equally sized blocks which are processed serially with a compression function combined with the output of the previous compression. An Initialization Vector (IV) is used to seed the process. By using a compression function, the hash is resistant to collisions. SHA-1 is built upon this Merkle-Damgård construction:
In general, the SHA algorithm's input message must be less than 264 bits. The message is processed in 512-bit blocks sequentially. SHA-1 is now superseded with strong kernels such as SHA-256 and SHA-3. SHA-1 was found to have "collision" within the hash. While it would take approximately 251 to 257 operations to find a collision, it would take only a few thousand dollars of rented GPU time to resolve the hash. Thus, the recommendation is to move to the strong SHA models.