Cryptography is a necessary component in many parts of a secure architecture. However, just adding cryptography to the code does not make it more secure; care must be given to such topics as secrets generation, secret storage, and plain-text management. Properly designing secure software is complicated, especially when cryptography is involved.
Designing for security is beyond the scope here. This chapter only teaches Python’s basic tools for cryptography and how to use them.
8.1 Fernet
The cryptography module supports the Fernet cryptography standard. It is named after an Italian, not French, wine; the t is pronounced. A good approximation for the pronunciation is fair-net.
Fernet works for symmetric cryptography. It does not support partial or streaming decryption. It expects to read in the whole ciphertext and return the whole plain text. This makes it suitable for names, text documents, or even pictures. However, videos and disk images are a poor fit for Fernet.
The cryptographic parameters were Fernet, which were chosen by domain experts who researched available encryption methods and the known best attacks against them. One advantage of using Fernet is that it avoids the need to become an expert yourself. However, for completeness, note that the Fernet standard uses AES-128 in CBC padding with PKCS7, and HMAC uses SHA256 for authentication.
The key is a short string of bytes. Securely managing the key is important; cryptography is only as good as its keys. If it is kept in a file, for example, the file should have minimal permissions and ideally be hosted on an encrypted file system.
Encryption is simple. It takes a string of bytes and returns an encrypted string. Note that the encrypted string is longer than the source string. It is also signed with the secret key, which means that tampering with the encrypted string is detectable, and the Fernet API handles that by refusing to decrypt the string. The value gotten back from decryption is trustworthy. It was indeed encrypted by someone who had access to the secret key.
Decryption is done in the same way as encryption. Fernet does contain a version marker, so if vulnerabilities in these are found, it is possible to move the standard to a different encryption and hashing system.
This fails if the encrypted information (sometimes referred to as the token) is older than five seconds. This is useful to prevent replay attacks, one where a previously encrypted token was captured and replayed instead of a new valid token. For example, if the encrypted token has a list of usernames that are allowed some access, and is retrieved using a subvertible medium, a user who is no longer allowed in can substitute the older token.
Ensuring token freshness would mean that no such list would be decoded, and everybody would be denied, which is no worse than if the medium was tampered with without having a previously valid token.
This can also be used to ensure good secret rotation hygiene. By refusing to decrypt anything older than, say, a week, you make sure that if the secret rotation infrastructure broke, you would fail loudly instead of succeeding silently and thus fix it.
The Fernet module also has a MultiFernet class to support seamless key rotation. MultiFernet takes a list of secrets. It encrypts with the first secret, but try decrypting with any secret.
If you add a new key to the end, it is first not used for encryption. After synchronizing the addition to the end, you can remove the first key. Now all encryptions are done via the second key, and even those instances where it is not synchronized yet have the decryption key available.
This two-step process is designed to have zero invalid decryption errors while still allowing key rotation, which is important as a precautionary measure. A well-tested rotation procedure means that if keys are leaked, the rotation procedure can minimize the harm they do.
8.2 PyNaCl
PyNaCl is a library wrapping the libsodium C library, which is a fork of Daniel J. Bernstein’s libnacl. This is why PyNaCl is named the way it is. (NaCl, or sodium chloride, is the chemical formula for salt. The fork took the name of the first element.)
PyNaCl supports both symmetric and asymmetric encryption. However, since cryptography supports symmetric encryption with Fernet, the main use of PyNaCl is for asymmetric encryption.
The idea of asymmetric encryption is that there is a private and a public key. The public key can easily be calculated from the private key, but not vice versa; that is the asymmetry it refers to. The public key is published, while the private key must remain a secret.
There are, in general, two basic operations supported with public-key cryptography. You can encrypt with the public key in a way that can only be decrypted with the private key. You can also sign with the private key in a way that can be verified with the public key.
As discussed earlier, modern cryptographic practice places as much value on authentication as it does on secrecy. This is because if the media the secret is transmitted on is vulnerable to eavesdropping, it is often vulnerable to modification. Secret modification attacks have had enough impact on the field that a cryptographic system is not considered complete if it does not guarantee both authenticity and secrecy.
Because of that, libsodium, and by extension PyNaCl, do not support encryption without signing or decryption without signature verification.
You can generate a private key from the byte stream, and it is identical. This means you can again keep the private key in a way you decide is secure enough; a secret manager, for example.
This signs with the source private key and encrypts using the target public key.
The decryption box decrypts with the target private key and verifies the signature using the source public key. If the information has been tampered with, the decryption operation automatically fails. This means that it is impossible to access plain-text information that is not correctly signed.
Another piece of functionality that is useful inside of PyNaCl is cryptographic signing. It is sometimes useful to sign without encryption; for example, you can make sure to only use approved binary files by signing them. This allows the permissions for storing the binary file to be loose if you trust that the permissions on keeping the signing key secure are strong enough.
Signing also involves asymmetric cryptography. The private key is used to sign, and the public key is used to verify the signatures. This means that you can, for example, check the public key into source control and avoid needing any further configuration of the verification part.
This is useful if you want to save the signature in a separate place. For example, if the original is in object storage, mutating it might be undesirable. In those cases, you can keep the signatures on the side. Another reason is to maintain different signatures for different purposes or allow key rotation.
8.3 Passlib
Secure storage of passwords is a delicate matter. It is so subtle that it must deal with people who do not use password best practices. If all passwords were strong and people never reused passwords from site to site, password storage would be straightforward.
However, people usually choose passwords with little entropy (123456 is still unreasonably popular, as well as password), they have a standard password that they use for all websites. They are often vulnerable to phishing attacks and social engineering attacks where they divulge the password to an unauthorized third party.
Not all threats can be stopped by correctly storing passwords, but many can. At the very least, they can be mitigated.
The Passlib library is written by people who are well versed in software security. It tries to eliminate the most obvious mistakes when saving passwords. Passwords are never saved in plain text; they are always hashed.
Note that hashing algorithms for passwords are optimized for different use cases than hashing algorithms used for other reasons; for example, one of the things they try to deny is brute-force source mapping attacks.
Passlib hashes passwords with the latest vetted algorithms optimized for password storage and intended to avoid any possibility of side-channel attacks. In addition, Salt is always used for hashing the passwords.
Although Passlib can be used without understanding these things, it is worthwhile to understand them to avoid mistakes while using Passlib.
Hashing means taking the users’ passwords and running them through a reasonably easy function to compute but hard to invert. This means that even if an attacker gets access to the password database, they cannot recover users’ passwords and pretend to be them.
One way that the attacker can attempt to get the original passwords is to try all combinations of passwords they can come up with, hash them, and see if they are equal to a password. To avoid this, special algorithms are used that are computationally hard. This means that an attacker would have to use a lot of resources to try many passwords so that even if, say, only a few million passwords are tried, it would take a long time to compare. Finally, attackers can use rainbow tables to pre-compute many hashes of common passwords and compare them all at once against a password database. To avoid that, passwords are salted before they are hashed; a random prefix (the salt) is added, the password is hashed, and the salt is prefixed to the hash value. When the user enters a password, the salt is retrieved from the beginning of the hash value before hashing it to compare.
Doing all of this from scratch is hard and even harder to get it right. Getting it right does not just mean having users log in but being resilient to the password database being stolen. Since there is no feedback about that aspect, it is best to use a well-tested library.
The library is storage agnostic. It does not care where the passwords are being stored. However, it does care that it is possible to update the hashed passwords. This way, hashed passwords can get updated to newer hashing schemes as the need arises. While Passlib does support various low-level interfaces, it is best to use the high-level interface of the CryptContext. The name is misleading since it does no encryption. It refers to vaguely similar (and largely deprecated) functionality built into Unix.
The first thing to do is decide on a list of supported hashes. Not all of them have to be good hashes; if you have supported bad hashes in the past, they still have to be on the list. In this example, you choose argon2 as the preferred hash but allow a few more options.
Using the argon2 hash, an extra dependency needs to be installed. Use pip install argon2_cffi to install it.
It is possible to configure other details, such as the number of rounds. This is almost always unnecessary, as the defaults should be good enough.
When saving the string, note that it does contain newlines; this might impact where it can be saved. If needed, it is always possible to convert it to base64.
In that case, you would need to store the second element in the password hash storage.
8.4 TLS Certificates
Transport Layer Security (TLS) is a cryptographic way to protect data in transit. Since man-in-the-middle attacks are a potential threat, it is important to be able to verify that the endpoints are correct. For this reason, certificate authorities sign the public keys. Sometimes, it is useful to have a local certificate authority.
One case where that can be useful is in micro-service architectures, where verifying each service is the right one allows a more secure installation. Another useful case is for putting together an internal test environment, where using real certificate authorities is sometimes not worth the effort. It is easy enough to install the local certificate authority as locally trusted and sign the relevant certificates with it.
Another place this can be useful is in running tests. You want to set up a realistic integration environment when running integration tests. Ideally, some tests would check that TLS is used rather than plain text. This is impossible to test if you downgrade to plain-text communication for testing purposes. Indeed, the root cause of many production security breaches is that plain-text communication code inserted for testing was accidentally (or maliciously) enabled. Furthermore, it was impossible to test that such bugs did not exist because the testing environment did have plain-text communication.
For the same reason, allowing TLS connections without verification in the testing environment is dangerous. This means that the code has a non-verification flow, which can accidentally turn on, or maliciously be turned on in production and is impossible to prevent with testing.
Manually creating a certificate requires access to the hazmat layer in cryptography. This is so named because this is dangerous. You must judiciously choose encryption algorithms and parameters, and the wrong choices can lead to insecure modes.
This is important since the certificate only refers to the public key. Since the private key is never shared, it is not worthwhile and actively dangerous to make any assertions about it.
That’s it! You now have a private key and a self-signed certificate that claims to be a CA. However, you need to store them in files.
This gives you the capability to now be a CA.
For real certificate authorities, you generally need to generate a certificate signing request (CSR) to prove that the owner of the private key wants that certificate. However, since you are the certificate authority, you can just create the certificate directly.
The service.pem file is in a format that the most popular web servers can use: Apache, Nginx, HAProxy, and more. It can also be used directly by the Twisted web server through the txsni extension.
If you add the ca.crt file to the trusted root, and run, say, an Nginx server on an IP that the client would resolve from service.test.local, then when you connect clients to https://service.test.local, they verify that the certificate is valid.
8.5 Summary
Cryptography is a powerful tool, but one which is easy to misuse. Using well-understood high-level functions reduces many of the risks in using cryptography. While this does not substitute proper risk analysis and modeling, it does make this exercise somewhat easier.
Python has several third-party libraries with well-vetted code, and it is a good idea to use them.