© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
M. ZadkaDevOps in Pythonhttps://doi.org/10.1007/978-1-4842-7996-0_7

7. HTTPX

Moshe Zadka1  
(1)
Belmont, CA, USA
 

Many systems expose a web-based API. The httpx library is useful for automating web-based APIs. It is designed to be easy to use while still exposing many powerful features.

Note that httpx does not support Python 2. If this is a concern, there are alternatives. It is important to remember that Python 2 is not getting security updates, so it is dangerous to use with a library designed to connect to websites.

Using httpx is almost always better than using Pythons standard library HTTP client facilities. It supports flexible authentication, internally serializes and deserializes JSON, and supports both synchronous and asynchronous operation.

Note that httpx is largely compatible with the popular requests library. Unless special features in requests are used, like exotic certificate validation, converting code that uses requests to use httpx mostly changes the import statements.

7.1 Clients

It is better to work with explicit clients in httpx. It is important to remember that there is no such thing as working without a client in httpx. When working with the functions, it uses the global client object.

This is problematic for several reasons. For one, this is exactly the kind of global mutable shared state that can cause hard-to-diagnose bugs. For example, when connecting to a website that uses cookies, another user of httpx connecting to the same website could override the cookies. This leads to subtle interactions between potentially far-apart pieces of code.

It is also problematic because this makes code non-trivial to unit tests. The httpx.get/httpx.post functions must be explicitly mocked. In contrast, httpx allows some interesting ways to fake explicit clients.

Last but not least, some functionality is only accessible when using an explicit Client object. If the requirement to use it comes later, for example, because you want to add a tracing header or a custom user-agent to all requests, refactoring all code to use explicit clients can be non-trivial.

It is much better for any code that expects to be long-lived to use an explicit client object. For similar reasons, it is even better to make most of this code not construct its own Client object but rather get it as an argument.

This allows initializing the client elsewhere, closer to the main code. It is useful because decisions about which proxies to use and when can happen closer to the end-user requirements rather than in abstract library code.

A client object is constructed with httpx.Client(). After that, the only interaction should be with the object. The client object has all the HTTP methods: .get(), .put(), .post(), .patch(), and .options().

Clients can be used as contexts.
with httpx.Client() as c:
    c.get(...)

At the end of the context, all pending connections are cleaned up, which is important, especially if a web server has strict usage limits that you cannot afford to exceed.

Note that counting on Pythons reference counting to close the connections can be dangerous. Not only is that not guaranteed by the language (and is not true, for example, in PyPy), but small things can easily prevent this. For example, the client can be captured as a local variable in a stack trace, and that stack trace can be involved in a circular data structure. This means that the connections are not closed for a potentially long time—not until Python does a circular garbage collection cycle.

The client supports a few constructor parameters you can set to send all requests in a specific way. The most common one to use is auth=. httpx authentication capabilities are further discussed later.

Another parameter that is useful to set is headers=. Those are the default headers that are sent with every request. This can sometimes be useful for the User-Agent variable.

When using httpx for testing your web APIs, it is useful to have an identifying string in the agent. This allows you to check the server logs and distinguish which requests came from tests instead of real users.
client = httpx.Client(
    headers={'User-Agent': 'Python/MySoftware ' + __version__ }
)

This lets you check which version of the test code caused a problem, especially if the test code crashes the server and you want to disable it, which can be invaluable in diagnosis.

The client also holds a CookieJar in the.cookies.jar member. The cookie jar can be explicitly set using a Client(cookies=cookie_jar) constructor parameter.

You can use it to persist cookies to disk and recover them if you want to have restartable HTTP sessions.

Finally, the client can have a client-side certificate in situations where this kind of authentication is desired. This can either be a pem file (the key and the certificate concatenated) or a tuple with the paths to the certificate and key file.

7.2 REST

REST stands for REpresentational State Transfer. It is a loose and loosely applied standard of representing information on the web. It is often used to map a row-oriented database structure almost directly to the web. When used this way, it is often called the CRUD model (create, retrieve, update, and delete). When using REST for CRUD, these web operations are often used.

You can create maps to POST, accessed via the .post() method on httpx.Client. In some sense, although the first on the list, it is the least RESTful of the four because its semantics are not replay safe. This means that if the .post() call raises a network-level error, for example, socket.error, it is not obvious how to proceed. Was the object created? If one of the fields in the object must be unique, for example, an email address for a user, then replaying is safe; it fails if the creation operation succeeded earlier.

However, this depends on application semantics, which makes it impossible to replay generically.

Luckily, the HTTP methods typically used for the other operations are replay safe. This property is also known as idempotency, inspired by (though not identical with) the mathematical notion of idempotent functions. If a network failure occurs, sending the operation again is safe.

If the server follows correct HTTP semantics, all operations that follow are replay safe.

The update operation is usually implemented with PUT (for a whole-object update) or PATCH (when changing specific fields).

The delete operation is implemented with HTTP DELETE. The replay safety here is subtle; whether a replay succeeds or fails with an object not found, in the end, you are left in a known state.

The retrieve operation, implemented with HTTP GET, is almost always read-only, and so it is replay safe, or safe to retry after a network failure.

Most REST services, nowadays, use JSON as the state representation. The httpx library has special support for JSON.
>>> from pprint import pprint
>>> pprint(c.get("https://httpbin.org/json").json())
{'slideshow': {'author': 'Yours Truly',
               'date': 'date of publication',
               'slides': [{'title': 'Wake up to WonderWidgets!', 'type': 'all'},
                          {'items': ['Why <em>WonderWidgets</em> are great',
                                     'Who <em>buys</em> WonderWidgets'],
                           'title': 'Overview',
                           'type': 'all'}],
               'title': 'Sample Slide Show'}}

The pprint built-in module allows printing with indentation and line breaks that are easier to read than print. The name stands for pretty print.

The return value from a request, Response, has a .json() method, which assumes the return value is JSON and parses it. While this only saves one step, it is a useful step to save in a multi-stage process where you get some JSON-encoded response only to use it in a further request.

It is also possible to auto-encode the request body as JSON.
>>> resp = c.put("https://httpbin.org/put", json=dict(hello=5,world=2))
>>> resp.json()['json']
{'hello': 5, 'world': 2}
The combination of those two, with a multi-step process, is often useful.
>>> res = c.get("https://api.github.com/repos/python/cpython/pulls")
>>> commits_url = res.json()[0]['commits_url']
>>> commits = c.get(commits_url).json()
>>> commits[0]['commit']['message'][:40]
'bpo-46104: Fix example broken by GH-3014'

This example of getting a commit message from the first pull request on the CPython project is a typical use of a good REST API. A good REST API includes URLs as resource identifiers. You can pass those URLs to a further request to get more information.

7.3 Security

The HTTP security model relies on certification authorities, often shortened to CAs. Certification authorities cryptographically sign public keys as belonging to a specific domain (or, less commonly, IP). To enable key rotation and revocation, certificate authorities do not sign the public key with their root key (the one trusted by the browser). Rather, they sign a signing key, which signs the public key. These chains, where each key signs the next one until the ultimate key is the one the server is using, can get long; often, there is a three or four-level deep chain.

Since certificates sign the domain, and often domains are co-hosted on the same IP, the protocol that requests the certificate includes Server Name Indication, or SNI. SNI sends the server name—unencrypted—that the client wants to connect to. Then the server responds with the appropriate certificate and proves that it owns the private key corresponding to the signed public key using cryptography.

Finally, optionally the client can engage in a cryptographic proof of its own identities, which is done through the slightly- misnamed client-side certificates. The client-side has to be initialized with a certificate and a private key. Then the client sends the certificate, and if the server trusts the certifying authority, it proves that it owns the corresponding private key.

Client-side certificates are seldom used in browsers but can sometimes be used by programs. For a program, they are usually easier secrets to deploy. Most clients, httpx included, support reading them out of files already. This makes it possible to deploy them using systems that make secrets available via files, like Kubernetes secrets or vault. It also means it is possible to manage permissions on them via normal Unix system permissions.

Usually, client-side certificates are not owned by a public CA. Rather, the server owner operates a local CA, which through some locally-determined procedure, signs certificates for clients. It can be anything from an IT person signing manually to a single sign-on (SSO) portal that auto-signs certificates.

To authenticate server-side certificates, httpx needs to have a source of client-side root CAs to accomplish secure connections. Depending on the subtleties of the ssl build process, it might or might not have access to the system certificate store.

The best way to have a good set of root CAs is to install the certifi package. This package has Mozilla-compatible certificates. Since httpx depends on certifi, it is already installed.

This is useful when making connections to the Internet; almost all sites are tested to work with Firefox and have a compatible certificate chain. If the certificate fails to validate, the error CERTIFICATE VALIDATE FAILED is thrown.

There is a lot of unfortunate advice on the Internet, including in httpx documentation, about the solution of passing in the verify=False flag, which is usually not good advice. Its usage violates the core assumption of TLS; the connection is encrypted and tamper-proof. For example, having a verify=False on the request means that any cookies or authentication credentials can now be intercepted by anyone who can modify in-stream packets, which is unfortunately common. ISPs and open access points often have operators with nefarious motivations.

Sometimes this would make sense in local testing of servers running self-signed certificates. However, even in those cases, the risk exists that code with verify=False is reused or repurposed for a different case without careful vetting.

A better alternative is to make sure that the correct certificates exist on the file system and pass the path to the verify argument via verify='/full/path/cert.pem'. At the very least, this allows you a form of trust on first use; manually get the certificate from the service, and bake it into the code. It is even better to attempt some out-of-band verification, for example, by asking someone to log in to the server and verify the certificate.

Choosing which SSL versions to allow or ciphers to allow is slightly more subtle. Again, there are a few reasons to do it. httpx is set up with good, secure defaults. However, sometimes there are overriding concerns; for example, avoiding a specific SSL version for regulatory reasons or testing purposes.

This can be done by customizing the SSL context.
import httpx
import ssl
ssl_context = ssl.create_default_context()
ssl_context.options |= ssl.OP_NO_TLSv1_3
client = httpx.Client(verify=ssl_context)

Since the client accepts an ssl.SSLContext object, you can customize it in any way the Python ssl allows. For example, ssl_context.set_ciphers("RSA+AES") only allows that specific cipher.

HTTPX also supports client-side certificates. Seldom used for user-to-service communication but sometimes used in microservice architectures, client-side certificates identify the client using the same mechanism servers identify themselves using cryptographically signed proofs. The client needs a private key and a corresponding certificate. These certificates are often signed by a private CA, which is part of the local infrastructure.

The certificate and the key can be concatenated into the same file, often called a PEM file. In that case, initializing the session to identify with it is done via
client = httpx.Client(cert="/path/to/pem/file.pem")
If the certificate and the private key are in separate files, they are given as a tuple.
client = httpx.Client(
    cert=("/path/to/client.cert", "/path/to/client.key")
)

Such key files must be carefully managed; anyone who has read access to them can pretend to be the client.

7.4 Authentication

The auth= keyword parameter to httpx.Client() configures the default authentication. Alternatively, auth= can be sent when making a request, using .get(), .post(), or one of the other request methods.

The most commonly used authentication is basic auth. The auth= argument can be a tuple (username, password). However, a better practice is to use an httpx.BasicAuth instance. This documents intent better and makes it easier to switch to other authentication forms.

It is also possible to implement your own authentication flow, which is done by subclassing httpx.Auth.

For simpler cases, it is enough to override auth_flow(), which is usually implemented as a generator. It accepts one parameter.
class MyAuth(httpx.Auth):
    def auth_flow(self, request: httpx.Request):
       ....

This method often modifies the request and then yields it, which is enough for authentication that can be done by setting headers, for example.

However, in more complicated flows, there may be more back-and-forth before authenticating to the server. For example, suppose that a server needs a user/password request against a login URL. In that case, for the response to return a cookie that can be used with subsequent calls, it is possible to yield the request to the login URL, capture the cookie, and then modify the request before sending it.

As a concrete example for a simple flow, the following is useful as an object that signs AWS requests with the V4 signing protocol.

The first thing you do is make the URL canonical. Canonicalization is the first step in many signing protocols. Since often higher levels of the software have already parsed the content by the time the signature checker looks at it, you convert the signed data into a standard form that uniquely corresponds to the parsed version.

The most subtle part is the query part. You parse it and re-encode it using the built-in urrlib.parse library.
from urllib.parse import parse_qs, urlencode
def canonical_query_string(query):
    if not query:
        return ""
    parsed = parse_qs(url.query, keep_blank_values=True)
    return "?" + urlencode(parsed, doseq=True)
You use this function in the URL canonicalization function.
def to_canonical_url(raw_url):
    url = urlparse(raw_url)
    path = url.path or "/"
    query = canonical_query_string(url.query)
    return (
        url.scheme +
        "://" +
        url.netloc +
        path +
        query
    )
Let’s make sure the path is canonical. You translate an empty path to /.
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
import httpx
class AWSv4Auth(httpx.Auth):
    def __init__(self, aws_session, region, service):
        self.aws_session = aws_session
        self.region = region
        self.service = service
    def sign(self, request):
        aws_request = AWSRequest(
            method=request.method.upper(),
            url=to_canonical_url(request.url),
            data=request.body,
        )
        credentials = aws_session.get_credentials()
        SigV4Auth(credentials, service, region).add_auth(request)
        request.headers.update(**aws_request.headers.items())
        yield request

Let’s create a class that uses botocore, the AWS Python SDK, to sign a request. You do that by faking an AWSRequest object with the canonical URL and the same data, asking for a signature, and grabbing the headers from the faked request.

Use this as follows.
client = httpx.Client(
    auth=AWSv4Auth(
        aws_session=botocore.session.get_session(),
        region='us-east-1',
        service='es',
    ),
)

In this case, the region and the service are part of the auth object. A more sophisticated approach would be to infer the region and service from the request URL and use that, which is beyond the scope of this example.

This should give a good idea about how custom authentication schemes work; write code that modifies the request to have the right authentication headers and then pass it as the auth parameter in the client.

7.5 Async client

HTTP calls can sometimes be slow. This might be because of network latency or server latency. Regardless, this means that doing several calls might take a long time.

As a stark example, httpbin.org has an endpoint, /delay, which waits a certain number of seconds. In this example, the code accesses it twice for different parameters.
import httpx, datetime
sync_client = httpx.Client()
before = datetime.datetime.now()
r1 = sync_client.get("https://httpbin.org/delay/3?param=sync-first")
r2 = sync_client.get("https://httpbin.org/delay/3?param=sync-second")
delta = datetime.datetime.now() - before
print(delta // datetime.timedelta(seconds=1))
results1 = r1.json()
results2 = r2.json()
print(results1["args"]["param"], results2["args"]["param"])
The processing, in this case, returns the parameter. In a real web API, the results would be more interesting. In this case, the interesting part is the time.
6
sync-first sync-second

This took six seconds since each call took three seconds.

One way to improve this is to use asynchronous network calls. The topic of asynchronicity in Python, in general, is beyond the scope of this chapter. It is important to mention that httpx supports async with a parallel API to the classic (synchronous) API.
import httpx, datetime
import asyncio
async def async_calls():
    before = datetime.datetime.now()
    async with httpx.AsyncClient() as async_client:
        fut1 = async_client.get("https://httpbin.org/delay/3?param=async-first")
        fut2 = async_client.get("https://httpbin.org/delay/3?param=async-second")
        responses = await asyncio.gather(fut1, fut2)
        delta = datetime.datetime.now() - before
    r1, r2 = responses
    results1 = r1.json()
    results2 = r2.json()
    print(delta // datetime.timedelta(seconds=1))
    print(results1["args"]["param"], results2["args"]["param"])
asyncio.run(async_calls())
The results are equivalent but faster.
3
async-first async-second
Note that just using await on each line would have still taken six seconds. To take advantage of async, the code separated launching the calls and waiting for the responses.
fut1 = async_client.get("https://httpbin.org/delay/3?param=async-first")
fut2 = async_client.get("https://httpbin.org/delay/3?param=async-second")
responses = await asyncio.gather(fut1, fut2)

Using await.gather() on both awaitable results allows both calls to start without delay.

7.6 Summary

Saying HTTP is popular feels like an understatement. It is everywhere—from user-accessible services through web-facing APIs and even internally in many microservice architectures.

httpx helps with all of these. It can be part of monitoring a user-accessible service for health, it can access APIs in programs to analyze the data, and it can be used to debug internal services to understand their state.

It is a powerful library with many ways to fine-tune it to send the right requests and get the right functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset