Chapter 3. Headers

So far, we’ve seen various presentations of the HTTP format, and examined the idea that there is a lot more information being transferred in web requests and responses than what appears in the body of the response. The body is certainly the most important bit, and often is the meatiest, but the headers provide key pieces of information for both requests and responses, which allow the client and the server to communicate effectively. If you think of the body of the request as a birthday card with a check inside it, then the headers are the address, postmark, and perhaps the “do not open until…” instruction on the outside (see Figure 3-1).

pwsv 0301
Figure 3-1. Envelope with stamp, address, and postmark

This additional information gets the body data to where it needs to go and instructs the target on what to do with it when it gets there.

Request and Response Headers

Many of the headers you see in HTTP make sense in both requests and responses. Others might be specific to either a request or a response. Here’s a sample set of real request and response headers from when I request my own site from a browser (I’m using Chrome).

Request headers:

GET / HTTP/1.1
Host: www.lornajane.net
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

Response headers:

HTTP/1.1 200 OK
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.6
X-Pingback: http://www.lornajane.net/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Type: text/html; charset=UTF-8
Content-Length: 7897
Accept-Ranges: bytes
Date: Sat, 11 Jul 2015 08:22:57 GMT
X-Varnish: 600223060
Age: 0
Via: 1.1 varnish
Connection: keep-alive

Headers can be related to the request, the response, or the “entity,” which is the body of either a request or a response. Some examples might be:

Request Headers User-Agent, Accept, Authorization, and Cookie
Response Headers Set-Cookie
Entity Headers Content-Type and Content-Length

This chapter looks in more detail at the headers you are likely to see when working with web services.

Identify Clients with User-Agent

The User-Agent header gives information about the client making the HTTP request and usually includes information about the software client. Take a look at the header here:

User-Agent Mozilla/5.0 (Linux; U; Android 2.3.4; en-gb; SonyEricssonSK17i Build/4.0.2.A.0.62) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1

What device do you think made this request? You would probably guess that it was my Sony Ericsson Android phone…and perhaps you would be right. Or perhaps I used a curl command:

curl -H "User-Agent: Mozilla/5.0 (Linux; U; Android 2.3.4; en-gb; SonyEricssonSK17i Build/4.0.2.A.0.62) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" http://requestb.in/example

We simply have no way of knowing, when a request is received with a User-Agent like this, if it really came from an Android phone, or if it came from something else pretending to be an Android phone. This information can be used to customize the response we send—after all, if someone wants to pretend to be a tiny Android phone, then it is reasonable to respond with the content that would normally be sent to this phone. It does mean, however, that the User-Agent header cannot be relied upon for anything more important, such as setting a custom header and using it as a means of authenticating users. Just like any other incoming data, it is wide open to abuse and must be treated with suspicion.

In PHP, it is possible both to parse and to send the User-Agent header, as suits the task at hand. Here’s an example of sending the header using streams:

<?php

$url = 'http://localhost/book/user-agent.php';
$options = array(
    "http" => array(
        "header"  => "User-Agent: Advanced HTTP Magic Client"
    )
);

$page = file_get_contents($url, false , stream_context_create($options));
echo $page;

We can set any arbitrary headers we desire when making requests, all using the same approach. Similarly, headers can be retrieved using PHP by implementing the same approach throughout. The data of interest here can all be found in $_SERVER, and in this case it is possible to inspect $_SERVER["HTTP_USER_AGENT"] to see what the User-Agent header was set to.

To illustrate, here’s a simple script:

<?php

echo "This request made by: "
    . filter_var($_SERVER['HTTP_USER_AGENT'], FILTER_SANITIZE_STRING);

It’s common when developing content for the mobile web to use headers such as User-Agent in combination with WURFL to detect what capabilities the consuming device has, and adapt the content accordingly. With APIs, however, it is better to expect the clients to use other headers so they can take responsibility for requesting the correct content types, rather than allowing the decision to be made centrally.

Headers for Content Negotiation

Commonly, the Content-Type header is used to describe what format the data being delivered in the body of a request or a response is in; this allows the target to understand how to decode this content. Its sister header, Accept, allows the client to indicate what kind of content is acceptable, which is another way of allowing the client to specify what kind of content it actually knows how to handle. As seen in the earlier example showing headers, here’s the Accept header Google Chrome usually sends:

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,
*/*;q=0.8

To read an Accept header, consider each of the comma-separated values as an individual entity. This client has stated a preference for (in order):

  • text/html

  • application/xhtml+xml

  • image/webp

  • application/xml

  • */*

This means that if any of these formats are supplied, the client will understand our meaning. There are two entries in the list that include some additional information: the q value. This is an indication of how much a particular option is preferred, where the default value is q=1.

Here, Chrome claims to be able to handle a content type of */*. The asterisks are wildcards, meaning it thinks it can handle any format that could possibly exist—which seems unlikely. If an imaginary format is implemented that both our client and server understand, for example, Chrome won’t know how to parse it, so */* is misleading.

Using the Accept and Content-Type headers together to describe what can be understood by the client, and what was actually sent, is called content negotiation. Using the headers to negotiate the usable formats means that meta-information is not tangled up with actual data as it would be when sending both kinds of parameters with the body or URL of the request. Including the headers is generally a better approach.

We can negotiate more than just content, too. The earlier example contained these lines:

Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

These headers show other kinds of negotiation, such as declaring what encoding the client supports, which languages are preferred, and which character sets can be used. This enables decisions to be made about how to format the response in various ways, and how to determine which formats are appropriate for the consuming device.

Parsing an Accept Header

Let’s start by looking at how to parse an Accept header correctly. All Accept headers have a comma-separated list of values, and some include a q value that indicates their level of preference. If the q value isn’t included for an entry, it can be assumed that q=1 for that entry. Using the Accept header from my browser again, I can parse it by taking all the segments, working out their preferences, and then sorting them appropriately. Here’s an example function that returns an array of supported formats in order of preference:

<?php

function parseAcceptHeader() {
    $hdr = $_SERVER['HTTP_ACCEPT'];
    $accept = array();
    foreach (preg_split('/s*,s*/', $hdr) as $i => $term) {
        $o = new stdclass;
        $o->pos = $i;
        if (preg_match(",^(S+)s*;s*(?:q|level)=([0-9.]+),i", $term, $M)) {
            $o->type = $M[1];
            $o->q = (double)$M[2];
        } else {
            $o->type = $term;
            $o->q = 1;
        }
        $accept[] = $o;
    }
    usort($accept, function ($a, $b) {
        /* first tier: highest q factor wins */
        $diff = $b->q - $a->q;
        if ($diff > 0) {
            $diff = 1;
        } else if ($diff < 0) {
            $diff = -1;
        } else {
            /* tie-breaker: first listed item wins */
            $diff = $a->pos - $b->pos;
        }
        return $diff;
    });
    $accept_data = array();
    foreach ($accept as $a) {
        $accept_data[$a->type] = $a->type;
    }
    return $accept_data;
}
Note

The headers sent by your browser may differ slightly and result in different output when you try the previous code snippet.

When using the Accept header sent by my browser, I see the following output:

Array
(
    [text/html] => text/html
    [application/xhtml+xml] => application/xhtml+xml
    [image/webp] => image/webp
    [application/xml] => application/xml
    [*/*] => */*
)

We can use this information to work out which format it would be best to send the data back in. For example, here’s a simple script that calls the parseAcceptHeader() function, then works through the formats to determine which it can support, and sends that information:

<?php

require "accept.php";

$data = ["greeting" => "hello", "name" => "Lorna"];

$accepted_formats = parseAcceptHeader();
$supported_formats = ["application/json", "text/html"];
foreach($accepted_formats as $format) {
    if(in_array($format, $supported_formats)) {
        // yay, use this format
        break;
    }
}

switch($format) {
    case "application/json":
        header("Content-Type: application/json");
        $output = json_encode($data);
        break;
    case "text/html":
    default:
        $output = "<p>" . implode(', ', $data) . "</p>";
        break;
}

echo $output;

There are many, many ways to parse the Accept header (and the same techniques apply to the Accept-Language, Accept-Encoding, and Accept-Charset headers), but it is vital to do so correctly. The importance of Accept header parsing can be seen in Chris Shiflett’s blog post, The Accept Header; the parseAcceptHeader() example shown previously came mostly from the comments on this post. You might use this approach, an existing library such as the PHP mimeparse port, a solution you build yourself, or one offered by your framework. Whichever you choose, make sure that it parses these headers correctly, rather than using a string match or something similar.

Demonstrating Accept Headers with cURL

Using cURL from the command line, here are some examples of how to call exactly the same URL by setting different Accept headers and seeing different responses:

curl http://localhost/book/hello.php
hello, Lorna
curl -H "Accept: application/json" http://localhost/book/hello.php
{"greeting":"hello","name":"Lorna"}
curl -H "Accept: text/html;q=0.5,application/json"
http://localhost/book/hello.php
{"greeting":"hello","name":"Lorna"}

To make these requests from PHP rather than from cURL, it is possible to simply set the desired headers as the request is made. Here’s an example that uses PHP’s cURL extension to make the same request as the previous example:

<?php

$url = "http://localhost/book/hello.php";

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, array(
    "Accept: text/html;q=0.5,application/json",
));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
echo $response;
curl_close($ch);

The number of headers you need to support in your application will vary. It is common and recommended to offer various content types such as JSON, XML, or even plain text. The selection of supported encodings, languages, and character sets will depend entirely on your application and users’ needs. If you do introduce support for variable content types, however, this is the best way to do it.

Securing Requests with the Authorization Header

Headers can provide information that allows an application to identify users. Again, keeping this type of information separate from the application data makes things simpler and, often, more secure. The key thing to remember when working on user security for APIs is that everything you already know about how to secure a website applies to web services. A common header that has been seen earlier in this book is the Authorization header. This can be used with a variety of different techniques for authenticating users, all of which will be familiar to web developers.

Rather than the Authorization header, some applications may have alternative approaches including using cookies and sessions to record a user’s information after he has supplied credentials to a login endpoint, for example. Others will implement solutions of their own making, and many of these will use a simple API key approach. In this approach, the user acquires a key, often via a web interface or other means, that she can use when accessing the API. A major advantage of this approach is that the keys can be deleted by either party, or can expire, removing the likelihood that they can be used with malicious intent. This is nicer than passing actual user credentials, as the details used can be changed. Sometimes API keys will be passed simply as a query parameter, but the Authorization header would also be an appropriate place for such information.

HTTP Basic Authentication

The simplest approach to authorization is HTTP Basic authentication (for more details, see the RFC), which requires the user to supply a username and password to identify himself. Since this approach is so widespread, it is well supported in most platforms, both client and server. Do beware, though, that these credentials can easily be inspected and reused maliciously, so this approach is appropriate only on trusted networks or over SSL.

When the user tries to access a protected resource using basic authentication, he will receive a 401 status code in response, which includes a WWW-Authenticate header with the value Basic followed by a realm for which to authenticate. As users, we see an unstyled pop up for username and password in our browser; this is basic authentication. When we supply the credentials, the client will combine them in the format username:password and Base64 encode the result before including it in the Authorization header of the request it makes.

The mechanism of the basic authentication is this:

  1. Arrange the username and password into the format username:password.

  2. Base64 encode the result.

  3. Send it in the header, like this: Authorization: Basic base64-encoded string.

  4. Since tokens are sent in plain text, HTTPS should be used throughout.

We can either follow the steps here and manually create the correct header to send, or we can use the built-in features of our toolchain. Here’s PHP’s cURL extension making a request to a page protected by basic authentication:

<?php

$url = "http://localhost/book/basic-auth.php";

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ) ;
curl_setopt($ch, CURLOPT_USERPWD, "user:pass");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
echo $response;
curl_close($ch);

In PHP, these details can be found on the $_SERVER superglobal. When basic authentication is in use, the username and password supplied by the user can be found in $_SERVER["PHP_AUTH_USER"] and $_SERVER["PHP_AUTH_PW"], respectively. When a request is made without credentials, or with invalid credentials, a 401 Unauthorized status code can be sent to tell the client why the server is not sending him the content requested.

HTTP Digest Authentication

Similar to basic authentication, but rather more secure, is HTTP Digest authentication (the Wikipedia page includes a great explanation with examples). This process combines the username and password with the realm, a client nonce (a nonce is a cryptographic term meaning “number used once”), a server nonce, and other information, and hashes them before sending. It may sound complicated to implement, but this standard is well understood and widely implemented by both clients and servers.

Very little changes when working with digest authentication when compared to the example of basic authentication just shown; the main things to look out for are:

  • The CURLOPT_HTTPAUTH option should be set to CURLAUTH_DIGEST.

  • On the receiving end, you can find the user data in $_SERVER[PHP_AUTH_DIGEST], which will need decoding according to the type of digest authentication you are using.

Digest authentication is preferred over basic authentication unless the connection is over SSL. If you want to work with digest auth then there’s a good resource on Sitepoint.

OAuth

An even better solution has emerged in the last few years: OAuth (version 2 is much better than version 1). OAuth arises as a solution to a very specific and common problem: how do we allow a third party (such as an external application on a mobile device) to have secure access to a user’s data? This is solved by establishing a three-way relationship, so that requests coming to the providing API from the third-party consumer have access to the user’s data, but do not impersonate that user. For every combination of application and user, the external application will send the user to the providing API to confirm that she wants access to be granted. Once the relationship is established, the user can, at any time, visit the providing API (with which she originally had the relationship of trust) to revoke that access. Newer versions of OAuth are simple to implement but, again, should always be used over SSL.

In OAuth terminology, we name the client the “consumer” and the server the “provider.” The consumer could be a app on your smartphone for example, and the provider would then be the system where you already have an account such as GitHub. Features such as “Sign in with GitHub” use this approach.

The basic process looks something like this:

  1. The user chooses to sign in with GitHub, or link their GitHub account to a third-party client.

  2. The client forwards the user to the provider’s page to sign in and give permission for this client to access the user’s data.

  3. The user does sign in and confirm, and arrives back in the app.

  4. The client can then get an access token from the provider.

Once we have the access token, we send this in the Authorization header for every request, something like:

Authorization: Bearer 852990de317

This approach is elegant in two ways:

  • The identity information is not sent as part of the body of the request. By sending this information in the header, we separate the two concerns.

  • By using an access token rather than the user’s actual credentials, we give the ability for that access token to expire or be revoked in the future. This allows users to safely grant access to even unknown applications and know that they can always remove that access in the future, even if that application doesn’t offer the option to remove creds (or if the user doesn’t trust it to), without needing to change the user’s credentials and therefore break all of the integrations that use this account.

This solution is very widely used in APIs and is recommended if you need to authenticate users in your own applications.

Hopefully this serves to cover the overall concept of OAuth and how to use an access token in your own application. For a more complete explanation, the book Getting Started with OAuth 2.0 (O’Reilly) is an excellent reference.

Caching Headers

Just like for other web requests, getting caching right can help enormously when an API server needs to handle a lot of traffic. Requests that perform actions cannot be cached, as they must be processed by the server each time, but GET requests certainly can be, in the right situation. Caching can either be done by the server, which makes a decision about whether to serve a previous version of a resource, or by clients storing the result of previous requests and allowing us to compare versions.

Giving version information along with a resource is a key ingredient in client-side caching, and also links with the nonatomic update procedures in REST as we mention in “Update a Resource with PUT”. When returning a resource, either an ETag (usually a hash of the representation itself) or a Last-Modified (the date this record last changed) is included with the response. Clients that understand these systems can then store these responses locally, and when making the same request again at a later point, they can tell us which version of a resource they already have. This is very similar to the way that web browsers cache assets such as stylesheets and images.

When a resource is served with an ETag header, this header will contain some textual representation of the resource, perhaps a hash of the resource or a combination of file size and timestamp. When requesting the resource at a later date, the client can send an If-None-Match header with the value of the ETag in it. If the current version of the resource has a nonmatching ETag, then the new resource will be returned with its ETag header. However if the ETag values do match, the server can simply respond with a 304 “Not Modified” status code and an empty body, indicating to the client that it can use the version it already has without transferring the new version. This can help reduce server load and network bandwidth.

In exactly the same way, a resource that is sent with a Last-Modified header can be stored with that header information by the client. A subsequent request would then have an If-Modified-Since header, with the current Last-Modified value in it. The server compares the timestamp it receives with the last update to the resource, and again either serves the resource with new metadata, or with the much smaller 304 response.

Custom Headers

As with almost every aspect of HTTP, the headers that can be used aren’t set in stone. It is possible to invent new headers if there’s more information to convey for which there isn’t a header. Headers that aren’t “official” can always be used (sometimes they are prefixed with X- but they don’t have to be), so you can make use of this in your own applications if you wish.

A good example, often seen on the Web, is when a tool such as Varnish has been involved in serving a response, and it adds its own headers. I have Varnish installed in front of my own site, and when I request it, I see:

HTTP/1.1 200 OK
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.6
X-Pingback: http://www.lornajane.net/xmlrpc.php
Content-Type: text/html; charset=UTF-8
Date: Sat, 11 Jul 2015 08:57:32 GMT
X-Varnish: 600227065 600227033
Age: 43
Via: 1.1 varnish
Connection: keep-alive

That additional X-Varnish header shows me that Varnish served the request. It isn’t an official header, but these X-* headers are used to denote all kinds of things in APIs and on the Web. A great example comes from GitHub. Here’s what happens when I make a request to fetch a list of the repositories associated with my user account:

HTTP/1.1 200 OK
Server: GitHub.com
Date: Sat, 11 Jul 2015 08:59:01 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 157631
Status: 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 59
ETag: "8976d7fc7aa861a8581108e59ae76506"
X-GitHub-Media-Type: github.v3
X-GitHub-Request-Id: 5EC19EE1:61C0:10B4CDB:55A0DAD5
X-Content-Type-Options: nosniff
X-Served-By: 13d09b732ebe76f892093130dc088652

There are a few custom headers in this example but the X-RateLimit-* headers are particularly worth noting; they check whether too many requests are being made. Using custom headers like these, any additional data can be transferred between client and server that isn’t part of the body data, which means all parties can stay “on the same page” with the data exchange.

Headers are particularly important when working with APIs as there is often separation between the data and the metadata. Not all APIs are designed that way, but look out for some examples in particular in Chapter 8.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset