Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 13. Image Delivery

Colin Bendell

Optimizing image delivery is just as important as using the right capabilities for each format and leveraging the best practices of the browser. In this chapter we will explore the practical aspects of leveraging all the best practices and the impact on operations.

Image Dimensions

As we have now discussed multiple times in Chapters 9, 11, and 12, reducing image dimensions can improve not only network performance but also memory performance. Small images for small devices on slow networks or low memory is better than using one large image for all situations—desktop and mobile alike.

In the section “Selecting the Right Image Width” we discussed allocating buckets for different viewports. Looking at a sample of 1 million JPEG images we can examine the impact of image dimensions on file size. Figures 13-1 and 13-2 compare images at different breakpoints (assuming at least a 2:1 ratio) again by sizes—broken out by 25th, 50th, and 75th percentiles. Of course, every image has its own distribution and this should be used for illustration only.

This makes sense: the larger the dimensions, the larger the file, and the longer the image takes to download. On slower links this will also impact the performance of the page. For the best performance for responsive images or Client Hints, we should be making many different dimensions available for our products.

The proposed breakpoints are a good rule of thumb and a good place to start as a default. Of course, every image might have a different variation based on its complexity. Jason Grigsby has proposed applying the performance budget to image delivery. To do this you set a goal of 16 packets (~24 KB) for each breakpoint. In this way you can reduce the number of breakpoints per image and better optimize your cache footprint.

Of course, every image could have its own set of breakpoints. This technique is most ideal for entry pages, campaign sites, and other parts of your app or website that you can examine with high intensity.

Image Format Selection: Accept, WebP, JPEG 2000, and JPEG XR

As we have already discussed, there are many competing image formats available. Generally for lossless compression, we can make our selection based on the features desired and be comfortable knowing that 99% of all clients have support for GIF or PNG.

The problem is lossy formats: JPEG is virtually ubiquitous. In contrast, the advanced formats—WebP, JPEG 2000, JPEG XR—are fragmented in support across platforms. One solution is to utilize responsive images’ <picture> element and duplicate your HTML to specify the same image resolutions but with different formats. It is like buying one of every size of light bulb, and bringing them all home, just to figure out which size fits your particular lamp. This is not a scalable solution.

<picture>
    <source type="image/webp"
            srcset="/fido_in_dc_100.webp 100w,
                 /fido_in_dc_400.webp 400w,
                 /fido_in_dc_800.webp 800w,
                 /fido_in_dc_1000.webp 1000w,
                 /fido_in_dc_1200.webp 1200w,
                 /fido_in_dc_1400.webp 1400w" />
    <source type="image/vnd.ms-photo"
            srcset="/fido_in_dc_100.jxr 100w,
                 /fido_in_dc_400.jxr 400w,
                 /fido_in_dc_800.jxr 800w,
                 /fido_in_dc_1000.jxr 1000w,
                 /fido_in_dc_1200.jxr 1200w,
                 /fido_in_dc_1400.jxr 1400w" />
    <source type="image/jp2"
            srcset="/fido_in_dc_100.jp2 100w,
                 /fido_in_dc_400.jp2 400w,
                 /fido_in_dc_800.jp2 800w,
                 /fido_in_dc_1000.jp2 1000w,
                 /fido_in_dc_1200.jp2 1200w,
                 /fido_in_dc_1400.jp2 1400w" />
    <img src="/fido_in_dc_100.jpg"
         srcset="/fido_in_dc_100.jpg 100w,
                 /fido_in_dc_400.jpg 400w,
                 /fido_in_dc_800.jpg 800w,
                 /fido_in_dc_1000.jpg 1000w,
                 /fido_in_dc_1200.jpg 1200w,
                 /fido_in_dc_1400.jpg 1400w"
         sizes="(min-width: 500px) 33.3vw, 100vw"
    />
</picture>

In the same vein as Client Hints, we can negotiate and detect the formats supported by the browser—at least, we should be able to using the Accept: request header.

In the early days of HTTP/1.1 the Accept header was introduced as a mechanism for content negotiation. It was envisioned as a way to tell the server what kinds of media and MIME types the browser would accept. It was intended to complement the other Accept headers, such as Accept-Language, Accept-Charset, and Accept-Encoding, which focused on negotiating human languages, character encodings, and compression, respectively.

While the latter three Accept headers are still important for proper interpretation of the page, unfortunately the Accept header has become largely irrelevant with most modern browsers. Most now simply transmit Accept: */* to avoid misinterpretations by servers. Also, the sheer sophistication of a modern browser means it is capable of handling a very long list of media types. To avoid ridiculously long and verbose Accept lines, most servers all but ignore the Accept header and browsers have simplified it to the generic wildcard.

In an odd way, */* does make sense. If there is consensus in the web development community, then there is very little need to use a different Accept value. The irrelevance of the Accept header has created an opportunity for browsers to communicate new enhancements. In this way, Chrome uses Accept for situations where capabilities are diverse. For example, Android and Chrome will send Accept: image/webp, */* indicating that in addition to the standard content types, this device can also render WebP images.

Implementing the detection is then pretty straightforward. For example, the following offers a quick rewrite rule to internally rewrite *.jpg to *.webp if the requesting client indicates support:

<?php
if (strstr($_SERVER['HTTP_ACCEPT'], 'image/webp') !== false) {
    # transform image to webp
    $img->setImageFormat('webp');
}
?>

As we cautioned in Table 5-1, Accept: image/webp can be used as shorthand to mean WebP extended or WebP animated. However, there is a small user base (Android 4.0–4.2) where only WebP standard is supported. Likewise, you should be concerned about specific Chrome versions that support animated WebP (Chrome 32+). If in doubt, consult your own user logs to determine how much traffic is from older Android and Chrome browsers and would be impacted if you delivered unsupported WebP advanced or animation formats.

You can detect JPEG XR in very much the same way by looking for Accept: image/jxr. This applies for IE 8+ and Microsoft Edge:

<?php
if (strstr($_SERVER['HTTP_ACCEPT'], 'image/jxr') !== false) {
    # transform image to jpeg xr
    $img->setImageFormat('jxr');
}
?>

What about JPEG 2000? Alas, it doesn’t use the Accept header. This means we have to resort to device characteristics in order to select the best format based on the client:

<?php
    $browser = $wurflObj->capabilities['mobile_browser'];
    $browser_ver = $wurflObj->capabilities['mobile_browser_version'];
    if ((strstr($browser, 'Safari') != false) && $brwoser_ver >= 6 {
        $picture->setImageFormat('jp2');
    }
    if ((strstr($browser, 'Safari') != false) && $brwoser_ver >= 6 {
        $picture->setImageFormat('jp2');
    }
?>

If you do decide to use device detection to leverage specific image formats (or specific features in other formats), you can refer to Appendix A for a list of supported operating systems and browsers for each format.

Finally, device detection can be accomplished with client-side JavaScript, such as Modernizr, which detects WebP (lossy, lossless, alpha, and animated variants), JPEG 2000, and JPEG XR. This is a great option, especially if the images on your site are loaded lazily, or use another JavaScript harness to load images. The downside is that this creates a race condition and the detection only happens after the JavaScript has loaded. The result is that either your images are loaded after the JavaScript execution, or the first collection of images is downloaded as JPEG (or another unoptimized format) until the libraries are loaded.

Image Quality

So far we have explored opportunities to select the right-sized image and the right format of image. The last dimension we can leverage to optimize the delivery of an image is quality. This is a tricky subject because the very term quality is used as a pejorative in the creative process. To reduce the quality of an image is to make it inferior. We must resist this association. By increasing the lossy compression (decreasing quality) we can help improve user performance in many situations. The tradeoff is balancing the comprehensive user experience (are users able to interact with the page and accomplish their goals?) with localized image experience.

Quality and Image Byte Size

There is a general understanding that adjusting the quality level in lossy formats (JPEG, JPEG 2000, JPEG XP, WebP) results in a commensurate reduction in bytes. The more compression applied, the fewer bytes there will be. There is also a point of diminishing returns. Setting compression to 100 doesn’t result in a pristine lossless image, just a large image.

Figure 13-4 shows the quality graph for the different image formats compared to the relative byte savings. This does not compare the relative sizes between formats but rather the change in bytes within the format. Changing between formats will yield additional relative byte savings.

This quality graph is based on a sample set of 1,000 product detail images and is fairly representative of a typical quality scale. This also highlights the variances between different libraries. It also emphasizes that quality does not mean percentage. It is tempting to conflate quality index to image quality or even file size.

As you can see, regardless of format, each encoding library can impact the byte size of an image differently. Specifically, there is a rapid reduction in byte size until we hit an index of around 35. Also, we can see the distortion of the highest index values on the scale. If we reset the scale and focus on an index of 90 through 35, we adjust our expectations (see Figure 13-5).

Nearly universally we can see that reducing the quality index can quickly reduce file sizes. Most follow a similar curve shape, but there are still noticeable differences. What we can conclude from this is that we should expect an additional 20% byte savings by moving from quality 90 to quality 80, and another 20% by moving to quality 70. From there the gains become smaller but are still impressive.

Quality Index and SSIM

But does the quality index of one encoder equal the quality index of another? Can we just assume that all quality indexes are the same? Using Structural Similarity (SSIM) calculations, we can compare the different encoding libraries and their effects. Can we assume that selecting index 80 in one library is the same as quality 80 in another? Or across formats?

Using the same dataset, we can compare the SSIM values at each index value. Using the 90th percentile value (conservative) we arrive at the curves shown in Figure 13-6. It’s important to emphasize that this is a conservative view, and an individual image could well get a lower SSIM value when run through the different quality indexes. The purpose of this illustration is to provide general guidance and conclusions, so a 90th percentile was selected.

Clearly quality is not a consistent metric across different libraries. Each encoding library impacts the visual perception differently at the same quality index. If you set libjpeg-turbo to quality 80, you would expect the same SSIM of a MozJPEG set at quality 65.

Just as before, the top and bottom indexes heavily skew the graph. Zooming in on index 90 through 40 yields the charts shown in Figures 13-7 and 13-8.

One thing this does not take into account is DPR. There is anecdotal evidence that suggests that the perception of higher SSIM values goes down based on pixel density as well as form factor; that is, humans can accept a higher SSIM value when it is on a smartphone versus on a desktop. Research is early and inconclusive on the impact of visual perception based on display form factor.

How do we select a quality index and apply it across the different encoders and expect the same results? Fortunately for you, I have run the regressions and derived the charts in Figures 13-9 and 13-10 to help with our conversions.

Is this a chapter on image quality or image delivery? Well, both. They are tightly linked. In order for us to select the best image quality to reduce bytes, we should also keep in mind the effective equivalent in the other formats.

So, we’ve reached two major conclusions about how to better deliver images:

Focus on the desired quality index for your images to maximize SSIM and reduce file sizes.
Layer image format and responsive images after the quality index adjustment.

Selecting SSIM and Quality Use Cases

You can take this one step further and create use cases for quality:

High: 0.01 SSIM
Medium: 0.03 SSIM
Low:  0.05 SSIM

In this way you could intentionally distort the image to maximize the user experience. For example, you could use the network Client Hints to inform the quality use case. Alternatively, you could look at the HTTP socket performance (packet RTT) or instrument latency detection with service workers. By doing so, you could adjust the user experience based on the hostility of the network conditions.

This is very similar to what was suggested in Chapter 12. In fact, both can be done at the same time for maximal benefit: adjust both the image dimensions and then adjust the quality. There are many possibilities.

Of course, there is always a point of diminishing returns. Applying these use cases to a 800-byte image has little value to the user experience. However, if the image is 100 KB, then of course you would want to apply this algorithm. Remember: every packet counts—especially on poor network situations.

The bottom line is this: if you can gain a full packet in savings it is probably worth adjusting image quality. Accordingly, we might augment the previous chart as follows:

High: 0.01 SSIM
Medium: 0.03 SSIM and >4,500 Byte savings (~3 packets)
Low:  0.05 SSIM and >12,000 Byte savings (~8 packets)

Creating Consensus on Quality Index

One final word about quality. As I mentioned, this topic is often very emotive—especially among those in your organization who are the custodians of brand (i.e., your marketing teams). Their job is to ensure that the public’s opinion of your brand is positive. You, in contrast, are responsible for ensuring that the site or app works for the highest number of people. These are two sides of the same coin.

In order to bring marketing and creative teams onboard with adjusting the quality index of your images, it is useful to show instead of explain. For example, you can gain consensus by selecting a set of images and running them through the different quality indexes, as shown in Figure 13-11.

Make sure you are consistent: if you are using MozJPEG as your JPEG engine, then use this to initiate the conversation—don’t use libjpeg-turbo.

Consider the Contractual Obligations of Branding When Reducing Image Quality

Also recognize that there are likely situations where you don’t want to reduce the quality index because of marketing or legal obligations.

Quality Index Conclusion

When applying changes to the quality index, follow these best practices (see Figure 13-12):

Reduce the quality index based on SSIM values instead of a fixed setting.
Apply the equivalent quality index to other formats.
Add network awareness to select a lower-quality index.
Use the Client Hint Save-Data: on to select a lower-quality index.

Achieving Cache Offload: Vary and Cache-Control

Selecting different images based on server side logic solves one problem, but can introduce new problems to downstream systems. Ultimately, we need to ensure that both the client and any middle-boxes—such as transparent proxies, surrogate caches, and Content Delivery Network (CDNs)—also follow the same selection logic. Failing to account for the ecosystem can result in clients re-downloading the same image multiple times, or worse: it could result in the user getting the wrong image. Additionally, we also need to ensure that even search-engine bots understand the logic so that SEO isn’t impacted because of “clocking” penalization. If we aren’t careful, changing delivery logic can have many unintended side effects downstream.

Fortunately, the authors of the HTTP spec considered this situation. The Vary header is intended to express how the content would vary from one request to another. There is also an enhancement specification proposed to help provide increased resolution with the Key header. The challenge, of course, is to ensure that all the current consumers (clients and middle boxes) also respect these headers.

Informing the Client with Vary

The first objective is to inform the end consumer how the content may change with different requests. For example, if the request were made by a mobile versus a desktop user, would the content change? If the user changed the orientation of the display to have a different Viewport-Width, would the image change?

To answer these questions, we would use the Vary header. The value of the header is not the value used, but the HTTP header used as an input. Some of the values you could use include Accept-Encoding (when Gzip is used), User-Agent, and Viewport-Width. We will discuss the implications of highly variable inputs such as User-Agent in the next section. For SEO and browsers, the Vary header helps properly inform the client that the content could change if different inputs are used.

If we used DPR: to select a different image, we would expect Vary: DPR in the response:

GET /broccoli.jpg
DPR: 1.5

...

HTTP/1.1 200 OK
Content-Type: image/jpeg
Vary: DPR

For changes in image dimensions using Client Hints we could use the following values: Viewport-Width, Width, DPR, Downlink, or Save-Data. These can also be combined; for example, if you are using both DPR and Width in your calculation you would emit:

Vary: Width, DPR

Changes in format are a bit more complex. For WebP and JPEG XR it is sufficient to use Vary: Accept. However, for JPEG 2000 (Safari/iOS) we have to use device detection and therefore we should send Vary: User-Agent.

Internet Explorer (all versions) adds an unfortunate wrinkle: Vary will cause a revalidation on every request instead of caching. This is because IE does not cache the requesting headers and so cannot use them to compute the internal cache key. As a result, each load of the image will, at the very least, prompt a new request with an If-Modified-Since (or If-None-Match) to revalidate. The workaround for IE is to drop the Vary header and mark the content as private with a Cache-Control header.

For Internet Explorer users only:

GET /broccoli.jpg
User-Agent: ...

HTTP/1.1 200 OK
Content-Type: image/jpeg
Cache-Control: private

Changes based on network conditions are likewise a challenge since the variation is not based on HTTP headers but on network conditions. If we have access to the Downlink Client Hint header, that would work well in the response. Otherwise, we should treat the variation much like we do for Internet Explorer and use Cache-Control: private to ensure that middle boxes don’t give the wrong experience to the client.

Middle Boxes, Proxies with Cache-Control (and TLS)

There are many middle boxes deployed throughout the Internet—in hotels, coffee shops, ISPs, and mobile operators. Their goal is to provide an additional layer of caching. Of course, these automatic middle boxes have to be conservative. They will only cache content that is marked as cacheable just as an end user would.

However, it would be problematic if they were to cache a WebP response and send it to a Safari user, or a smartphone response and send it to a desktop. It would be one thing to assume that the proxies and middle boxes all honor the Vary header as the browser does. Unfortunately, they don’t.

Worse yet, many middle boxes controlled by network operators often try to apply their own image optimizations out of your control. This can be problematic if they do things like strip the color policy profile or further apply a lower-quality index.

There is a clear risk versus benefit with these middle boxes. If you are applying any logic in delivery selection, you can confuse these middle boxes and inadvertently deliver an inferior user experience despite your best intention.

To work around this problem you can do two things:

Use Transport Layer Security (TLS) as a transport for your images. These middle boxes cannot intercept TLS connections because it would cause the client to distrust the resigned response (aka man-in-the-middle attack).
Mark the response as private with Cache-Control: private. This will ensure that these proxies don’t accidentally cache the content and serve it to the wrong person.

Even if you are not doing selection in resolution or format, it is still good to account for these middle boxes impacting the delivery of your images. To control your destiny, it is good to also mark the response with Cache-Control: no-transform. This will indicate that middle boxes shouldn’t further mutate the response and possibly delay the delivery of your images. Again, using TLS will also accomplish the same goal.

CDNs and Vary and Cache-Control

It is useful to remember that the CDN acts on your behalf in the delivery solution and is under your control. While you cannot control the cache and life cycle of images sent to the end client (or intercepted by ISP proxies), you can control the CDN as you can your infrastructure.

There are two ways to invoke a CDN when delivering your images: passively or actively. In a passive setup, the CDN honors the Vary and Cache-Control headers in the same way that the client would. Unlike a transparent proxy, a CDN can often also serve TLS traffic on your behalf with a valid certificate. This makes it all the more important to ensure that you decorate your response with a properly formed Vary header.

The problem with CDNs in a passive mode is that while the possible values for Vary: DPR might be somewhat limited, the possible values of Vary: User-Agent or Vary: Accept result in a very fragmented cache. This is the equivalent of a infinite permutation and will yield a very low or no cache offload. Some CDNs, like Akamai, will treat any value of Vary other than Vary: Accept-Encoding as equivalent to no store. Be sure to configure the CDN to ignore the Vary header but pass it along to the end user.

To reiterate: using Vary has value for the end client but will have minimal to no value at the CDN. The client may have a few possibilities for Vary: Viewport-Width, but the CDN will have thousands upon thousands. Think about how many mobile devices have different screen dimensions. If you Vary: Viewport-Width, the CDN would have to cache for each possible value from 320 all the way up to 2,000 pixels. Similarly, with Vary: User-Agent, there are literally millions of permutations that would each need to be cached interdependently.

Active CDN configurations extend the decision logic from your origin into the CDN. In this way you can use device characteristics in the CDN to form the cache key. You should also be able to extend the cache key with multiple buckets of values to make it more succinct.

For example, you could bucket values of Width into 0–100, 100–200, 200–300 to a rationalized cache key with Width = 100, 200, and 300, respectively. This creates 3 cached versions instead of 300 possible variations.

With an active CDN configuration you will need to ensure that your server-side logic matches the CDN (see Figure 13-13).

In advanced solutions you can move the image selection to the domain of the CDN. This way, the CDN not only reflects the cache key but also is responsible for making the image selection and subsequently picking up the correct files from the origin or passing to an image transformation solution (see Figure 13-14).

Near Future: Key

There is a proposed standard that in the near future may help CDNs and the browser better understand the cache key partitioning of a response. The IETF httpbis working group has proposed the use of the Key HTTP response header to describe the secondary cache key. Key would complement the Vary header by providing the ranges of values that would result in the same response.

For example, using Key in a Client Hints–informed response could help describe the various breakpoints for an image, like so:

HTTP/1.1 200 OK
Vary: DPR, Width
Key: DPR;partition=1.5:2.5:4.0
Key: Width;div=320

Single URL Versus Multiple URLs

There are many metaphysical debates on whether an application should utilize one canonical URL for many derivative images or manifest each combination and permutation as a uniquely accessible URL. There are both philosophical arguments to be made as well as pragmatic.

The single URL camp usually starts with a discussion about “the forms” and quotes Socrates and Plato nine times before breakfast. The argument is to keep a canonical single URL representation exposed in order to ensure simplicity and agility. If you have one URL that has many derivations from the original, then you can partition or collapse the responsive image buckets at will without worrying about stale caches or link rot. A single URL allows regular iterations of optimization to find the best performance for the highest number of users.

/images/broccoli.jpg

On the other hand, the advocates for many URLs would argue that using one URL for each permutation avoids the unnecessary complications to address caching and proxies. (They also would likely claim Socrates was just a hack and scared of shadows.) Each derivative for responsive images, formats, and quality should likewise be manifested as a unique URL. This is in addition to the various use cases, such as “search results,” “product detail,” or “banner ad.”

/images/broccoli-search-400-80.jpg
/images/broccoli-search-400-80.webp
/images/broccoli-search-400-80.jp2
/images/broccoli-search-400-80.jxr
/images/broccoli-search-800-80.jpg
/images/broccoli-search-800-80.webp
/images/broccoli-search-800-80.jp2
/images/broccoli-search-800-80.jxr
...

Clearly, there is no single answer. There is a need for both approaches. As user demographics change, so too will the effectiveness of image breakpoints, image formats, and quality. For this reason it is good to remain flexible. Yet at the same time there are classes of content that should be exposed independently. Generally, the image use cases are best served as a unique URL. This is practical for your content creators and will likely have positive SEO impact as well.

Regardless of the approach, all the derivative images will need to be produced at one of the layers in your architecture. Whether the images are generated and stored in a filesystem at the origin or through a cloud-based transformation service, all of the variations must be stored somewhere. The key question is what makes the simplest operational sense and what has the least impact on your catalog of images.

File Storage, Backup, and Disaster Recovery

One of the often-overlooked aspects of image delivery is the performance (and cost) of storage, backup, and disaster recovery. Content creators and web devs often forget the cost of infrastructure. Modern storage infrastructure is fast and abundant. However, this doesn’t preclude operational complexity when you’re dealing with a large volume of images—especially small images (in bytes) that are optimized for delivery.

This section is not intended to be exhaustive, as there are just as many variables with efficient storage and backup as there are with image delivery. A good delivery experience also requires a balance of the infrastructure requirements. Millions of small images may not pose a problem in a steady state, but in a disaster recovery scenario, they can create a significant bottleneck that could impact the operation of your business and be the root cause for a mean time to recovery of 8 hours instead of 30 minutes.

Image delivery should always consider business continuity in the infrastructure planning. While we always hope that a datacenter will be resilient, we know that nature has a way of throwing a spanner into the works. The question, then, is how quickly can we recover.

Images transferred over the Web are predominantly small—at least in comparison to databases, videos, and other key assets an organization needs to preserve for business continuity. Using the median byte size for various breakpoints (see Figure 13-1) we can attempt to estimate the impact of these derivative images.

Let’s do the math:

100,000 base images
x 4 use cases (search, product details, hero ad ...)
x 8 widths
x 4 image formats (WebP, JPEG XR, JPEG 2000, WebP)
x 3 quality index
= 38.4 million images

Focusing on just the 300x breakpoint and assuming 30% savings for each format and an additional 20% for each quality:

10,000 base images
x 12.1KB (8.4KB, 8.4KB, 8.4KB)
= 1,200MB + 840MB + 840MB + 840MB
+ 3 quality (12.1KB/9.6/7.2, 8.4/6.7/5)
= 960MB + 672MB + 672MB + 672MB
= 720MB + 504MB + 504MB + 504MB
= 8.93 GB per use case and per breakpoint!

38.4 million images doesn’t sound like much, nor does 9 GB. But let’s look at the two factors that matter: size on disk and the cost of metadata.

Size on Disk

Most modern filesystems, from EXT4 to NTFS, use a block size of 4 KB. This ensures that the block size lines up with the physical attributes of the disk. Alignment to physical disk matters more with a spinning disk than it does for solid state. There is always inefficiency in filling every block. The assumption is that there will be more completely filled blocks than partially filled blocks.

In the previous example, rounding to the nearest block size adds an extra 25% to the total storage; that is, the 9 GB actually uses 12 GB of storage. Fortunately, as file sizes increase, the impact of size on disk decreases.

Cost of Metadata

The second issue is the cost of metadata. Every filesystem has some form of metadata to track the location on disk for a file and the block association for this file. This metadata is usually the root cause for any limits on the number of files per directory. For example, in ext4, the limit is 64,000 files. Generally speaking, each file and directory on a filesystem includes metadata (in ext4 it is an inode) to track the size of the file and the location on disk as well as its location in the hierarchy.

Different filesystems use different allocations, but it can be 2–3.2% of the total volume of a disk allocated to metadata. Even if you are storing the files in a database, the database itself will have to track the location with metadata. What can be tuned is how much metadata.

When ext4 was released, a number of tests were conducted by Linux Magazine based on different file sizes and directory depths (see Figure 13-15). The key here is that every file written must also have metadata recorded. It is not just one write for the file, but multiple writes. These tests showed the impact of creating small images and large images with shallow or deep directory structures.

As you can see, the impact of file size and metadata can be very large. The bottleneck here is now the filesystem metadata. Fast drives are no longer the bottleneck.

The cost of metadata is the bottleneck for disaster recovery. If we use the same scenario as we did for the 300x breakpoint and applied it to the eight other breakpoints, we would have a total storage of 2.5 TB. Even at 80 MB/s the expected time to recovery would be over 8 hours. In this scenario, your business would be out of commission for a full workday while we recover images.

Consider the impact of your design decisions on your infrastructure. Bottom line: you may be making decisions that your CFO might not be comfortable with in a disaster recovery event.

Note

To address this specific problem of many small images and the cost of metadata, Facebook has purpose-built an optimized object storage system called Haystack. Haystack uses an in-memory index designed for single write and many reads while minimizing the overhead cost of metadata. Replication, election, clustering, and other distributed or backup functions are outside the scope of the storage system and handled by other system logic.

Domain Sharding and HTTP2

As we discussed in Chapter 7, browsers are limited by the number of connections. To overcome this, and to improve the throughput for downloading images (and other small content), many websites use domain sharding. The objective of domain sharding is to work around TCP slow start, congestion window scaling, and head-of-line blocking. Normally, by opening up parallel TCP connections, up to six per host, you can effectively saturate the network connection. Domain sharding takes this a step further by utilizing multiple hostnames that point to the same infrastructure. In this way you can trick out the browser to send even more parallel requests by opening more sockets.

In Figures 13-6 and 13-7, you can see how the browser opens additional socket connections with each new domain shard. The impact is a faster completed download and page render. This is because the network is more fully utilized. (This example uses a 3 Mbps connection and 200 ms of latency to emphasize the impact.)

Even though TLS has a handshake tax, sharding can also have some benefits. For example, Figures 13-18 and 13-19 show the same website from before using one or two shards.

Typically you’d implement this approach by adding a different prefix, or even whole domain, to the resource request. Requesting http://www.example.com/i-love-broccoli.jpg now becomes http://images1.example.com/i-love-broccoli.jpg. These different hostnames are usually just aliases to the same content. Typically the subdomains would resolve in DNS to the same IP and depend on the virtual host mapping on the application server to serve the same content.

Using domain shards is straightforward but does have a few implementation considerations.

How Do I Avoid Cache Busting and Redownloading?

We have two objectives when using sharding: maximize the browser cache, and avoid downloading the same resource twice. However we implement domain sharding, we must ensure that i-love-broccoli.jpg doesn’t show up using img1.example.com on the first page but img2.example.com on the second. This would effectively void the browser cache and force redownloading the content.

To avoid this, you should partition your images into groups of content. However, avoid using a counter to switch between shards. Also, it is tempting to use one shard for CSS and another for JPEG. You should avoid this temptation because you don’t want all the critical resources to be bunched up on a single request queue. Instead, use a hash or an index to equally distribute filenames between available shards.

How Many Shards Should I Use?

Selecting the right number of shards is not as clear cut as you would expect. Early research suggested two to four shards per page, but this was a best practice from 2007 when browsers only made two connections per hostname. Steve Souders has provided the most recent guidance, suggesting ~20 resources per domain to provide a good balance of sharding for performance.

This remains the best general guidance. However, there are other questions, such as: what is the impact on congestion control and TCP scaling? If each socket is attempting to maximize the congestion window but competing with itself, this could result in packet loss and thus decrease overall performance. The size of the resources also impacts the effectiveness of sharding. Sharding works because many small resources don’t use more than a few packets to send/receive. (We discussed this more in Chapter 10.) However, this value can diminish with larger content, many more resources in parallel, or low bandwidth.

What Should I Do for HTTP/2?

Is domain sharding an anti-pattern for HTTP/2? The short answer is: no. The longer answer is: it could be, if you don’t consider HTTP/2 in your implementation.

HTTP/2 has many advantages, one of which is the ability to have multiple parallel requests on a single socket connection. In this way we can avoid the HTTP/1.1 head-of-line blocking problem. Domain sharding is not necessary to saturate the network connection. By using a single socket you can also scale the congestion window more quickly and avoid packet loss and retransmission.

However, there are a number of barriers to HTTP/2 adoption. Aside from the consideration of adopting TLS (because there aren’t any implementations that can do HTTP/2 without TLS), there is also the factor of the user adoption curve. HTTP/2 requires recent versions of modern browsers. For native apps it also requires modern OSes (or client libraries) that can likewise use HTTP/2. Beyond user adoption there is also the challenge of corporate (and home) content filters that intentionally decrypt and resign TLS encryption. In some situations, Akamai has observed the interception of TLS requests to be as high as 17% in a region or demographic. There are many causes, most likely from web filters or local antivirus software. The problem is the same: these content filter proxies likely do not use HTTP/2, even if the browser behind the proxy supports HTTP/2.

As with many other web technologies, we should expect the organic adoption of HTTP/2 to take many years. Consider that between 1998 and 2016, only 10% of users were on IPv6 reachable networks. Likewise, SNI (Server Name Indicator) support in TLS has been a standard since 2003, but it wasn’t until 2016 that 95% of TLS web traffic supported SNI (mostly as a result of the end-of-life of Windows XP). As of 2016, HTTP/2 adoption is between 50% and 75% depending on the demographic or segmentation. We should expect the long tail of adoption of HTTP/2 to take three to five years before we come close to 100%.

So what should we do in this interim?

Option 1 is to dynamically generate the domain sharding. If the user is connected on a HTTP/2 connection, disable domain sharding. If the user has HTTP/1.1, then utilize multiple sharding as before. This approach, of course, requires that your local caching infrastructure and your CDN be aware of the different rendered outputs and add the HTTP/2 connection as part of the cache key. Unfortunately, there isn’t a corresponding Vary: header that can properly describe a variation based on the protocol. The best solution is to use Vary: User-Agent to communicate the variation (as you would with a RESS design).

Option 2 is to simply ignore the problem. Fortunately, most (if not all) HTTP/2 implementations have an optimization to address domain sharding. Specifically, if the shared domain resolves by DNS to the same IP and the shard is on the same certificate (the hostname is in the Subject-Alternate-Name list), then the HTTP/2 connection will consolidate the sockets. In this way, multiple hosts will use the same HTTP/2 connection and therefore avoid any penalty of sharding. It still remains a single connection. The only penalty is the DNS request. See Figure 13-20.

This is the easiest option because it simply means less work. It also allows you to continue to use sharding for the laggard adopters. And it is these laggard adopters that likely need any performance bump you can give them.

Best Practices

You should continue to use multiple domain shards for your website to maximize the connection throughput. This also helps you avoid images from blocking more critical resources like CSS and JS.

In preparation for HTTP/2 make sure that you:

Use the same DNS for all of your shards and primary hostname.
Use the same certificates—add them to the SAN fields of your TLS certificate.

Secure Image Delivery

Security is everyone’s responsibility. Throughout this chapter we have focused on how to deliver images to users. Just as important to your brand is the security of your images. What if your images were tampered with? How could your brand be tarnished if a nefarious agent accessed them?

Secure Transport of Images

Up until recently the majority of the Web has been delivered unencrypted. As we have all experienced, there are many locations where content can be hijacked in an unencrypted flow. Public WiFi does this intentionally to force you through a captive portal before granting you access to the Internet. ISPs, with good intentions, have notoriously applied higher compression, distorting the visual quality of your brand. Using Cache-Control: no-transform works for some, but not all, well-behaved image transformations (see Figure 13-21). But there are also not-so-well-intentioned transparent proxies that hijack image requests and replace the content with different advertisements or placeholders.

Securing the transport for images is straightforward. Using TLS you can ensure that the communication from user to server is trusted and that there aren’t any middle boxes interfering with or mutating your content. Moving to HTTP/2 also requires the use of TLS.

Be Careful of Content Hijacking on Untrusted WiFi

There have been increasing reports of free WiFi hotspots found at hotels, coffee shops, and restaurants replacing web content with alternate advertising. Putting the ethical argument aside—whether service providers can generate ad revenue from offering free Wifi—there are branding implications for your own web content. Hijacking content like this is only possible with unencrypted pages and images. Moving to TLS prevents man-in-the-middle interception.

Secure Transformation of Images

Securing image delivery is about more than just the transport layer. We should also be concerned about the attack surface of our transformation engines. Whether you are using an on-premise image transformation engine or an off-premise one, there are many possible vulnerabilities. Third-party and open source libraries are extremely useful but also can introduce risk to the enterprise if not properly isolated.

An index of Common Vulnerabilities and Exposures (CVE) is maintained by Mitre (see Figure 13-22). It is critical to keep up-to-date with the latest known exploits on the libraries and tools used in your image transformation workflow. Isolating and patching should be part of your regular team practices.

The main concern for image transformation engines is if a contaminated image enters for processing and, through the decode or mutation process, exploits a vulnerability. This could leverage a byte alteration from the logic edge case, checksum collision, or remote code execution. Consider that the famous Jailbreakme exploit that allowed jailbreaking on iOS 3 used a flaw in the TIFF decoder in iOS. This single flaw allowed rooting of the entire operating system. Imagine the potential impact on your images. This vulnerability could impact subsequent images, possibly tagging them with brand-damaging messages. Just because the bytes of the image have left the processor doesn’t mean that there isn’t residual code running on the thread. The last thing you want is all of your product images graffitied with “EAT BROCCOLI” without your realizing it (see Figure 13-23).

How could a contaminated image enter your workflow?

User-generated images, compromised at source
Vendor-supplied product images, compromised at source
In-house photography, compromised by malware on the artist’s laptop

It is easy to imagine how a compromised image could enter your workflow. So how can you ensure that a compromised image doesn’t impact your ecosystem? How do you isolate the impact to just that compromised image? How can you minimize risk and exposure to your image transformation service?

Secure Transformation: Architecture

Whether your image transformation is on premise or with a cloud-based SaaS provider, you should evaluate the architectural security of the transformation engine. Ideally, there should be isolation at every level of processing. You want to ensure that no single compromised image can affect other parallel threads/processes/systems that are also transforming other images. You also need to ensure that there isn’t any residual code that may impact the next image processed by this specific thread.

A well-secured transformation architecture should consider three major areas for isolation (see Figure 13-24):

TCP connection pools (retrieving and storing)
Transformation engine (e.g., ImageMagick)
Encoding and decoding shared objects

For example:

We need to ensure that there is no way that images being sent or received via TCP (or disk) can impact another thread or process. The initiating worker should only have access to the stream of bytes for this job.
The transformation engine, such as ImageMagick, must not be able to store, execute, or preserve any state between image processing. The worker threads must be each isolated to exclusive scratch areas and restricted to access only certain system libraries. For example, the transformation engine should not be able to open up new TCP sockets or leave temporary files or memory state between jobs.
The various encoding and decoding shared objects (e.g., libjpeg-turbo) also need to be isolated. Memory state should not be allowed to persist or be accessed by parallel threads or other jobs.

This is not an exhaustive list of ways to isolate and segment the architecture. Your local security team should be able to help you ensure that there is no way that a maliciously tampered image can have ecological impact on the rest of your valuable assets. If you are using a cloud solution, you should also ensure that the same level of scrutiny can be applied.

Summary

Downloading an image is no longer simple. There are many variables to consider to ensure the best performance. In order to deliver the best image we want to:

Adjust image dimensions: Provide a set of breakpoints available for an image to reduce memory usage on the device and improve delivery performance. Use a general rule of 16 packets (24 KB) per breakpoint.
Use advanced image formats: Newer formats support additional compression as well as more features. For mobile environments, use WebP and JPEG 2000 for Android and iOS users, respectively.
Apply different quality: Reducing the quality index for a format can reduce byte size. Use DSSIM to find the lowest-quality index for an image. Use three different steps of quality for slower network conditions.

In addition to considering the matrix of image delivery options, you must account for impacts to infrastructure, operations, and security. Transforming images will increase your storage footprint and can impact disaster recovery. Finally, the security of transforming and transformed images is an important and oft-overlooked aspect of delivery. Delivery requires balance between the user’s situation, operational complexity, and security.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 13. Image Delivery

Create new playlist

Sign In

Sign Up

Chapter 13. Image Delivery

Image Dimensions

Figure 13-1. Image size to image breakpoints (150–800)

Figure 13-2. Image size to image breakpoints (1,000–4,000)

Image Format Selection: Accept, WebP, JPEG 2000, and JPEG XR

Image Quality

Quality and Image Byte Size

Figure 13-4. Quality graph comparing image formats to byte savings

Figure 13-5. Quality graph comparing image formats to byte savings with index between 90 and 35

Quality Index and SSIM

Figure 13-6. Quality graph comparing image formats to SSIM values

Figure 13-7. Quality graph comparing image formats to SSIM values with index between 90 and 40

Figure 13-8. Quality index graph comparing JPEGs to other formats

Figure 13-9. Quality index: JPEG (libjpegturbo) versus other formats

Figure 13-10. Quality index: JPEG (libjpegturbo) versus other formats

Selecting SSIM and Quality Use Cases

Creating Consensus on Quality Index

Figure 13-11. Building consensus on image quality

Consider the Contractual Obligations of Branding When Reducing Image Quality

Quality Index Conclusion

Figure 13-12. Workflow for selecting the right image quality

Achieving Cache Offload: Vary and Cache-Control

Informing the Client with Vary

Middle Boxes, Proxies with Cache-Control (and TLS)

CDNs and Vary and Cache-Control

Figure 13-13. User (Vary: User-Agent) ←- CDN (add isJpeg2000 to cache-key) ←- Origin (select JPEG 2000)

Figure 13-14. CDN selects origin file

Near Future: Key

Single URL Versus Multiple URLs

File Storage, Backup, and Disaster Recovery

Size on Disk

Cost of Metadata

Figure 13-15. Ext4 metadata writes reduce disk performance: creating many small files is slower

Note

Domain Sharding and HTTP2

Figure 13-16. One resource domain on an HTTP/1.1 connection

Figure 13-17. Two resource domains on an HTTP/1.1 connection

Figure 13-18. One resource domain on an HTTP/1.1 + TLS connection

Figure 13-19. Two resource domains on an HTTP/1.1 + TLS connection

How Do I Avoid Cache Busting and Redownloading?

How Many Shards Should I Use?

What Should I Do for HTTP/2?

Figure 13-20. HTTP/2 with two resource domains

Best Practices

Secure Image Delivery

Secure Transport of Images

Figure 13-21. Use Cache-Control: no-transform to prevent degraded quality by ISP proxies

Be Careful of Content Hijacking on Untrusted WiFi

Secure Transformation of Images

Figure 13-22. CVEs reported for ImageMagick and common image libraries

Figure 13-23. We want to avoid one image affecting other images on the platform

Secure Transformation: Architecture

Figure 13-24. A model for secure image transformation architecture

Summary

Table of Contents for
13. Image Delivery