Chapter 3. Lossless Image Formats

In the previous chapter, you learned about the difference between lossy and lossless image formats. Lossy image formats lose image information during their compression process—typically taking advantage of the way we perceive images to shave away unnecessary bytes. Lossless image formats, however, do not have that benefit. Lossless image formats incur no loss of image information as part of their compression process.

In this chapter, we’ll dig deeper into GIF and PNG, the two primary lossless image formats on the Web. We’ll talk about how they’re constructed and compressed, and what to do to ensure we keep them as lightweight as possible.

GIF (It’s Pronounced “GIF”)

When it comes to image formats on the Web, the Graphic Interchange Format (GIF) may no longer be the king of the castle, but it certainly is its oldest resident. Originally created in 1987 by CompuServe, the GIF image format was one of the first portable, nonproprietary image formats. This gave it a distinct advantage over the many proprietary, platform-specific image formats when it came to gaining support and adoption on first Usenet, then the World Wide Web.

The GIF format was established at a time of very limited networks and computing power, and many of the decisions about how to structure the format reflect this. Unfortunately, as we’ll see, this limits its ability both to portray rich imagery as well as to compress.

Block by Block

The building blocks of the GIF format are…well, they’re blocks. A GIF file is composed of a sequence of data blocks, each communicating different types of information. These blocks can be either optional or required.

The first two blocks of every GIF file are required, and have a fixed length and format.

Header block

First up is the header block. The header takes up 6 bytes and communicates both an identifier of the format and a version number. If you were to look at the header block for any given GIF, you would almost certainly see one of the following sequences:

47 49 46 38 39 61
47 49 46 38 37 61

The first three bytes (47, 49, 46) are the GIF’s signature and will always equate to “GIF.” The last three bytes indicate the version of the GIF specification used—either “89a” (38, 39, 61) or “87a” (38, 37, 61). Generally speaking, image encoders will use the older 87a for compatibility reasons unless the image is specifically taking advantage of some features from the 89a specification (such as animation).

Logical screen descriptor

Immediately following the header block is the logical screen descriptor, which is 7 bytes long and tells the decoding application how much space the image will occupy.

The canvas width and canvas height can be found in the first two pairs of bytes. These are legacy values that stem from the belief that these image viewers may render multiple images in a single GIF, on the same canvas. Most viewers today ignore these values altogether. The only time in practice that a GIF contains multiple images is if it is animated, but in those cases each image is either a frame or in image libraries. (We’ll explore this in more detail later in the chapter.)

The next byte is actually four fields of data crammed together. By converting the byte to a binary number, you get a series of boolean switches to indicate four distinct pieces of data.

The first bit is the global color table flag. If the bit is 0, there is no global color table being used in the image. If the bit is one, then a global color table will be included right after the logical screen descriptor.

GIF is a palette-based image format; that is, the colors that the image uses have their RGB values stored in a palette— or color—table. These tables contain the colors in the image, as well as a corresponding index value starting at zero. So, if the first pixel of an image is the color green, then in the color table, the color green will have a corresponding index value of 0. Now, whenever the image is being processed and encoded, any time that color is discovered, it can be represented by the number 0.

In the case of the GIF format, each table can hold up to 256 entries. This 256-color limit made a great deal of sense when the GIF format was established—hardware was far less capable than it is today—but it severely limits GIF’s ability to display images that contain much detail.

GIFs can feature both a global color table as well as a number of local color tables if multiple images are being used (typically in animation). While the global color table is not required, it is almost always included in the image, so this bit is typically the number 1.

The next three bits are the color resolution, which is used to help determine the size of the global color table. The formula for the number of entries in the global color table is 2(N+1) where N is equal to the number indicated in the color resolution bits. For example, if the color resolution bit is 0, then the size of the color table is 2 (2(0+1)). If the color resolution bit is 3, the size of the color table is 16 (2(3+1)).

Following the color resolution is the sort flag: another single bit that is used to indicate if the colors in the global color able are sorted, typically by how frequently they occur in the image, or not.

The last two bytes, the background color index and pixel aspect ratio, are mostly unused. The background color index only comes into play if you’re trying to composite several subimages—something that was anticipated when the GIF format was created, but never really utilized.

The pixel aspect ratio is also mostly ignored, and the GIF standard isn’t exactly forthcoming on what their thinking was for including it in the first place.

But wait, there’s more!

Following the header and logical screen descriptor blocks are a number of optional blocks, as well as the actual image data. These blocks range from the optional global and local color tables, to extensions that enable transparency, animation, and even comments for including human-readable metadata within the image. We’ll touch on some of the extensions when we look at animation.

The very last block in any GIF is always the trailer, which is a single byte with the value of 3B, providing decoders with an indicator that they’ve reached the end of the file.

GIF is a palette-based image format; that is, the colors that the image uses have their RGB values stored in a palette table.

Animation

Animated GIFs are everywhere on the Web today. Thanks to social media, they’ve seen a resurgence in popularity. It makes good sense, too: they’re pretty easy to make and because they’re GIF files, they work in every major browser without any need for additional development work.

Animation was built into the GIF89a file format. The way it works is that each frame of the animation is a separate image, stored within the file. A series of extension blocks are used to tell decoders how to transition between these images.

The application extension block is used to indicate how many times the animation should loop. Immediately following the application extension block, you’ll find a graphic-control extension, which provides information about how the file should move between the different images (or frames) in the file and how long the delay should be.

While the portability and broad support of the format may seem appealing, there are much better options.

Remember: the GIF format is showing its age. It has a limited color palette that makes it poorly equipped to show detailed images, and it was formalized before incredibly useful compression techniques, such as chroma subsampling (which you’ll learn about in Chapter 4) or specific compression techniques targeted at video data.

With every frame of the animation requiring another image to be stored inside the GIF file, and with the format itself lacking many of these advanced compressed techniques, animated GIFs very quickly become frighteningly heavy and bloated.

An increasingly popular (and more bandwidth friendly) alternative is to use a video instead—something like the MP4 format. Much like image formats have specific compression steps built in that are tailored for compressing image data, video formats have additional compression techniques available that are tailored for compressing video. As a result, using an MP4 can significantly reduce the size of the file.

Consider the image of Eadweard Muybridge’s famous “The Horse in Motion” in Figure 3-1. Sadly, you’ll have to take our word that it’s an animated GIF. Just squint and use your imagination to pretend the horse is galloping.

hpim 0301
Figure 3-1. An animated GIF of “The Horse in Motion”

The animated GIF contains 15 frames—which means there are actually 15 images bundled up in this file. Compressed, it weighs in at a whopping 568 KB.

We can use the open source tool FFmpeg to convert the GIF to an MP4 file (Example 3-1).

Example 3-1. Using FFmpeg to convert GIF to an MP4 file
fffmpeg -f gif -i horse.gif horse.mp4

The resulting MP4 is only 76 KB—around 14% of the size of the GIF itself. Not bad for a one-line command.

You can now use the <video> element to include your new, bandwidth-friendly animation (Example 3-2).

Example 3-2. Using video element to include animation
<video autoplay loop>
	<source src="horse.mp4" type="video/mp4" />
</video>

Transparency with GIF

The GIF format also allows for transparency. Just as with animation, transparency support is indicated within a graphic control extension block, this time using a transparency color flag that is either set to “0” if there is no transparency in the file, or “1” if we would like the image to have a transparent component.

In Chapter 2, you learned that transparency can be handled using an alpha channel—an additional byte of information for each color indicating the level of transparency. The GIF format, however, takes a different approach.

In GIF images, we accomplish transparency by signifying that one color in the palette should be treated as transparent. For example, we may indicate that any white pixels should be transparent. This has the advantage of not requiring nearly as much additional data as having a full-blown alpha channel, but it also brings a significant limitation. Having a single color represent a transparent pixel removes any ability for partial transparency: it’s all or nothing. As a result, transparency in GIFs frequently looks very jagged and low resolution.

LZW, or the Rise and Fall of the GIF

The GIF format boasted a lossless compression algorithm known as Lempel-Ziv-Welch, or more commonly, LZW. This algorithm allowed GIF to improve compression significantly over other lossless formats of the time, while maintaining similar compression and decompression times. This file savings, paired with GIF’s interlace option, which allowed a rough version of an image to be displayed before the full image had been transmitted, made GIF a perfect fit for the limited networks and hardware of the Web’s early days.

Unfortunately, the same compression algorithm that made it such a great format for the Web also directly led to GIF’s fall from grace. As it turns out, the algorithm had been patented by Unisys. In December 1994, Unisys and CompuServe announced that developers of GIF-based software (e.g., compression tools) would be required to pay licensing fees. As you might imagine, this didn’t sit well with developers and the community at large.

There were many repercussions of this announcement, but none more notable than the creation of the PNG image format in early 1995.

The PNG File Format

Depending on who you ask, PNG either stands for Portable Network Graphics or, as a bit of recursive humor, PNG not GIF (we programmers have a very finely tuned sense of humor). The PNG format was the community’s response to the licensing issues that arose around GIF.

The early goal of creating the format was pretty straightforward: create an open alternative to GIF to avoid all the licensing fees. It didn’t take long for everyone involved to realize that they wouldn’t be able to do this and maintain backward compatibility in any way. While everyone loves a seamless fallback, the advantage was that this meant the folks creating the PNG format could be more ambitious in their aims—if they weren’t going to be able to maintain backward compatibility, why not make PNG better in every possible way? For the most part, it would seem, they succeeded.

Understanding the Mechanics of the PNG Format

PNGs are composed of a PNG signature followed by some number of chunks.

PNG Signature

The PNG signature is an 8-byte identifier that is identical for every single PNG image. This identifier also works as a clever way to verify that the PNG file has not been corrupted during transfer (whether over the network or from operating system to operating system). If the signature is altered in any way, then the file has been corrupted somewhere along the line.

For example, the first value in the PNG signature is “137”—a non-ASCII, 8-bit character. Because it is a non-ASCII character, it helps to reduce the risk of a PNG file being mistakently identified as a text file, and vice versa. Since it is 8 bits, it also provides verification that the file was not passed over a 7-bit channel. If it was, the 8th bit would be dropped and the PNG signature would be altered.

The full list of bytes of the PNG signature is presented in Table 3-1.

Table 3-1. PNG signature bytes
Decimal value Interpretation

137

8-bit, non-ASCII character

80

P

78

N

71

G

13

Carriage-return (CR) character

10

Line-feed (LF) character

27

Ctrl-Z

10

Line-feed (LF) character

Chunks

Other than the first 8 bytes that the PNG signature occupies, a PNG file is made entirely of chunks—the building blocks of the PNG format.

Each chunk composes the same set of four components:

Length field

The length field takes up 4 bytes and refers to the length of the chunk’s data field.

Type field

The type field takes up 4 bytes and indicates to the decoder what type of data the chunk contains.

Chunk data

The chunk data contains the bytes of data that the chunk is trying to pass along. This can range anywhere from 0 bytes to 2 GB in size.

Cyclic Redundancy Code (CRC)

The CRC is a 4-byte check value. The decoder calculates the CRC based on the chunk data and chunk type—the length field is not used in the calculation. If the calculated CRC value matches the 4-byte CRC field included in the chunk, the data has not been corrupted.

Cyclic Redundancy Code Algorithm

The actual algorithm used to calculate the CRC makes for pretty dry reading (says the guy writing about the nuances of PNG compression), but if that’s your cup of tea, you can find the exact alogrithm online.

Ancillary and critical chunks

The type field communicates a decent amount of information about the chunk within its 4 little bytes. Each byte has a designated purpose. In addition, each byte has a simple boolean value of information that is turned on and off by the capitalization of the character occupying that byte.

The first byte is the ancillary bit. Just as with blocks in the GIF format, not all chunks are essential to succesfully display an image. Each chunk can either be critical (uppercase) or ancillary (lowercase). A critical chunk is one that is necessary to successfully display the PNG file. An ancillary chunk is one that is not—instead, its purpose is to provide supporting information.

The second byte is the private bit. The private bit informs the decoder if the chunk is public (uppercase) or private (lowercase). Typically private chunks are used for application-specific information a company may wish to encode.

The third byte is a reserved bit. Currently this bit doesn’t inform the coder of anything other than conformance to the current version of PNG, which requires an uppercase value here.

The fourth byte is the safe-to-copy bit. This bit is intended for image editors and tells the editor whether it can safely copy an unknown ancillary chunk into a new file (lowercase) or not (uppercase). For example, an ancillary chunk may depend on the image data in some way. If so, it can’t be copied over to a new file in case any of the critical chunks have been modified, or reordered, or new critical chunks have been added.

The capitalization means that two chunk types that look nearly identical can be very different. Consider iDAT and IDAT. While they appear similar, the first byte makes them distinct chunk types. iDAT is an ancillary chunk type—it’s not essential to properly display the image. IDAT, on the other hand, starts with the first character capitalized, indicating that it is a critical chunk type and therefore if it is missing, any decoder should throw an error since it won’t be able to display the image.

The PNG specification defines four critical chunk types (see Table 3-2), three of which are required for a PNG file to be valid.

Table 3-2. Critical chunks
Chunk type Name Required

IHDR

Image header

Yes

PLTE

Palette

No

IDAT

Image data

Yes

IEND

Image trailer

Yes

The IHDR chunk is the first chunk in any PNG image and provides details about the type of image (more on that in a bit), the height and width of the image, the pixel depth, the compression and filtering methods, the interlacing method, whether the image has an alpha channel (transparency), and whether the image is true color, grayscale, or color-mapped.

The IDAT chunk contains the compressed pixel data for the given image. Technically, the IDAT chunk can contain up to 2 GB of compressed data. In practice, however, IDAT chunks rarely reach that size.

The final required chunk is the IEND chunk. IEND is as simple as you can possibly get when it comes to chunks—it contains no data at all. Its entire purpose is to indicate that there are no more chunks in the image.

Pairing these three required chunks—IHDR, IDAT, IEND—with a PNG signature gives you the simplest PNG file possible. In fact, these three chunks are all you need to build a true color or grayscale PNG file.

However, like its predecessor GIF, PNG can also take advantage of color palettes. If a color palette is being used, then the PNG file also needs to include the PLTE (palette) chunk. The PLTE chunk houses a series of RGB values that may be included in the image.

While these four chunks—IHDR, IDAT, IEND, and PLTE—are the primary ones, and the only critical chunks specified, that doesn’t mean they’re the only chunks in your files. Image editors tend to create all sorts of ancillary chunks containing everything from histogram-related data to the oh-so-very-helpful chunk Photoshop adds that tells you that the image was made in Photoshop. Removing any chunks that have no influence over the visual appearance of your image is an essential first step in reducing PNG bloat, and any PNG optimization tool worth its salt will take care of this step for you.

Filters

The not-so-secret secret of compression is that the more repetitive data an object contains, the easier it will be to compress. Image data by itself is typically not very repetitive, so the PNG format introduces a precompression step called filtering. The goal of the filtering process is to take the image data and try to make it easier to compress.

The PNG filtering process uses what is called delta encoding; that is, it compares each pixel to the pixels surrounding it, replacing the value with the difference to those pixels.

Clear as mud? Here’s an example.

Let’s say we had a set of numbers:

1 2 3 4 5 6 7

Every value in the set is unique, so when a compression algorithm comes through, it’s not going to have any luck reducing the size.

But what if we had a filter run through and replace each number with the difference between its value, and the value of the number preceding it? Then our set of numbers would look like:

1 1 1 1 1 1 1

Now this is much more promising! A compression algorithm can whittle this down to almost nothing, giving us a huge savings. That’s delta encoding (albeit a very idealistic example), and that’s what the PNG filters set out to accomplish.

The PNG format has five filters that may be applied line by line:

None

Each byte is left unchanged.

Sub

Each byte is replaced with the difference between it and the value of the byte just to the left of it.

Up

Each byte is replaced with the difference between it and the value of the byte just above it (in the preceding row).

Average

Each byte is replaced with the difference between it and the average of the bytes just to the left and just above it.

Paeth

Each byte is replaced with the difference between it and the Paeth predictor (a function of the bytes above, to the left, and to the upper left).

These filters can vary from line to line of the image, based on the content of that line and what filter would have the greatest impact.

Interlacing

Both GIFs and PNGs have an interlacing feature that, similar to the progressive JPEG feature you’ll learn about in the next chapter, enables an image to be rendered quickly as a low-resolution version, and then be progressively filled in at each successive pass. This interlacing approach allows the browser to give the user some sense of the makeup of the image earlier than the typical top-down approach to image rendering.

The GIF approach to interlacing is a one-dimensional scheme; that is, the interlacing is based on horizontal values only, focusing on a single row at a time. GIF’s approach to interlacing has four passes. First, every eighth row is displayed. Then, every eighth row is displayed again—this time offset by four rows from the first pass. For example, in an image composed of eight rows of pixels, pass one would display row one and pass two would display row five.

The third pass displays every fourth row, offset by two rows from the top. So in our 8-px by 8-px example, pass three would fill in the third and seventh rows. The fourth and final pass displays every other row. You can see how each row of an image is displayed using GIF interlacing in Figure 3-2.

hpim 0302
Figure 3-2. How rows of an image are displayed using GIF interlacing

In contrast, PNG’s interlacing method is a two-dimensional scheme. Instead of analyzing a single row at a time, PNG’s interlacing method involves looking at the individual pixels within the row.

The first pass involves filling in every eighth pixel—both horizontally and vertically. The second pass fills in every eighth pixel (again horizontally and vertically) but with an offset of four pixels to the right. So given an image eight pixels wide and eight pixels high, pass one would fill in the first pixel in the first row, and pass two would fill in the fifth pixel on the first row.

The third pass fills in the pixels that are four rows below the pixels filled in by the first two passes. Using the same 8×8-pixel image, pass three would fill in the first pixel on row five as well as the fifth pixel on row five.

The fourth pass displays the pixels that are offset by two columns to the right of the first four pixels, and the fifth pass fills in the pixels that fall two rows below each of the prior displayed pixels.

Pass six fills in all remaining pixels on the odd rows, and the seventh and final pass fills in all pixels on the even rows.

That’s a lot of numbers, and is quite possibly as clear as mud at this point. For those more visually minded, Figure 3-3 shows which pixels are filled in for each pass.

hpim 0303
Figure 3-3. A visual of which pixels are filled in per pass

While the PNG method of interlacing involves more passes, if you were to assume the same network conditions and compression levels, an interlaced PNG image would be on pass four by the time the GIF image had completed its first pass. Why? Because the first pass of GIF interlacing involves 1/8 of the data of the GIF image itself (1 in every 8 rows), whereas the first pass of PNG interlacing involves only 1/64 of the data (1 pixel in every 64 pixels—8 pixels horizontally multipled by 8 pixels vertically). The impact is particularly noticeable on any images with text, as the text becomes readable much more quickly using the PNG approach to interlacing.

Progressive loading, higher fidelity much earlier than the GIF counterpart—PNG interlacing sounds great, right? Unfortunately, it’s not all sunshine and roses. The consequence of PNG’s approach to interlacing is that it can dramatically increase the file size because of its negative impact on compression.

Remember all those filters we talked about? Because each pass in the PNG interlacing process has different widths, it’s far simpler to treat each pass as a completely separate image for filtering. The consequence is that the filtering process has less data to work with, making compression less effective. On top of that, the benefits of progressively loading images have been debated with no definitive conclusion. When you combine the severe reduction in compression with the questionable value of interlacing in the first place, PNG interlacing starts to make a lot less sense. Typically, you’re better off ignoring interlacing on both PNGs and GIFs altogether.

Image Formats

PNG images can be saved in five different formats: indexed, grayscale, grayscale plus alpha, truecolor, and truecolor plus alpha.

The difference is in how many bytes are needed to describe each pixel’s color and, optionally, transparency.

Indexed PNGs use a palette to list all the colors included in the image. These palette-based PNGs are commonly grouped together as PNG8, which is technically short for 8-bit PNGs. Somewhat confusingly, that doesn’t necessarily mean that each pixel value is actually 8 bits deep. You can actually have 1-, 2-, and 4-bit pixels as well. Each just means you have potentially smaller color tables. For example, an 8-bit PNG can support 256 colors, while a 2-bit PNG can only support up to 4 colors.

Grayscale PNGs are similar in that they use a palette, but they also add support for 16-bit pixels for highly detailed grayscale-based imagery (think things like medical imagery).

Truecolor PNGs are what are referred to when you hear about PNG-24s, and they do not use a palette at all. Instead of referring to a palette table, each pixel directly specifies a color using the RGB format. This provides the ability for PNG-24s to cover the full color spectrum (hence “truecolor”), but is also the reason why PNG-24s are much heavier than their 8-bit counterparts.

Transparency with PNG

Like the GIF format, PNGs support transparency. The PNG format, however, is much more powerful and flexible in its support.

If you recall, we accomplish transparency in GIFs by specifying an individual color in the color table to be displayed as transparent—it’s entirely binary, leaving room for partial transparency.

On the other hand, PNG support for transparency takes a few different flavors.

For palette-based images, we handle transparency by adding one or more entries of alpha information to the tRNS chunk. Truecolor and grayscale images can also use the tRNS chunk, but only to define a single value as transparent. Because we’re referring to specific entries in the palette table, you don’t have the ability to make, say, two different white pixels in your image display with two different alpha values.

If you want more flexibility and access to partial transparency, the truecolor and grayscale formats also allow for the addition of an alpha channel—each pixel now receiving its own alpha value. This is much more costly than the indexed approach to transparency, as we’re not adding merely a table, but a full channel of information to every single pixel.

Knowing this, as well as the golden rule of image compression (the more similar colors, the better), we can take a few steps to ensure that our transparent PNGs are as light as possible.

As we just discussed, if you’re using a truecolor PNG and making a large number of those pixels fully transparent, that alpha value is added to each individual pixel. But what about all the RGB data? That doesn’t go away. Even if we marked those pixels as fully transparent so that they will not display, there may still be dozens and dozens of unique RGB data values associated with them.

To maximize compression, we can use an image editor to convert all of those fully transparent pixels to be the same color—red, for example. Now, when we make them fully transparent, their full RGBA values will be identical and PNG’s compression algorithms will happily gobble up all that unnecessary data, leaving us with a much smaller file size.

There Can Be Only One!

So given all the preceding information, here’s the ultimate question: when do you use a GIF and when do you use a PNG? The answer is to favor PNGs for all except the smallest of images. Likewise, if you want to use animation at all, GIF is the way to go (though as we’ve seen above, you could argue MP4s are even better).

Basically, while the GIF format helped pave the way for formats like PNG, its time has come and gone. If you are ever considering putting a GIF in a page, take a step back and consider if another alternative would work better.

Summary

In this chapter we looked at the two most popular and widely supported lossless image formats on the Web, GIFs and PNGs. We looked at how each format is encoded and compressed, as well as what tweaks we can make to maximize those savings. Now that you know all about lossless formats, not only can you impress your friends with your in-depth knowledge of filtering and compression algorithms, but you can also start to save precious bytes with every image you produce.

In the next chapter, we’ll dig into JPEGs—the Web’s favorite lossy image format—and learn how to optimize them as much as possible.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset