Chapter 16. PDF streams

This chapter covers

  • Image and font streams
  • Adding and extracting file attachments
  • Creating portable collections
  • Integrating rich media

We’ve arrived at the final chapter of part 4. In this part, we’re turning PDF files inside out. In chapter 13, we explored the file structure and discussed the different objects. We focused on the content stream of pages in chapters 14 and 15.

In this chapter, we’ll continue working with streams: we’ll look at image and font streams, and you’ll find out how to add streams containing other files as attachments, and how to organize these files in a portable collection. We’ll finish this chapter with some really cool examples of adding multimedia annotations to a document and integrating a Flash application into a PDF document.

16.1. Finding and replacing image and font streams

When you create an image using the Image class, or a font using the Font or BaseFont class, you don’t have to worry about the way these objects are stored in the finished document. For example, when you use a standard Type 1 font, iText will add a font dictionary to the PDF file. When you use a font that is embedded, the font dictionary will also refer to a stream with a full or partial font program that is copied into the PDF file.

In this section, we’ll look at advanced techniques that address the lowest level of PDF creation and manipulation with iText. The examples that follow were inspired by questions that were posted to the mailing list (see appendix B for more info about the list).

16.1.1. Adding a special ID to an Image

In the previous chapter, you learned how to extract all the images from a page, but what if you want to pick one specific image programmatically?

An image is a stored in a stream object. Each stream consists of a dictionary followed by zero or more bytes bracketed between the keywords stream and endstream (see table 13.2). The entries of the stream dictionary are filled in by iText. In the case of images, you’ll have at least entries for the width and the height of the image, and a value defining the compression filter, but there’s no reference to the original filename. The original bits and bytes of the image may have been changed completely.

One of the mailing-list subscribers wanted to solve the problem of retrieving specific images by adding an extra entry to the image stream dictionary. Listing 16.1 was written in answer to his question.

Listing 16.1. SpecialId.java

  1. You create an instance of the high-level Image object, and set some properties, as described in chapter 2.
  2. You use this Image object to create a low-level PdfImage object. This object extends the PdfStream class. With the second parameter, you can pass a name for the image; the third parameter can be used for the reference to a mask image.
  3. PdfStream extends PdfDictionary. Just like with plain dictionaries, you can add key-value pairs. In this case, you choose a name for the key using the prefix reserved for iText (ITXT): ITXT_SpecialId. The value of the entry is also a name of your choice, in this case /123456789.
  4. You add the stream object to the body of the file that is written by the PdfWriter object. The addToBody() method returns a PdfIndirectObject. Because it’s the first element that’s added to the writer in this example, the reference of this object will be 1 0 R.
  5. You tell the Image object that it has already been added to the writer with the method setDirectReference().
  6. Finally, you add the image to the document. The image bytes have already been written to the OutputStream in . Line writes the Do operator and its operands to the content stream of the page, and adds the correct reference to the image bytes to the page dictionary. This example unveils the mechanism that’s used by iText internally to add streams.

You’ll use the PDF file that was created by listing 16.1 in the next example. You’ll search for an image with the special ID /123456789, and you’ll replace it with another image that has a lower resolution.

16.1.2. Resizing an image in an existing document

Here’s another question that is often posted to the mailing list: “How do I reduce the size of an existing PDF containing lots of images?” There are many different answers to this question, depending on the nature of the PDF file. Maybe the same image is added multiple times, in which case passing the PDF through PdfSmartCopy could result in a serious file size reduction. Maybe the PDF wasn’t compressed, or maybe there are plenty of unused objects. You could try to see if the PdfReader’s removeUnusedObjects() method has any effect.

It’s more likely that the PDF contains high-resolution images, in which case the original question should be rephrased as, “How do I reduce the resolution of the images inside my PDF?” To achieve this, you should extract the image from the PDF, downsample it, then put it back into the PDF, replacing the high-resolution image.

The next example uses brute force instead of the PdfReaderContentParser to find images. With the getXrefSize() method, you get the highest object number in the PDF document, and you loop over every object, searching for a stream that has the special ID you’re looking for.

Listing 16.2. ResizeImage.java

Once you’ve found the stream you need, you create a PdfImageObject that will create a java.awt.image.BufferedImage named bi; you’ll create a second BufferedImage named img that is a factor smaller. In this example, the value of FACTOR is 0.5. You draw the image bi to the Graphics2D object of the image image using an affine transformation that scales the image down with a factor of FACTOR.

You write the image as a JPEG to a ByteArrayOutputStream, and use the bytes from this OutputStream as the new data for the stream object you’ve retrieved from PdfReader. You reset all the entries in the image dictionary and add all the keys that are necessary for a PDF viewer to interpret the image bytes correctly. After changing the PRStream object in the reader, you use PdfStamper to write the altered file to a FileOutputStream. Again, you get a look at the way iText works internally. When you add a JPEG to a document the normal way, iText selects all the entries for the image dictionary for you.

Working at the lowest level is fun and gives you a lot of power, but you really have to know what you’re doing, or you can seriously damage a PDF file. Because of the high complexity, some requirements are close to impossible. For instance, it’s very hard to replace a font. Let’s start by finding a way to list the fonts that are used in a PDF document.

16.1.3. Listing the fonts used

In listing 11.1, you created a PDF document demonstrating different font types. You can now use listing 16.3 to inspect this document and create a set containing all the fonts that were used. This time you won’t look at every object in the PDF, as done in the previous listing—even those that weren’t relevant. This time you’ll process the resources of every page in the document.

Listing 16.3. ListUsedFonts.java

In this listing, you check for a series of keys in the font descriptor dictionary to determine the font type . Table 16.1 explains which key corresponds with which font type.

Table 16.1. Stream references in the font descriptor

Key

Description

FONTFILE The value for this key (if present) is a stream containing a Type 1 font program.
FONTFILE2 The value for this key (if present) is a stream containing a TrueType font program.
FONTFILE3 The value for this key (if present) is a stream containing a font program whose format is specified by the /Subtype entry in the stream dictionary. It can be /Type1C, /CIDFontType0C, or /OpenType.

If you try this example on the file created in chapter 11, you’ll get the following result:

Arial-BlackItalic subset (IAEZOI)
ArialMT subset (WTBBZY)
ArialMT subset (XKYIQK)
CMR10 (Type 1) embedded
Helvetica nofontdescriptor
KozMinPro-Regular-UniJIS-UCS2-H nofontdescriptor
MS-Gothic subset (ZGXOUP)
Puritan2 (Type1) embedded

The standard Helvetica Type 1 font isn’t embedded, and there’s no font descriptor. The same goes for the KozMinPro-Regular CJK font. Embedded Type 1 fonts are always fully embedded by iText. TrueType and OpenType fonts are subsetted unless you changed the default behavior with the setSubset() method. This was explained in chapter 11.

Observe that there are two entries of ArialMT. This is caused by the use of two variations of the Arial font: one using WinAnsi encoding and one using Identity-H. You can’t store both types of the font in the same font dictionary and stream; two different font objects with different names will be created. In this case, the font names are WTBBZY+ArialMT and XKYIQK+ArialMT. The six-letter code is chosen at random and will change every time you execute the example.

 

FAQ

Can I combine different subsetted fonts into one font? The easy answer is “no.” The not-so-easy answer is that merging subsets is really hard. It may require the page content of all the pages to be rewritten.

 

In the next example, you’ll replace a font that isn’t embedded with a fully embedded font. This will give you an idea of the difficulties you can expect if you ever try to combine different subsetted fonts into one.

16.1.4. Replacing a font

Figure 16.1 shows two PDF files that were created in the very same way, except for one difference: in the upper PDF, the font (Walt Disney Script v4.1) wasn’t embedded. It’s a font I downloaded from a site with plenty of free fonts. The font isn’t installed on my OS, so Adobe Reader doesn’t find it, and the words “iText in Action” are shown in Adobe Sans MM, which is quite different from the font shown in the PDF that has the font embedded.

Figure 16.1. Non-embedded versus embedded fonts

Suppose you have the upper PDF as well as the font file for the Walt Disney Script font. You could use this listing to embed that font after the fact.

Listing 16.4. EmbedFontPostFacto.java

In this listing, you’re adding the complete font file. You add the reference to the stream using the FONTFILE2 key because you know in advance that the font has TrueType outlines. That’s not the only assumption you make. You also assume that the metrics of the font that is used in the PDF correspond to the metrics of the new font you’re embedding.

When we talked about parsing PDFs, I explained that we could only make a fair attempt, but that the functionality could fail for PDFs using exotic encodings. Several warnings that were mentioned in section 15.3.1 also apply here. In real-world examples, replacing one font with another can be very difficult.

Now that you know what a PDF looks like on the inside, these examples complement your knowledge about images (discussed in chapter 10) and fonts (chapter 11). In the sections that follow, we’ll take a close look at annotations (chapter 7) that are associated with a PDF stream.

16.2. Embedding files into a PDF

You’ve already created a document with file attachment annotations in section 7.3.3. You can embed different files of any type—images, Word documents, XML files, other PDF files—into a PDF document as an annotation, but there’s also an alternative way to do this.

In this section, we’ll briefly return to file attachment annotations, and you’ll learn about document-level attachments and create actions to open these annotations. We’ll also discuss the concept of portable collections.

16.2.1. File attachment annotations

Figure 16.2 shows a list of Kubrick movies available in video stores. There’s a pushpin next to every movie title, and if you click the pushpin, the movie poster is shown. All the file attachments are also listed in the file attachments panel at the bottom.

Figure 16.2. File attachment annotations

The next listing demonstrates how you can extract the attached files by looping over all the pages of the document, inspecting the /Annots array.

Listing 16.5. KubrickDvds.java

If you don’t want to add an attachment using a visible annotation, you can attach files at the document level.

16.2.2. Document-level attachments

In the next listing, you’ll create a page listing the movies that are discussed in the documentary, Stanley Kubrick: A Life in Pictures. When you add the movies to a List, you create an XML file. You then add this XML file to the document as an attachment with the addFileAttachment() method.

Listing 16.6. KubrickDocumentary.java

Suppose you’ve created a report based on different spreadsheets; you could add the original spreadsheets to your document as attachments. This is also an ideal way to combine any presentation for human consumption with the data for automated consumption.

The following listing shows how you can extract the XML from the PDF document you created in the previous one.

Listing 16.7. KubrickDocumentary.java (continued)

The references to the file specifications of document-level attachments can be found through the /EmbeddedFiles entry in the catalog’s name tree. These reference are in turn part of a name tree. In section 13.3.3, you learned that a name tree is an array with ordered pairs of strings and values. In this case, you ignore the names—you only want the values, which are the file specifications of the embedded files.

16.2.3. Go to embedded file action

Embedded files—be they added as annotations, or at the document level—are listed in the attachments panel where the end user can select and open them. If you want to provide a better way for end users to find an attachment, you can create goto actions to switch to an embedded file, or to the parent of an embedded file.

The document in figure 16.3 shows a PDF listing the DVDs that are packaged in the Kubrick box: eight Kubrick movies and a documentary. The PDF has nine attachments in the PDF format, one per movie. When you click “see info,” one of these attached files will open. There’s a “Go to original document” link in each of these files to return you to the original document. This is done with a /GoToE action specifying a destination in an embedded or embedding file (an attachment or a parent of an attachment).

Figure 16.3. Go to embedded files

This example shows how such a /GoToE action in the parent document is created.

Listing 16.8. KubrickBox.java

You do something similar in the PDF files that are attached to the parent document:

PdfDestination dest = new PdfDestination(PdfDestination.FIT);
dest.addFirst(new PdfNumber(1));
PdfTargetDictionary target = new PdfTargetDictionary(false);
Chunk chunk = new Chunk("Go to original document");
PdfAction action = PdfAction.gotoEmbedded(null, target, dest, false);
chunk.setAction(action);
document.add(chunk);

How does this work? The gotoEmbedded() method expects four parameters:

  • A filename— The name of the PDF file that has attachments. This parameter can be null if you want to go to an attachment in the current document.
  • A target— An instance of the class PdfTargetDictionary. We’ll discuss this dictionary in a moment.
  • A destination— A PdfString or a PdfName if you want to jump to a named destination (see section 7.1.1); a PdfDestination if you want to go to an explicit destination (see section 7.1.2).
  • A Boolean value— If true, the destination document should be opened in a new window.

When you create a PdfTargetDictionary, you specify whether you are targeting a child document (true) or a parent document (false). If you want to jump to a child document, you have two options:

  • If you want to go to a file that is attached at the document-level, which could be the case if you are targeting a child document, you need to specify the name of this file with the setEmbeddedFileName() method.
  • If you’re targeting a file that was added as a file attachment annotation, you need to use setFileAttachmentPage() or setFileAttachmentPagename() to specify to which page the attachment belongs. The former method expects a page number; the latter expects a named destination. A page can contain more than one file attachment, so you also have to pass the index (0-based) of the attachment with setFileAttachmentIndex(), or its name with setFileAttachmentName()—the name is the value corresponding with the /NM key in the annotation dictionary.

It’s also possible to nest target dictionaries. For instance, you might want to go to a child document of a child document, to the parent of a parent document, or to a sibling. This is done with the setAdditionalPath() method. We’ll use this method in a more complex example involving portable collections.

16.2.4. PDF packages, portable collections, or portfolios

Suppose that you want to bundle a set of documents that belong together into one PDF, and organize them in a way that the attachment panel can’t accommodate. Suppose you want to add your own keys, and to allow the end user to sort the entries in the collection of documents based on those custom keys.

This functionality was introduced in PDF 1.7, and it’s known under different names. People working with it on the lowest level will talk about portable collections, because that’s the name that is used in the PDF reference and in ISO-32000-1. People who work on a higher level using Adobe Acrobat or Adobe Reader will say that a PDF document as shown in figure 16.4 is a portfolio. And if you ever hear people talk about PDF packages, that’s the original name of this functionality.

Figure 16.4. A portable collection containing PDF files

Figure 16.4 shows a collection of PDF files with information about the movies of Stanley Kubrick. The end user gets an overview with the year the movie was made, the movie title, the run length, and the file size. The user can also sort the entries based on these fields. Clicking one of the lines in the overview opens the file.

The fields in this UI are defined in a collection schema dictionary. This dictionary consists of a variable number of individual collection field dictionaries. The next listing shows how to create these dictionaries.

Listing 16.9. KubrickMovies.java
private static PdfCollectionSchema getCollectionSchema() {
PdfCollectionSchema schema = new PdfCollectionSchema();
PdfCollectionField size
= new PdfCollectionField("File size", PdfCollectionField.SIZE);
size.setOrder(4);
schema.addField("SIZE", size);
PdfCollectionField filename
= new PdfCollectionField("File name", PdfCollectionField.FILENAME);
filename.setVisible(false);
schema.addField("FILE", filename);
PdfCollectionField title
= new PdfCollectionField("Movie title", PdfCollectionField.TEXT);
title.setOrder(1);
schema.addField("TITLE", title);
PdfCollectionField duration
= new PdfCollectionField("Duration", PdfCollectionField.NUMBER);
duration.setOrder(2);
schema.addField("DURATION", duration);
PdfCollectionField year
= new PdfCollectionField("Year", PdfCollectionField.NUMBER);
year.setOrder(0);
schema.addField("YEAR", year);
return schema;
}

In listing 16.9, you create five PdfCollectionField objects. The constructor of this class accepts a name that will be used as the caption of a column in the detail view of the collection. It also expects a field type, which must be one of values listed in table 16.2.

Table 16.2. Collection field types

Parameter

Name

Description

TEXT /S The field value will contain text; iText will use the object PdfString internally.
DATE /D The field value will contain a date; iText will use the object PdfDate internally.
NUMBER /N The field value will contain a number; iText will use the object PdfNumber internally.
FILENAME /F The value will be obtained from the /UF entry in the file specification.
DESC /Desc The value will be obtained from the /Desc entry in the file specification.
MODDATE /ModDate The value will be obtained from the /ModDate entry in the file specification.
CREATIONDATE /CreationDate The value will be obtained from the /CreationDate entry in the file specification.
SIZE /Size The size of the embedded file as identified by the /Size entry in the /Params dictionary of the stream dictionary of the embedded file.

You can set the order of the fields in the UI with the setOrder() method. Observe that in listing 16.9 you set one field invisible with setVisible(false). As a result, there’s no column with that filename in figure 16.4. The default is true; all other fields are visible. Finally, you can make the field editable with the setEditable() method. By default, fields are not editable.

 

Note

If the collection schema is absent, the Reader will choose useful defaults taken from the file specification dictionary, such as the filename and the file size.

 

The collection schema is used in the collection dictionary of the PDF document. You construct a PdfCollection dictionary with one of the following preferences as a parameter:

  • DETAIL The collection view is presented in detail mode, with all information in the schema dictionary presented in a multicolumn format. This mode provides the most information to the user. See figure 16.4.
  • TILE The collection view is presented in tile mode, with each file in the collection denoted by a small icon and a subset of information from the schema dictionary. This mode provides top-level information about the file attachments to the user. See figure 16.5.
  • HIDDEN The collection view is initially hidden, without preventing the user from obtaining a file list via explicit actions.
  • CUSTOM The collection view is presented by a custom navigator. This option isn’t described in ISO-32000-1, but in Adobe’s extensions to ISO-32000-1 (level 3).

The end user can always switch from the initial view to another view.

The files presented in the UI can be sorted in different ways, and you can define the sort order using a PdfCollectionSort object. You construct this object by passing the name of a field that has to be used to sort the items as a parameter. With the setSortOrder() method, you can sort the items in ascending (true) or descending (false) order. If you want to involve multiple fields, you have to pass an array of field names as a parameter of the PdfCollectionSort constructor as well as a corresponding array of Boolean values for the sort order.

Each collection has a cover page. In listing 16.10, the cover page has the text, “This document contains a collection of PDFs, one per Stanley Kubrick movie.” But when you open the document, you’ll see a different page because you’ve used the setInitialDocument() method to choose one of the embedded files as the initial page.

Once you’ve completed setting all the parameters of the PdfCollection dictionary, you can use setCollection() as is done here.

Listing 16.10. KubrickMovies.java (continued)

As soon as there are fields of type TEXT, DATE, or NUMBER in the collection schema, you need to create a PdfCollectionItem for each file specification. This class comes with a plethora of addItem() methods that allow you to set the values of the different fields present in the collection schema.

 

Note

If you sorted the collection shown in figure 16.4 alphabetically in ascending order based on the titles, you’d want the movie A Clockwork Orange to follow Barry Lyndon, and not the other way around. To achieve this, you need to pass the string “Clockwork Orange” with the addItem() method and the article “A” with the setPrefix() method. The title would be shown as A Clockwork Orange, but the sorting order wouldn’t be affected by the article “A”.

 

You’ve created your first portable collection. If you open it in Adobe Reader, there will be an extra entry named Portfolio in the View menu. You can use it to switch to another UI, such as from a detailed view to a tiled view, or to return to the cover page.

Figure 16.5 shows a second portable collection opened in tiled view. As you can see, some of the PDFs created in this section have been bundled along with a JPEG and a plain text file. The image was created using the following listing.

Figure 16.5. A portable collection containing different file types

Listing 16.11. KubrickCollection.java
PdfCollectionItem collectionitem = new PdfCollectionItem(schema);
PdfFileSpecification fs;
fs = PdfFileSpecification
.fileEmbedded(writer, IMG_KUBRICK, "kubrick.jpg", null);
fs.addDescription("Stanley Kubrick", false);
collectionitem.addItem(TYPE_FIELD, "JPEG");
fs.addCollectionItem(collectionitem);
writer.addFileAttachment(fs);

If the file type is supported by the viewer, the end user will be able to view the file directly. This is the case for the JPEG and the plain text file in figure 16.5. You can choose to open these files in an external application too. That’s also an option for file types that can’t be opened in the viewer, unless special permissions are set to avoid security hazards.

This second portfolio example, named KubrickCollection, was written to demonstrate nested /GoToE actions. The file kubrick_movies.pdf shown in figure 16.5 is the collection you created with the KubrickMovies example. The following listing adds links from the cover page of the collection to the files embedded in a file that is part of the collection.

Listing 16.12. KubrickCollection.java (continued)

The final target is a movie page that is the child of an intermediate target, namely the first attachment on page 2, which is the page with index 1. The next bit of code shows how this attachment was added.

Listing 16.13. KubrickCollection.java (continued)
PdfPCell cell = new PdfPCell(new Phrase("All movies by Kubrick"));
cell.setBorder(PdfPCell.NO_BORDER);
fs = PdfFileSpecification.fileEmbedded(writer, null,
KubrickMovies.FILENAME, new KubrickMovies().createPdf());
collectionitem.addItem(TYPE_FIELD, "PDF");
fs.addCollectionItem(collectionitem);
target = new PdfTargetDictionary(true);
target.setFileAttachmentPagename("movies");
target.setFileAttachmentName("The movies of Stanley Kubrick");
cell.setCellEvent(new PdfActionEvent(writer,
PdfAction.gotoEmbedded(null, target, dest, true)));
cell.setCellEvent(new FileAttachmentEvent(writer, fs,
"The movies of Stanley Kubrick"));
cell.setCellEvent(new LocalDestinationEvent(writer, "movies"));
table.addCell(cell);
writer.addFileAttachment(fs);

In this code snippet, we have another example of a /GoToE action, demonstrating the use of the setFileAttachmentPagename() and setFileAttachmentName() methods as alternatives for setFileAttachmentPage() and setFileAttachmentIndex(). But the main reason to look at this snippet is the final line: writer.addFileAttachment(fs);.

The kubrick_movies.pdf file is added as an attachment annotation. Internally, this annotation will appear in the /Annots array of the page dictionary. These file attachment annotations do not appear in the list of embedded files and are therefore not a part of the portable collection, unless you also add them as document-level attachments.

Don’t worry, the bits and bytes of the file will only be present once inside the PDF file. The file specification will be referenced from two places: from a file attachment annotation on the page level, and from the /EmbeddedFiles name tree at the document level.

If you’ve experimented with the examples while reading this book, you’ve probably noticed that the files with the movie information that were embedded in the PDF named kubrick_movies.pdf contain a “Go to original document” link that doesn’t work. This link is created with this listing:

Listing 16.14. KubrickMovies.java (continued)
PdfTargetDictionary target = new PdfTargetDictionary(false);
target.setAdditionalPath(new PdfTargetDictionary(false));
Chunk chunk = new Chunk("Go to original document");
PdfAction action
= PdfAction.gotoEmbedded(null, target, new PdfString("movies"), false);

This creates a link to the parent of a parent. It’s normal that this link doesn’t work in the context of the standalone kubrick_movies.pdf file, because there’s no grandparent. This link will only work when the file with the movie information is opened in the context of the kubrick_collection.pdf file in which the kubrick_movies.pdf file is embedded. While it’s fun to make constructions like this, you shouldn’t confuse the end user by making the family structure of embedded files and embedded goto actions too complex.

Let’s move on and look at special types of annotations that allow you to add movies, sound, and other multimedia formats as part of a document.

16.3. Integrating rich media

ISO-32000-1 has a complete chapter about multimedia, explaining how to embed movies and sound and even 3D images into pages, but the supplement to ISO-32000-1 (extension level 3), also adds the concept of rich media. If you look for the term “Rich Media” on Wikipedia, you’ll be forwarded to a page about “Interactive media”:

Interactive media normally refers to products and services on digital computer-based systems which respond to the user’s actions by presenting content such as text, graphics, animation, video, audio.

“Interactive media,” Wikipedia

Let’s start with the more traditional multimedia, such as movies, then have a look at a 3D example, and finish this chapter with a rich media annotation that embeds a Flash application into a PDF document.

16.3.1. Movie annotations

In chapter 10, you created a document containing the different frames of an animated GIF showing a fox jumping over a dog. You learned that animated GIFs aren’t supported in PDF, but if you want to add a movie with a fox jumping over a dog, you can create an annotation using the media types shown in table 16.3.

Table 16.3. Multimedia files supported in PDF

Extension

MIME-type

Description

.aiff audio/aiff Audio Interchange File Format
.au audio/basic NeXT/Sun Audio Format
.avi video/avi AVI (Audio/Video Interleaved)
.mid audio/midi MIDI (Musical Instrument Digital Interface)
.mov video/quicktime QuickTime
.mp3 audio/x-mp3 MPEG Audio Layer-3
.mp4 audio/mp4 MPEG-4 Audio
.mp4 video/mp4 MPEG-4 Video
.mpeg video/mpeg MPEG-2 Video
.smil application/smil Synchronized Multimedia Integration Language
.swf application/x-shockwave-flash Macromedia Flash

Depending on the viewer, other types of multimedia may be supported too, but these are the ones listed in appendix H of the PDF specification.

Adding a movie with iText is done with a screen annotation. You can use the createScreen() method to add an annotation that refers to an external file, or you can embed the file as is done next.

Listing 16.15. MovieAnnotation.java
PdfFileSpecification fs = PdfFileSpecification
.fileEmbedded(writer, RESOURCE, "foxdog.mpg", null);
writer.addAnnotation(PdfAnnotation.createScreen(writer,
new Rectangle(200f, 700f, 400f, 800f), "Fox and Dog", fs,
"video/mpeg", true));

The constant value RESOURCE contains the path to the file that needs to be embedded; foxdog.mpg is the name that will be used inside the PDF.

 

Note

The viewer will warn you about possible security hazards before you can play a movie or any other multimedia file, because one never knows if the file contains a Trojan horse. (I’m not referring to a wooden construction concealing Brad Pitt.)

 

You can also add sound with a sound annotation, but currently there are no convenience methods in iText to do this. If you need to embed an .au file, you’ll have to create PdfDictionary objects describing a sound object, a sound annotation, and possibly a sound action. The same goes for 3D annotations.

In the next section, we’ll learn how to create specific objects that are described in ISO-32000-1, but for which there are no convenience classes or methods.

16.3.2. 3D annotations

The 3D Industry Forum is a special consortium that brought together a diverse group of companies and organizations, including Adobe Systems, HP, and Intel. They’ve developed a format named the Universal 3D (U3D) file format, a compressed file format for 3D computer graphics. The format was standardized by Ecma International in 2005.

This format is natively supported by the PDF format. 3D objects in U3D format can be inserted into PDF documents and interactively visualized by Adobe Reader 7.0 and higher. This is done with a 3D annotation that provides a virtual camera through which the artwork is viewed. Figure 16.6 shows a 3D image of a teapot. You can change the view of the object in the PDF by using the mouse and the 3D controls in the bar on top of the annotation.

Figure 16.6. Document with a 3D annotation

To produce a PDF like the one shown in figure 16.6, you need to use basic PdfObject classes to create a 3D stream object in listing 16.16, a 3D view dictionary , and a 3D annotation .

Listing 16.16. Pdf3D.java

In , you create a PdfStream using a FileInputStream that allows iText to read a U3D file. You add the keys /Type and /Subtype to the stream dictionary to indicate that you’re creating a /3D stream of type /U3D—you do this because it’s described that way in ISO-32000-1. You then compress the stream and add the compressed stream to the body of the PDF file with the addToBody() method.

 

Note

If you create a PdfStream by passing an array of bytes, the stream object can immediately determine the length of the stream. This length will change when you invoke flatecompress(). In this example, you’re creating the stream using a FileInputStream, and iText doesn’t know the length of the stream until after the stream has been written to the body, so iText will create an indirect reference for the value of the /Length key. It’s up to us to write the object with the actual value to the body once the length is known. This is done with the writeLength() method.

 

When you add the stream to the body, you obtain the indirect reference streamObject.

You also need a 3D view dictionary . In this dictionary, you can specify parameters for the virtual camera associated with a 3D annotation: the orientation and position of the camera, details regarding the projection of camera coordinates, and so on.

In listing 16.16, you define an external (/XN) and an internal (/IN) name. The matrix system (/MS) indicates that you’ll specify a matrix (/M) using a Camera to World entry (/C2W). This is a 12-element 3D transformation matrix that specifies a position and orientation of the camera in world coordinates. The /CO value is a non-negative number indicating a distance in the camera coordinate system along the Z axis to the center of orbit for this view. For the complete description of these values, and of the other options that are available, please read section 13.6 of ISO-32000-1.

You add the 3D view dictionary to the body with the addToBody() method, just as you did with the 3D stream. You obtain the indirect reference dictObject. Finally, you create a 3D annotation as you did before in listing 7.20. You can consult ISO-32000-1 to find out which keys are required in the annotation dictionary, and you add the annotation to the PDF document using the addAnnotation() method.

3D is hot, and it will probably become even hotter, because Adobe’s supplement to ISO-32000-1 and Acrobat 9 came with plenty of new features for 3D. If you need 3D functionality, please check itextpdf.com to find out if 3D classes have been added to iText before you add 3D streams the hard way—as described in listing 16.16.

We’ll conclude this chapter with an example of brand new functionality described in the supplement to ISO-32000-1 by Adobe (extension level 3).

16.3.3. Embedding Flash into a PDF

You can embed a Flash application (a .swf file) into a PDF document using a movie annotation as described in section 16.3.1. This works well for a Flash movie, but if you want to embed a Flash application, you’ll discover that the interactive features are rather limited. If you want to take advantage of all the functionality of a Flash application, you’ll need to embed the .swf file as a rich media annotation. This was the case for the PDF shown in figure 16.7.

Figure 16.7. Integrating a Flash application in a PDF document

The combo box with dates, the button for selecting a day, and the table listing screenings on a particular day are all part of a Flash application written in Flex.

Writing a Flex Application

This listing shows the source code of the Flex application.

Listing 16.17. FestivalCalendar1.mxml

This listing is easy to understand even if you’ve never written a Flex application. There are two methods written in ActionScript inside the Script tag. The object defined with the HTTPService tag will be responsible for making a connection to a site and retrieving data about screenings in the form of an XML file (resultFormat="e4x").

The layout of the UI is defined using a Grid containing two GridRows. The first row has two GridItems: a ComboBox with days ranging from 2011-10-12 to 2011-10-19, and a Button with the label "Select day". The item in the second row has colspan 2, and contains a DataGrid with five DataGridColumns: Time, Location, Duration, Title, and Year. The data provider for this data grid is the last result of the HTTPService with id screeningsService.

This .mxml file was compiled into an .swf file using Flex Builder. This .swf file can be embedded into an HTML file, but you’re going to integrate it into a PDF document.

Fetching XML Data From a Server

This example shows the XML file that is fetched by this service for October 12.

Listing 16.18. http://flex.itextpdf.org/fff/day_2011-10-12.xml
<day date="2011-10-12">
<screening>
<location>GP.3</location>
<time>09:30:00</time>
<duration>98</duration>
<title>The Counterfeiters</title>
<year>2007</year>
</screening>
<screening>
<location>GP.3</location>
<time>11:30:00</time>
<duration>120</duration>
<title>Give It All</title>
<year>1998</year>
</screening>
...
</day>

You’ve indicated that you’re looking for screening nodes in the dataProvider of the DataGrid. As a result, the data grid will have a line for every screening tag in the XML, containing the contents of the dataField defined in the DataGridColumn.

To make this work, you need to put XML files for every date in the combo box in the appropriate place on our web server, but this may not be sufficient. This will work for HTML and .swf files that are hosted on the same domain as the data files, but it won’t work in a PDF that is opened on somebody’s local machine. The Flash player that runs the Flash application—in a browser, or in a PDF viewer—operates in a secure sandbox. This sandbox will prevent the application from accessing the user’s filesystem, and from fetching data from a remote website.

In this case, the Flex application won’t be allowed to access the XML files outside the domain to which the application is deployed, unless the owner of the site where the XML files reside allows it. If you open a PDF containing this Flex application locally, you are not on the http://flex.itextpdf.org/ domain, and Adobe Reader will open a dialog box with the following security warning:

The document is trying to connect to http://flex.itextpdf.org/crossdomain.xml. If you trust the site, choose Allow. If you do not trust the site, choose Block.

The next bit of code shows the contents of the crossdomain.xml file that I had to put at the root of the flex.itextpdf.org domain in order to grant access to any Flex application from any domain.

Listing 16.19. crossdomain.xml
<?xml version="1.0"?>
<!DOCTYPE cross-domain-policy
SYSTEM "http://www.adobe.com/xml/dtds/cross-domain-policy.dtd">
<cross-domain-policy>
<site-control permitted-cross-domain-policies="all"/>
<allow-access-from domain="*" />
<allow-http-request-headers-from domain="*" headers="*"/>
</cross-domain-policy>

If such a file isn’t there, or if it doesn’t allow everyone access, the Flex application won’t be able to retrieve the data.

Even with the crossdomain.xml file in place, Adobe Reader will show a security warning every time an XML file (for instance, http://flex.itextpdf.org/fff/day_2011-10-12.xml) is fetched, unless you check the Remember My Action for This Site check box.

 

Note

Most SWF files that can be found on the market are written to be embedded in HTML files. In theory, you can embed all these files in a PDF document. However, if the SWF files were created using Flex Builder, you may experience problems when zooming in and out, or when printing a page that has a rich media annotation. These problems are caused by the default scale mode. To avoid them, you need to change the scale mode as is done in line in listing 16.17: stage.scaleMode = StageScaleMode.EXACT_FIT. This is important if you buy a Flash component that was written using Flex Builder and that was intended for use in HTML. You need to make sure the vendor has taken this into account if you want to use the .swf in a PDF document.

 

Now that you know how the Flex application was written, let’s look at how you can integrate it into a PDF document.

Rich Media Annotations

Rich media annotations aren’t part of ISO-32000-1. Support for these annotations was added by Adobe in PDF 1.7 extension level 3. In this case, it isn’t sufficient to change the version number to 1.7 with setPdfVersion(); you also have to set the extension level with the addDeveloperExtension() method. You can do this more than once if you’re using extensions from different companies.

The method expects an instance of the PdfDeveloperExtension class. In listing 16.20, you use the static final object ADOBE_1_7_EXTENSIONLEVEL3. This value was created like this:

new PdfDeveloperExtension(PdfName.ADBE, PdfWriter.PDF_VERSION_1_7, 3)

The first parameter refers to the developing company. The second parameter indicates for which PDF version the extension was written. Finally, you pass in the number of the extension level as an int.

Listing 16.20. FestivalCalendar1.java

The RichMediaAnnotation class isn’t a subclass of PdfAnnotation, but it can create such an object using the method createAnnotation(). The rich media annotation dictionary contains two important entries: a /RichMediaContent dictionary and a /RichMediaSettings dictionary. These dictionaries are created internally by iText.

The RichMediaContent dictionary consists of the assets, the configuration, and the views:

  • Assets— These are stored as a name tree with embedded file specifications. You can use the addAsset() method to add entries to this name tree.
  • Configuration— This is an array of RichMediaConfiguration objects. Such an object contains an array of RichMediaInstance objects.
  • Views— This is an array of 3D view dictionaries, in case the rich media annotation contains a 3D stream. See listing 16.16.

The RichMediaConfiguration dictionary describes a set of instances that are loaded for a given scene configuration. In this example, you use a rich media annotation to embed a Flash application, but you can also use such an annotation for 3D, sound, or video objects.

The constructors of the RichMediaConfiguration and RichMediaInstance classes accept the following parameters:

  • PdfName._3D For 3D objects
  • PdfName.FLASH For Flash objects
  • PdfName.SOUND For sound objects
  • PdfName.VIDEO For video objects

A RichMediaInstance dictionary describes a single instance of an asset with settings to populate the artwork of an annotation. In this example, you only have one Flash instance, for which you define /FlashVars: &day=2011-10-13. The day variable is retrieved in line of listing 16.17: Application.application.parameters.day.

 

Note

If you want to reuse the RichMediaContent dictionary in more than one rich media annotation, you have to create the first RichMediaAnnotation as is done in listing 16.20. You can then get a reference to the RichMediaContent dictionary with the getRichMediaContentReference() method, and use this reference as an extra parameter for the RichMediaAnnotation constructor.

 

Rich media annotations can be active or inactive. The RichMediaSettings dictionary stores conditions and responses that determine when the annotation should be activated and deactivated. iText creates this dictionary automatically, just like the RichMediaContent dictionary. It can contain a RichMediaActivation dictionary that is set with the method setActivation(), and a RichMediaDeactivation dictionary set with setDeactivation(). Listing 16.20 uses the default activation and deactivation conditions.

The possible conditions for activation—set with setCondition()—are:

  • PdfName.XA The annotation is explicitly activated by a user action or script; this is the default.
  • PdfName.PO The annotation is activated as soon as the page that contains the annotation receives focus as the current page.
  • PdfName.PV The annotation is activated as soon as any part of the page that contains the annotation becomes visible.

These are the possible conditions for deactivation—also set with setCondition():

  • PdfName.XD The annotation is explicitly deactivated by a user action or script; this is the default.
  • PdfName.PC The annotation is deactivated as soon as the page that contains the annotation loses focus as the current page.
  • PdfName.PI The annotation is deactivated as soon as the entire page that contains the annotation is no longer visible.

In the RichMediaActivation dictionary, you can also add keys to specify the animation, the view, presentation, and scripts.

This first “Flash in PDF” example is cool because you have a PDF document that presents data to the end user that isn’t part of the PDF document. I know from experience that the schedule of screenings at a film festival can change at any moment, because the film stock didn’t arrive on time or some other reason. By using this Flash application written in Flex, the document can always show the most recent information fetched from the official film festival website.

You could use the same technique to get the most recent items and prices to complete an order form in PDF. To achieve this, you’d need to establish communication between the embedded Flex application and the PDF document.

16.3.4. Establishing communication between Flex and PDF

Figure 16.8 shows a series of widget annotations (PDF), and one rich media annotation (Flash). The sentence, “This is the festival program for 2011-10-14”, is shown using a read-only text field. The text is updated using a JavaScript method that is triggered by the rich media annotation. The buttons with the different dates call a RichMediaExecuteAction that executes an ActionScript method in the Flash application.

Figure 16.8. Communication between PDF and Flash

The Flex application, in the next listing, is different from the previous example.

Listing 16.21. FestivalCalendar2.mxml

You’ll recognize the HTTP service that will get the XML files from http://flex.itextpdf.org/ and the data grid that visualizes the XML data. You no longer need the combo box and the button, because you’re going to change the data from outside the Flex application.

You import the flash.external.* package , because you’re going to use the ExternalInterface object. With the addCallback() method , you make the ActionScript method getDateInfo() in the Flex application available for external applications. In this method, you call the JavaScript showDate() method that is supposed to be present in the PDF document by using the call() method .

The showDate() JavaScript method in the PDF is very simple:

function showDate(txt) {
this.getField("date").value
= "This is the festival program for " + txt;
}

It gets the text field with the name date, and it changes the value of this field so that it corresponds with the date for which the screenings are shown.

The first part of this next listing should look familiar.

Listing 16.22. FestivalCalendar2.java

In the second half of the previous listing, you add a series of buttons to the document, creating a RichMediaExecuteAction for each button. The action will be triggered on the rich media annotation for which you pass the indirect reference. You also pass a RichMediaCommand.

The name of the action is a PDF string that corresponds to the string you used as the parameter of the ExternalInterface.addCallback() method in the Flex application. The argument can be a PdfString, PdfNumber, PdfBoolean, or PdfArray containing those objects.

You also add a text field named date. When you click one of the PDF buttons, the getDateInfo() method will be called, an XML file containing screenings will be fetched from the internet, filling the data grid, and the Flex application will trigger the showDate() JavaScript method to change the value of the date field.

Although this is a very simple example, the techniques that are used can apply to many different types of applications. You could use these techniques to integrate fancy Flash buttons that trigger functions in a PDF file, or you could embed a Flex application to establish client-server communication to retrieve the most recent data. But don’t forget that this functionality is very new: it only works with the most recent versions of Adobe Reader!

16.4. Summary

In this final chapter, we’ve looked at different kinds of PDF streams. We started with streams that hold an image or a font and looked at the way iText creates low-level objects that are responsible for writing such a stream to the OutputStream. You also learned how to replace such a stream.

Then we moved on to a type of annotation we encountered in chapter 7: a file attachment annotation. You discovered that there’s a difference between file attachments that are added as annotations, and file attachments that are stored at the document level as embedded files. This difference matters if you want to extract files from a PDF document. Files that are embedded at the document level can be organized into a portable collection, aka a portfolio.

Finally, we discussed multimedia files. You added annotations containing a movie file and a 3D stream, and you also used a very new type of annotation that isn’t part of ISO-32000 yet.

With a rich media annotation, you were able to integrate a Flash application into a PDF document, and establish communication between the ActionScript in the Flash application and the JavaScript in the PDF document. Those were the last examples of this book.

What You’ve Learned From This Book

Let’s return to the first image in chapter 1 and quickly review everything you’ve learned. Figure 16.9 shows three main areas.

Figure 16.9. Overview of the PDF functionality that was covered

Creating PDFs

Chapter 1 provided a short introduction. In chapters 2 and 4 you learned to create PDF documents from scratch using high-level objects. You did the same using low-level functionality in chapters 3 and 5. These first five chapters formed part one of the book.

Essential skills concerning PDF creation were explained in part three. In chapter 9, you learned to create documents on the fly from a web application. We focused on color and images in chapter 10, and on fonts in chapter 11. Chapter 12 was about encrypting and signing documents.

For advanced users, there’s also part four, which explains the inner workings of iText and PDF. Chapter 14 will remain especially interesting as a reference for developers who frequently need to add content to a document using low-level methods.

Most of the PDF files in this book were generated using data from a database, but in some cases you converted an XML or an HTML file to PDF. For instance in chapter 9, we talked about using the HTMLWorker class to convert HTML snippets; in chapter 11 you converted an XML file containing the word “peace” in many different languages into a PDF document.

On occasion, we looked at creating a PDF document manually, using Open Office rather than using iText, such as in chapter 6. In chapter 8, you used Adobe Acrobat and LiveCycle Designer. These files were created for the purpose of updating them, and that takes us from the Read block in figure 16.9 to the Update block.

Updating PDFs

Part two of this book was titled “Manipulating existing PDF documents.” Chapter 6 presented an overview of all the PDF manipulation classes available in iText. You always needed a PdfReader instance to access an existing document. You learned how to split and merge PDF documents with PdfCopy, PdfSmartCopy, PdfCopyFields, and even using PdfImportedPage objects, but the class you used the most was PdfStamper, which was initially written to stamp extra information on an existing document.

In chapter 7, you used PdfStamper to add different types of annotations. This functionality is also useful when creating a document from scratch; for instance, to add links that allow the end user to navigate from one page to another, or from one document to another. Along the way, we talked about bookmarks, actions, and destinations.

Chapter 8 was dedicated entirely to forms: you worked with forms built using the AcroForm technology, and with XFA forms. iText has almost complete support for AcroForms, but as soon as you have a form involving the XML Forms Architecture, the possibilities are limited. For instance, iText can’t flatten an XFA form (yet).

Signing and encrypting existing PDF documents was discussed in chapter 12. Converting a PDF document to another format turned out to be very difficult, but in some cases, you can extract an XML version of the complete document, or extract plain text from a page.

Reading PDFs

iText isn’t a PDF viewer, nor can iText be used to print a PDF, but the PdfReader class can give you access to the objects that form a PDF document. The different types of objects that are defined in the PDF specification were listed in chapter 13, where you also had a closer look at the root object of a PDF document.

Chapter 14 dealt with the imaging system and the way the content of a page in a PDF document is organized. We continued studying the content stream of a page in chapter 15, looking for ways to add structure. You found out that a PDF can be read out loud if marked content has been added to improve the accessibility of the document. You also learned how to convert a PDF to XML if the PDF was tagged and contains a structure tree. At the end of the chapter, we made a fair attempt at parsing the content of a page to plain text.

Finally, in chapter 16 you learned more about streams. You even wrote a Flex application for use in a PDF document.

This isn’t the first book I’ve written, and based on my previous experience, I know that one can never have enough documentation. But with this book, you have a comprehensible overview of what is possible with PDF in general—the different topics listed in figure 16.9—and with iText in particular—the topics marked with the iText logo.

Sure, writing a book is a lot of work, but I also had a lot of fun writing new material for this second edition: creating the movie database, making my first dynamic PDF using LiveCycle Designer, learning Flex for the sole purpose of creating a PDF containing a rich media annotation, and inventing many other new examples that weren’t in the first edition.

I hope you’ve enjoyed reading this book as much as I enjoyed writing it.

May the source be with you!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset