Chapter 12. Protecting your PDF

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12. Protecting your PDF

This chapter covers

Providing metadata
Compressing and decompressing PDFs
Encrypting documents
Adding digital signatures

You have created many different documents containing data, such as movies, directors, and movie screenings taken from a database, but you haven’t added any information about the owner of this data. You could make sure that people find out who created the document by adding metadata.

You’ve also peeked inside some of the PDF files you created, and you’ve seen that the content of a document is compressed by default. You could use iText to decompress content streams to read the PDF syntax that makes up a page or a form XObject.

For confidential documents, you’ll want to protect the document. To achieve this, we’re going to discuss how to encrypt content streams. You can do this using a password, or you can encrypt a PDF using a public key. Only the person who owns the corresponding private key will be able to open the document.

Digital signatures work the other way around: you sign a document using your private key, and whoever reads your document can use your public key (or the root certificate of a certificate authority) to make sure the document wasn’t forged by somebody else.

But let’s begin with the beginning, and start by adding metadata.

12.1. Adding metadata

There are two ways to store metadata inside a PDF document. The original way was to store a limited number of keys and values in a special dictionary; a newer way is to embed the data as an XML stream inside the PDF. Let’s discuss both to find out the difference.

12.1.1. The info dictionary

In figure 12.1, the document properties from the Hello World example you made in chapter 1 are compared to a new Hello World example with metadata added.

Figure 12.1. Metadata in PDF files

The metadata shown in the window to the right was added using this code:

Listing 12.1. MetadataPdf.java

document.addTitle("Hello World example");
document.addAuthor("Bruno Lowagie");
document.addSubject("This example shows how to add metadata");
document.addKeywords("Metadata, iText, PDF");
document.addCreator("My program using iText");

This code snippet adds the title of the document, its author, the subject, some keywords, and the application that was used to create the PDF as metadata. If you look inside the PDF, you see that this information is stored in a dictionary, named the info dictionary, along with the creation date, modification date, and PDF producer. This is the limited set of metadata key-value pairs that is supported in PDF.

Three metadata entries are filled in automatically by iText (and you can’t change them). If you create a PDF from scratch, iText will use the time on the clock of your local computer as the creation and modification date. If you manipulate a PDF with PdfStamper, only the modification date will be changed. The same goes for the producer name.

Listing 12.2. MetadataPdf.java

With the getInfo() method, you can retrieve the keys and values as Strings. You can add, remove, or replace entries in the HashMap, and put the altered metadata in the PDF using setMoreInfo().

FAQ

Can I change the producer info? The value for the PDF producer tells you which version of iText was used to create the document. It’s also a way to tell the end users of the document that iText was used to create it. You can’t change this without breaking the software license that allows you to use iText for free.

A dictionary is a PDF object, and the values that are stored in this dictionary are also PDF objects. PDF viewers such as Adobe Reader don’t have any problem interpreting these objects, but applications that aren’t PDF-aware can’t find or read this meta-information. The Extensible Metadata Platform (XMP) was introduced to solve this problem.

12.1.2. The Extensible Metadata Platform (XMP)

The Extensible Metadata Platform provides a standard format for the creation, processing, and interchange of metadata. An XMP stream can be embedded in a number of popular file formats (TIFF, JPEG, PNG, GIF, PDF, HTML, and so on) without breaking their readability by non-XMP-aware applications.

The XMP specification defines a model that can be used with any defined set of metadata items. It also defines particular schemas; for instance, the Dublin Core schema provides a set of commonly used properties such as the title of the document, a description, and so on. For PDF files, there’s a PDF schema with information about the keywords, the PDF version, and the PDF producer. This way, an application that can’t interpret PDF syntax can still extract the metadata from the file by detecting and parsing the XML that is embedded inside the PDF. What follows is an example of such an XMP metadata stream.

Listing 12.3. xmp.xml

<?xpacket begin="?" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description
      rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/">
      <dc:format>application/pdf</dc:format>
      <dc:description><rdf:Alt>
        <rdf:li>This example shows how to add metadata</rdf:li>
      </rdf:Alt></dc:description>
      <dc:subject><rdf:Bag>
        <rdf:li>This example shows how to add metadata</rdf:li>
      </rdf:Bag></dc:subject>
      <dc:title><rdf:Alt>
        <rdf:li>Hello World example</rdf:li>
      </rdf:Alt></dc:title>
      <dc:creator><rdf:Seq>
        <rdf:li>Bruno Lowagie</rdf:li>
      </rdf:Seq></dc:creator>
    </rdf:Description>
    <rdf:Description rdf:about="" xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
      <pdf:Producer>iText 5.0.1 (c) 1T3XT BVBA</pdf:Producer>
      <pdf:keywords>Metadata, iText, PDF</pdf:keywords>
    </rdf:Description>
    <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/">
      <xmp:CreateDate>2010-01-22T16:31:00+01:00</xmp:CreateDate>
      <xmp:ModifyDate>2010-01-22T16:31:01+01:00</xmp:ModifyDate>
      <xmp:CreatorTool>My program using iText</xmp:CreatorTool>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>

This stream was created with iText using the XmpWriter class. The following bit of code shows how to add an XMP stream as metadata.

Listing 12.4. MetadataXmp.java

You use the byte[] created with XmpWriter with the setXmpMetadata() method to add the stream to the PdfWriter. This XMP stream covers the complete document. It’s also possible to define an XML stream for individual pages. In that case you need to use the setPageXmpMetadata() method.

You can delegate the creation of the XMP stream to iText. Just create the metadata as done in listing 12.1, and add the following line:

writer.createXmpMetadata();

Suppose you have a PDF file that only contains metadata in an info dictionary. In that case, you can use the following to add an XMP stream.

Listing 12.5. MetadataXmp.java

Extracting the XMP metadata from an existing PDF is done using the getMetadata() method on a PdfReader instance.

Tools or applications that aren’t PDF-aware will search through the file for an xpacket with the id shown in listing 12.3, so it’s important that the stream containing the XMP metadata is never compressed.

12.2. PDF and compression

iText will never compress an XMP metadata stream; all other content streams are compressed by default. You’ve already used the setCompressionLevel() method for the Image and BaseFont classes; you can also use it for PdfWriter to set the compression level for the other stream objects that are written to the OutputStream.

12.2.1. Compression levels

The next example uses different techniques to change the compression settings of a newly created PDF document.

Listing 12.6. HelloWorldCompression.java

The Document class has a static member variable, compress, that can be set to false if you want to avoid having iText compress the content streams of pages and form XObjects. Use this for debugging purposes only! It changes the behavior of iText for the whole JVM, and that’s not a good idea because it will also affect PDF documents created in other processes using the same JVM.

One option in listing 12.6 uses the method setFullCompression(). In the resulting PDF file, content streams will be compressed, but so will some other objects, such as the cross-reference table. This is only possible since PDF version 1.5. This is an example where iText will change the version number in the PDF header automatically from PDF-1.4 to PDF-1.5.

Table 12.1 compares the file sizes of the PDFs produced with listing 12.6.

Table 12.1. PDF and compression

Option	File size	Percentage
Without any compression (Document.compress = false)	43,237 bytes	99.23%
Compression level 0	43,567 bytes	100.00%
Default compression	12,066 bytes	27.70%
Compression level 9	11,943 bytes	27.41%
Full compression	9,836 bytes	22.58%

As you can see, compressing as many objects as possible is the most effective option in this example, but be aware that the compression percentage largely depends on the type of content in the document.

12.2.2. Compressing and decompressing existing files

Next you’ll see how to compress the content streams of the pages in an existing file.

Listing 12.7. HelloWorldCompression.java

PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(
  reader, new FileOutputStream(dest), PdfWriter.VERSION_1_5);
stamper.getWriter().setCompressionLevel(9);
int total = reader.getNumberOfPages() + 1;
for (int i = 1; i < total; i++) {
  reader.setPageContent(i, reader.getPageContent(i));
}
stamper.setFullCompression();
stamper.close();

You want the PdfStamper class to create a file with header PDF-1.5, because you’re using the setFullCompression() method after the header has been written to the OutputStream. That’s why you add the PDF version number to the parameter list of the PdfStamper constructor.

Decompressing can be done exactly the same way by setting the compression level to zero, or by using the following code.

Listing 12.8. HelloWorldCompression.java

PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
Document.compress = false;
int total = reader.getNumberOfPages() + 1;
for (int i = 1; i < total; i++) {
  reader.setPageContent(i, reader.getPageContent(i));
}
stamper.close();
Document.compress = true;

The result is a document whose PDF syntax can be seen in the content streams of each page when opened in a text editor. This can be handy when you need to debug a PDF document. We’ll take a closer look inside the content stream of a PDF in part 4.

Note

PdfStamper keeps existing stream objects intact when it manipulates a document, which means the compression level won’t be changed. As a workaround, you can use the getPageContent() method to get the content stream of a page, and the setPageContent() method to put it back. When you do so, iText thinks the stream has changed, and it will use the compression level that was defined for PdfStamper’s writer object.

Suppose your PDF contains confidential information that should only be seen by a limited number of people. Or you want to enforce access permissions to the people who download the PDF; for instance, they can view it, but they are not allowed to print it. In that case, you’ll want these content streams to be encrypted.

12.3. Encrypting a PDF document

The PDF standard security handler allows access permissions and up to two passwords to be specified for a document: a user password (sometimes referred to as the open password) and an owner password (sometimes referred to as the permissions password).

In this section, we’ll start with encrypting and decrypting PDFs using passwords, and we’ll move on to public-key encryption, which is a much more robust way to protect your documents.

Warning

The examples in the remainder of the chapter involve encryption or digital signing. If you want them to work, you’ll need extra encryption JARs in your classpath. For iText 5.0.x, the Bouncy Castle JARs are required (see section B.3.1). Later versions of iText can use different libraries. Check itextpdf.com for the most current information.

12.3.1. Creating a password-encrypted PDF

Listing 12.9 shows how to create a PDF document that is protected with two passwords. The maximum password length is 32 characters. You can enter longer passwords, but only the first 32 characters will be taken into account. One or both of the passwords can be null.

Listing 12.9. EncryptionPdf.java

public static byte[] USER = "Hello".getBytes();
public static byte[] OWNER = "World".getBytes();
public void createPdf(String filename)
  throws IOException, DocumentException {
  Document document = new Document();
  PdfWriter writer
    = PdfWriter.getInstance(document, new FileOutputStream(filename));
  writer.setEncryption(USER, OWNER,
    PdfWriter.ALLOW_PRINTING, PdfWriter.STANDARD_ENCRYPTION_128);
  writer.createXmpMetadata();
  document.open();
  document.add(new Paragraph("Hello World"));
  document.close();
}

A user who wants to open the resulting PDF file has to enter the password—“Hello” in this example—and will be able to perform only the actions that were specified in the permissions parameter. In this case, the user will be allowed to print the document but won’t be able to copy and paste content.

The document will also open if the user enters the password “World”. Using this owner password on Adobe Acrobat (not Reader), allows the user to change the permissions to whatever they want. If you don’t specify a user password, all users will be able to open the document without being prompted for a password, but the permissions and restrictions (if any) will remain in place.

Note that iText will create a random password if the owner password isn’t specified. In that case, you’ll never know which password to use if you ever want to change the access permissions.

Access Permissions

Table 12.2 shows an overview of the permissions that are available. If you pass 0 as a parameter for the permissions in the setEncryption() method, the end user can only view the document. By composing the values of table 12.2 in an or (|) sequence (such as PdfWriter.ALLOW_PRINTING | PdfWriter.ALLOW_COPY), you can grant the end user permissions (for instance to print the document and to extract text).

Table 12.2. Overview of the permissions parameters

Static final in PdfWriter	Description
ALLOW_PRINTING	The user is permitted to print the document.
ALLOW_DEGRADED_PRINTING	The user is permitted to print the document, but not with the quality offered by ALLOW_PRINTING (for 128-bit encryption only).
ALLOW_MODIFY_CONTENTS	The user is permitted to modify the contents—for example, to change the content of a page, or insert or remove a page.
ALLOW_ASSEMBLY	The user is permitted to insert, remove, and rotate pages and add bookmarks. The content of a page can’t be changed unless the permission ALLOW_MODIFY_CONTENTS is granted too (for 128-bit encryption only).
ALLOW_COPY	The user is permitted to copy or otherwise extract text and graphics from the document, including using assistive technologies such as screen readers or other accessibility devices.
ALLOW_SCREENREADERS	The user is permitted to extract text and graphics for use by accessibility devices (for 128-bit encryption only).
ALLOW_MODIFY_ANNOTATIONS	The user is permitted to add or modify text annotations and interactive form fields.
ALLOW_FILL_IN	The user is permitted to fill form fields (for 128-bit encryption only).

Half of these permissions can only be revoked when 128-bit encryption is used for one of the available encryption algorithms.

Encryption Algorithms

The standard encryption used in PDF documents is a proprietary algorithm known as RC4. RC4 was initially a trade secret, but in September 1994 a description of it was posted anonymously on the Cypherpunks mailing list. This algorithm is often referred to as ARC4 or ARCFOUR (the Alleged RC4). iText uses this unofficial implementation.

Beginning with PDF 1.6, you can also use the Advanced Encryption Standard (AES). iText supports the algorithms listed in table 12.3.

Table 12.3. Overview of the encryption algorithms

Static final in PdfWriter	Description
STANDARD_ENCRYPTION_40	40-bit ARC4 encryption
STANDARD_ENCRYPTION_128	128-bit ARC4 encryption
ENCRYPTION_AES_128	128-bit AES encryption

There’s one major problem with listing 12.9. You’re adding XMP metadata, but this metadata won’t be readable by a non-PDF-aware application, because the XMP stream will be encrypted too. To avoid this, you need to add DO_NOT_ENCRYPT_METADATA to the encryption parameter; for instance, use ENCRYPTION_AES_128 | DO_NOT_ENCRYPT_ METADATA as the encryptionType parameter.

FAQ

How do I revoke the permission to save a PDF file? It isn’t possible to restrict someone from saving or copying a PDF file. You can’t disable the Save (or Save As) option in Adobe Reader. And even if you could, people would always be able to retrieve and copy the file with another tool. If you really need this kind of protection, you must look for a Digital Rights Management (DRM) solution. DRM tools give you fine-grained control over documents.

If you want to use an encrypted PDF document with PdfReader, for instance, to fill out fields, add annotations, or even decrypt it, you always need the owner password, regardless of the permissions that were set.

Decrypting and Encrypting an Existing PDF Document

Decrypting or encrypting an existing document is easily done with PdfStamper.

Listing 12.10. EncryptionPdf.java

public void decryptPdf(String src, String dest)
  throws IOException, DocumentException {
  PdfReader reader = new PdfReader(src, OWNER);
  PdfStamper stamper
    = new PdfStamper(reader, new FileOutputStream(dest));
  stamper.close();
}
public void encryptPdf(String src, String dest)
  throws IOException, DocumentException {
  PdfReader reader = new PdfReader(src);
  PdfStamper stamper
    = new PdfStamper(reader, new FileOutputStream(dest));
  stamper.setEncryption(USER, OWNER, PdfWriter.ALLOW_PRINTING,
    PdfWriter.ENCRYPTION_AES_128 | PdfWriter.DO_NOT_ENCRYPT_METADATA);
  stamper.close();
}

You can also combine both methods from listing 12.10 to change the permissions of an already encrypted PDF document. PdfReader has a getPermissions() method that returns an integer value that can interpreted as a bit-set containing the values listed in table 12.2.

FAQ

I have an encrypted PDF document with permissions that allow me to fill in a form, but iText throws a BadPasswordException. Why? The decryption process in iText isn’t fine-grained. As soon as you start manipulating a document, iText will decrypt it, and this always requires the owner password. Note that if you’ve created the PDF using iText, passing null as the password, you won’t be able to change the document because you don’t know the randomly created password.

Encrypting a PDF document using passwords isn’t a waterproof solution. Password protection has to be seen as a psychological and legal barrier. If the document is encrypted, the author intends to protect the document against abuse. If you remove that protection without permission (that is, without the passwords), you’re deliberately doing something you’re not supposed to do. Extra restrictions were added to iText to prevent the use of the API for password removal.

If you need better protection for your documents, you can use public-key encryption.

12.3.2. Public-key encryption

With symmetric key algorithms, a single secret key has to be shared between the creator and the consumer of a document. The same key is used to encrypt and decrypt the content.

Public-key cryptography uses asymmetric key algorithms, where the key used to encrypt a message is not the same as the key used to decrypt it. Each user has a pair of cryptographic keys: one that is kept secret—the private key—and one that is publicly distributed—the public key. In the next example, you’ll use a public key to encrypt a PDF document. This way, only the person who owns the corresponding private key will be able to open the document in Adobe Reader.

But before you can do this, you need to find out how to create a public-private key pair.

Creating a Public-Private Key Pair with Keytool

You’re developing in Java, so you can use the keytool application that comes with the JDK. Let’s use the -genkey option to create a key store for somebody called Bruno Specimen:

$ keytool -genkey -alias foobar -keyalg RSA -keystore .keystore
Enter keystore password: f00b4r
What is your first and last name?
  [Unknown]:  Bruno Specimen
What is the name of your organizational unit?
  [Unknown]:  ICT
What is the name of your organization?
  [Unknown]:  Foobar Film Festival
What is the name of your City or Locality?
  [Unknown]:  Foobar
What is the name of your State or Province?
  [Unknown]:
What is the two-letter country code for this unit?
  [Unknown]:  BE
Is CN=Bruno Specimen, OU=ICT, O=Foobar Film Festival, L=Foobar,
 ST=Unknown, C=BE correct?
  [no]:  yes

Enter key password for <foobar>
        (RETURN if same as keystore password):  f1lmf3st

This file, .keystore, is protected with the password f00b4r; the private key stored in this file is protected with the password f1lmf3st. Do not share this key store or these passwords with anyone, but extract a public certificate with the -export option:

$ keytool -export -alias foobar -file foobar.cer -keystore .keystore
Enter keystore password: f00b4r
Certificate stored in file <foobar.cer>

You can now share the file foobar.cer, which contains your public key, with the world. People can use this file to encrypt a PDF document that can be read by nobody else but you, the owner of the corresponding private key.

Creating a Public-Key Encrypted PDF

In the next listing, you’ll encrypt a document using two public keys. The first one is the public key you’ve created for testing purposes (Bruno Specimen); the second one is my own public key (Bruno Lowagie).

Listing 12.11. EncryptWithCertificate.java

Note the different permissions defined for the different certificates. Bruno Specimen will only be able to print the document; I won’t be able to print it. I’ll only be able to extract text, for instance with copy/paste, provided that my private key is registered on my operating system.

Listing 12.11 will only work if the unlimited strength jurisdiction policy files are installed in your runtime environment.

FAQ

When I try to encrypt a document using public-key encryption, an Invalid-KeyException is thrown, saying the key size is invalid. Why? Due to import control restrictions by the governments of a few countries, the encryption libraries shipped by default with the JDK restrict the length, and as a result the strength, of encryption keys. If you want these examples to work, you need to replace the default JARs with the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files. These JARs are available for download from http://java.sun.com/ in eligible countries.

The document was encrypted for Bruno Specimen and for myself. If somebody else tries to open the document, they will get an Acrobat Security error, saying that “A digital ID was used to encrypt this document but no digital ID is present to decrypt it. Make sure your digital ID is properly installed or contact the document author.” See figure 12.2.

Figure 12.2. A protected public-key encrypted PDF document

Suppose that the document was created for you. In that case, you should use the keytool utility to export the private key from your key store to a .p12 file. To install your private key on the Windows OS, you need to double-click this file (for instance, private.p12) and follow the instructions.

Note

The path to my personal key store and certificate, along with the corresponding passwords that are used in the examples, are stored in a properties file on my OS. For obvious reasons, this file is not distributed with the examples.

When you open the document that was created using listing 12.11 and your public key, the PDF will be shown, as in figure 12.3. When you open the Document Properties window, you can check the permissions and the security method that was used.

Figure 12.3. An opened public-key-encrypted PDF document

While this is a safer way to protect your document than using user and owner passwords, it’s hard to enforce the permissions. Private key holders can always use a PDF library to decrypt the content that was encrypted with their own public key.

Decrypting and Encrypting Existing PDFs

In the previous section, you used PdfStamper to decrypt existing password-protected PDF files using the owner password; or to encrypt an unprotected PDF file by adding a user and an owner password. In this listing, you’ll do the same with public-key encryption.

Listing 12.12. EncryptWithCertificate.java

Apart from the fact that the access permissions got lost in the decryption process, there’s another problem that is inherent to the way Bruno Specimen’s key was created. What if Bruno Specimen actually exists? You could distribute the public key you created for him, and you could pretend to be him. He wouldn’t like that.

Anybody can generate a private key and a self-signed certificate. To solve this problem, Bruno Specimen can call in a third party that is beyond suspicion: a certificate authority (CA). He could create a certificate signing request (CSR) like this:

$ keytool -certreq -keystore .keystore -alias foobar -file foobar.csr
Enter keystore password: f00b4r
Enter key password for f1lmf3st

A file, foobar.csr, is generated. Bruno Specimen can send this file to a CA, and this third party will check if Bruno Specimen is really who he says he is. If his identity can be verified, he’ll receive a Privacy Enhanced Mail (PEM) file, which will contain his public certificate signed by the CA using the CA’s private key. This certificate can be decrypted with the CA’s public key, which comes in the form of a Distinguished Encoding Rules (DER) file.

Many applications ship with a number of root certificates from CAs. This is necessary to check the validity of digital signatures.

12.4. Digital signatures, OCSP, and timestamping

Digital signatures in PDF also involve asymmetric cryptography. Suppose that you receive an official PDF document from Bruno Specimen. How do you make sure that this document was originally created by Bruno and not by somebody else? Also, how do you make sure that nobody changed the document after Bruno created it and before you received it?

This is only possible if the document was digitally signed by Bruno. The signing application will make a digest of the document’s content, and encrypt it using Bruno’s private key. This encrypted digest will be stored in a signature field. When you open the signed PDF, the viewer application will decrypt the encrypted digest using the author’s public key, and compare it with a newly created digest of the content. If there’s a match, the document wasn’t tampered with; if there’s a difference, somebody else has tried to forge the author’s signature, or the document was changed after it was signed.

Let’s start by creating a document that has a signature field.

12.4.1. Creating an unsigned signature field

When you created AcroForms in chapter 8, we discussed button (/Btn), text (/Tx), and choice (/Ch) fields, but we skipped signature fields (/Sig). Figure 12.4 shows two PDF files. The one to the left has a signature field without a digital signature. This file was signed using my own private key. The resulting PDF is shown on the right.

Figure 12.4. PDFs with signature fields

This code shows how to add a signature field without a signature.

Listing 12.13. SignatureField.java

Normally, you won’t have to use the code in listing 12.13. The signature field will either be present because it was added by another application (such as Adobe Acrobat); or you’ll be presented with a document that has no signature field. In that case, you can add the field and sign it at the same time.

12.4.2. Signing a PDF

Listing 12.14 adds a signature to the field created in listing 12.13. There are two options: with the parameter certified, you can choose whether or not to use a certification signature. In figure 12.5 there’s a bar with the text “Certified by Bruno Lowagie <[email protected]>, certificate issued by CA Cert Signing Authority.” This is different from what was displayed in figure 12.4, where it only said “Signed and all signatures are valid.”

Figure 12.5. PDF with a certifying signature

There’s also a graphic parameter to define whether or not to use a graphical object instead of a text message. In figure 12.5, the 1T3XT logo was used to visualize the signature on the page.

Listing 12.14. SignatureField.java

This listing no longer uses the constructor to create an instance of PdfStamper, but the method createSignature() . You create a PdfSignatureAppearance and define it as a visible signature . In this example, the signature field uses the name “mySig”.

FAQ

How can I sign a document if it doesn’t have a signature field? If there’s no signature field present, you can make a small change to the code from listing 12.14 to add a signature that will show up in the signature panel (see the left side of figure 12.5). If you omit the setVisibleSignature() method, the signature won’t show up on any page. This is called an invisible signature. Or you can use the setVisibleSignature() method with a Rectangle object, a page number, and a field name as parameters. This will create a new signature field.

The name of the person who signs the document is retrieved from the private key. You can add a reason for signing and a location with the setReason() and setLocation() methods . This information can be used for the appearance in the signature field (see figure 12.4) and it’s also shown in the signature panel (see figure 12.5).

You pass the PrivateKey object and the Certificate chain obtained from the key store to the setCrypto() method . With the third parameter, you can pass certificate revocation lists (CRLs). We’ll discuss certificate revocation in section 12.4.6. With the final parameter, you choose a security handler. The corresponding cryptographic filters that are supported in iText are listed in table 12.4.

Table 12.4. Security handlers

iText constant	Filter name	Description
SELF_SIGNED	Adobe.PPKLite	This uses a self-signed security handler.
VERISIGN_SIGNED	VeriSign.PPKVS	To sign documents with the VeriSign CA, you need a key that is certified with VeriSign. You can acquire a 60-day trial key or buy a permanent key at verisign.com.
WINCER_SIGNED	Adobe.PPKMS	The Microsoft Windows Certificate Security works with any trusted certificate. For instance, I’m using a public-private key pair obtained from CACert (http://cacert.org).

The signature shown in figure 12.4 is an ordinary signature, aka an approval or a recipient signature. A document can be signed for approval by one or more recipients.

The signature shown in figure 12.5 is a certification signature, aka an author signature. There can only be one certification signature in a document. In iText, you create a certification signature by using the setCertificationLevel() method with one of the following values:

CERTIFIED_NO_CHANGES_ALLOWED— No changes are allowed.
CERTIFIED_FORM_FILLING— The document is certified, but other people can still fill out form fields without invalidating the signature.
CERTIFIED_FORM_FILLING_AND_ANNOTATIONS— The document is certified, but other people can still fill out form fields and add annotations without invalidating the signature.

If you use NOT_CERTIFIED as parameter, an approval signature will be added.

Just like other form fields, a signature field has an appearance. Only the normal appearance is supported; the rollover and down attributes aren’t used. There are two approaches to generating those appearances. In , you use the setAcro6Layers() method and pass the 1T3XT logo as signature graphic with the setSignature-Graphic() method, because listing 12.14 uses the GRAPHIC option for the rendering mode. The following options are available for the setRenderingMode() method:

DESCRIPTION— The rendering mode is just the description.
NAME_AND_DESCRIPTION— The rendering mode is the name of the signer and the description.
GRAPHIC_AND_DESCRIPTION— The rendering mode is an image and the description.
GRAPHIC— The rendering mode is just an image.

The setAcro6Layers() method refers to Acrobat 6. In earlier versions of Acrobat, the signature appearance consisted of five different layers that are drawn on top of each other:

n0— Background layer.
n1— Validity layer, used for the unknown and valid state; contains, for instance, a yellow question mark.
n2— Signature appearance, containing information about the signature. This can be text or an XObject that represents the handwritten signature.
n3— Validity layer, containing a graphic that represents the validity of the signature when the signature is invalid.
n4— Text layer, for a text presentation of the state of the signature.

If you omit setAcro6Layers(), iText will create a default appearance for these layers, or you can use the method getLayer() with a number ranging from 0 to 4 to get a PdfTemplate that allows you to create a custom appearance. You can also use the methods setLayer2Text() and setLayer4Text() to add a custom text for the signature appearance and the text layer. Note that the use of layers n1, n3, and n4 is no longer recommended since Acrobat 6.

In the next example, you’ll add more than one signature.

12.4.3. Adding multiple signatures

Figure 12.6 shows another Hello World document, but now it has been signed twice. Once by myself with a signature that could be validated, and once by Bruno Specimen, who isn’t trusted because “None of the parent certificates are trusted identities.” This is normal: the certificate was self-signed; there was no CA such as VeriSign involved.

Figure 12.6. Document with two signatures, one of which has “validity unknown”

If you know and trust Bruno Specimen, you can add his public certificate to the list of trusted identities in Adobe Reader. You can import the file foobar.cer through Document > Manage Trusted Identities and edit the trust as a “trusted root.” If you do, the second signature can also be verified (figure 12.7).

Figure 12.7. Document with two valid signatures

The original Hello World example of the document shown in figures 12.6 and 12.7 didn’t have a signature field. Here is how the first signature was added.

Listing 12.15. Signatures.java

PdfReader reader = new PdfReader(src);
FileOutputStream os = new FileOutputStream(dest);
PdfStamper stamper
  = PdfStamper.createSignature(reader, os, ''),
PdfSignatureAppearance appearance
  = stamper.getSignatureAppearance();
appearance.setCrypto(key, chain, null,
PdfSignatureAppearance.WINCER_SIGNED);
appearance.setImage(Image.getInstance(RESOURCE));
appearance.setReason("I've written this.");
appearance.setLocation("Foobar");
appearance.setVisibleSignature(
  new Rectangle(72, 732, 144, 780), 1, "first");
stamper.close();

You don’t have to create a PdfFormField explicitly as in listing 12.13. The field is created by iText using the parameters of the setVisibleSignature() method. Note that this time you add an Image that will be added in the background of layer 2. Compare listing 12.15 with this one to find out how to add a second approval signature.

Listing 12.16. Signatures.java

PdfReader reader = new PdfReader(src);
FileOutputStream os = new FileOutputStream(dest);
PdfStamper stamper
  = PdfStamper.createSignature(reader, os, '', null, true);
PdfSignatureAppearance appearance
  = stamper.getSignatureAppearance();
appearance.setCrypto(key, chain, null,
PdfSignatureAppearance.WINCER_SIGNED);
appearance.setReason("I'm approving this.");
appearance.setLocation("Foobar");
appearance.setVisibleSignature(
  new Rectangle(160, 732, 232, 780), 1, "second");
stamper.close();

You have to add two extra parameters to the createSignature() method if you want to add a second signature. One parameter can be used to store the resulting PDF as a temporary file. If you pass a File object that’s a directory, a temporary file will be created there; if it’s a file, it will be used directly. The file will be deleted on exit unless the os output stream is null. In that case, the document can be retrieved directly from the temporary file. This is a way to keep the memory use low. In this example, you’re signing a simple Hello World file. You don’t need a temporary file; the signing will be done in memory.

The fifth parameter of createSignature() indicates whether or not the file has to be manipulated in append mode. Working in append mode means that the original file will be kept intact; the new content will be appended after the %EOF marker.

Note

You can also use append mode if you want the file to keep the complete history of the changes made to the document. We’ll look at the implications of using append mode outside the context of digital signatures in chapter 13.

Using append mode is mandatory if you want to add content to a document that has been signed. If you set the append value to false, the original signature will be invalidated, as shown in figure 12.8.

Figure 12.8. Document with one valid and one invalid signature

When a document is signed multiple times, you get a PDF file with multiple revisions (see the signature panel in figures 12.6 and 12.7). For one revision, the signature name is “first”; for the other it’s “second”. In figure 12.6, you can see a link in the signature panel saying “Click to view this version.” This allows you to manually retrieve the original files for each signature. This listing shows how to extract such a revision programmatically.

Listing 12.17. Signatures.java

public void extractFirstRevision() throws IOException {
  PdfReader reader = new PdfReader(SIGNED2);
  AcroFields af = reader.getAcroFields();
  FileOutputStream os = new FileOutputStream(REVISION);
  byte bb[] = new byte[8192];
  InputStream ip = af.extractRevision("first");
  int n = 0;
  while ((n = ip.read(bb)) > 0)
    os.write(bb, 0, n);
  os.close();
  ip.close();
}

With this code snippet, you can extract the first revision, the one that only has the signature field named “first”.

We’ve now checked for the validity of a signature using Adobe Reader, but you can also automate the process.

12.4.4. Verifying the signatures in a document

The root certificates of CAs that are trusted by the distributor of the Java Runtime you use are stored in a file named cacerts. You can find this key store in the lib directory of the JAVA_HOME directory. Depending on the use case, different collections of CA certificates may be required, which may not include those already in that file.

If a root certificate isn’t present, you can import it with the keytool utility. This key store can be loaded into a KeyStore object like this:

KeyStore ks = PdfPKCS7.loadCacertsKeyStore();

The next bit of code uses the getSignatureNames() method to get all the names of the signature fields in the document. Then you can use the root certificates in a KeyStore to verify each signature.

Listing 12.18. Signatures.java

PdfReader reader = new PdfReader(SIGNED2);
AcroFields af = reader.getAcroFields();
ArrayList<String> names = af.getSignatureNames();
for (String name : names) {
  out.println("Signature name: " + name);
  out.println("Signature covers whole document: "
    + af.signatureCoversWholeDocument(name));
  out.println("Document revision: "
    + af.getRevision(name) + " of " + af.getTotalRevisions());
  PdfPKCS7 pk = af.verifySignature(name);
  Calendar cal = pk.getSignDate();
  Certificate[] pkc = pk.getCertificates();
  out.println("Subject: "
    + PdfPKCS7.getSubjectFields(pk.getSigningCertificate()));
  out.println("Revision modified: " + !pk.verify());
  Object fails[] = PdfPKCS7.verifyCertificates(pkc, ks, null, cal);
  if (fails == null)
    out.println("Certificates verified against the KeyStore");
  else
    out.println("Certificate failed: " + fails[1]);
}

You can check whether the signature covers the whole document by using the signatureCoversWholeDocument() method. This is true for the second signature, but the first signature only covers revision 1 of 2, and that’s not the complete document.

You can get the revision number for the signature with the getRevision() method, and the total number of revisions with getTotalRevisions(). The verification of the signature is done with the PdfPKCS7 object. This object can give you the public certificate of the signer and its parent certificates, as well as the signing date. You can use the verify() method to find out if the document was tampered with, and the verifyCertificates() method to check the certificates in the PDF against the certificates in the cacerts key store. In this example, you didn’t pass any CRLs, so the third parameter of the method is null.

12.4.5. Creating the digest and signing externally

In the previous examples, we’ve let iText make the digest, and we’ve let iText decide how to sign it using the PrivateKey object. But it isn’t always possible to create a PrivateKey object. If the private key is put on a token or a smart card, you can’t retrieve it programmatically. In this case, making and signing the digest has to be done on external hardware, such as a smart-card reader.

FAQ

How do I get a private key that is on my smart card? There would be a serious security problem if you could extract a private key from a smart card. Your private key is secret, and the smart card should be designed to keep this secret safe. You don’t want an external application to use your private key. Instead, you send a hash to the card, and the card returns a signature or a PKCS#7 message. PKCS refers to a group of Public Key Cryptography Standards, and PKCS#7 defines the Cryptographic Message Syntax Standard.

Signing a PDF document using a smart-card reader involves middleware, and the code will depend on the type of smart-card reader you’re using. To get an idea of what needs to be done, we’ll look at some examples where the digest is made or signed externally.

The first part of this listing looks similar to what you’ve done before.

Listing 12.19. Signatures.java

Let’s pretend you don’t have access to the private key, so you pass null to the setCrypto() method . You use the setExternalDigest() method to reserve space in the signature dictionary for keys whose content isn’t known yet . You don’t close the PdfStamper, but you preClose() the signature appearance . Then you create a Signature object using the private key ; this is something that could happen outside of your program. You pass the document bytes obtained with getRangeStream() to the Signature , and you create the /Contents (the signed digest) of the signature field . When you close the appearance, the signature will be added.

The following listing shows a variation where you create a digest using the Secure Hash Algorithm 1 (SHA-1), and if sign is true, you sign it with the RSA algorithm.

Listing 12.20. Signatures.java

appearance.setCrypto(
  key, chain, null, PdfSignatureAppearance.WINCER_SIGNED);
appearance.setExternalDigest(null, new byte[20], null);
appearance.preClose();
MessageDigest messageDigest = MessageDigest.getInstance("SHA1");
byte buf[] = new byte[8192];
int n;
InputStream inp = appearance.getRangeStream();
while ((n = inp.read(buf)) > 0) {
  messageDigest.update(buf, 0, n);
}
byte hash[] = messageDigest.digest();
PdfSigGenericPKCS sg = appearance.getSigStandard();
PdfLiteral slit = (PdfLiteral)sg.get(PdfName.CONTENTS);
byte[] outc = new byte[(slit.getPosLength() - 2) / 2];
PdfPKCS7 sig = sg.getSigner();
if (sign) {
  Signature signature = Signature.getInstance("SHA1withRSA");
  signature.initSign(key);
  signature.update(hash);
  sig.setExternalDigest(signature.sign(), hash, "RSA");
}
else
  sig.setExternalDigest(null, hash, null);
PdfDictionary dic = new PdfDictionary();
byte[] ssig = sig.getEncodedPKCS7();
System.arraycopy(ssig, 0, outc, 0, ssig.length);
dic.put(PdfName.CONTENTS, new PdfString(outc).setHexWriting(true));
appearance.close(dic);

If you look at the resources that come with this book, you’ll also find an example that explains how to sign a PDF document using an external library.

Note

Signing can become even more generic. There may be situations in which you don’t know the certificate chain before the signature is generated. Or you may have to split the signing process into parts, in which case you can’t keep the PdfStamper open all the time. It would lead us too far afield to discuss all the possible workarounds for each of these situations. More examples, including examples involving smart-card readers, can be found on SourceForge and on the official iText site (see section B.1, for the URLs).

Now let’s discuss some technologies that provide extra security features.

12.4.6. CRLs, OCSP, and timestamping

Suppose you receive a contract from person X who works at company Y. The contract is signed with a valid digital signature, corresponding to the e-mail address [email protected]. You can safely assume that the document is genuine, unless ... person X was fired, but he still owns a copy of the private key of company Y. Such a contract probably wouldn’t be legal. Surely there must be a way for company Y to revoke the certificate for employee X so that he no longer can act on behalf of his former company.

Certificate Revocation List

Every certificate authority keeps lists of certificates that are no longer valid, whether because the owner thinks the private key was compromised, or the token containing the private key was lost or stolen, or the original owner of the key is no longer entitled to use it. Such a list is called a certificate revocation list (CRL), and they are made public at one or more URLs provided by the CA who signed the certificate.

You can create a CRL object like this:

InputStream is = new URL(url_of_crl).openStream();
CertificateFactory cf = CertificateFactory.getInstance("X.509");
CRL crl = (CRL)cf.generateCRL(is);

An array of CRL objects can be passed as a parameter to the setCrypto() method. However, CRLs are generally large, and this technique is considered to be “old technology.”

It might be a better idea to use the Online Certificate Status Protocol (OCSP).

Online Certificate Status Protocol

OCSP is an internet protocol for obtaining the revocation status of a certificate online. You can post a request to check the status of a certificate over HTTP, and the CA’s OCSP server will send you a response. You no longer need to parse and embed long CRLs. An OCSP response is small and constant in size, and can easily be included in the PKCS#7 object.

Note

Revocation information in a PDF document is a signed attribute, which means that the signing software must capture the revocation information before signing. A similar requirement in this use case applies to the chain of certificates. The signing software must capture and validate the certificate’s chain before signing. CRLs will lead to bigger PDF documents, and using OCSP will not take as much space. But the OCSP connection to check the status can take time, whereas CRLs can easily be cached on the filesystem. It’s always a tradeoff.

Now let’s look at another problem that might arise. Suppose somebody sends you a signed contract. He has used a private key that is still valid, and you’re sure that the document you’ve received is genuine. However, at some point the author of the document regrets what he’s written. By resetting the clock on his computer, he could create a new document with a new digital signature that is as valid as the first one. This way, you could end up with two documents signed with the same private key at almost the same time, but with slightly different content. How can anybody know which document is more genuine?

Timestamping

This problem can be solved by involving a third party: a timestamping authority (TSA). The TSA will take the hash of the document and concatenate a timestamp to it. This is done on a timestamp server that is contacted during the signing process. The timestamp server will return a hash that is signed using the private key of the TSA.

Figure 12.9 shows a PDF with a timestamped signature. In previous examples and screen shots, the signature panel informed you that the “Signature date/time are from the clock on the signer’s computer.” Now it says: “Signature is timestamped.” You can also check the certificate of the TSA in the signature properties. That solves the potential problem of antedated documents.

Figure 12.9. A signed PDF with a timestamp

The next listing can be used to add a timestamp (if withTS is true) and to check the revocation status of the certificate with OCSP (if withOCSP is true).

Listing 12.21. Signatures.java

In this example, you use the PdfSignature dictionary to create a detached PKCS#7 signature, as opposed to PKCS#7 signatures where the data is encapsulated in the digest. Before you preclose the appearance, you also need to estimate the length of the signature’s content.

Note that you need an account with a TSA (with a TSA_LOGIN and TSA_PASSWORD) to create a TSA client object. An account with a trustworthy TSA isn’t free, but you can probably find some free timestamp server for testing purposes.

The URL of the OCSP server that is needed to create an OCSP client object is available in the public certificate. That is, if the CA that signed the certificate supports OCSP. You retrieve it with the getOCSPURL() method.

If you set withTS and withOCSP to false in listing 12.21, you’ll get an example that shows how to create a detached signature with authenticated attributes. By combining the code snippets in this chapter, we could make many more examples, experimenting with almost every option that is described in ISO-32000-1.

We’ll finish this chapter by introducing a set of restrictions and extensions to the PDF standard developed by the European Telecommunications Standards Institute (ETSI) regarding PDF Advanced Electronic Signatures (PAdES) profiles.

12.4.7. PDF Advanced Electronic Signatures (PAdES) profiles

ETSI is a European standardization organization in the telecommunications industry. This institute issues technical specifications such as TS 101 733 (first published in 2000), “Cryptographic Message Syntax (CMS) Advanced Electronic Signatures (CAdES),” and TS 101 903 (first published in 2002), “XML Advanced Electronic Signatures (XAdES).” More recently, in 2009, ETSI brought the same capabilities pioneered in CAdES and XAdES to PDF, resulting in a five-part specification describing PDF Advanced Electronic Signatures profiles:

Part 1— This is an overview of support for signatures in PDF documents, and it lists the features of the PDF profiles in the other documents.
Part 2— PAdES Basic is based on ISO-32000-1. If you want to know more about digital signatures in PDF, you should read this specification before starting to dig into the PDF reference. Everything mentioned in PAdES part 2 is supported in iText.
Part 3— PAdES Enhanced describes profiles that are based on CAdES: PAdES Basic Electronic Signature (BES) and Explicit Policy Electronic Signature (EPES). If you want to implement PAdES part 3 using iText, you need to switch to creating a detached CMS signature and use ETSI.CAdES.detached as the /SubFilter.
Part 4— PAdES Long-Term Validation (LTV) is about protecting data beyond the expiry of the user signing certificate. This mechanism requires a Document Security Store (DSS), and this mechanism isn’t available in ISO-32000-1. PAdES part 4 isn’t supported in iText yet.
Part 5— PAdES for XML content describes profiles for XAdES signatures. For instance, after filling an XFA form, which is XML content embedded in a PDF file, a user may sign selected parts of the form. This isn’t supported in iText yet.

At the time this book was written, neither Adobe Acrobat nor iText supported parts 3, 4, or 5. PAdES will solve one major issue that hasn’t been discussed in this chapter: certificates have an expiration date. A document that is signed and verified today may be difficult to verify in seven years when the certificate has expired, or when it has been revoked (the validation data may not be available in the future).

The idea is to add new validation data and a new document timestamp to a Document Security Store in the PDF before the last document timestamp expires. This can be repeated multiple times, always before the expiration of the last document timestamp. This way, PAdES LTV makes it possible to extend the lifetime of protection for the document.

Note that the DSS isn’t part of ISO-32000-1 and it’s not available in iText yet; it will be introduced in ISO-32000-2. We’ll find out more about ISO-32000-2 in the next part of this book, but first let’s summarize what we’ve learned in this chapter.

12.5. Summary

With this chapter, we close part 3 of this book. You’ve discovered that you can add different types of metadata to the documents created in parts 1 and 2. We discussed the compression of content streams, and we’ll use the decompression methods in the next part to inspect the PDF syntax that’s used to describe the content of a page.

In the sections about encryption and digital signatures, we talked about the protection of PDF documents. You used public-key cryptography to encrypt and decrypt a PDF document, and to digitally sign a PDF document. You’ve worked with key stores and certificates, signing documents in different ways. You’ve also learned about certificate and timestamp authorities, about certificate revocation lists, and the Online Certificate Status Protocol.

This chapter completes the overview of essential iText skills you may need when creating or manipulating PDF documents. In the next part, we’ll dive into the PDF specification, and look at PDF at a much lower level. While doing this, you’ll learn about different types of PDFs such as PDF/X and PDF/A. We’ll work with PDF-specific functionality, such as optional content and marked content, and we’ll inspect different types of streams.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12. Protecting your PDF

Create new playlist

Sign In

Sign Up

Chapter 12. Protecting your PDF

12.1. Adding metadata

12.1.1. The info dictionary

Figure 12.1. Metadata in PDF files

Listing 12.1. MetadataPdf.java

Listing 12.2. MetadataPdf.java

FAQ

12.1.2. The Extensible Metadata Platform (XMP)

Listing 12.3. xmp.xml

Listing 12.4. MetadataXmp.java

Listing 12.5. MetadataXmp.java

12.2. PDF and compression

12.2.1. Compression levels

Listing 12.6. HelloWorldCompression.java

Table 12.1. PDF and compression

12.2.2. Compressing and decompressing existing files

Listing 12.7. HelloWorldCompression.java

Listing 12.8. HelloWorldCompression.java

Note

12.3. Encrypting a PDF document

Warning

12.3.1. Creating a password-encrypted PDF

Listing 12.9. EncryptionPdf.java

Access Permissions

Table 12.2. Overview of the permissions parameters

Encryption Algorithms

Table 12.3. Overview of the encryption algorithms

FAQ

Decrypting and Encrypting an Existing PDF Document

Listing 12.10. EncryptionPdf.java

FAQ

12.3.2. Public-key encryption

Creating a Public-Private Key Pair with Keytool

Creating a Public-Key Encrypted PDF

Listing 12.11. EncryptWithCertificate.java

FAQ

Figure 12.2. A protected public-key encrypted PDF document

Note

Figure 12.3. An opened public-key-encrypted PDF document

Decrypting and Encrypting Existing PDFs

Listing 12.12. EncryptWithCertificate.java

12.4. Digital signatures, OCSP, and timestamping

12.4.1. Creating an unsigned signature field

Figure 12.4. PDFs with signature fields

Listing 12.13. SignatureField.java

12.4.2. Signing a PDF

Figure 12.5. PDF with a certifying signature

Listing 12.14. SignatureField.java

FAQ

Table 12.4. Security handlers

12.4.3. Adding multiple signatures

Figure 12.6. Document with two signatures, one of which has “validity unknown”

Figure 12.7. Document with two valid signatures

Listing 12.15. Signatures.java

Listing 12.16. Signatures.java

Note

Figure 12.8. Document with one valid and one invalid signature

Listing 12.17. Signatures.java

12.4.4. Verifying the signatures in a document

Listing 12.18. Signatures.java

12.4.5. Creating the digest and signing externally

FAQ

Listing 12.19. Signatures.java

Listing 12.20. Signatures.java

Note

12.4.6. CRLs, OCSP, and timestamping

Certificate Revocation List

Online Certificate Status Protocol

Note

Timestamping

Figure 12.9. A signed PDF with a timestamp

Listing 12.21. Signatures.java

12.4.7. PDF Advanced Electronic Signatures (PAdES) profiles

12.5. Summary

Table of Contents for
Chapter 12. Protecting your PDF