Chapter 12. Programming for the Web

When you think about the Web, you probably think of web-based applications and services. If you are asked to go deeper, you may consider tools such as web browsers and web servers that support those applications and move data around the network. But it’s important to note that standards and protocols, not the applications and tools themselves, have enabled the Web’s growth. Since the earliest days of the Internet, there have been ways to move files from here to there, and document formats that were just as powerful as HTML, but there was not a unifying model for how to identify, retrieve, and display information, nor was there a universal way for applications to interact with that data over the network. Since the web explosion began, HTML has reigned supreme as a common format for documents, and most developers have at least some familiarity with it. In this chapter, we’re going to talk a bit about its cousin, HTTP, the protocol that handles communications between web clients and servers, and URLs—Uniform Resource Locators—which provide a standard for naming and addressing objects on the Web. Java provides a very simple API for working with URLs to address objects on the Web. In this chapter, we’ll discuss how to write web clients that can interact with the servers using the HTTP GET and POST methods and also say a bit about web services, which are the next step up the evolutionary chain. In “Java Web Applications”, we’ll jump over to the server side and take a look at servlets and web services, which are Java programs that run on web servers and implement the other side of these conversations.

Uniform Resource Locators

A URL points to an object on the Internet. It’s a text string that identifies an item, tells you where to find it, and specifies a method for communicating with it or retrieving it from its source. A URL can refer to any kind of information source. It might point to static data, such as a file on a local filesystem, a web server, or an FTP site; or it can point to a more dynamic object such as an RSS news feed or a record in a database. URLs can even refer to more dynamic resources such as communication sessions and email addresses.

Because there are many different ways to locate an item on the Net and different mediums and transports require different kinds of information, URLs can have many forms. The most common form has four components: a network host or server, the name of the item, its location on that host, and a protocol by which the host should communicate:

 protocol://hostname/path/item-name

protocol (also called the “scheme”) is an identifier such as http or ftp; hostname is usually an Internet host and domain name; and the path and item components form a unique path that identifies the object on that host. Variants of this form allow extra information to be packed into the URL, specifying, for example, port numbers for the communications protocol and fragment identifiers that reference sections inside documents. Other, more specialized types of URLs such as “mailto” URLs for email addresses or URLs for addressing things like database components may not follow this format precisely, but do conform to the general notion of a protocol followed by a unique identifier. (Some of these would more properly be called URIs—Uniform Resource Identifiers. URIs can specify the name or the location of a resource. URLs are a subset of URIs.)

Because most URLs have the notion of a hierarchy or path, we sometimes speak of a URL that is relative to another URL, called a base URL. In that case, we are using the base URL as a starting point and supplying additional information to target an object relative to that URL. For example, the base URL might point to a directory on a web server and a relative URL might name a particular file in that directory or in a subdirectory.

The URL Class

Bringing this down to a more concrete level is the Java URL class. The URL class represents a URL address and provides a simple API for accessing web resources, such as documents and applications on servers. It can use an extensible set of protocol and content handlers to perform the necessary communication and, in theory, even data conversion. With the URL class, an application can open a connection to a server on the network and retrieve content with just a few lines of code. As new types of servers and new formats for content evolve, additional URL handlers can be supplied to retrieve and interpret the data without modifying your applications.

A URL is represented by an instance of the java.net.URL class. A URL object manages all the component information within a URL string and provides methods for retrieving the object it identifies. We can construct a URL object from a URL string or from its component parts:

try {
    URL aDoc =
      new URL( "http://foo.bar.com/documents/homepage.html" );
    URL sameDoc =
      new URL("http","foo.bar.com","documents/homepage.html");
} catch ( MalformedURLException e ) { ... }

These two URL objects point to the same network resource, the homepage.html document on the server foo.bar.com. Whether the resource actually exists and is available isn’t known until we try to access it. When initially constructed, the URL object contains only data about the object’s location and how to access it. No connection to the server has been made. We can examine the various parts of the URL with the getProtocol(), getHost(), and getFile() methods. We can also compare it to another URL with the sameFile() method (an unfortunate name for something that may not point to a file), which determines whether two URLs point to the same resource. It’s not foolproof, but sameFile() does more than compare the URL strings for equality; it takes into account the possibility that one server may have several names as well as other factors. It doesn’t go as far as to fetch the resources and compare them, however.

When a URL is created, its specification is parsed to identify just the protocol component. If the protocol doesn’t make sense, or if Java can’t find a protocol handler for it, the URL constructor throws a MalformedURLException. A protocol handler is a Java class that implements the communications protocol for accessing the URL resource. For example, given an http URL, Java prepares to use the HTTP protocol handler to retrieve documents from the specified web server.

As of Java 7, URL protocol handlers are guaranteed to be provided for http, https (secure HTTP), and ftp, as well as local file URLs and jar URLs that refer to files inside JAR archives. Outside of that, it gets a little dicey. We’ll talk more about the issues surrounding content and protocol handlers a bit later in this chapter.

Stream Data

The lowest-level and most general way to get data back from a URL is to ask for an InputStream from the URL by calling openStream(). Getting the data as a stream may also be useful if you want to receive continuous updates from a dynamic information source. The drawback is that you have to parse the contents of the byte stream yourself. Working in this mode is basically the same as working with a byte stream from socket communications, but the URL protocol handler has already dealt with all of the server communications and is providing you with just the content portion of the transaction. Not all types of URLs support the openStream() method because not all types of URLs refer to concrete data; you’ll get an UnknownServiceException if the URL doesn’t.

The following code (a simplification of the Read.java file available in the examples folder for this chapter) prints the contents of an HTML file from a web server:

try {
    URL url = new URL("http://server/index.html");

    BufferedReader bin = new BufferedReader (
        new InputStreamReader( url.openStream() ));

    String line;
    while ( (line = bin.readLine()) != null ) {
        System.out.println( line );
    }
    bin.close();
} catch (Exception e) { }

We ask for an InputStream with openStream() and wrap it in a BufferedReader to read the lines of text. Because we specify the http protocol in the URL, we enlist the services of an HTTP protocol handler. Note that we haven’t talked about content handlers yet. In this case, because we’re reading directly from the input stream, no content handler (no transformation of the content data) is involved.

Getting the Content as an Object

As we said previously, reading raw content from a stream is the most general mechanism for accessing data over the Web. openStream() leaves the parsing of data up to you. The URL class, however, was intended to support a more sophisticated, pluggable, content-handling mechanism. We’ll discuss this now, but be aware that it is not widely used because of lack of standardization and limitations in how you can deploy new handlers. Although the Java community made some progress in recent years in standardizing a small set of protocol handlers, no such effort was made to standardize content handlers. This means that although this part of the discussion is interesting, its usefulness is limited.

If Java knows the type of content being retrieved from a URL and a proper content handler is available, you can retrieve the URL content as an appropriate Java object by calling the URL’s getContent() method. In this mode of operation, getContent() initiates a connection to the host, fetches the data for you, determines the type of data, and then invokes a content handler to turn the bytes into a Java object. Java will try to determine the type of the content by looking at its MIME type1, its file extension, or even by examining the bytes directly.

For example, given the URL http://foo.bar.com/index.html, a call to getContent() uses the HTTP protocol handler to retrieve data and might use an HTML content handler to turn the data into an appropriate document object. Similarly, a GIF file might be turned into an AWT ImageProducer object using a GIF content handler. If we access the GIF file using an FTP URL, Java would use the same content handler but a different protocol handler to receive the data.

Since the content handler must be able to return any type of object, the return type of getContent() is Object. This might leave us wondering what kind of object we got. In a moment, we’ll describe how we could ask the protocol handler about the object’s MIME type. Based on this, and whatever other knowledge we have about the kind of object we are expecting, we can cast the Object to its appropriate, more specific type. For example, if we expect an image, we might cast the result of getContent() to ImageProducer:

try  {
    ImageProducer ip = (ImageProducer)myURL.getContent();
} catch ( ClassCastException e ) { ... }

Various kinds of errors can occur when trying to retrieve the data. For example, getContent() can throw an IOException if there is a communications error. Other kinds of errors can occur at the application level: some knowledge of how the application-specific content and protocol handlers deal with errors is necessary. One problem that could arise is that a content handler for the data’s MIME type wouldn’t be available. In this case, getContent() invokes a special “unknown type” handler that returns the data as a raw InputStream (back to square one).

In some situations, we may also need knowledge of the protocol handler. For example, consider a URL that refers to a nonexistent file on an HTTP server. When requested, the server returns the familiar “404 Not Found” message. To deal with protocol-specific operations like this, we may need to talk to the protocol handler, which we’ll discuss next.

Managing Connections

Upon calling openStream() or getContent() on a URL, the protocol handler is consulted and a connection is made to the remote server or location. Connections are represented by a URLConnection object, subtypes of which manage different protocol-specific communications and offer additional metadata about the source. The HttpURLConnection class, for example, handles basic web requests and also adds some HTTP-specific capabilities such as interpreting “404 Not Found” messages and other web server errors. We’ll talk more about HttpURLConnection later in this chapter.

We can get a URLConnection from our URL directly with the openConnection() method. One of the things we can do with the URLConnection is ask for the object’s content type before reading data. For example:

URLConnection connection = myURL.openConnection();
String mimeType = connection.getContentType();
InputStream in = connection.getInputStream();

Despite its name, a URLConnection object is initially created in a raw, unconnected state. In this example, the network connection was not actually initiated until we called the getContentType() method. The URLConnection does not talk to the source until data is requested or its connect() method is explicitly invoked. Prior to connection, network parameters and protocol-specific features can be set up. For example, we can set timeouts on the initial connection to the server and on reads:

URLConnection connection = myURL.openConnection();
connection.setConnectTimeout( 10000 ); // milliseconds
connection.setReadTimeout( 10000 ); // milliseconds
InputStream in = connection.getInputStream();

As we’ll see in “Using the POST Method”, we can get at the protocol-specific information by casting the URLConnection to its specific subtype.

Handlers in Practice

The content- and protocol-handler mechanisms we’ve described are very flexible; to handle new types of URLs, you need only add the appropriate handler classes. One interesting application of this would be Java-based web browsers that could handle new and specialized kinds of URLs by downloading them over the Net. The idea for this was touted in the earliest days of Java. Unfortunately, it never came to fruition. There is no API for dynamically downloading new content and protocol handlers. In fact, there is no standard API for determining what content and protocol handlers exist on a given platform.

Java currently mandates protocol handlers for HTTP, HTTPS, FTP, FILE, and JAR. While in practice you will generally find these basic protocol handlers with all versions of Java, that’s not entirely comforting, and the story for content handlers is even less clear. The standard Java classes don’t, for example, include content handlers for HTML, GIF, PNG, JPEG, or other common data types. Furthermore, although content and protocol handlers are part of the Java API and an intrinsic part of the mechanism for working with URLs, specific content and protocol handlers aren’t defined. Even those protocol handlers that have been bundled in Java are still packaged as part of the Sun implementation classes and are not truly part of the core API for all to see.

In summary, the Java content- and protocol-handler mechanism was a forward-thinking approach that never quite materialized. The promise of web browsers that dynamically extend themselves for new types of protocols and new content is, like flying cars, always just a few years away. Although the basic mechanics of the protocol-handler mechanism are useful (especially now with some standardization) for decoding content in your own applications, you should probably turn to other, newer frameworks that have a bit more specificity.

Useful Handler Frameworks

The idea of dynamically downloadable handlers could also be applied to other kinds of handler-like components. For example, the Java XML community is fond of referring to XML as a way to apply semantics (meaning) to documents and to Java as a portable way to supply the behavior that goes along with those semantics. It’s possible that an XML viewer could be built with downloadable handlers for displaying XML tags.

Fortunately, for working with URL streams of images, music, and video, very mature APIs are available. The Java Advanced Imaging API (JAI) includes a well-defined, extensible set of handlers for most image types, and the Java Media Framework (JMF) can play most common music and video types found online.

Talking to Web Applications

Web browsers are the universal clients for web applications. They retrieve documents for display and serve as a user interface, primarily through the use of HTML, JavaScript, and linked documents. In this section, we‘ll show how to write client-side Java code that uses HTTP through the URL class to work with web applications directly using GET and POST operations to retrieve and send data.

There are many reasons an application might want to communicate via HTTP. For example, compatibility with another browser-based application might be important, or you might need to gain access to a server through a firewall where direct socket connections (and RMI) are problematic. HTTP is the lingua franca of the Net, and despite its limitations (or more likely because of its simplicity), it has rapidly become one of the most widely supported protocols in the world. As for using Java on the client side, all the other reasons you would write a client-side GUI or non-GUI application (as opposed to a pure web/HTML-based application) also present themselves. A client-side GUI can perform sophisticated presentation and validation while, with the techniques presented here, still using web-enabled services over the network.

The primary task we discuss here is sending data to the server, specifically HTML form-encoded data. In a web browser, the name/value pairs of HTML form fields are encoded in a special format and sent to the server using one of two methods. The first method, using the HTTP GET command, encodes the user’s input into the URL and requests the corresponding document. The server recognizes that the first part of the URL refers to a program and invokes it, passing along the information encoded in the URL as a parameter. The second method uses the HTTP POST command to ask the server to accept the encoded data and pass it to a web application as a stream. In Java, we can create a URL that refers to a server-side program and request or send it data using the GET and POST methods. In “Java Web Applications” below, we’ll see how to build web applications that implement the other side of this conversation.

Using the GET Method

Using the GET method of encoding data in a URL is pretty easy. All we have to do is create a URL pointing to a server program and use a simple convention to tack on the encoded name/value pairs that make up our data. For example, the following code snippet opens a URL to an old-school CGI program called login.cgi on the server myhost and passes it two name/value pairs. It then prints whatever text the CGI sends back:

URL url = new URL(
    // this string should be URL-encoded
    "http://myhost/cgi-bin/login.cgi?Name=Pat&Password=foobar");

BufferedReader bin = new BufferedReader (
  new InputStreamReader( url.openStream() ));

String line;
while ( (line = bin.readLine()) != null ) {
    System.out.println( line );
}

To form the URL with parameters, we start with the base URL of login.cgi; we add a question mark (?), which marks the beginning of the parameter data, followed by the first name/value pair. We can add as many pairs as we want, separated by ampersand (&) characters. The rest of our code simply opens the stream and reads back the response from the server. Remember that creating a URL doesn’t actually open the connection. In this case, the URL connection was made implicitly when we called openStream(). Although we are assuming here that our server sends back text, it could send anything.

It’s important to point out that we have skipped a step here. This example works because our name/value pairs happen to be simple text. If any “nonprintable” or special characters (including ? or &) are in the pairs, they must be encoded first. The java.net.URLEncoder class provides a utility for encoding the data. We’ll show how to use it in the next example in “Using the POST Method”.

Another important thing is that although this small example sends a password field, you should never send sensitive data using this simplistic approach. The data in this example is sent in clear text across the network (it is not encrypted). And in this case, the password field would appear anywhere the URL is printed as well (e.g., server logs, browser history, and bookmarks). We’ll talk about secure web communications later in this chapter when we discuss writing web applications using servlets.

Using the POST Method

For larger amounts of input data or for sensitive content, you’ll likely use the POST option. Here’s a small application that acts like an HTML form. It gathers data from two text fields—name and password—and posts the data to a specified URL using the HTTP POST method. This Swing-based client application works with a server-side web-based application, just like a web browser.

Here’s the code:

//file: ch12/Post.java
package ch12;

import java.net.*;
import java.io.*;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;

/**
 * A small graphical application that demonstrates use of the
 * HTTP POST mechanism. Provide a POST-able URL to the command line
 * and use the "Post" button to send sample name and password
 * data to the URL.
 *
 * See the servlet section of this chapter for the ShowParameters
 * example that can serve (ha!) as the receiving (server) side.
 */
public class Post extends JPanel implements ActionListener {
  JTextField nameField;
  JPasswordField passwordField;
  String postURL;

  GridBagConstraints constraints = new GridBagConstraints(  );

  void addGB( Component component, int x, int y ) {
    constraints.gridx = x;  constraints.gridy = y;
    add ( component, constraints );
  }

  public Post( String postURL ) {

    this.postURL = postURL;

    setBorder(BorderFactory.createEmptyBorder(5, 10, 5, 5));
    JButton postButton = new JButton("Post");
    postButton.addActionListener( this );
    setLayout( new GridBagLayout(  ) );
    constraints.fill = GridBagConstraints.HORIZONTAL;
    addGB( new JLabel("Name ", JLabel.TRAILING), 0, 0 );
    addGB( nameField = new JTextField(20), 1, 0 );
    addGB( new JLabel("Password ", JLabel.TRAILING), 0, 1 );
    addGB( passwordField = new JPasswordField(20), 1, 1 );
    constraints.fill = GridBagConstraints.NONE;
    constraints.gridwidth = 2;
    constraints.anchor = GridBagConstraints.EAST;
    addGB( postButton, 1, 2 );
  }

  public void actionPerformed(ActionEvent e) {
    postData(  );
  }

  protected void postData(  ) {
    StringBuilder sb = new StringBuilder();
    String pw = new String(passwordField.getPassword());
    try {
      sb.append( URLEncoder.encode("Name", "UTF-8") + "=" );
      sb.append( URLEncoder.encode(nameField.getText(), "UTF-8") );
      sb.append( "&" + URLEncoder.encode("Password", "UTF-8") + "=" );
      sb.append( URLEncoder.encode(pw, "UTF-8") );
    } catch (UnsupportedEncodingException uee) {
      System.out.println(uee);
    }
    String formData = sb.toString(  );

    try {
      URL url = new URL( postURL );
      HttpURLConnection urlcon =
          (HttpURLConnection) url.openConnection(  );
      urlcon.setRequestMethod("POST");
      urlcon.setRequestProperty("Content-type",
          "application/x-www-form-urlencoded");
      urlcon.setDoOutput(true);
      urlcon.setDoInput(true);
      PrintWriter pout = new PrintWriter( new OutputStreamWriter(
          urlcon.getOutputStream(  ), "8859_1"), true );
      pout.print( formData );
      pout.flush(  );

      // Did the post succeed?
      if ( urlcon.getResponseCode() == HttpURLConnection.HTTP_OK )
        System.out.println("Posted ok!");
      else {
        System.out.println("Bad post...");
        return;
      }
      // Hooray! Go ahead and read the results...
      //InputStream in = urlcon.getInputStream(  );
      // ...

    } catch (MalformedURLException e) {
      System.out.println(e);     // bad postURL
    } catch (IOException e2) {
      System.out.println(e2);    // I/O error
    }
  }

  public static void main( String [] args ) {
    if (args.length != 1) {
      System.err.println("Must specify URL on command line. Exiting.");
      System.exit(1);
    }
    JFrame frame = new JFrame("SimplePost");
    frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    frame.add( new Post(args[0]), "Center" );
    frame.pack();
    frame.setVisible(true);
  }
}

When you run this application, you must specify the URL of the server program on the command line. For example:

% java Post http://www.myserver.example/cgi-bin/login.cgi

The beginning of the application creates the form using Swing elements like we did in Chapter 10. All the magic happens in the protected postData() method. First, we create a StringBuilder (a non-synchronized version of StringBuffer) and load it with name/value pairs, separated by ampersands. (We don’t need the initial question mark when we’re using the POST method because we’re not appending to a URL string.) Each pair is first encoded using the static URLEncoder.encode() method. We run the name fields through the encoder as well as the value fields, even though we know that in this case they contain no special characters.

Next, we set up the connection to the server program. In our previous example, we weren’t required to do anything special to send the data because the request was made by the simple act of opening the URL on the server. Here, we have to carry some of the weight of talking to the remote web server. Fortunately, the HttpURLConnection object does most of the work for us; we just have to tell it that we want to do a POST to the URL and the type of data we are sending. We ask for the URLConnection object that is using the URL’s openConnection() method. We know that we are using the HTTP protocol so we should be able to cast it to an HttpURLConnection type, which has the support we need. Because HTTP is one of the guaranteed protocols, we can safely make this assumption. (Speaking of safely, we use HTTP here only for demonstration purposes. So much data these days is considered sensitive. Industry guidelines have settled on defaulting to HTTPS; more on that soon in “SSL and Secure Web Communications”.)

We then use setRequestMethod() to tell the connection we want to do a POST operation. We also use setRequestProperty() to set the Content-Type field of our HTTP request to the appropriate type—in this case, the proper MIME type for encoded form data. (This is necessary to tell the server what kind of data we’re sending.) Finally, we use the setDoOutput() and setDoInput() methods to tell the connection that we want to both send and receive stream data. The URL connection infers from this combination that we are going to do a POST operation and expects a response. Next, we get an output stream from the connection with getOutputStream() and create a PrintWriter so that we can easily write our encoded data.

After we post the data, our application calls getResponseCode() to see whether the HTTP response code from the server indicates that the POST was successful. Other response codes (defined as constants in HttpURLConnection) indicate various failures. At the end of our example, we indicate where we could have read back the text of the response. For this application, we’ll assume that simply knowing that the post was successful is sufficient.

Although form-encoded data (as indicated by the MIME type we specified for the Content-Type field) is the most common, other types of communications are possible. We could have used the input and output streams to exchange arbitrary data types with the server program. The POST operation could send any kind of data; the server application simply has to know how to handle it. One final note: if you are writing an application that needs to decode form data, you can use the java.net.URLDecoder to undo the operation of the URLEncoder. Be sure to specify “UTF8” when calling decode().

The HttpURLConnection

Other information from the request is available from the HttpURLConnection as well. We could use getContentType() and getContentEncoding() to determine the MIME type and encoding of the response. We could also interrogate the HTTP response headers by using getHeaderField(). (HTTP response headers are metadata name/value pairs carried with the response.) Convenience methods can fetch integer and date-formatted header fields, getHeaderFieldInt() and getHeaderFieldDate(), which return an int and a long type, respectively. The content length and last modification date are provided through getContentLength() and getLastModified().

SSL and Secure Web Communications

The previous examples sent a field called Password to the server. However, standard HTTP doesn’t provide encryption to hide our data. Fortunately, adding security for GET and POST operations like this is easy (trivial in fact, for the client-side developer). Where available, you simply have to use a secure form of the HTTP protocol—HTTPS:

https://www.myserver.example/cgi-bin/login.cgi

HTTPS is a version of the standard HTTP protocol run over Secure Sockets Layer (SSL), which uses public-key encryption techniques to encrypt the browser-to-server communications. Most web browsers and servers currently come with built-in support for HTTPS (or raw SSL sockets). Therefore, if your web server supports HTTPS and has it configured, you can use a browser to send and receive secure data simply by specifying the https protocol in your URLs. There is much more to learn about SSL and related aspects of security such as authenticating whom you are actually talking to, but as far as basic data encryption goes, this is all you have to do. It is not something your code has to deal with directly. The Java JRE standard edition ships with SSL and HTTPS support, and beginning with Java 5.0, all Java implementations must support HTTPS as well as HTTP for URL connections.

Java Web Applications

During Java’s early years, web-based applications followed the same basic paradigm: the browser makes a request to a particular URL; the server generates a page of HTML in response; and actions by the user drive the browser to the next page. In this exchange, most or all of the work is done on the server side, which is seemingly logical given that that’s where data and services often reside. The problem with this application model is that it is inherently limited by the loss of responsiveness, continuity, and state experienced by the user when loading new “pages” in the browser. It’s difficult to make a web-based application as seamless as a desktop application when the user must jump through a series of discrete pages and it is technically more challenging to maintain application data across those pages. After all, web browsers were not designed to host applications, they were designed to host documents.

But a lot has changed in web application development in recent years. Standards for HTML and JavaScript have matured to the point where it is practical, indeed common, to write applications in which most of the user interface and logic reside on the client side and background calls are made to the server for data and services. In this paradigm, the server effectively returns just a single “page” of HTML that references the bulk of the JavaScript, CSS, and other resources used to render the application interface. JavaScript then takes over, manipulating elements on the page or creating new ones dynamically using advanced HTML DOM features to produce the UI. JavaScript also makes asynchronous (background) calls to the server to fetch data and invoke services. In early years, the results were returned as XML, leading to the term Asynchronous JavaScript and XML (AJAX) for this style of interaction. You still hear that term, although these days the JavaScript Object Notation (JSON) format is more popular than XML and an explosion of asynchronous JavaScript libraries have taken over. Since all of the libraries have the “asynchronous JavaScript” part in common, you mostly hear developers (and hiring managers) talk about the particular library or framework they use such as React or Angular.

This new model simplifies and empowers web development in many ways. No longer must the client work in a single-page, request-response regime where views and requests are ping-ponged back and forth. The client is now more equivalent to a desktop application in that it can respond to user input fluidly and manage remote data and services without interrupting the user.

So far we’ve used the term web application generically, referring to any kind of browser-based application that is located on a web server whether it was a single page or a collection of many pages. Now we are going to be more precise with that term. In the context of the Java Servlet API, a web application is a collection of servlets and Java web services that support Java classes, content such as HTML, Java Server Pages (JSP), images or other media, and configuration information. For deployment (installation on a web server), a web application is bundled into a WAR file. We’ll discuss WAR files in detail later, but suffice it to say that they are really just JAR archives that contain all the application files along with some deployment information. The important thing is that the standardization of WAR files means not only that the Java code is portable, but also that the process of deploying the application to a server is standardized.

Most WAR archives have at their core a web.xml file. This is an XML configuration file that describes which servlets are to be deployed, their names and URL paths, their initialization parameters, and a host of other information, including security and authentication requirements. In recent years, however, the web.xml file has become optional for many applications due to the introduction of Java annotations that take the place of the XML configuration. In most cases, you can now deploy your servlets and Java web services simply by annotating the classes with the necessary information and packaging them into the WAR file, or using a combination of the two. We’ll discuss this in detail later in the chapter.

Web applications, or web apps, also have a well-defined runtime environment. Each web app has its own “root” path on the web server, meaning that all the URLs addressing its servlets and files start with a common unique prefix (e.g., http://www.oreilly.com/someapplication/). The web app’s servlets are also isolated from those of other web applications. Web apps cannot directly access each other’s files (although they may be allowed to do so through the web server, of course). Each web app also has its own servlet context. We’ll discuss the servlet context in more detail, but in brief, it is a common area for servlets within an application to share information and get resources from the environment. The high degree of isolation between web applications is intended to support the dynamic deployment and updating of applications required by modern business systems and to address security and reliability concerns. Web apps are intended to be coarse-grained, relatively complete applications—not to be tightly coupled with other web apps. Although there’s no reason you can’t make web apps cooperate at a high level, for sharing logic across applications you might want to consider web services, which we’ll discuss later in this chapter.

The Servlet Lifecycle

Let’s jump now to the Servlet API and get started building servlets. We’ll fill in the gaps later when we discuss various parts of the APIs and WAR file structure in more detail. The Servlet API is very simple. The base Servlet class has three lifecycle methods—init(), service(), and destroy()—along with some methods for getting configuration parameters and servlet resources. However, these methods are not often used directly by developers. Generally developers will implement the doGet() and doPost() methods of the HttpServlet subclass and access shared resources through the servlet context, as we’ll discuss shortly.

Generally, only one instance of each deployed servlet class is instantiated per container. More precisely, it is one instance per servlet entry in the web.xml file, but we’ll talk more about servlet deployment in “Servlet Containers”. In the past, there was an exception to that rule when using the special SingleThreadModel type of servlet. As of Servlet API 2.4, single-threaded servlets have been deprecated.

By default, servlets are expected to handle requests in a multithreaded way; that is, the servlet’s service methods may be invoked by many threads at the same time. This means that you should not store per-request or per-client data in instance variables of your servlet object. (Of course, you can store general data related to the servlet’s operation, as long as it does not change on a per-request basis.) Per-client state information can be stored in a client session object on the server or in a client-side cookie, which persists across client requests. We’ll talk about client state later as well.

The service() method of a servlet accepts two parameters: a servlet “request” object and a servlet “response” object. These provide tools for reading the client request and generating output; we’ll talk about them (or rather their HttpServlet versions) in detail in the examples below.

Servlets

The package of primary interest to us here is javax.servlet.http, which contains APIs specific to servlets that handle HTTP requests for web servers. In theory, you can write servlets for other protocols, but nobody really does that and we are going to discuss servlets as if all servlets were HTTP-related.

Notice the javax package prefix similar to what we saw with the Swing packages. The servlet API is certainly an important part of Java, but it is not included with the base developer kit. You need to download a separate library, servlet-api.jar, from a third-party provider. Apache provides the reference implementation of the servlet API. Details on downloading this library and using it on the command line or with the IntelliJ IDEA IDE can be found in “Grabbing the Web Code Examples”.

The primary tool provided by the javax.servlet.http package is the HttpServlet base class. This is an abstract servlet that provides some basic implementation details related to handling an HTTP request. In particular, it overrides the generic servlet service() request and breaks it out into several HTTP-related methods, including doGet(), doPost(), doPut(), and doDelete(). The default service() method examines the request to determine what kind it is and dispatches it to one of these methods, so you can override one or more of them to implement the specific protocol behavior you need.

doGet() and doPost() correspond to the standard HTTP GET and POST operations. GET is the standard request for retrieving a file or document at a specified URL. POST is the method by which a client sends an arbitrary amount of data to the server. HTML forms utilize POST to send data as do most web services.

To round these out, HttpServlet provides the doPut() and doDelete() methods. These methods correspond to a part of the HTTP protocol popular with web applications using a REST (REpresentational State Transfer) API style. They provide a way to upload and remove files or other entities such as database records. doPut() is supposed to be like POST but with slightly different semantics (a PUT is supposed to logically replace the item identified by the URL, whereas POST presents new data to it); doDelete() would be its opposite.

HttpServlet also implements three other HTTP-related methods for you: doHead(), doTrace(), and doOptions(). You don’t normally need to override these methods. doHead() implements the HTTP HEAD request, which asks for the headers of a GET request without the body. HttpServlet implements this by default in the trivial way, by performing the GET method and then sending only the headers. You may wish to override doHead() with a more efficient implementation if you can provide one as an optimization. doTrace() and doOptions() implement other features of HTTP that allow for debugging and simple client/server capabilities negotiation. You shouldn’t normally need to override these.

Along with HttpServlet, javax.servlet.http also includes subclasses of the objects ServletRequest and ServletResponse, HttpServletRequest and HttpServletResponse. These subclasses provide, respectively, the input and output streams needed to read and write client data. They also provide the APIs for getting or setting HTTP header information and, as we’ll see, client session information. Rather than document these dryly, we’ll show them in the context of some examples. As usual, we’ll start with the simplest possible example.

The HelloClient Servlet

Here’s our servlet version of “Hello, World,” HelloClient:

@WebServlet(urlPatterns={"/hello"})
public class HelloClient extends HttpServlet
{
    public void doGet(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException
    {
        response.setContentType("text/html"); // must come first
        PrintWriter out = response.getWriter();
        out.println(
            "<html><head><title>Hello Client!</title></head><body>"
            + "<h1>Hello Client!</h1>"
            + "</body></html>" );
    }
}

If you want to try this servlet right away, skip ahead to “Servlet Containers”, where we walk through the process of deploying this servlet. Because we’ve included the WebServlet annotation in our class, this servlet does not need a web.xml file for deployment. All you have to do is bundle the class file into a particular folder within a WAR archive (a fancy ZIP file) and drop it into a directory monitored by the Tomcat server. For now, we’re going to focus on just the servlet example code itself, which is pretty simple in this case. The code examples for this portion of the book are available in a second repository on Github. Details for downloading and setting up IntelliJ IDEA to use the appropriate servlet library can be found in “Grabbing the Web Code Examples”.

Let’s have a look at the example. HelloClient extends the base HttpServlet class and overrides the doGet() method to handle simple requests. In this case, we want to respond to any GET request by sending back a one-line HTML document that says “Hello Client!” First, we tell the container what kind of response we are going to generate, using the setContentType() method of the HttpServletResponse object. We specify the MIME type “text/html” for our HTML response. Then, we get the output stream using the getWriter() method and print the message to it. It is not necessary for us to explicitly close the stream. We’ll talk more about managing the output stream throughout this chapter.

ServletExceptions

The doGet() method of our example servlet declares that it can throw a ServletException. All of the service methods of the Servlet API may throw a ServletException to indicate that a request has failed. A ServletException can be constructed with a string message and an optional Throwable parameter that can carry any corresponding exception representing the root cause of the problem:

    throw new ServletException("utter failure", someException );

By default, the web server determines exactly what is shown to the user whenever a ServletException is thrown; often there is a “development mode” where the exception and its stack trace are displayed. Most servlet containers (like Tomcat) allow you to designate custom error pages, but that’s beyond the scope of this chapter.

Alternatively, a servlet may throw an UnavailableException, a subclass of ServletException, to indicate that it cannot handle requests. This exception can be thrown to indicate that the condition is permanent or that it should last for a specified period of seconds.

Content type

Before fetching the output stream and writing to it, we must specify the kind of output we are sending by calling the response parameter’s setContentType() method. In this case, we set the content type to text/html, which is the proper MIME type for an HTML document. In general, though, it’s possible for a servlet to generate any kind of data, including audio, video, or some other kind of text or binary document. If we were writing a generic FileServlet to serve files like a regular web server, we might inspect the filename extension and determine the MIME type from that or from direct inspection of the data. (This is a good use for the java.nio.file.Files probeConentType() method!) For writing binary data, you can use the getOutputStream() method to get an OutputStream as opposed to a Writer.

The content type is used in the Content-Type: header of the server’s HTTP response, which tells the client what to expect even before it starts reading the result. This allows your web browser to prompt you with the “Save File” dialog when you click on a ZIP archive or executable program. When the content-type string is used in its full form to specify the character encoding (for example, text/html; charset=UTF-8), the information is also used by the servlet engine to set the character encoding of the PrintWriter output stream. As a result, you should always call the setContentType() method before fetching the writer with the getWriter() method. The character encoding can also be set separately via the servlet response setCharacterEncoding() method.

The Servlet Response

In addition to providing the output stream for writing content to the client, the HttpServletResponse object provides methods for controlling other aspects of the HTTP response, including headers, error result codes, redirects, and servlet container buffering.

HTTP headers are metadata name/value pairs sent with the response. You can add headers (standard or custom) to the response with the setHeader() and addHeader() methods (headers may have multiple values). There are also convenience methods for setting headers with integer and date values:

    response.setIntHeader("MagicNumber", 42);
    response.setDateHeader("CurrentTime", System.currentTimeMillis() );

When you write data to the client, the servlet container automatically sets the HTTP response code to a value of 200, which means OK. Using the sendError() method, you can generate other HTTP response codes. HttpServletResponse contains predefined constants for all of the standard codes. Here are a few common ones:

    HttpServletResponse.SC_OK
    HttpServletResponse.SC_BAD_REQUEST
    HttpServletResponse.SC_FORBIDDEN
    HttpServletResponse.SC_NOT_FOUND
    HttpServletResponse.SC_INTERNAL_SERVER_ERROR
    HttpServletResponse.SC_NOT_IMPLEMENTED
    HttpServletResponse.SC_SERVICE_UNAVAILABLE

When you generate an error with sendError(), the response is over and you can’t write any actual content to the client. You can specify a short error message, however, which may be shown to the client. (See the section “The Servlet Lifecycle”.)

An HTTP redirect is a special kind of response that tells the client web browser to go to a different URL. Normally this happens quickly and without any interaction from the user. You can send a redirect with the sendRedirect() method:

    response.sendRedirect("http://www.oreilly.com/");

While we’re talking about the response, we should say a few words about buffering. Most responses are buffered internally by the servlet container until the servlet service method has exited or a preset maximum size has been reached. This allows the container to set the HTTP content-length header automatically, telling the client how much data to expect. You can control the size of this buffer with the setBufferSize() method, specifying a size in bytes. You can even clear it and start over if no data has been written to the client. To clear the buffer, use isCommitted() to test whether any data has been sent, then use resetBuffer() to dump the data if none has been sent. If you are sending a lot of data, you may wish to set the content length explicitly with the setContentLength() method.

Servlet Parameters

Our first example showed how to accept a basic request. Of course, to do anything really useful, we’ll need to get some information from the client. Fortunately, the servlet engine handles this for us, interpreting both GET and POST form-encoded data from the client and providing it to us through the simple getParameter() method of the servlet request.

GET, POST, and “extra path”

There are two common ways to pass information from your web browser to a servlet or CGI program. The most general is to “post” it, meaning that your client encodes the information and sends it as a stream to the program, which decodes it. Posting can be used to upload large amounts of form data or other data, including files. The other way to pass information is to somehow encode the information in the URL of your client’s request. The primary way to do this is to use GET-style encoding of parameters in the URL string. In this case, the web browser encodes the parameters and appends them to the end of the URL string. The server decodes them and passes them to the application.

As we described earlier, GET-style encoding takes the parameters and appends them to the URL in a name/value fashion, with the first parameter preceded by a question mark (?) and the rest separated by ampersands (&). The entire string is expected to be URL-encoded: any special characters (such as spaces, ?, and & in the string) are specially encoded.

Another way to pass data in the URL is called extra path. This simply means that when the server has located your servlet or CGI program as the target of a URL, it takes any remaining path components of the URL string and hands them over as an extra part of the URL. For example, consider these URLs:

    http://www.myserver.example/servlets/MyServlet
    http://www.myserver.example/servlets/MyServlet/foo/bar

Suppose the server maps the first URL to the servlet called MyServlet. When given the second URL, the server also invokes MyServlet, but considers /foo/bar to be “extra path” that can be retrieved through the servlet request getExtraPath() method. This technique is useful for making more human-readable and meaningful URL pathnames, especially for document-centric content.

Both GET and POST encoding can be used with HTML forms on the client by specifying get or post in the action attribute of the form tag. The browser handles the encoding; on the server side, the servlet engine handles the decoding.

The content type used by a client to post form data to a servlet is: “application/x-www-form-urlencoded”. The Servlet API automatically parses this kind of data and makes it available through the getParameter() method. However, if you do not call the getParameter() method, the data remains available, unparsed, in the input stream and can be read by the servlet directly.

GET or POST: Which one to use?

To users, the primary difference between GET and POST is that they can see the GET information in the encoded URL shown in their web browser. This can be useful because the user can cut and paste that URL (the result of a search, for example) and mail it to a friend or bookmark it for future reference. POST information is not visible to the user and ceases to exist after it’s sent to the server. This behavior goes along with the protocol’s intent that GET and POST are to have different semantics. By convention, the result of a GET operation is not supposed to have any side effects; that is, it’s not supposed to cause the server to perform any persistent operations (such as making a purchase in a shopping cart). In theory, that’s the job of POST. That’s why your web browser warns you about reposting form data again if you hit reload on a page that was the result of a form posting.

The extra path style would be useful for a servlet that retrieves files or handles a range of URLs in a human-readable way. Extra path information is often useful for URLs that the user must see or remember, because it looks like any other path.

The ShowParameters Servlet

Our first example didn’t do much. This next example prints the values of any parameters that were received. We’ll start by handling GET requests and then make some trivial modifications to handle POST as well. Here’s the code:

import java.io.*;
import javax.servlet.http.*;
import java.util.*;

@WebServlet(urlPatterns={"/showParameter"})
public class ShowParameters extends HttpServlet
{
    public void doGet(HttpServletRequest request, HttpServletResponse response)
      throws IOException
    {
        showRequestParameters( request, response );
    }

    void showRequestParameters(HttpServletRequest request,
        HttpServletResponse response)
        throws IOException
    {
        response.setContentType("text/html");
        PrintWriter out = response.getWriter();

        out.println(
          "<html><head><title>Show Parameters</title></head><body>"
          + "<h1>Parameters</h1><ul>");

        Map<String, String[]> params = request.getParameterMap();
        for ( String name : params.keySet() )
        {
            String [] values = params.get( name );
            out.println("<li>"+ name +" = "+ Arrays.asList(values) );
        }

        out.close(  );
    }
}

As in the first example, we override the doGet() method. We delegate the request to a helper method that we’ve created, called showRequestParameters(), a method that enumerates the parameters using the request object’s getParameterMap() method, which returns a map of parameter name to values, and prints the names and values. Note that a parameter may have multiple values if it is repeated in the request from the client, hence the map contains String []. To make thing pretty, we listed each parameter in HTML with <li> tag.

As it stands, our servlet would respond to any URL that contains a GET request. Let’s round it out by adding our own form to the output and also accommodating POST method requests. To accept posts, we override the doPost() method. The implementation of doPost() could simply call our showRequestParameters() method, but we can make it simpler still. The API lets us treat GET and POST requests interchangeably because the servlet engine handles the decoding of request parameters. So we simply delegate the doPost() operation to doGet().

Add the following method to the example:

    public void doPost( HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException
    {
        doGet( request, response );
    }

Now, let’s add an HTML form to the output. The form lets the user fill in some parameters and submit them to the servlet. Add this line to the showRequestParameters() method before the call to out.close():

    out.println("</ul><p><form method="POST" action=""
            + request.getRequestURI() + "">"
      + "Field 1 <input name="Field 1" size=20><br>"
      + "Field 2 <input name="Field 2" size=20><br>"
      + "<br><input type="submit" value="Submit"></form>"
    );

The form’s action attribute is the URL of our servlet so that our servlet will get the data back. We use the getRequestURI() method to get the location of our servlet. For the method attribute, we’ve specified a POST operation, but you can try changing the operation to GET to see both styles.

So far, we haven’t done anything terribly exciting. In the next example, we’ll add some power by introducing a user session to store client data between requests.

User Session Management

One of the nicest features of the Servlet API is its simple mechanism for managing a user session. By a session, we mean that the servlet can maintain information over multiple pages and through multiple transactions as navigated by the user; this is also called maintaining state. Providing continuity through a series of web pages is important in many kinds of applications, such as handling a login process or tracking purchases in a shopping cart. In a sense, session data takes the place of instance data in your servlet object. It lets you store data between invocations of your service methods. Without such a mechanism, your servlet would have no way of knowing that two requests came from the same user.

Session tracking is supported by the servlet container; you normally don’t have to worry about the details of how it’s accomplished. It’s done in one of two ways: using client-side cookies or URL rewriting. Client-side cookies are a standard HTTP mechanism for getting the client web browser to cooperate in storing state information for you. A cookie is basically just a name/value attribute that is issued by the server, stored on the client, and returned by the client whenever it is accessing a certain group of URLs on a specified server. Cookies can track a single session or multiple user visits.

URL rewriting appends session-tracking information to the URL, using GET-style encoding or extra path information. The term rewriting applies because the server rewrites the URL before it is seen by the client and absorbs the extra information before it is passed back to the servlet. In order to support URL rewriting, a servlet must take the extra step to encode any URLs it generates in content (e.g., HTML links that may return to the page) using a special method of the HttpServletResponse object. You need to allow for URL rewriting by the server if you want your application to work with browsers that do not support cookies or have them disabled. Many sites simply choose not to work without cookies.

To the servlet programmer, state information is made available through an HttpSession object, which acts like a hashtable for storing any objects you would like to carry through the session. The objects stay on the server side; a special identifier is sent to the client through a cookie or URL rewriting. On the way back, the identifier is mapped to a session, and the session is associated with the servlet again.

The ShowSession Servlet

Here’s a simple servlet that shows how to store some string information to track a session:

import java.io.*;
import javax.servlet.ServletException;
import javax.servlet.http.*;
import java.util.Enumeration;

@WebServlet(urlPatterns={"/showSession"})
public class ShowSession extends HttpServlet {

    public void doPost(
        HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException
    {
        doGet( request, response );
    }

    public void doGet(
        HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException
    {
        HttpSession session = request.getSession();
        boolean clear = request.getParameter("clear") != null;
        if ( clear )
            session.invalidate();
        else {
            String name = request.getParameter("Name");
            String value = request.getParameter("Value");
            if ( name != null && value != null )
                session.setAttribute( name, value );
        }

        response.setContentType("text/html");
        PrintWriter out = response.getWriter();
        out.println(
          "<html><head><title>Show Session</title></head><body>");

        if ( clear )
            out.println("<h1>Session Cleared:</h1>");
        else {
            out.println("<h1>In this session:</h1><ul>");
            Enumeration names = session.getAttributeNames();
            while ( names.hasMoreElements() ) {
                String name = (String)names.nextElement();
                out.println( "<li>"+name+" = " +session.getAttribute(
                    name ) );
            }
        }

        out.println(
          "</ul><p><hr><h1>Add String</h1>"
          + "<form method="POST" action=""
          + request.getRequestURI() +"">"
          + "Name: <input name="Name" size=20><br>"
          + "Value: <input name="Value" size=20><br>"
          + "<br><input type="submit" value="Submit">"
          + "<input type="submit" name="clear" value="Clear"></form>"
        );
    }
}

When you invoke the servlet, you are presented with a form that prompts you to enter a name and a value. The value string is stored in a session object under the name provided. Each time the servlet is called, it outputs the list of all data items associated with the session. You will see the session grow as each item is added (in this case, until you restart your web browser or the server).

The basic mechanics are much like our ShowParameters servlet. Our doGet() method generates the form, which points back to our servlet via a POST method. We override doPost() to delegate back to our doGet() method, allowing it to handle everything. Once in doGet(), we attempt to fetch the user session object from the request object using getSession(). The HttpSession object supplied by the request functions like a hashtable. There is a setAttribute() method, which takes a string name and an Object argument, and a corresponding getAttribute() method. In our example, we use the getAttributeNames() method to enumerate the values currently stored in the session and to print them.

By default, getSession() creates a session if one does not exist. If you want to test for a session or explicitly control when one is created, you can call the overloaded version getSession(false), which does not automatically create a new session and returns null if there is no session. Alternately, you can check to see if a session was just created with the isNew() method. To clear a session immediately, we can use the invalidate() method. After calling invalidate() on a session, we are not allowed to access it again, so we set a flag in our example and show the “Session Cleared” message. Sessions may also become invalid on their own by timing out. You can control session timeout programmatically, in the application server, or through the web.xml file (via the “session-timeout” value of the “session config” section). In general, this appears to the application as either no session or a new session on the next request. User sessions are private to each web application and are not shared across applications.

We mentioned earlier that an extra step is required to support URL rewriting for web browsers that don’t support cookies. To do this, we must make sure that any URLs we generate in content are first passed through the HttpServletResponse encodeURL() method. This method takes a string URL and returns a modified string only if URL rewriting is necessary. Normally, when cookies are available, it returns the same string. In our previous example, we could have encoded the server form URL that was retrieved from getRequestURI() before passing it to the client if we wanted to allow for users without cookies.

Servlet Containers

It’s finally time to run all of that example code! There are many tools—known as containers—available for deploying servlets. Neither the OpenJDK nor the official Oracle JDK come with a servlet container built in. Online services such as AWS2 can provide reasonably quick, reasonably cheap containers making your servlets available to the world. For development though, you will undoubtedly want a local environment you can control and change and restart as you learn your way around the servlet APIs. Since we have to setup this local environment ourselves, we will be installing the “reference implementation” container, Apache Tomcat. We’ll be installing version 9, but older versions still support all of the servlet basics we’ve discussed so far.

As we described earlier, a WAR file is an archive that contains all the parts of a web application: Java class files for servlets and web services, JSPs, HTML pages, images, and other resources. The WAR file is simply a JAR file (which is itself a fancy ZIP file) with specified directories for the Java code and one designated configuration file: the web.xml file, which tells the application server what to run and how to run it. WAR files always have the extension .war, but they can be created and read with the standard jar tool.

The contents of a typical WAR might look like this, as revealed by the jar tool:

    $ jar tvf shoppingcart.war

        index.html
        purchase.html
        receipt.html
        images/happybunny.gif
        WEB-INF/web.xml
        WEB-INF/classes/com/mycompany/PurchaseServlet.class
        WEB-INF/classes/com/mycompany/ReturnServlet.class
        WEB-INF/lib/thirdparty.jar

When deployed, the name of the WAR becomes, by default, the root path of the web application—in this case, shoppingcart. Thus, the base URL for this web app, if deployed on http://www.oreilly.com, is http://www.oreilly.com/shoppingcart/, and all references to its documents, images, and servlets start with that path. The top level of the WAR file becomes the document root (base directory) for serving files. Our index.html file appears at the base URL we just mentioned, and our happybunny.gif image is referenced as http://www.oreilly.com/shoppingcart/images/happybunny.gif.

The WEB-INF directory (all caps, hyphenated) is a special directory that contains all deployment information and application code. This directory is protected by the web server, and its contents are not visible to outside users of the application, even if you add WEB-INF to the base URL. Your application classes can load additional files from this area using getResource() on the servlet context, however, so it is a safe place to store application resources. The WEB-INF directory also contains the web.xml file, which we’ll talk more about in the next section.

The WEB-INF/classes and WEB-INF/lib directories contain Java class files and JAR libraries, respectively. The WEB-INF/classes directory is automatically added to the classpath of the web application, so any class files placed here (using the normal Java package conventions) are available to the application. After that, any JAR files located in WEB-INF/lib are appended to the web app’s classpath (the order in which they are appended is, unfortunately, not specified). You can place your classes in either location. During development, it is often easier to work with the “loose” classes directory and use the lib directory for supporting classes and third-party tools. It’s also possible to install JAR files directly in the servlet container to make them available to all web apps running on that server. This is often done for common libraries that will be used by many web apps. The location for placing the libraries, however, is not standard and any classes that are deployed in this way cannot be automatically reloaded if changed—a feature of WAR files that we’ll discuss later. Servlet API requires that each server provide a directory for these extension JARs and that the classes there will be loaded by a single classloader and made visible to the web application.

Configuration with web.xml and Annotations

The web.xml file is an XML configuration file that lists servlets and related entities to be deployed, the relative names (URL paths) under which to deploy them, their initialization parameters, and their deployment details, including security and authorization. For most of the history of Java web applications, this was the only deployment configuration mechanism. However, as of the Servlet 3.0 API (Tomcat 7 and later), there are additional options. Most configuration can now be done using Java annotations. We saw the WebServlet annotation used in the first example, HelloClient, to declare the servlet and specify its deployment URL path. Using the annotation, we could deploy the servlet to the Tomcat server without any web.xml file. Another option with the Servlet 3.0 API is to deploy servlet procedurally—using Java code at runtime.

In this section we will describe both the XML and annotation style of configuration. For most purposes, you will find it easier to use the annotations, but there are a couple of reasons to understand the XML configuration as well. First, the web.xml can be used to override or extend the hardcoded annotation configuration. Using the XML, you can change configuration at deployment time without recompiling the classes. In general, configuration in the XML will take precedence over the annotations. It is also possible to tell the server to ignore the annotations completely, using an attribute called metadata-complete in the web.xml. Next, there may be some residual configuration, especially relating to options of the servlet container, which can only be done through XML.

We will assume that you have at least a passing familiarity with XML, but you can simply copy these examples in a cut-and-paste fashion. Let’s start with a simple web.xml file for our HelloClient servlet example. It looks like this:

    <web-app>
        <servlet>
            <servlet-name>helloclient1</servlet-name>
            <servlet-class>HelloClient</servlet-class>
        </servlet>
        <servlet-mapping>
            <servlet-name>helloclient1</servlet-name>
            <url-pattern>/hello</url-pattern>
        </servlet-mapping>
    </web-app>

The top-level element of the document is called <web-app>. Many types of entries may appear inside the <web-app>, but the most basic are <servlet> declarations and <servlet-mapping> deployment mappings. The <servlet> declaration tag is used to declare an instance of a servlet and, optionally, to give it initialization and other parameters. One instance of the servlet class is instantiated for each <servlet> tag appearing in the web.xml file.

At minimum, the <servlet> declaration requires two pieces of information: a <servlet-name>, which serves as a handle to reference the servlet elsewhere in the web.xml file, and the <servlet-class> tag, which specifies the Java class name of the servlet. Here, we named the servlet helloclient1. We named it like this to emphasize that we could declare other instances of the same servlet if we wanted to, possibly giving them different initialization parameters, etc. The class name for our servlet is, of course, HelloClient. In a real application, the servlet class would likely have a full package name, such as com.oreilly.servlets.HelloClient.

A servlet declaration may also include one or more initialization parameters, which are made available to the servlet through the ServletConfig object’s getInitParameter() method:

    <servlet>
        <servlet-name>helloclient1</servlet-name>
        <servlet-class>HelloClient</servlet-class>
        <init-param>
            <param-name>foo</param-name>
            <param-value>bar</param-value>
        </init-param>
    </servlet>

Next, we have our <servlet-mapping>, which associates the servlet instance with a path on the web server:

    <servlet-mapping>
        <servlet-name>helloclient1</servlet-name>
        <url-pattern>/hello</url-pattern>
    </servlet-mapping>

Here we mapped our servlet to the path /hello. (We could include additional url-patterns in the mapping if desired.) If we later name our WAR learningjava.war and deploy it on www.oreilly.com, the full path to this servlet would be http://www.oreilly.com/learningjava/hello. Just as we could declare more than one servlet instance with the <servlet> tag, we could declare more than one <servlet-mapping> for a given servlet instance. We could, for example, redundantly map the same helloclient1 instance to the paths /hello and /hola. The <url-pattern> tag provides some very flexible ways to specify the URLs that should match a servlet. We’ll talk about this in detail in the next section.

Finally, we should mention that although the web.xml example listed earlier will work on some application servers, it is technically incomplete because it is missing formal information that specifies the version of XML it is using and the version of the web.xml file standard with which it complies. To make it fully compliant with the standards, add a line such as:

    <?xml version="1.0" encoding="ISO-8859-1"?>

As of Servlet API 2.5, the web.xml version information takes advantage of XML Schemas. The additional information is inserted into the <web-app> element:

   <web-app
        xmlns="http://java.sun.com/xml/ns/j2ee"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee
        http://java.sun.com/xml/ns/j2ee/web-app_2_5.xsd”
        version="2.5">

If you leave them out, the application may still run, but it will be harder for the servlet container to detect errors in your configuration and give you clear error messages. Some smart editors also take advantage of the schema information to help with syntax highlighting, autocompletion, and other niceties.

The equivalent of the preceding servlet declaration and mapping is, as we saw earlier, our one line annotation:

@WebServlet(urlPatterns={"/hello", "/hola"})
public class HelloClient extends HttpServlet {
   ...
}

Here the WebServlet attribute urlPatterns allows us to specify one or more URL patterns that are the equivalent to the url-pattern declaration in the web.xml.

URL Pattern Mappings

The <url-pattern> specified in the previous example was a simple string, /hello. For this pattern, only an exact match of the base URL followed by /hello would invoke our servlet. The <url-pattern> tag is capable of more powerful patterns, however, including wildcards. For example, specifying a <url-pattern> of /hello* allows our servlet to be invoked by URLs such as http://www.oreilly.com/learningjava/helloworld or …/hellobaby. You can even specify wildcards with extensions (e.g., *.html or *.foo, meaning that the servlet is invoked for any path that ends with those characters).

Using wildcards can result in more than one match. Consider URLs ending in /scooby* and /scoobydoo*. Which should be matched for a URL ending in …/scoobydoobiedoo? What if we have a third possible match because of a wildcard suffix extension mapping? The rules for resolving these are as follows.

First, any exact match is taken. For example, /hello matches the /hello URL pattern in our example regardless of any additional /hello*. Failing that, the container looks for the longest prefix match. So /scoobydoobiedoo matches the second pattern, /scoobydoo*, because it is longer and presumably more specific. Failing any matches there, the container looks at wildcard suffix mappings. A request ending in .foo matches a *.foo mapping at this point in the process. Finally, failing any matches there, the container looks for a default, catchall mapping named /*. A servlet mapped to /* picks up anything unmatched by this point. If there is no default servlet mapping, the request fails with a “404 not found” message.

Deploying HelloClient

Once you’ve deployed the HelloClient servlet, it should be easy to add examples to the WAR as you work with them in this chapter. In this section, we’ll show you how to build a WAR by hand. There are certainly a variety of tools out there to help automate and manage WARs, but the manual approach is straightforward and helps illuminate the contents.

To create the WAR by hand, we first create the WEB-INF and WEB-INF/classes directories. If you are using a web.xml file, place it into WEB-INF. (Remember that the web.xml file is not necessary if you are using the WebServlet annotation with Tomcat 7 or later.) Put the HelloClient.class into WEB-INF/classes. Use the jar command to create learningjava.war (WEB-INF at the “top” level of the archive):

    $ jar cvf learningjava.war WEB-INF

You can also include documents and other resources in the WAR by adding their names after the WEB-INF directory. This command produces the file learningjava.war. You can verify the contents using the jar command:

    $ jar tvf learningjava.war
    document1.html
    WEB-INF/web.xml
    WEB-INF/classes/HelloClient.class

Now all that is necessary is to drop the WAR into the correct location for your server. If you have not already, you should download and install Apache Tomcat. You can grab the latest version and find some useful documentation at Apache’s Tomcat site. You can also jump right to downloading version 9 here: https://tomcat.apache.org/download-90.cgi.

The location for WAR files is the webapps directory within your Tomcat installation directory. Place your WAR here, and start the server. If Tomcat is configured with the default port number, you should be able to point to the HelloClient servlet with one of two URLs: http://localhost:8080/learningjava/hello or http://<yourserver>:8080/learningjava/hello, where <yourserver> is the name or IP address of your server. If you have trouble, look in the logs directory of the Tomcat folder for errors.

Reloading web apps

All servlet containers are supposed to provide a facility for reloading WAR files; many support reloading of individual servlet classes after they have been modified. Reloading WARs is part of the servlet specification and is especially useful during development. Support for reloading web apps varies from server to server. Normally, all that you have to do is drop a new WAR in place of the old one in the proper location (e.g., the webapps directory for Tomcat) and the container shuts down the old application and deploys the new version. This works in Tomcat when the “autoDeploy” attribute is set (it is on by default) and also in Oracle’s WebLogic application server when it is configured in development mode.

Some servers, including Tomcat, “explode” WARs by unpacking them into a directory under the webapps directory, or they allow you explicitly to configure a root directory (or “context”) for your unpacked web app through their own configuration files. In this mode, they may allow you to replace individual files, which can be especially useful for tweaking HTML or JSPs. Tomcat automatically reloads WAR files when you change them (unless configured not to), so all you have to do is drop an updated WAR over the old one and it will redeploy it as necessary. In some cases, it may be necessary to restart the server to make all changes take effect. When in doubt, shut down and restart.

The World-Wide Web is, well, wide

We have only scratched the surface of all that you can accomplish with Java and the Web. We looked at how built-in facilities in Java make accessing and interacting with online resources as simple as dealing with files. We also saw how to start putting your own Java code out into the world with servlets. As you explore servlets, you’ll undoubtedly run into other third-party libraries to add to your project just as we did with the servlet-api.jar file. Perhaps you are starting to understand just how big the Java ecosystem has become!

It is not just libraries and add-ons around Java that are expanding, either. The Java language itself continues to grow and evolve. In the next chapter, we’ll look at how to watch for new features on the horizon as well as how to work recently published features into existing code.

1 Perhaps “media type” would be a more friendly term. MIME is a bit of a historical acronym: Multipurpose Internet Mail Extensions.

2 Amazon Web Services is one of the largest providers with everything from free trials to enterprise-level tiers. But there are many, many online Java hosting options including Heroku and Google’s App Engine which is not a servlet container per se but still allows you to bring your Java skills to the web.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset