Chapter 9. Performance

Ultimately, none of the techniques presented in this book would be practical if they didn’t provide a solid foundation on which to build large web applications that perform quickly and efficiently. This chapter shows how to use the foundation from the previous chapters to monitor and tweak the performance of your application.

You may well get a performance boost simply by following the practices already presented in this book. For example, the semantically meaningful HTML presented in Chapter 3 can speed up page display for several reasons. Likewise, modular techniques for large-scale PHP (see Chapter 7) generally create a faster site than jumping in and out of the PHP interpreter multiple times whenever needed.

But every professional web developer devotes time to performance as an end in itself, so this chapter shows how performance optimization interacts with the techniques in this book. To guide our discussion, we’ll explore some of the recommendations presented in High Performance Web Sites (O’Reilly). This book, based on research conducted by Steve Souders at Yahoo!, suggests that for most websites, backend performance accounts for only 10 to 20 percent of the overall time required for a page to load; the remaining 80 to 90 percent is spent downloading components for the user interface. By following a set of 14 rules, many web applications can be made 20 to 25 percent faster.

These statistics emphasize the importance of paying close attention to the performance of how your HTML, CSS, JavaScript, and PHP work together. By utilizing a set of techniques for developing large web applications like the ones in this book, you can manage performance with relative ease and in a centralized manner.

Tenet 9: Large-scale HTML, JavaScript, CSS, and PHP provide a good foundation on which to build large web applications that perform well. They also facilitate a good environment for capturing site metrics and testing.

We begin this chapter by looking at how the techniques for developing large web applications discussed in this book can help us manage opportunities for caching. Next, we’ll explore some performance improvements that apply specifically to JavaScript. We then cover performance improvements related to ways we can distribute the various assets for an application across multiple servers. Finally, we’ll look at techniques that facilitate capturing site metrics and performing testing.

Caching Opportunities

One of the biggest opportunities for improving performance is caching. Caching is the preservation and management of a collection of data that replicates original data computed earlier or stored in another location. The idea is to avoid retrieving the original data repeatedly and thus to avoid the high performance cost of retrieval. Some examples of resources that you can cache in a user interface are CSS files, JavaScript files, images, and even the entire contents of modules and pages. Whenever you encounter something that doesn’t change very often (as is the case with CSS and JavaScript files especially), there is probably a good opportunity for caching.

Caching CSS and JavaScript

Whenever you can, you should place CSS and JavaScript in separate files that you can link on the pages that require them, as shown in Example 9-1. Not only does this allow you to share the contents of those files across multiple pages, it allows a browser to retrieve the files once over the wire, and then use them many times from the local cache.

Certainly, a browser can cache an HTML file that contains embedded CSS or JavaScript. However, the HTML is likely to change much more often than the CSS or JavaScript, so the browser may only cache it for a few moments. In contrast, you might go for months or even years without changing your CSS or JavaScript for a page. Separating the CSS and JavaScript into dedicated files therefore lets the browser store the CSS and JavaScript for repeated use, and just download the new HTML when needed.

In Chapter 7, you saw that modules and pages both define similar methods in their interfaces to specify the CSS and JavaScript files they require using get_css_linked and get_js_linked, respectively. Because each method results in links on the final page, as opposed to embedding CSS or JavaScript in the same page as the HTML, you get the benefits of caching.

Example 9-1. Linking JavaScript files for the benefits of caching
class PictureSlider extends Module
{
   ...

   public function get_js_linked()
   {
      // Specify the JavaScript files that must be included on the page.
      // This module needs YUI libraries for managing the DOM and doing
      // animation. The module's JavaScript is a part of sitewide.js.
      return array
      (
         "yahoo-dom-event.js",
         "animation.js",
         "sitewide.js"
      );
   }

   ...
}

Versioning CSS and JavaScript files

Anytime a browser caches a CSS or JavaScript file, it’s important to ensure that the browser knows when a copy of the cached file is no longer up to date with changes you’ve made. Without this, your application is likely to be styled incorrectly or contain JavaScript errors as your HTML gets out of sync with your CSS and JavaScript. A simple way to ensure the browser knows when to fetch a new version of a file is to give each file a version ID. Whenever you change the file, simply advance the version ID. As a result, the browser does not find the new version in its cache and subsequently fetches it. A good method for constructing version IDs is to append the date to the name of the file or use the version number from your source control system. For example, you could have the following:

sitewide_20090710.js

If you need to update the file multiple times on a single day, you can append a sequence number or letter after the date:

sitewide_20090710a.js

Of course, you’ll need to update references to the files wherever you link to them. Example 9-2 illustrates how easy this is to control in a centralized way using the register_links method presented in Chapter 7. Example 9-2 illustrates registering a JavaScript file with a version ID, and is based on the assumption that all pages in the web application have SitePage at some point in their class hierarchy. The get_js_linked method for pages and modules returns an array of keys. As files are linked for the page, these keys are used to look up the real path that was defined in register_links. Each time you need to update the version ID for a file, you adjust it in one place, such as the SitePage class shown here. The process for CSS files is similar.

Example 9-2. Registering a JavaScript file with a version ID
class SitePage extends Page
{
   ...

   public function register_links()
   {
      ...

      $this->js_linked_info = array
      (
         "sitewide.js" => array
         (
            "aka_path" => $this->aka_path."/sitewide_20090710.js",
            "loc_path" => $this->loc_path."/sidewide_20090710.js"
         ),

         ...
      );

      ...
   }

   ...
}

Ideally, changes to a CSS or JavaScript file would apply wherever the file is accessed. But what if a dependency on one page prevents it from using the new version? Again, the register_links method provides an easy way to manage such fine-grained distinctions. The page class for the page containing the dependency defines a more specific version of register_links that first calls upon register_links in the parent to set up all the links as normal, then overwrites the name of the file for which the page requires the earlier version, as shown in Example 9-3.

Example 9-3. Overriding a version ID for just one page
class NewCarSearchResultsPage extends SitePage
{
   ...

   public function register_links()
   {
      // Call upon the parent class to set up all the links as normal.
      parent::register_links();

      // Alter the link for which this page needs a different version.
      $this->js_linked_info["sitewide.js"] = array
      (
         "aka_path" => $this->aka_path."/sitewide_20090709.js",
         "loc_path" => $this->loc_path."/sidewide_20090709.js"
      );
   }

   ...
}

Combining CSS and JavaScript files

One of the issues when placing CSS and JavaScript in dedicated files is determining a good way to divide the CSS (or JavaScript). On the one hand, if you place all your CSS within a single, large file, your application will become monolithic, lack modularity, and end up more difficult to maintain. On the other hand, if you place the CSS for each module within its own individual file, you’ll end up with a large number of links on every page.

The section Minimizing HTTP Requests discusses a good middle ground for dividing your CSS and JavaScript across a set of files to minimize HTTP requests. Once you have a good division of files, you can minimize the number of requests made for CSS or JavaScript files even further by combining multiple requests into one. To do this, you need to implement a server that understands combined requests. Such a request for CSS files might look like the following using a link tag:

<link href="http://.../?sitewide_20090710.css&newcars_20090630.css"
type="text/css" rel="stylesheet" media="all" />

Such a request for JavaScript files looks similar, but occurs in a script tag. A request for JavaScript files might look like the following:

<script src="http://.../ext/yahoo-dom-event_2.7.0.js&ext/yahoo-
animation_2.7.0.js&sitewide_20090710.js" type="text/javacript">
</script>

Once the server receives the request, it concatenates the files in the specified order and returns the concatenated file to the browser. It also caches a copy of the concatenated file on the server to use the next time a request with the same combination of files is made (for example, the next time the same page is displayed to any visitor). The browser receives the single, concatenated file for all the CSS (or JavaScript) via a single HTTP request. Furthermore, the next time a request is made from the same browser for the same set of files, the browser will already have the concatenated version cached and can avoid the request altogether.

To combine CSS and JavaScript files, you need to write some scripts on a server to do the combining and some code to assemble the requests for combining files as you generate pages. In this book, we won’t examine the code to place on the server that does the combining, but the implementation is relatively straightforward. To build the requests for combining files, you need only make a few modifications to the Page class presented in Chapter 7. The modifications for combining JavaScript files are shown in Example 9-4. Combining CSS files is similar.

For CSS, just remember that you can only combine links that share the same media type (e.g., all, print), since all the concatenated files will form one file with one media type. Since media types other than all generally don’t require multiple CSS files, a simple but effective approach is to ignore requests to combine CSS files that have a media type other than all.

Example 9-4. The Page class with support for combining JavaScript files
class Page
{
   protected $js_is_combined;

   ...

   public function __construct()
   {
      parent::__construct();

      ...

      // Default combining JavaScript to true; however, you can always
      // disable it in a derived page or by calling the setter method.
      $this->js_is_combined = true;
   }

   ...

   public function set_js_combined($flag)
   {
      // Offer a way to enable or disable handling combined JavaScript.
      $this->js_is_combined = $flag;
   }

   ...

   private function create_js_combined_part($k)
   {
      // Candidates for combining need to be from one server. Set that
      // here as a prefix to check. We'll log errors for other paths.
      $prefix = "...";

      // Look up the actual path for the file identified by the key k.
      $path = $this->js_linked_info[$k]["aka_path"];

      // Return a query part only if combining is supported for the path.
      $pos = strpos($path, $prefix);

      if ($pos === 0)
         return str_replace($prefix, "", $path);
      else
         return "";
   }

   private function create_js_combined_query()
   {
      $combined_query = "";

      // We're making the assumption that local files are never combined
      // since normally alternative servers are used for the combining.
      if ($this->js_is_combined && !$this->js_is_local)
      {
         // Build an array of all the JavaScript keys in the order that
         // they were added by the page or modules created for the page.
         $all = array_merge
         (
            $this->js_common,
            $this->js_page_linked,
            $this->js_module_linked
         );

         $i = 0;

         // Build the combined query by appending each part one by one.
         foreach ($all as $k)
         {
            $part = $this->create_js_combined_part($k);

            if (empty($part))
            {
               // An empty part indicates that the path for the file is
               // not a path that supports combining. Log this issue.
               ...

               break;
            }

            $sep = ($i++ == 0) ? "?" : "&";
            $combined_query .= $sep.$part;
         }
      }

      return $combined_query;
   }

   ...
}

Caching Modules

Another opportunity for caching occurs each time you generate the CSS, JavaScript, and content for a module on the server. Caching for a module is especially useful when the module’s content, styles, and behaviors require a fair amount of CPU work to generate and you don’t expect them to change very often. A good approach to implementing cacheable modules is to provide the capabilities required by all cacheable modules within a base class called CacheableModule, derived from the Module class in Chapter 7. To make your own module cacheable, simply derive it from CacheableModule. Example 9-5 illustrates an implementation for the CacheableModule class.

Example 9-5. The implementation of a base class for cacheable modules
class CacheableModule extends Module
{
   protected $cache_ttl;
   protected $cache_clr;

   public function __construct($page)
   {
      parent::__construct($page);

      // The default time-to-live for entries in the cache is one hour.
      $this->cache_ttl = 3600;

      // The default is to check the cache first, but you can clear it.
      $this->cache_clr = false;
   }

   public function create()
   {
      // Check whether data exists in the cache for the module at all.
      $cache_key = $this->get_cache_key();
      $cache_val = apc_fetch($cache_key);

      // Set the hash for the variables on which the new data is based.
      $hash = $this->get_cache_hash($this->get_cache_vars());

      if (!$this->cache_clr && $cache_val && $cache_val["hash"]==$hash)
      {
         // Whenever we can use the cached module, access the cache.
         $content = $this->fetch_from_cache($cache_val["data"]);
      }
      else
      {
         // Otherwise, generate the module as normal and cache a copy.
         $content = $this->store_into_cache($cache_key, $hash);
      }

      return $content;
   }

   public function set_cache_ttl($ttl)
   {
      // Set the time-to-live to the specified value, in milliseconds.
      $this->cache_ttl = $ttl;
   }

   public function set_cache_clr()
   {
      // Force the cacheable module to bust any cached copy immediately.
      $this->cache_clr = true;
   }

   protected function get_cache_vars()
   {
      // Modules derived from this class should implement this method
      // to return a string that changes whenever the cache should be
      // discarded (the current microtime busts the cache by default).
      return microtime();
   }

   protected function fetch_from_cache($data)
   {
      // Add cached CSS styles to the page on which the module resides.
      $this->page->add_to_css_linked($data["css_linked"]);
      $this->page->add_to_css($data["css"]);

      // Add cached JavaScript to the page on which the module resides.
      $this->page->add_to_js_linked($data["js_linked"]);
      $this->page->add_to_js($data["js"]);

      // Return the cached content for the module.
      return $data["content"];
   }

   protected function store_into_cache($cache_key, $hash)
   {
      $css_linked = $this->get_css_linked();
      $css = $this->get_css();

      $js_linked = $this->get_js_linked();
      $js = $this->get_js();

      $content = $this->get_content();

      // Set up the data structure for the data to place in the cache.
      $cache_val = array
      (
         "hash" => $hash,
         "data" => array
         (
            "css_linked" => $css_linked,
            "css" => $css,
            "js_linked" => $js_linked,
            "js" => $js,
            "content" => $content
         )
      );

      // Store the new copy into the cache and apply the time-to-live.
      apc_store($cache_key, $cache_val, $this->cache_ttl);

      // Add module CSS styles to the page on which the module resides.
      $this->page->add_to_css_linked($css_linked);
      $this->page->add_to_css($css);

      // Add module JavaScript to the page on which the module resides.
      $this->page->add_to_js_linked($js_linked);
      $this->page->add_to_js($js);

      // Return the content that was just generated using get_content.
      return $content;
   }

   protected function get_cache_hash($var)
   {
      // Hash the string used to determine when to use the cached copy.
      return md5($var);
   }

   protected function get_cache_key()
   {
      // This must be unique per module, so use the derived class name.
      return get_class($this);
   }
}

The CacheableModule class uses the APC (Alternative PHP Cache) cache of PHP to implement the caching between instantiations of the module. The class provides a good example of overriding create provided by Module (see Chapter 7). Instead of the default implementation of create, the implementation here inspects the APC cache before generating the module. If the module can use the cache, it fetches its CSS, JavaScript, and content instead of generating them from scratch. If the module cannot use the cache, it generates itself as normal and caches its CSS, JavaScript, and content for the next time. To be clear, there are four conditions under which the module will be generated from scratch:

  • There is no copy in the cache at all.

  • The variables from which the cached copy is derived have changed.

  • The time-to-live has expired.

  • The $cache_clr member is set.

One of the nice things about the implementation in Example 9-5 is that using a cacheable module is very similar to using a module that is not cacheable. For example, suppose NewCarSearchResults were a module derived from CacheableModule. The code to instantiate and create this module looks like what was presented in Chapter 7. The call to set_cache_ttl is optional, just to set a different time-to-live than the default for the cache. You can also call the public method set_cache_clr whenever you want to ensure that a fresh copy of the module is generated.

$mod = new NewCarSearchResults
(
   $this,
   $this->data["new_car_listings"]
);

$mod->set_cache_ttl(1800);
$results = $mod->create();

The main thing to remember when using a cacheable module is that your class derived from CacheableModule needs to implement get_cache_vars for how you want caching to occur. This method should return a string that changes whenever you no longer want to use the cached copy of the module. This string is typically a concatenation of the variables and values on which the cached module depends.

Notice that the default implementation for get_cache_vars in the base class returns the current time in microseconds. This value ensures the default behavior is never to use the cached copy, since the time in microseconds is different whenever you generate the module. This will be the case until you provide more informed logic about when the cache should be considered valid by overriding get_cache_vars within your own implementation in the derived class.

Caching for Pages

Just as you can cache the contents of individual modules that you don’t expect to change frequently, you also can cache the contents of entire pages. The process for implementing this is similar to that for modules. You create a CacheablePage class and override the default implementations for the create and get_page methods. The start of create is a logical place to insert the code for generating the hash and searching the cache. At this point, you can inspect parameters for generating the page even before taking the time to load data for the page. If the page can use the cache, fetch the completely assembled page instead of generating it from scratch in get_page. If the page cannot use the cache, generate the page in the traditional manner (during which some caching may still be utilized by modules, remember) and cache the completely assembled page at the end of get_page for the next time.

A further opportunity for caching, of course, occurs when the data for the page is loaded. This type of caching is performed best by the backend since it has the visibility into how the data is stored, and ideally these details should be abstracted from the user interface. Therefore, we’re not going to look at an example of this in this book, although it clearly plays an important part of most large web applications.

Whenever you expect to do a lot of caching, keep in mind that caching can cause its own performance issues as memory becomes too full. In this case, a system may begin to thrash as it begins to spend more time swapping virtual pages in and out of memory than doing other work. You can keep an eye on this by running top on Unix systems and monitoring the process in charge of swapping for your system.

Caching with Ajax

Ajax provides another opportunity for caching. In Chapter 8, we discussed the usefulness of the MVC design pattern in managing the separation between data, presentation, and control in an Ajax application. Here, we revisit Example 8-15 with caching in the model. The model in this example manages an accordion list of additional trims for one car in a list of cars with good green ratings. When the model is updated, a view that subscribes to changes in the model updates itself to show an expanded list of cars that are trims related to the main entry. Because many of the cars will never have their lists expanded, loading the lists of trims on demand via Ajax is a good approach. Because the list of trims doesn’t change frequently, caching the list in the model once retrieved also makes a lot of sense. Example 9-6 illustrates caching trims in the model that we discussed in Chapter 8.

Example 9-6. Caching with Ajax added to Example 8-15
GreenSearchResultsModel = function(id)
{
   MVC.Model.call(this);

   this.carID = id;
};

GreenSearchResultsModel.prototype = new MVC.Model();

GreenSearchResultsModel.prototype.setCache = function()
{
   // This implements a caching layer in the browser. If other cars
   // under the main entry were fetched before, we don't refetch them.
   if (this.state.cars)
   {
      // Other cars under the main car were fetched before, so just
      // send a notification to each of the views to update themselves.
      this.notify();
   }
   else
   {
      // Cars under the main entry are not cached, so set the state of
      // the model by specifying the URL through which to make the Ajax
      // request. The setState method is responsible for notifying views.
      this.setState("GET", "...?carid=" + this.carID);
   }
};

GreenSearchResultsModel.prototype.recover = function()
{
   alert("Could not retrieve the cars you are trying to view.");
};

WineSearchResultsModel.prototype.abandon = function()
{
   alert("Timed out fetching the cars you are trying to view.");
};

GreenSearchResultsView = function(i)
{
   MVC.View.call(this);

   // The position of the view is helpful when performing DOM updates.
   this.pos = i;
}

GreenSearchResultsView.prototype = new MVC.View();

GreenSearchResultsView.prototype.update = function()
{
   var cars = this.model.state.cars;
   ...

   // There is no need to update the view or show a button for one car.
   if (this.total == 1)
      return;

   if (!cars)
   {
      // When no cars are loaded, we're rendering for the first time.
      // In this case, we likely need to do different things in the DOM.
      ...
   }
   else
   {
      // When there are cars loaded, update the view by working with
      // the DOM to show the cars that are related to the main car.
      ...
   }
};

GreenSearchResultsView.prototype.show = function()
{
   // When we show the view, check whether we can use the cache or not.
   this.model.setCache();
};

GreenSearchResultsView.prototype.hide = function()
{
   // When we hide the view, modify the DOM to make the view disappear.
   ...
};

To implement caching in the model, Example 9-6 adds the setCache method to GreenSearchResultsModel. The event handler for showing the expanded list of cars invokes the show method of GreenSearchResultsView. This, in turn, calls setCache for the model. If the model already contains a list of cars, the cached list is used and no server request occurs. If the model does not contain the list, it makes a request back to the server via the setState method of Model and caches the returned list for the next time. To request the proper list of cars, the request uses the carID member of the model as a parameter. This is set in the constructor to identify the car for which we want additional trims. After the appropriate action is taken based on the state of the cache, the model calls notify (either directly or within setState), and notify calls update for each view subscribed to the model, which, in the list of search results, is just one view for each car.

Using Expires Headers

Another way to control caching for Ajax applications is to set an Expires header on the server. This header informs the browser of the date after which the result is to be considered stale. It’s particularly important to set this to 0 when your data is highly dynamic. In PHP, set the Expires header to 0 by doing the following before you echo anything for the page:

header("Expires: 0");

If you have a specific time in the future at which you’d like a cached result to expire, you can use an HTTP date string:

header("Expires: Fri, 17 Jul 2009 16:00:00 GMT");

Managing JavaScript

We’ve already discussed some aspects of managing JavaScript performance in the various topics presented for caching. In this section, we look at other ideas for managing JavaScript, including its placement within the overall structure of a page, the use of JavaScript minification, and an approach for ensuring that you never end up with duplicates of the same JavaScript file on a single page.

JavaScript Placement

Whenever possible, you should place JavaScript at the bottom of the page. The main reason for this is because the rendering of a page pauses while a JavaScript file loads (presumably because the JavaScript being loaded could alter the DOM already in the process of being created). Given this, large files or network latency can cause significant delays as a page loads. If you’ve ever seen a page hang while waiting for an ad to load, you have likely suffered through this problem caused by JavaScript loading. If you place your JavaScript at the end of the page, the page will finish rendering by the time it encounters the JavaScript.

The Page class in Chapter 7 addresses this issue simply by defaulting all JavaScript to the bottom of the page where the page is assembled within the get_page method. That said, there are times when you may find it necessary to place your JavaScript at the top. One example is when the main call to action on a page requires JavaScript, such as a selection list or other user interface component that appears near the top and commands the user’s attention. For these situations, Page provides the set_js_top method, which you can call after the page is instantiated to indicate that the JavaScript should be placed at the top of the page.

To preserve modularity, a module should not rely on a particular placement for its JavaScript beyond the order of the dependencies specified in its own get_js_linked method. So, for example, you shouldn’t assume that the DOM will be ready to use when your JavaScript starts to run, even if you have placed the JavaScript at the bottom of the page. Here, it’s better to rely on the YUI library’s onDOMReady method for registering a callback to execute as soon as the DOM is stable.

JavaScript Minification

Minification removes whitespace, comments, and the like, and performs other innocuous modifications that reduce the overall size of a JavaScript file. This is not to be confused with obfuscation. Obfuscation can result in even smaller JavaScript files and code that is very difficult to read (thereby providing some rudimentary protection from reverse engineering), but its alteration of variable names and other references requires coordination across files that often makes the rewards not worth the risks. Minification, on the other hand, comes with very little risk and offers an easy way to reduce download times for JavaScript files.

To minify a JavaScript file, use Douglas Crockford’s JSMin utility, which is available at http://www.crockford.com/javascript/jsmin.html, or you can use YUICompressor, available at http://developer.yahoo.com/yui/compressor, which minifies CSS, too. In addition, be sure that your web server is configured to gzip not only HTML, but CSS and JavaScript as well.

A common complaint among developers about minified JavaScript is how to gracefully transition between human-readable JavaScript within development environments and minified JavaScript for production systems. The register_links method of the Page class from Chapter 7 offers a good solution. As we’ve seen, register_links lets you define two locations for each file: one referenced using $aka_path (an “also-known-as” path intended for files on production servers), and the other referenced using $loc_path (a “local path” intended for files on development systems). Set the $js_is_local flag to select between them. Example 9-7 provides an example of managing minification.

Example 9-7. Managing minified and development JavaScript in a page class
class SitePage extends Page
{
   ...

   public function register_links()
   {
      $this->aka_path = "http://...";
      $this->loc_path = "http://...";

      $this->js_linked_info = array
      (
         "sitewide.js" => array
         (
            "aka_path" => $this->aka_path."/sitewide_20090710-min.js",
            "loc_path" => $this->loc_path."/sidewide_20090710.js"
         ),

         ...
      );

      // Access the minified JavaScript files on the production servers.
      $this->js_is_local = false;
   }

   ...
}

Removing Duplicates

Modular development intrinsically raises the risk of including the same file more than once. Duplicating JavaScript files may seem like an easy thing to avoid, but as the number of scripts added to a large web application and the number of developers working together increase, there’s a good chance that duplications will occur if there’s no procedure for managing file inclusion. Fortunately, the use of keys for JavaScript files, which we’ve already discussed, prevents the duplication of JavaScript files intrinsically. In fact, for a truly modular system, every module is expected to specify in its own get_js_linked method precisely the JavaScript files that it requires without concerns about which other modules might or might not need the files. The page will exclude the duplicates and link files in the proper order.

Example 9-8 shows how the Page class prevents duplicate JavaScript files from being linked within its manage_js_linked method. Managing duplicate CSS files is similar.

Example 9-8. Preventing duplicate JavaScript files from being linked
class Page
{
   ...

   private function manage_js_linked($keys)
   {
      $js = "";

      if (empty($keys))
         return "";

      // Normalize so that we can pass keys individually or as an array.
      if (!is_array($keys))
         $keys = array($keys);

      foreach ($keys as $k)
      {
         // Log an error for unknown keys when there is no link to add.
         if (!array_key_exists($k, $this->js_linked_info))
         {
            error_log("Page::manage_js_linked: Key "".$k."" missing");
            continue;
         }

         // Add the link only if it hasn't been added to the page before.
         if (array_search($k, $this->js_linked_used) === false)
         {
            $this->js_linked_used[] = $k;
            $js .= $this->create_js_linked($k);
         }
      }

      return $js;
   }

   ...
}

Distribution of Assets

Another method for improving the performance of a large web application is to distribute your assets across a number of servers. Whereas only very large web applications may be able to rely on virtual IP addresses and load balancers to distribute traffic among application servers, anyone can accomplish a distribution of assets to some extent simply by distributing CSS files, JavaScript files, and images. This section describes a few approaches for managing this.

Content Delivery Networks

Content delivery networks are networks like those of Akamai and a few other companies that are typically available only to very large web applications. These networks use sophisticated caching algorithms to spread content throughout a highly distributed network so that it eventually reaches servers that are geographically close to any visitor that might request it. Amazon.com’s CloudFront, an extension to its S3 storage service, presents an interesting recent twist on this industry that may bring this high-performance technology within the reach of more sites.

If you work for a company that has access to a content delivery network and you employ an approach to developing pages using classes like those in Chapter 7, you can store the path to its servers within the $aka_path member used when defining CSS and JavaScript links in the SitePage class. Recall, $aka_path is intended to reference production servers when $js_is_local is false.

Minimizing DNS Lookups

As you distribute assets across different servers, it’s important to strike a balance with the number of Domain Name Service (DNS) lookups that a page must perform. Looking up the IP address associated with a hostname is another type of request that affects how fast your page loads. Furthermore, even after a name has been resolved, the amount of time a name remains valid varies based on a number of factors, including the time-to-live value returned in the DNS record itself, settings in the operating system, settings in the browser, and the Keep-Alive feature of the HTTP protocol. As a result, it’s important to pay attention to how many DNS requests your page ends up generating.

A simple way to manage this number is to define the paths (including hostnames) for the assets you plan to use across your large web application in a central place. The class hierarchy we discussed for pages in Chapter 7 provides some insight into where to place the members that define these paths.

Recall that a logical set of classes to derive from Page includes a sitewide page class, a page class for each section of the site, and a page class for each specific page. Considering this, the sitewide page class, SitePage, makes an excellent place to define paths that affect the number of DNS lookups. By defining the paths here, all parts of your large web application can access the paths as needed and you’ll have a single, centrally located place where you can manage the number of DNS requests that your assets require. High Performance Web Sites suggests dividing your assets across at least two hosts, but not more than four. Many web applications use one set of hosts for static assets like CSS, JavaScript, and image files, and another set for server-side code.

Minimizing HTTP Requests

As we saw earlier in this chapter, the first step to minimizing HTTP requests is to take advantage of caching and combine multiple requests for CSS and JavaScript files into single requests for each. This section presents additional opportunities for minimizing the number of HTTP requests for a page. For the most part, this means carefully managing requests for CSS, JavaScript, and images.

Guidelines for CSS files

Just as we discussed with caching, one of the issues with minimizing HTTP requests for CSS files is determining a good division of files. A good starting point for managing the number of CSS files in a large web application is to define one CSS file as a common file linked by all pages across the site, one CSS file for each section of the site, and as few other CSS files as possible. However, you are likely to find other organization schemes specific to your web application. Naturally, as you start to need additional CSS files on a single page (to support different CSS media types, for example), you’ll find that the number of CSS files that you may want to link can grow quickly. Therefore, whenever possible, employ the technique presented earlier for combining multiple CSS files into a single request.

Guidelines for JavaScript files

As with CSS files, a good starting point for managing the number of JavaScript files in a large web application is to use one common file containing JavaScript applicable to most parts of the site, a set of sectional files each specific to one section of the site, and as few other JavaScript files as possible. Of course, if you use a lot of external libraries, those will increase the number of files that you need to link; however, you can always take the approach of joining files together on your own servers in ways that make sense for your application.

This has the added benefit of placing the files directly under your control rather than on someone else’s servers, and it reduces DNS lookups. In addition, many libraries provide files that contain groupings of library components that are most frequently used together. One example is yahoo-dom-event.js in the YUI library. Again, employ the techniques presented earlier for combining multiple JavaScript files into a single request.

Guidelines for image files

Surprisingly, you can also combine image files, although in a more complicated way than CSS and JavaScript files, using a technique called spriting. Spriting is the process of creating a single larger image that contains many smaller originals of the same type (e.g., GIF, JPEG, etc.) at known offsets in one file. You can use these offsets to position the larger image within an HTML element so that just its desired portion is visible. Spriting is a good way to reduce HTTP requests, but there are some practical considerations that limit how images can be combined.

One practical limitation occurs with images that will be used for repeating backgrounds. Only those images to be repeated in the same direction (i.e., the x or y direction) and with the same size in that direction can be combined. Otherwise, for all images smaller than the largest one, you’ll see space between repeated images.

Another practical consideration is that sprites can change rather frequently because changes or additions for any individual image require the sprite file to change. When this happens, browsers that have an earlier version cached need to know to get the new version on the next request. Fortunately, using a version ID like we did for CSS and JavaScript files provides a good solution. That said, the management of version IDs for sprites is a little more problematic for two reasons: first, sprites are often referenced from CSS files, which usually are not run through the PHP interpreter (to manage the version IDs dynamically); and second, changes to images may require you to update offsets within the CSS as well as version IDs for files.

Considering these practical limitations, a good approach is to look for opportunities for spriting within scopes that are easy to manage. For example, if we create a sprite file with just the images for a specific module, it’s easy to keep the module in sync with version ID and offset changes that take place as the sprite file changes. Example 9-9 illustrates spriting within a module for a navigation bar. The module uses five icons with two states (selected and unselected), which reside in one sprite file. The sprite is named after the module, which is a good practice for the purposes of documentation.

Example 9-9. Spriting for icons in a navigation bar
#navbar .ichome .selected
{
   width: 50px;
   height: 50px;
   background: url(http://.../navbar_20090712.jpg) 0px 0px no-repeat;
}
#navbar .ichome .noselect
{
   width: 50px;
   height: 50px;
   background: url(http://.../navbar_20090712.jpg) -50px 0px no-repeat;
}
#navbar .icrevs .selected
{
   width: 50px;
   height: 50px;
   background: url(http://.../navbar_20090712.jpg) -100px 0px no-repeat;
}
#navbar .icrevs .noselect
{
   width: 50px;
   height: 50px;
   background: url(http://.../navbar_20090712.jpg) -150px 0px no-repeat;
}

...

#navbar .icabout .selected
{
   width: 50px;
   height: 50px;
   background: url(http://.../navbar_20090712.jpg) ... 0px no-repeat;
}
#navbar .icabout .noselect
{
   width: 50px;
   height: 50px;
   background: url(http://.../navbar_20090712.jpg) ... 0px no-repeat;
}

Figure 9-1 illustrates positioning the sprite image for the first two icons in Example 9-9 (at positions 0px, 0px and –50px, 0px).

Positioning the sprite for the first two icons in
Figure 9-1. Positioning the sprite for the first two icons in Example 9-9

An additional approach for managing sprite files is to create one per CSS file. This way, whenever the sprite changes, you know that only its associated CSS file needs to be updated to reflect the new sprite file and changes to offsets. Again, a naming convention can help document this approach. For example, for the CSS file newcars_20090731.css (containing the CSS for one section of the application), you can create a sprite file called newcars_20090731.jpg that contains all the JPEG images for just that section.

Control Over Site Metrics

Although not directly related to performance from the standpoint of how fast a page loads, the ability to capture metrics about how visitors are using your web application does tell you a great deal about other aspects of how your application is performing with regard to the overall user experience. In this section, we’ll look at an easy approach for adding Google Analytics to a large web application.

Google Analytics is a free service that provides great tools for analyzing how visitors are using your web application. Once you register for the service, enabling metrics for your site is simply a matter of adding a snippet of code to the right place on all pages that you want to track. As we have discussed several times throughout this chapter, the SitePage class that you define for use by all pages across your application offers a logical place to manage this code. Example 9-10 illustrates this.

Example 9-10. Adding Google Analytics across an entire web application
class SitePage extends Page
{
   protected $google_site_id;
   protected $google_site_nm;

   ...

   public function __construct()
   {
      // You get the ID for your site once you've signed up with Google.
      $this->google_site_id = "...";
      $this->google_site_nm = "...";
   }

   ...

   public function get_all_js()
   {
      // First, get all the JavaScript that was assembled for the page.
      $js = parent::get_all_js();

      // This is the snippet of Google Analytics code from registering.
      $analytics = <<<EOD
<!-- Google Analytics -->
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol)
   ? "https://ssl."
   : "http://www.");

document.write(unescape("%3Cscript src='" +
   gaJsHost + "google-analytics.com/ga.js'
   type='text/javascript'%3E%3C/script%3E"));

var pageTracker = _gat._getTracker($this->google_site_id);
pageTracker._setDomainName($this->google_site_nm);
pageTracker._trackPageview();
</script>
EOD;
      }

      // Append Google Analytics to the JavaScript that was assembled
      // otherwise for the page.
      return <<<EOD
$js
$analytics

EOD;
   }

   ...
}

Of course, your metrics won’t be accurate if you include this code in environments for which you shouldn’t be tracking pages. For example, you’ll probably want to exclude development environments, testing environments, staging environments, and the like. To do this, simply define a flag that you set to the type of environment in which the page is being displayed, as shown in Example 9-11. The PHP code checks to make sure that it’s generating a page within the production environment before including the Google Analytics code.

Example 9-11. Adding Google Analytics just for tracking on production systems
class SitePage extends Page
{
   protected $google_site_id;
   protected $google_site_nm;
   protected $op_environment;

   ...

   public function __construct()
   {
      $this->google_site_id = "...";
      $this->google_site_nm = "...";

      // Set this from a server config that indicates the environment.
      $this->op_environment = "...";
   }

   ...

   public function get_all_js()
   {
      // First, get all the JavaScript that was assembled for the page.
      $js = parent::get_all_js();
      $analytics = "";

      if ($this->op_environment == "production")
      {
         // Add the Google Analystics code here for production tracking.
         $analytics = <<<EOD
         ...

EOD;
      }

      // Google Analytics is not appended unless running in production.
      return <<<EOD
$js
$analytics

EOD;
   }

   ...
}

Modular Testing

Because the ability to test a large web application is closely related to performance, this section discusses how to use some of the techniques presented in this book to create pages that can easily utilize test data.

Using Test Data

Modularity makes adding and removing components easier. This is important for test data, too. The data for most web applications comes from databases or other backend systems. But while the backend is under development, you might have to test your modules in the absence of real data, or at least programming logic to retrieve the data. Therefore, you need a clean and simple way to inject invented data into your modules. Because data managers (see Chapter 6) define the interface for data exchange between the user interface and backend, they offer a good point at which to define hardcoded data that precisely matches the structure of the real data that you expect to exchange with the backend later. When you’re ready to use the real data, it’s easy to remove the test data manager and replace it with the real one.

To use a test data manager, require its include file in place of the include file for the real one, but use the same name for the data manager class. The only difference is something in the name of the include file to distinguish it, such as a _test suffix. Example 9-12 illustrates the key goal: using the test data looks exactly like using the real data later, except for the name of the include file.

Example 9-12. Using a test data manager
<?php
require_once(".../common/sitepage.inc");
require_once(".../common/navbar.inc");
require_once(".../common/subnav.inc");
require_once(".../common/nwcresults.inc");
...

require_once(".../layout/resultslayout.inc");
...

// Include the test data manager until the real data manager is ready.
require_once(".../datamgr/nwclistings_test.inc");
...

class NewCarSearchResultsPage extends SitePage
{
   ...

   public function load_data()
   {
      // This appears exactly like it will with the real data manager.
      $dm = new NewCarListingsDataManager();

      // The data members for loading are provided by the base class.
      // Populate them as needed by the data manager and call get_data.
      ...

      $dm->get_data
      (
         $this->load_args["new_car_listings"],
         $this->load_data["new_car_listings"],
         $this->load_stat["new_car_listings"]
      );

      // Check the status member and handle any errors, which often
      // require a redirect to another page using the header function.
      if ($this->load_stat != 0)
         header("Location: ...");

      ...
   }

   public function get_content()
   {
      ...

      // This appears exactly like it will with the real data manager.
      $mod = new NewCarResults
      (
         $this,
         $this->load_data["new_car_listings"]
      );

      $results = $mod->create();

      ...
   }

   ...
}
?>

Creating Test Data

An important aspect of creating test data managers is that they can actually serve as a contract of sorts between the user interface and backend for how the two will exchange data for real. Ideally, once the data manager is ready, you should be able to remove the test data manager, replace it with the real one, and have a working system with relatively minor tweaks.

Example 9-13 illustrates a simple test data manager, which populates a data structure with some hardcoded test values. This defines the data structure that will be used for the real data later.

Example 9-13. Defining a test data manager
class NewCarReviewsDataManager extends DataManager
{
   ...

   public function __construct()
   {
      parent::__construct();

      ...
   }

   public function get_data($load_args, &$load_data, &$load_stat)
   {
      // Populate the data structure explicitly with data for testing.
      // This also defines the contract between the backend and user
      // interface for how the real data eventually should be handled.
      $load_data = array
      (
         "0" => array
         (
            "name"  => "2009 Honda Accord",
            "price" => "21905",
            "link"  => "http://.../reviews/00001/"
         ),
         "1" => array
         (
            "name"  => "2009 Toyota Prius",
            "price" => "22000",
            "link"  => "http://.../reviews/00002/"
         ),
         "2" => array
         (
            "name"  => "2009 Nissan Altima",
            "price" => "19900",
            "link"  => "http://.../reviews/00003/"
         )
      );
   }
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset