Ultimately, none of the techniques presented in this book would be practical if they didn’t provide a solid foundation on which to build large web applications that perform quickly and efficiently. This chapter shows how to use the foundation from the previous chapters to monitor and tweak the performance of your application.
You may well get a performance boost simply by following the practices already presented in this book. For example, the semantically meaningful HTML presented in Chapter 3 can speed up page display for several reasons. Likewise, modular techniques for large-scale PHP (see Chapter 7) generally create a faster site than jumping in and out of the PHP interpreter multiple times whenever needed.
But every professional web developer devotes time to performance as an end in itself, so this chapter shows how performance optimization interacts with the techniques in this book. To guide our discussion, we’ll explore some of the recommendations presented in High Performance Web Sites (O’Reilly). This book, based on research conducted by Steve Souders at Yahoo!, suggests that for most websites, backend performance accounts for only 10 to 20 percent of the overall time required for a page to load; the remaining 80 to 90 percent is spent downloading components for the user interface. By following a set of 14 rules, many web applications can be made 20 to 25 percent faster.
These statistics emphasize the importance of paying close attention to the performance of how your HTML, CSS, JavaScript, and PHP work together. By utilizing a set of techniques for developing large web applications like the ones in this book, you can manage performance with relative ease and in a centralized manner.
Tenet 9: Large-scale HTML, JavaScript, CSS, and PHP provide a good foundation on which to build large web applications that perform well. They also facilitate a good environment for capturing site metrics and testing.
We begin this chapter by looking at how the techniques for developing large web applications discussed in this book can help us manage opportunities for caching. Next, we’ll explore some performance improvements that apply specifically to JavaScript. We then cover performance improvements related to ways we can distribute the various assets for an application across multiple servers. Finally, we’ll look at techniques that facilitate capturing site metrics and performing testing.
One of the biggest opportunities for improving performance is caching. Caching is the preservation and management of a collection of data that replicates original data computed earlier or stored in another location. The idea is to avoid retrieving the original data repeatedly and thus to avoid the high performance cost of retrieval. Some examples of resources that you can cache in a user interface are CSS files, JavaScript files, images, and even the entire contents of modules and pages. Whenever you encounter something that doesn’t change very often (as is the case with CSS and JavaScript files especially), there is probably a good opportunity for caching.
Whenever you can, you should place CSS and JavaScript in separate files that you can link on the pages that require them, as shown in Example 9-1. Not only does this allow you to share the contents of those files across multiple pages, it allows a browser to retrieve the files once over the wire, and then use them many times from the local cache.
Certainly, a browser can cache an HTML file that contains embedded CSS or JavaScript. However, the HTML is likely to change much more often than the CSS or JavaScript, so the browser may only cache it for a few moments. In contrast, you might go for months or even years without changing your CSS or JavaScript for a page. Separating the CSS and JavaScript into dedicated files therefore lets the browser store the CSS and JavaScript for repeated use, and just download the new HTML when needed.
In Chapter 7, you saw that modules and
pages both define similar methods in their interfaces to specify the
CSS and JavaScript files they require using get_css_linked
and
get_js_linked
,
respectively. Because each method results in links on the final page, as
opposed to embedding CSS or JavaScript in the same page as the HTML, you
get the benefits of caching.
class PictureSlider extends Module { ... public function get_js_linked() { // Specify the JavaScript files that must be included on the page. // This module needs YUI libraries for managing the DOM and doing // animation. The module's JavaScript is a part of sitewide.js. return array ( "yahoo-dom-event.js", "animation.js", "sitewide.js" ); } ... }
Anytime a browser caches a CSS or JavaScript file, it’s important to ensure that the browser knows when a copy of the cached file is no longer up to date with changes you’ve made. Without this, your application is likely to be styled incorrectly or contain JavaScript errors as your HTML gets out of sync with your CSS and JavaScript. A simple way to ensure the browser knows when to fetch a new version of a file is to give each file a version ID. Whenever you change the file, simply advance the version ID. As a result, the browser does not find the new version in its cache and subsequently fetches it. A good method for constructing version IDs is to append the date to the name of the file or use the version number from your source control system. For example, you could have the following:
sitewide_20090710.js
If you need to update the file multiple times on a single day, you can append a sequence number or letter after the date:
sitewide_20090710a.js
Of course, you’ll need to update references to the files
wherever you link to them. Example 9-2 illustrates how
easy this is to control in a centralized way using the register_links
method
presented in Chapter 7. Example 9-2 illustrates
registering a JavaScript file with a version ID, and is based on the
assumption that all pages in the web application have SitePage
at some point in their class
hierarchy. The get_js_linked
method for pages and
modules returns an array of keys. As files are linked for the page,
these keys are used to look up the real path that was defined in
register_links
. Each time you need
to update the version ID for a file, you adjust it in one place, such
as the SitePage
class shown here.
The process for CSS files is similar.
class SitePage extends Page { ... public function register_links() { ... $this->js_linked_info = array ( "sitewide.js" => array ( "aka_path" => $this->aka_path."/sitewide_20090710.js", "loc_path" => $this->loc_path."/sidewide_20090710.js" ), ... ); ... } ... }
Ideally, changes to a CSS or JavaScript file would apply
wherever the file is accessed. But what if a dependency on one page
prevents it from using the new version? Again, the register_links
method provides an easy way
to manage such fine-grained distinctions. The page class for the page
containing the dependency defines a more specific version of
register_links
that
first calls upon register_links
in
the parent to set up all the links as normal, then overwrites the name
of the file for which the page requires the earlier version, as shown
in Example 9-3.
class NewCarSearchResultsPage extends SitePage { ... public function register_links() { // Call upon the parent class to set up all the links as normal. parent::register_links(); // Alter the link for which this page needs a different version. $this->js_linked_info["sitewide.js"] = array ( "aka_path" => $this->aka_path."/sitewide_20090709.js", "loc_path" => $this->loc_path."/sidewide_20090709.js" ); } ... }
One of the issues when placing CSS and JavaScript in dedicated files is determining a good way to divide the CSS (or JavaScript). On the one hand, if you place all your CSS within a single, large file, your application will become monolithic, lack modularity, and end up more difficult to maintain. On the other hand, if you place the CSS for each module within its own individual file, you’ll end up with a large number of links on every page.
The section Minimizing HTTP Requests
discusses a good middle ground for dividing your CSS and JavaScript
across a set of files to minimize HTTP requests. Once you have a good division of
files, you can minimize the number of requests made for CSS or
JavaScript files even further by combining multiple requests into one.
To do this, you need to implement a server that understands combined
requests. Such a request for CSS files might look like the following
using a link
tag:
<link href="http://.../?sitewide_20090710.css&newcars_20090630.css" type="text/css" rel="stylesheet" media="all" />
Such a request for JavaScript files looks similar, but occurs in
a script
tag. A request for
JavaScript files might look like the following:
<script src="http://.../ext/yahoo-dom-event_2.7.0.js&ext/yahoo- animation_2.7.0.js&sitewide_20090710.js" type="text/javacript"> </script>
Once the server receives the request, it concatenates the files in the specified order and returns the concatenated file to the browser. It also caches a copy of the concatenated file on the server to use the next time a request with the same combination of files is made (for example, the next time the same page is displayed to any visitor). The browser receives the single, concatenated file for all the CSS (or JavaScript) via a single HTTP request. Furthermore, the next time a request is made from the same browser for the same set of files, the browser will already have the concatenated version cached and can avoid the request altogether.
To combine CSS and JavaScript files, you need to write some
scripts on a server to do the combining and some code to assemble the
requests for combining files as you generate pages. In this book, we
won’t examine the code to place on the server that does the combining,
but the implementation is relatively straightforward. To build the
requests for combining files, you need only make a few modifications
to the Page
class presented in
Chapter 7. The modifications for combining
JavaScript files are shown in Example 9-4. Combining CSS
files is similar.
For CSS, just remember that you can only combine links that
share the same media type (e.g., all
, print
), since all the concatenated files
will form one file with one media type. Since media types other than
all
generally don’t require
multiple CSS files, a simple but effective approach is to ignore
requests to combine CSS files that have a media type other than
all
.
class Page { protected $js_is_combined; ... public function __construct() { parent::__construct(); ... // Default combining JavaScript to true; however, you can always // disable it in a derived page or by calling the setter method. $this->js_is_combined = true; } ... public function set_js_combined($flag) { // Offer a way to enable or disable handling combined JavaScript. $this->js_is_combined = $flag; } ... private function create_js_combined_part($k) { // Candidates for combining need to be from one server. Set that // here as a prefix to check. We'll log errors for other paths. $prefix = "..."; // Look up the actual path for the file identified by the key k. $path = $this->js_linked_info[$k]["aka_path"]; // Return a query part only if combining is supported for the path. $pos = strpos($path, $prefix); if ($pos === 0) return str_replace($prefix, "", $path); else return ""; } private function create_js_combined_query() { $combined_query = ""; // We're making the assumption that local files are never combined // since normally alternative servers are used for the combining. if ($this->js_is_combined && !$this->js_is_local) { // Build an array of all the JavaScript keys in the order that // they were added by the page or modules created for the page. $all = array_merge ( $this->js_common, $this->js_page_linked, $this->js_module_linked ); $i = 0; // Build the combined query by appending each part one by one. foreach ($all as $k) { $part = $this->create_js_combined_part($k); if (empty($part)) { // An empty part indicates that the path for the file is // not a path that supports combining. Log this issue. ... break; } $sep = ($i++ == 0) ? "?" : "&"; $combined_query .= $sep.$part; } } return $combined_query; } ... }
Another opportunity for caching occurs each time you
generate the CSS, JavaScript, and content for a module on the server.
Caching for a module is especially useful when the module’s content,
styles, and behaviors require a fair amount of CPU work to generate and
you don’t expect them to change very often. A good approach to
implementing cacheable modules is to provide the capabilities required
by all cacheable modules within a base class called CacheableModule
, derived from the Module
class in Chapter 7. To make your own module cacheable, simply
derive it from CacheableModule
. Example 9-5 illustrates an
implementation for the CacheableModule
class.
class CacheableModule extends Module { protected $cache_ttl; protected $cache_clr; public function __construct($page) { parent::__construct($page); // The default time-to-live for entries in the cache is one hour. $this->cache_ttl = 3600; // The default is to check the cache first, but you can clear it. $this->cache_clr = false; } public function create() { // Check whether data exists in the cache for the module at all. $cache_key = $this->get_cache_key(); $cache_val = apc_fetch($cache_key); // Set the hash for the variables on which the new data is based. $hash = $this->get_cache_hash($this->get_cache_vars()); if (!$this->cache_clr && $cache_val && $cache_val["hash"]==$hash) { // Whenever we can use the cached module, access the cache. $content = $this->fetch_from_cache($cache_val["data"]); } else { // Otherwise, generate the module as normal and cache a copy. $content = $this->store_into_cache($cache_key, $hash); } return $content; } public function set_cache_ttl($ttl) { // Set the time-to-live to the specified value, in milliseconds. $this->cache_ttl = $ttl; } public function set_cache_clr() { // Force the cacheable module to bust any cached copy immediately. $this->cache_clr = true; } protected function get_cache_vars() { // Modules derived from this class should implement this method // to return a string that changes whenever the cache should be // discarded (the current microtime busts the cache by default). return microtime(); } protected function fetch_from_cache($data) { // Add cached CSS styles to the page on which the module resides. $this->page->add_to_css_linked($data["css_linked"]); $this->page->add_to_css($data["css"]); // Add cached JavaScript to the page on which the module resides. $this->page->add_to_js_linked($data["js_linked"]); $this->page->add_to_js($data["js"]); // Return the cached content for the module. return $data["content"]; } protected function store_into_cache($cache_key, $hash) { $css_linked = $this->get_css_linked(); $css = $this->get_css(); $js_linked = $this->get_js_linked(); $js = $this->get_js(); $content = $this->get_content(); // Set up the data structure for the data to place in the cache. $cache_val = array ( "hash" => $hash, "data" => array ( "css_linked" => $css_linked, "css" => $css, "js_linked" => $js_linked, "js" => $js, "content" => $content ) ); // Store the new copy into the cache and apply the time-to-live. apc_store($cache_key, $cache_val, $this->cache_ttl); // Add module CSS styles to the page on which the module resides. $this->page->add_to_css_linked($css_linked); $this->page->add_to_css($css); // Add module JavaScript to the page on which the module resides. $this->page->add_to_js_linked($js_linked); $this->page->add_to_js($js); // Return the content that was just generated using get_content. return $content; } protected function get_cache_hash($var) { // Hash the string used to determine when to use the cached copy. return md5($var); } protected function get_cache_key() { // This must be unique per module, so use the derived class name. return get_class($this); } }
The CacheableModule
class uses
the APC (Alternative PHP Cache) cache of PHP to implement the
caching between instantiations of the module. The class provides a good
example of overriding create
provided
by Module
(see Chapter 7). Instead of the default implementation of
create
, the implementation here
inspects the APC cache before generating the module. If the module can
use the cache, it fetches its CSS, JavaScript, and content instead of
generating them from scratch. If the module cannot use the cache, it
generates itself as normal and caches its CSS, JavaScript, and content
for the next time. To be clear, there are four conditions under which
the module will be generated from scratch:
There is no copy in the cache at all.
The variables from which the cached copy is derived have changed.
The time-to-live has expired.
The $cache_clr
member is
set.
One of the nice things about the implementation in Example 9-5 is that using a
cacheable module is very similar to using a module that is not
cacheable. For example, suppose NewCarSearchResults
were a module derived from
CacheableModule
. The code to
instantiate and create this module looks like what was presented in
Chapter 7. The call to set_cache_ttl
is
optional, just to set a different time-to-live than the default for the cache. You can also
call the public method set_cache_clr
whenever
you want to ensure that a fresh copy of the module is generated.
$mod = new NewCarSearchResults ( $this, $this->data["new_car_listings"] ); $mod->set_cache_ttl(1800); $results = $mod->create();
The main thing to remember when using a cacheable module is that
your class derived from CacheableModule
needs to implement get_cache_vars
for how
you want caching to occur. This method should return a string that
changes whenever you no longer want to use the cached copy of the
module. This string is typically a concatenation of the variables and
values on which the cached module depends.
Notice that the default implementation for get_cache_vars
in the base class returns the
current time in microseconds. This value ensures the default behavior is
never to use the cached copy, since the time in microseconds is
different whenever you generate the module. This will be the case until
you provide more informed logic about when the cache should be
considered valid by overriding get_cache_vars
within your own implementation
in the derived class.
Just as you can cache the contents of individual modules
that you don’t expect to change frequently, you also can cache the
contents of entire pages. The process for implementing this is similar
to that for modules. You create a CacheablePage
class and override the default
implementations for the create
and
get_page
methods. The start of
create
is a logical place to insert
the code for generating the hash and searching the cache. At this point,
you can inspect parameters for generating the page even before taking
the time to load data for the page. If the page can use the cache, fetch
the completely assembled page instead of generating it from scratch in
get_page
. If the page cannot use the
cache, generate the page in the traditional manner (during which some
caching may still be utilized by modules, remember) and cache the
completely assembled page at the end of get_page
for the next
time.
A further opportunity for caching, of course, occurs when the data for the page is loaded. This type of caching is performed best by the backend since it has the visibility into how the data is stored, and ideally these details should be abstracted from the user interface. Therefore, we’re not going to look at an example of this in this book, although it clearly plays an important part of most large web applications.
Whenever you expect to do a lot of caching, keep in mind that
caching can cause its own performance issues as memory becomes too full.
In this case, a system may begin to thrash as it begins to spend more
time swapping virtual pages in and out of memory than doing other work.
You can keep an eye on this by running top
on Unix systems and monitoring the process
in charge of swapping for your system.
Ajax provides another opportunity for caching. In Chapter 8, we discussed the usefulness of the MVC design pattern in managing the separation between data, presentation, and control in an Ajax application. Here, we revisit Example 8-15 with caching in the model. The model in this example manages an accordion list of additional trims for one car in a list of cars with good green ratings. When the model is updated, a view that subscribes to changes in the model updates itself to show an expanded list of cars that are trims related to the main entry. Because many of the cars will never have their lists expanded, loading the lists of trims on demand via Ajax is a good approach. Because the list of trims doesn’t change frequently, caching the list in the model once retrieved also makes a lot of sense. Example 9-6 illustrates caching trims in the model that we discussed in Chapter 8.
GreenSearchResultsModel = function(id) { MVC.Model.call(this); this.carID = id; }; GreenSearchResultsModel.prototype = new MVC.Model(); GreenSearchResultsModel.prototype.setCache = function() { // This implements a caching layer in the browser. If other cars // under the main entry were fetched before, we don't refetch them. if (this.state.cars) { // Other cars under the main car were fetched before, so just // send a notification to each of the views to update themselves. this.notify(); } else { // Cars under the main entry are not cached, so set the state of // the model by specifying the URL through which to make the Ajax // request. The setState method is responsible for notifying views. this.setState("GET", "...?carid=" + this.carID); } }; GreenSearchResultsModel.prototype.recover = function() { alert("Could not retrieve the cars you are trying to view."); }; WineSearchResultsModel.prototype.abandon = function() { alert("Timed out fetching the cars you are trying to view."); }; GreenSearchResultsView = function(i) { MVC.View.call(this); // The position of the view is helpful when performing DOM updates. this.pos = i; } GreenSearchResultsView.prototype = new MVC.View(); GreenSearchResultsView.prototype.update = function() { var cars = this.model.state.cars; ... // There is no need to update the view or show a button for one car. if (this.total == 1) return; if (!cars) { // When no cars are loaded, we're rendering for the first time. // In this case, we likely need to do different things in the DOM. ... } else { // When there are cars loaded, update the view by working with // the DOM to show the cars that are related to the main car. ... } }; GreenSearchResultsView.prototype.show = function() { // When we show the view, check whether we can use the cache or not. this.model.setCache(); }; GreenSearchResultsView.prototype.hide = function() { // When we hide the view, modify the DOM to make the view disappear. ... };
To implement caching in the model, Example 9-6 adds the setCache
method to GreenSearchResultsModel
. The event handler for
showing the expanded list of cars invokes the show
method of GreenSearchResultsView
. This, in turn, calls
setCache
for the model. If the model
already contains a list of cars, the cached list is used and no server
request occurs. If the model does not contain the list, it makes a
request back to the server via the setState
method of Model
and caches the returned list for the
next time. To request the proper list of cars, the request uses the
carID
member of the model as a
parameter. This is set in the constructor to identify the car for which
we want additional trims. After the appropriate action is taken based on
the state of the cache, the model calls notify
(either directly or within setState
), and notify
calls update
for each view subscribed to the model,
which, in the list of search results, is just one view for each
car.
Another way to control caching for Ajax applications is to
set an Expires
header on the server.
This header informs the browser of the date after which the result is to
be considered stale. It’s particularly important to set this to 0 when
your data is highly dynamic. In PHP, set the Expires
header to 0 by doing the following
before you echo anything for the page:
header("Expires: 0");
If you have a specific time in the future at which you’d like a cached result to expire, you can use an HTTP date string:
header("Expires: Fri, 17 Jul 2009 16:00:00 GMT");
We’ve already discussed some aspects of managing JavaScript performance in the various topics presented for caching. In this section, we look at other ideas for managing JavaScript, including its placement within the overall structure of a page, the use of JavaScript minification, and an approach for ensuring that you never end up with duplicates of the same JavaScript file on a single page.
Whenever possible, you should place JavaScript at the bottom of the page. The main reason for this is because the rendering of a page pauses while a JavaScript file loads (presumably because the JavaScript being loaded could alter the DOM already in the process of being created). Given this, large files or network latency can cause significant delays as a page loads. If you’ve ever seen a page hang while waiting for an ad to load, you have likely suffered through this problem caused by JavaScript loading. If you place your JavaScript at the end of the page, the page will finish rendering by the time it encounters the JavaScript.
The Page
class in Chapter 7 addresses this issue simply by defaulting
all JavaScript to the bottom of the page where the page is assembled
within the get_page
method. That
said, there are times when you may find it necessary to place your
JavaScript at the top. One example is when the main call to action on a
page requires JavaScript, such as a selection list or other user
interface component that appears near the top and commands the user’s
attention. For these situations, Page
provides the set_js_top
method,
which you can call after the page is instantiated to indicate that the
JavaScript should be placed at the top of the page.
To preserve modularity, a module should not rely on a particular
placement for its JavaScript beyond the order of the dependencies
specified in its own get_js_linked
method.
So, for example, you shouldn’t assume that the DOM will be ready to use
when your JavaScript starts to run, even if you have placed the
JavaScript at the bottom of the page. Here, it’s better to rely on the
YUI library’s onDOMReady
method for registering a callback
to execute as soon as the DOM is stable.
Minification removes whitespace, comments, and the like, and performs other innocuous modifications that reduce the overall size of a JavaScript file. This is not to be confused with obfuscation. Obfuscation can result in even smaller JavaScript files and code that is very difficult to read (thereby providing some rudimentary protection from reverse engineering), but its alteration of variable names and other references requires coordination across files that often makes the rewards not worth the risks. Minification, on the other hand, comes with very little risk and offers an easy way to reduce download times for JavaScript files.
To minify a JavaScript file, use Douglas Crockford’s JSMin
utility, which is available at
http://www.crockford.com/javascript/jsmin.html,
or you can use YUICompressor
,
available at http://developer.yahoo.com/yui/compressor, which minifies
CSS, too. In addition, be sure that your web server is configured to
gzip not only HTML, but CSS and JavaScript as well.
A common complaint among developers about minified JavaScript is
how to gracefully transition between human-readable JavaScript within
development environments and minified JavaScript for production systems.
The register_links
method
of the Page
class from Chapter 7 offers a good solution. As we’ve seen,
register_links
lets you define two
locations for each file: one referenced using $aka_path
(an “also-known-as” path intended for files on production
servers), and the other referenced using $loc_path
(a “local path” intended for files on development systems).
Set the $js_is_local
flag to select
between them. Example 9-7 provides an
example of managing minification.
class SitePage extends Page { ... public function register_links() { $this->aka_path = "http://..."; $this->loc_path = "http://..."; $this->js_linked_info = array ( "sitewide.js" => array ( "aka_path" => $this->aka_path."/sitewide_20090710-min.js", "loc_path" => $this->loc_path."/sidewide_20090710.js" ), ... ); // Access the minified JavaScript files on the production servers. $this->js_is_local = false; } ... }
Modular development intrinsically raises the risk of
including the same file more than once. Duplicating JavaScript files may
seem like an easy thing to avoid, but as the number of scripts added to
a large web application and the number of developers working together
increase, there’s a good chance that duplications will occur if there’s
no procedure for managing file inclusion. Fortunately, the use of keys
for JavaScript files, which we’ve already discussed, prevents the
duplication of JavaScript files intrinsically. In fact, for a truly
modular system, every module is expected to specify
in its own get_js_linked
method
precisely the JavaScript files that it requires without concerns about
which other modules might or might not need the files. The page will
exclude the duplicates and link files in the proper order.
Example 9-8 shows
how the Page
class prevents duplicate
JavaScript files from being linked within its manage_js_linked
method. Managing duplicate CSS files is similar.
class Page { ... private function manage_js_linked($keys) { $js = ""; if (empty($keys)) return ""; // Normalize so that we can pass keys individually or as an array. if (!is_array($keys)) $keys = array($keys); foreach ($keys as $k) { // Log an error for unknown keys when there is no link to add. if (!array_key_exists($k, $this->js_linked_info)) { error_log("Page::manage_js_linked: Key "".$k."" missing"); continue; } // Add the link only if it hasn't been added to the page before. if (array_search($k, $this->js_linked_used) === false) { $this->js_linked_used[] = $k; $js .= $this->create_js_linked($k); } } return $js; } ... }
Another method for improving the performance of a large web application is to distribute your assets across a number of servers. Whereas only very large web applications may be able to rely on virtual IP addresses and load balancers to distribute traffic among application servers, anyone can accomplish a distribution of assets to some extent simply by distributing CSS files, JavaScript files, and images. This section describes a few approaches for managing this.
Content delivery networks are networks like those of Akamai and a few other companies that are typically available only to very large web applications. These networks use sophisticated caching algorithms to spread content throughout a highly distributed network so that it eventually reaches servers that are geographically close to any visitor that might request it. Amazon.com’s CloudFront, an extension to its S3 storage service, presents an interesting recent twist on this industry that may bring this high-performance technology within the reach of more sites.
If you work for a company that has access to a content delivery
network and you employ an approach to developing pages using classes
like those in Chapter 7, you can store the path
to its servers within the $aka_path
member used
when defining CSS and JavaScript
links in the SitePage
class. Recall,
$aka_path
is intended to reference
production servers when $js_is_local
is false.
As you distribute assets across different servers, it’s important to strike a balance with the number of Domain Name Service (DNS) lookups that a page must perform. Looking up the IP address associated with a hostname is another type of request that affects how fast your page loads. Furthermore, even after a name has been resolved, the amount of time a name remains valid varies based on a number of factors, including the time-to-live value returned in the DNS record itself, settings in the operating system, settings in the browser, and the Keep-Alive feature of the HTTP protocol. As a result, it’s important to pay attention to how many DNS requests your page ends up generating.
A simple way to manage this number is to define the paths (including hostnames) for the assets you plan to use across your large web application in a central place. The class hierarchy we discussed for pages in Chapter 7 provides some insight into where to place the members that define these paths.
Recall that a logical set of classes to derive from Page
includes a sitewide page class, a page
class for each section of the site, and a page class for each specific
page. Considering this, the sitewide page class, SitePage
, makes an excellent place to define
paths that affect the number of DNS lookups. By defining the paths here,
all parts of your large web application can access the paths as needed
and you’ll have a single, centrally located place where you can manage
the number of DNS requests that your assets require. High Performance Web Sites suggests
dividing your assets across at least two hosts, but not more than four.
Many web applications use one set of hosts for static assets like CSS,
JavaScript, and image files, and another set for server-side
code.
As we saw earlier in this chapter, the first step to minimizing HTTP requests is to take advantage of caching and combine multiple requests for CSS and JavaScript files into single requests for each. This section presents additional opportunities for minimizing the number of HTTP requests for a page. For the most part, this means carefully managing requests for CSS, JavaScript, and images.
Just as we discussed with caching, one of the issues with minimizing HTTP requests for CSS files is determining a good division of files. A good starting point for managing the number of CSS files in a large web application is to define one CSS file as a common file linked by all pages across the site, one CSS file for each section of the site, and as few other CSS files as possible. However, you are likely to find other organization schemes specific to your web application. Naturally, as you start to need additional CSS files on a single page (to support different CSS media types, for example), you’ll find that the number of CSS files that you may want to link can grow quickly. Therefore, whenever possible, employ the technique presented earlier for combining multiple CSS files into a single request.
As with CSS files, a good starting point for managing the number of JavaScript files in a large web application is to use one common file containing JavaScript applicable to most parts of the site, a set of sectional files each specific to one section of the site, and as few other JavaScript files as possible. Of course, if you use a lot of external libraries, those will increase the number of files that you need to link; however, you can always take the approach of joining files together on your own servers in ways that make sense for your application.
This has the added benefit of placing the files directly under your control rather than on someone else’s servers, and it reduces DNS lookups. In addition, many libraries provide files that contain groupings of library components that are most frequently used together. One example is yahoo-dom-event.js in the YUI library. Again, employ the techniques presented earlier for combining multiple JavaScript files into a single request.
Surprisingly, you can also combine image files, although in a more complicated way than CSS and JavaScript files, using a technique called spriting. Spriting is the process of creating a single larger image that contains many smaller originals of the same type (e.g., GIF, JPEG, etc.) at known offsets in one file. You can use these offsets to position the larger image within an HTML element so that just its desired portion is visible. Spriting is a good way to reduce HTTP requests, but there are some practical considerations that limit how images can be combined.
One practical limitation occurs with images that will be used for repeating backgrounds. Only those images to be repeated in the same direction (i.e., the x or y direction) and with the same size in that direction can be combined. Otherwise, for all images smaller than the largest one, you’ll see space between repeated images.
Another practical consideration is that sprites can change rather frequently because changes or additions for any individual image require the sprite file to change. When this happens, browsers that have an earlier version cached need to know to get the new version on the next request. Fortunately, using a version ID like we did for CSS and JavaScript files provides a good solution. That said, the management of version IDs for sprites is a little more problematic for two reasons: first, sprites are often referenced from CSS files, which usually are not run through the PHP interpreter (to manage the version IDs dynamically); and second, changes to images may require you to update offsets within the CSS as well as version IDs for files.
Considering these practical limitations, a good approach is to look for opportunities for spriting within scopes that are easy to manage. For example, if we create a sprite file with just the images for a specific module, it’s easy to keep the module in sync with version ID and offset changes that take place as the sprite file changes. Example 9-9 illustrates spriting within a module for a navigation bar. The module uses five icons with two states (selected and unselected), which reside in one sprite file. The sprite is named after the module, which is a good practice for the purposes of documentation.
#navbar .ichome .selected { width: 50px; height: 50px; background: url(http://.../navbar_20090712.jpg) 0px 0px no-repeat; } #navbar .ichome .noselect { width: 50px; height: 50px; background: url(http://.../navbar_20090712.jpg) -50px 0px no-repeat; } #navbar .icrevs .selected { width: 50px; height: 50px; background: url(http://.../navbar_20090712.jpg) -100px 0px no-repeat; } #navbar .icrevs .noselect { width: 50px; height: 50px; background: url(http://.../navbar_20090712.jpg) -150px 0px no-repeat; } ... #navbar .icabout .selected { width: 50px; height: 50px; background: url(http://.../navbar_20090712.jpg) ... 0px no-repeat; } #navbar .icabout .noselect { width: 50px; height: 50px; background: url(http://.../navbar_20090712.jpg) ... 0px no-repeat; }
Figure 9-1 illustrates positioning the sprite image for the first two icons in Example 9-9 (at positions 0px, 0px and –50px, 0px).
An additional approach for managing sprite files is to create one per CSS file. This way, whenever the sprite changes, you know that only its associated CSS file needs to be updated to reflect the new sprite file and changes to offsets. Again, a naming convention can help document this approach. For example, for the CSS file newcars_20090731.css (containing the CSS for one section of the application), you can create a sprite file called newcars_20090731.jpg that contains all the JPEG images for just that section.
Although not directly related to performance from the standpoint of how fast a page loads, the ability to capture metrics about how visitors are using your web application does tell you a great deal about other aspects of how your application is performing with regard to the overall user experience. In this section, we’ll look at an easy approach for adding Google Analytics to a large web application.
Google Analytics is a free service that provides great tools
for analyzing how visitors are using your web application. Once you
register for the service, enabling metrics for your site is simply a
matter of adding a snippet of code to the right place on all pages that
you want to track. As we have discussed several times throughout this
chapter, the SitePage
class that you
define for use by all pages across your application offers a logical place
to manage this code. Example 9-10 illustrates
this.
class SitePage extends Page { protected $google_site_id; protected $google_site_nm; ... public function __construct() { // You get the ID for your site once you've signed up with Google. $this->google_site_id = "..."; $this->google_site_nm = "..."; } ... public function get_all_js() { // First, get all the JavaScript that was assembled for the page. $js = parent::get_all_js(); // This is the snippet of Google Analytics code from registering. $analytics = <<<EOD <!-- Google Analytics --> <script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker($this->google_site_id); pageTracker._setDomainName($this->google_site_nm); pageTracker._trackPageview(); </script> EOD; } // Append Google Analytics to the JavaScript that was assembled // otherwise for the page. return <<<EOD $js $analytics EOD; } ... }
Of course, your metrics won’t be accurate if you include this code in environments for which you shouldn’t be tracking pages. For example, you’ll probably want to exclude development environments, testing environments, staging environments, and the like. To do this, simply define a flag that you set to the type of environment in which the page is being displayed, as shown in Example 9-11. The PHP code checks to make sure that it’s generating a page within the production environment before including the Google Analytics code.
class SitePage extends Page { protected $google_site_id; protected $google_site_nm; protected $op_environment; ... public function __construct() { $this->google_site_id = "..."; $this->google_site_nm = "..."; // Set this from a server config that indicates the environment. $this->op_environment = "..."; } ... public function get_all_js() { // First, get all the JavaScript that was assembled for the page. $js = parent::get_all_js(); $analytics = ""; if ($this->op_environment == "production") { // Add the Google Analystics code here for production tracking. $analytics = <<<EOD ... EOD; } // Google Analytics is not appended unless running in production. return <<<EOD $js $analytics EOD; } ... }
Because the ability to test a large web application is closely related to performance, this section discusses how to use some of the techniques presented in this book to create pages that can easily utilize test data.
Modularity makes adding and removing components easier. This is important for test data, too. The data for most web applications comes from databases or other backend systems. But while the backend is under development, you might have to test your modules in the absence of real data, or at least programming logic to retrieve the data. Therefore, you need a clean and simple way to inject invented data into your modules. Because data managers (see Chapter 6) define the interface for data exchange between the user interface and backend, they offer a good point at which to define hardcoded data that precisely matches the structure of the real data that you expect to exchange with the backend later. When you’re ready to use the real data, it’s easy to remove the test data manager and replace it with the real one.
To use a test data manager, require its include file in place of the include file for the real one, but use the same name for the data manager class. The only difference is something in the name of the include file to distinguish it, such as a _test suffix. Example 9-12 illustrates the key goal: using the test data looks exactly like using the real data later, except for the name of the include file.
<?php require_once(".../common/sitepage.inc"); require_once(".../common/navbar.inc"); require_once(".../common/subnav.inc"); require_once(".../common/nwcresults.inc"); ... require_once(".../layout/resultslayout.inc"); ... // Include the test data manager until the real data manager is ready. require_once(".../datamgr/nwclistings_test.inc"); ... class NewCarSearchResultsPage extends SitePage { ... public function load_data() { // This appears exactly like it will with the real data manager. $dm = new NewCarListingsDataManager(); // The data members for loading are provided by the base class. // Populate them as needed by the data manager and call get_data. ... $dm->get_data ( $this->load_args["new_car_listings"], $this->load_data["new_car_listings"], $this->load_stat["new_car_listings"] ); // Check the status member and handle any errors, which often // require a redirect to another page using the header function. if ($this->load_stat != 0) header("Location: ..."); ... } public function get_content() { ... // This appears exactly like it will with the real data manager. $mod = new NewCarResults ( $this, $this->load_data["new_car_listings"] ); $results = $mod->create(); ... } ... } ?>
An important aspect of creating test data managers is that they can actually serve as a contract of sorts between the user interface and backend for how the two will exchange data for real. Ideally, once the data manager is ready, you should be able to remove the test data manager, replace it with the real one, and have a working system with relatively minor tweaks.
Example 9-13 illustrates a simple test data manager, which populates a data structure with some hardcoded test values. This defines the data structure that will be used for the real data later.
class NewCarReviewsDataManager extends DataManager { ... public function __construct() { parent::__construct(); ... } public function get_data($load_args, &$load_data, &$load_stat) { // Populate the data structure explicitly with data for testing. // This also defines the contract between the backend and user // interface for how the real data eventually should be handled. $load_data = array ( "0" => array ( "name" => "2009 Honda Accord", "price" => "21905", "link" => "http://.../reviews/00001/" ), "1" => array ( "name" => "2009 Toyota Prius", "price" => "22000", "link" => "http://.../reviews/00002/" ), "2" => array ( "name" => "2009 Nissan Altima", "price" => "19900", "link" => "http://.../reviews/00003/" ) ); } }