Thursday, September 11, 2008

Cache it! Solve PHP Performance Problems

In the good old days when building web sites was as easy as knocking up a few HTML pages, the delivery of a web page to a browser was a simple matter of having the web server fetch a file. A site's visitors would see its small, text-only pages almost immediately, unless they were using particularly slow modems. Once the page was downloaded, the browser would cache it somewhere on the local computer so that, should the page be requested again, after performing a quick check with the server to ensure the page hadn't been updated, the browser could display the locally cached version. Pages were served as quickly and efficiently as possible, and everyone was happy.

Then dynamic web pages came along and spoiled the party by introducing two problems:


When a request for a dynamic web page is received by the server, some intermediate processing must be completed, such as the execution of scripts by the PHP engine. This processing introduces a delay before the web server begins to deliver the output to the browser. This may not be a significant delay where simple PHP scripts are concerned, but for a more complex application, the PHP engine may have a lot of work to do before the page is finally ready for delivery. This extra work results in a noticeable time lag between the user's requests and the actual display of pages in the browser.
A typical web server, such as Apache, uses the time of file modification to inform a web browser of a requested page's age, allowing the browser to take appropriate caching action. With dynamic web pages, the actual PHP script may change only occasionally; meanwhile, the content it displays, which is often fetched from a database, will change frequently. The web server has no way of discerning updates to the database, so it doesn't send a last modified date. If the client (that is, the user's browser) has no indication of how long the data will remain valid, it will take a guess. This is problematic if the browser decides to use a locally cached version of the page which is now out of date, or if the browser decides to request from the server a fresh copy of the page, which actually has no new content, making the request redundant. The web server will always respond with a freshly constructed version of the page, regardless of whether or not the data in the database has actually changed.

To avoid the possibility of a web site visitor viewing out-of-date content, most web developers use a meta tag or HTTP headers to tell the browser never to use a cached version of the page. However, this negates the web browser's natural ability to cache web pages, and entails some serious disadvantages. For example, the content delivered by a dynamic page may only change once a day, so there's certainly a benefit to be gained by having the browser cache a page--even if only for 24 hours.

If you're working with a small PHP application, it's usually possible to live with both issues. But as your site increases in complexity--and attracts more traffic--you'll begin to run into performance problems. Both these issues can be solved, however: the first with server-side caching; the second, by taking control of client-side caching from within your application. The exact approach you use to solve these problems will depend on your application, but in this chapter, we'll consider both PHP and a number of class libraries from PEAR as possible panaceas for your web page woes.

Note that in this chapter's discussions of caching, we'll look at only those solutions that can be implemented in PHP. For a more general introduction, the definitive discussion of web caching is represented by Mark Nottingham's tutorial.

Furthermore, the solutions in this chapter should not be confused with some of the script caching solutions that work on the basis of optimizing and caching compiled PHP scripts, such as Zend Accelerator and ionCube PHP Accelerator.

This chapter is excerpted from The PHP Anthology: 101 Essential Tips, Tricks & Hacks, 2nd Edition. Download this chapter plus two others, covering PDO and Databases, and Access Control, in PDF format to read offline.

How do I prevent web browsers from caching a page?
If timely information is crucial to your web site and you wish to prevent out-of-date content from ever being visible, you need to understand how to prevent web browsers--and proxy servers--from caching pages in the first place.

Solutions

There are two possible approaches we could take to solving this problem: using HTML meta tags, and using HTTP headers.

Using HTML Meta Tags

The most basic approach to the prevention of page caching is one that utilizes HTML meta tags:




The insertion of a date that's already passed into the Expires meta tag tells the browser that the cached copy of the page is always out of date. Upon encountering this tag, the browser usually won't cache the page. Although the Pragma: no-cache meta tag isn't guaranteed, it's a fairly well-supported convention that most web browsers follow. However, the two issues associated with this approach, which we'll discuss below, may prompt you to look at the alternative solution.

Using HTTP Headers

A better approach is to use the HTTP protocol itself, with the help of PHP's header function, to produce the equivalent of the two HTML meta tags above:

header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Pragma: no-cache');
?>

We can go one step further than this, using the Cache-Control header that's supported by HTTP 1.1-capable browsers:

header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Cache-Control: post-check=0, pre-check=0', FALSE);
header('Pragma: no-cache');
?>

For a precise description of HTTP 1.1 Cache-Control headers, have a look at the W3C's HTTP 1.1 RFC. Another great source of information about HTTP headers, which can be applied readily to PHP, is mod_perl's documentation on issuing correct headers.

Discussion

Using the Expires meta tag sounds like a good approach, but two problems are associated with it:


The browser first has to download the page in order to read the meta tags. If a tag wasn't present when the page was first requested by a browser, the browser will remain blissfully ignorant and keep its cached copy of the original.
Proxy servers that cache web pages, such as those common to ISPs, generally won't read the HTML documents themselves. A web browser might know that it shouldn't cache the page, but the proxy server between the browser and the web server probably doesn't--it will continue to deliver the same out-of-date page to the client.

On the other hand, using the HTTP protocol to prevent page caching essentially guarantees that no web browser or intervening proxy server will cache the page, so visitors will always receive the latest content. In fact, the first header should accomplish this on its own; this is the best way to ensure a page is not cached. The Cache-Control and Pragma headers are added for some degree of insurance. Although they don't work on all browsers or proxies, the Cache-Control and Pragma headers will catch some cases in which the Expires header doesn't work as intended--if the client computer's date is set incorrectly, for example.

Of course, to disallow caching entirely introduces the problems we discussed at the start of this chapter: it negates the web browser's natural ability to cache pages, and can create unnecessary overhead, as new versions of pages are always requested, even though those pages may not have been updated since the browser's last request. We'll look at the solution to these issues in just a moment.

How do I control client-side caching?
We addressed the task of disabling client-side caching in "How do I prevent web browsers from caching a page?", but disabling the cache is rarely the only (or best) option.

Here we'll look at a mechanism that allows us to take advantage of client-side caches in a way that can be controlled from within a PHP script.

Apache Required!
This approach will only work if you're running PHP as an Apache web server module, because it requires use of the function getallheaders--which only works with Apache--to fetch the HTTP headers sent by a web browser.

Solutions

In controlling client-side caching you have two alternatives. You can set a date on which the page will expire, or respond to the browser's request headers. Let's see how each of these tactics is executed.

Setting a Page Expiry Header

The header that's easiest to implement is the Expires header--we use it to set a date on which the page will expire, and until that time, web browsers are allowed to use a cached version of the page. Here's an example of this header at work:

expires.php (excerpt)

function setExpires($expires) {
header(
'Expires: '.gmdate('D, d M Y H:i:s', time()+$expires).'GMT');
}
setExpires(10);
echo ( 'This page will self destruct in 10 seconds
' );
echo ( 'The GMT is now '.gmdate('H:i:s').'
' );
echo ( 'View Again
' );
?>

In this example, we created a custom function called setExpires that sets the HTTP Expires header to a point in the future, defined in seconds. The output of the above example shows the current time in GMT, and provides a link that allows us to view the page again. If we follow this link, we'll notice the time updates only once every ten seconds. If you like, you can also experiment by using your browser's Refresh button to tell the browser to refresh the cache, and watching what happens to the displayed date.

0 comments: