Saturday, June 9, 2012

HTTP 304

The Bing homepage is a lot prettier looking than Google's. It has a slick, high-resolution background image that changes every week or so. But an image like this takes time to download and it increases the load time of the page. What if you're doing a research project and you have to visit Bing several times a day? Does your browser have to download this image over and over again?

The answer is no. When your browser downloads an image for the first time, it saves it to a cache on the hard drive. A If-Modified-Since header is then added to all subsequent requests for the image and it contains the time that the browser last downloaded the image. The server looks at this time and compares it with the time that the image was last modified on the server. If the image hasn't been modified since then, it returns a HTTP 304 response with an empty body (the image data is left out). The browser sees this status code and knows that it's OK to use the cached version of the image. This means that the image does not have to be downloaded again and the page loads more quickly. If the image has been modified since the browser last downloaded it, then a normal HTTP 200 response is returned containing the image data.


Using the Bing.com background image mentioned above as an example, let's try using curl to test this out. First, we'll send a request without the If-Modified-Since header. This should return a normal HTTP 200 response with the image in the response body.

Note: Curl sends the response body to stdout--because we're not interested in the actual image data, we'll just direct it to /dev/null to throw it away. The --verbose argument displays the request and response headers.

curl --verbose "http://www.bing.com/az/hprichbg?p=rb%2fTimothyGrassPollen_EN-US8441009544_1366x768.jpg" > /dev/null

Request headers:

GET /az/hprichbg?p=rb%2fTimothyGrassPollen_EN-US8441009544_1366x768.jpg HTTP/1.1
User-Agent: curl/7.21.6 (i686-pc-linux-gnu)
Host: www.bing.com

Response headers:

HTTP/1.1 200 OK
Content-Type: image/jpeg
Last-Modified: Fri, 08 Jun 2012 09:37:14 GMT
Content-Length: 155974

As expected, an HTTP 200 response was returned containing a JPEG image that's about 155KB in size. The Last-Modified header shows the date that the image was last modified on the server.


Now let's try sending a request that will cause a HTTP 304 response to be returned. As shown in the Last-Modified header from the response above, the image was last modifed on the morning of June 8. Let's pretend that we last downloaded the image on June 9. Because the image hasn't changed since we've downloaded it, we know that we have the most recent image, so an HTTP 304 response should be returned.

curl --verbose --header "If-Modified-Since: Sat, 09 Jun 2012 09:37:14 GMT" "http://www.bing.com/az/hprichbg?p=rb%2fTimothyGrassPollen_EN-US8441009544_1366x768.jpg" > /dev/null

Request headers:

GET /az/hprichbg?p=rb%2fTimothyGrassPollen_EN-US8441009544_1366x768.jpg HTTP/1.1
User-Agent: curl/7.21.6 (i686-pc-linux-gnu)
Host: www.bing.com
If-Modified-Since: Sat, 09 Jun 2012 09:37:14 GMT

Response headers:

HTTP/1.1 304 Not Modified
Content-Type: image/jpeg
Last-Modified: Fri, 08 Jun 2012 09:37:14 GMT

An HTTP 304 response was returned with an empty body (as shown by the lack of a Content-Length header) as expected.


Let's play pretend one more time and say that we last downloaded the image on June 7. The image was last updated on June 8, so this means that we have an outdated copy of the image and we need to download a fresh copy.

curl --verbose --header "If-Modified-Since: Thu, 07 Jun 2012 09:37:14 GMT" "http://www.bing.com/az/hprichbg?p=rb%2fTimothyGrassPollen_EN-US8441009544_1366x768.jpg" > /dev/null

Request headers:

GET /az/hprichbg?p=rb%2fTimothyGrassPollen_EN-US8441009544_1366x768.jpg HTTP/1.1
User-Agent: curl/7.21.6 (i686-pc-linux-gnu)
Host: www.bing.com
If-Modified-Since: Thu, 07 Jun 2012 09:37:14 GMT

Response headers:

HTTP/1.1 200 OK
Content-Type: image/jpeg
Last-Modified: Fri, 08 Jun 2012 09:37:14 GMT
Content-Length: 155974

As shown in the response, the server detected the fact that our copy was out of date and sent us a HTTP 200 response with the image data in it.


So as you can see, without this caching mechanism, the web would be much slower. Your browser would have to download everything from scratch every time a page is loaded. But with caching, your browser can pull images from the cache without having to download them again.

No comments: