Understanding Content Caching

with WordPress: what it is and how it works

Mark Montague

mark@catseye.org

Many slides have notes with extra material:

Press "t" to toggle between slide view and outline view. The notes will be visible in the outline view. Or,
Press "n" to see notes in a pop-up window.

View this presentation at
http://www-personal.umich.edu/~markmont/wpcc/

Source files for this presentation can be downloaded from
http://www-personal.umich.edu/~markmont/wpcc.zip

Disclaimer

The description of this talk on the WP Ann Arbor web site said that this presentation would not cover how to set up caching for your WordPress site. If you want that, please refer to the presentation done at WordCamp Ann Arbor earlier this month by Topher DeRosia:

Site Caching, From Nothing to Everything

or see the documentation for the W3 Total Cache or WP Super Cache plugins.

Instead, this presentation focuses on an in-depth examination of what caching is and how it works behind the scenes.

What is caching?

Picture of an open cardboard box, symbolizing a cache

Content caching is saving a copy of content that a visitor asks for in a special, high-speed location called a cache, so that it can be sent again very quickly to the next visitor who asks for it. This avoids the need to generate the content again for each visitor.

Types of caching

While content caching is saving a copy of the content that a visitor asks for so that it can be sent again to other visitors, there are also other types of caching that a WordPress site can do:

Database query caching
Object caching / transient caching
Opcode caching (PHP script caching)

All of these save expensive-to-generate intermediate results so that they can be reused in order to speed up generating future requests for articles.

This presentation is only concerned with content caching, that is, caching copies of HTML pages, CSS files, JavaScript files, image files, font files, and so on. By caching and re-using content, we don't care as much about speeding up generating the content in the first place (although, it's best to speed things up there, too).

Why do we care about caching?

Content caching is the single biggest way to speed up WordPress. This is critical because WordPress is notoriously slow.
Google ranks fast sites higher than slow sites, all other things being equal, resulting in Google sending the site 15% more traffic.
Studies have shown that site speed has a major effect on number of visitors, engagement, retention, and conversions:

A site taking 0.5 seconds more to load results in 20% less page hits.
40% of visitors will abandon a web page if it takes more than three seconds to load.
Amazon found that for every 0.1 seconds that a page takes to load, sales decrease by another 1%.

Caching speedup example

https://dev.catseye.org/ (development site, not publicly accessible).
On a VPS in a datacenter in Atlanta, Georgia.
Using Apache HTTP Server mod_disk_cache and the web browser cache.
Even more importantly for being able to handle a lot of visitors, web server logs show that the amount of time that the web server spent generating the main page decreases from 0.138745 seconds to 0.001535 seconds.

This is a fairly simple site with a small number of plugins and several other performance optimizations enabled (such as gzip compression).

The speedup will be larger with a more out-of-the-box WordPress configuration.

Something to keep in mind

The W3 Total Cache and WP Super Cache plugins do

CDN Support	Content Compression (gzip)	HTML Minification
CSS Minification	Content Caching	JavaScript Minification
CSS / JavaScript Inlining	Database object caching	PHP Opcode Caching

...so it is more accurate to call them "performance optimization plugins" rather than caching plugins.

There are also other speedups that you can and should do that are not part of these plugins: minimizing number of plugins, tuning the database, and much more.

In this presentation we will look ONLY at web content caching, even though the other speedups are important, too.

When NOT to cache

More caching is not always better.
Caching the wrong things at the wrong time for the wrong visitors can break your site. Potential problems include:

Someone leaves a comment on an article, or the author updates the article, but squint visitors still get a previously cached copy that doesn't have the new comments or changes.
Someone who does a search gets someone else's search results for completely different keywords.
The entire admin interface / dashboard breaks.

Solutions:

Carefully specify what should and should not be cached, by whom, and for how long.
When possible, tell caches to throw away old copies of content that are no longer valid.

It is common to turn off all caching for logged-in users, including authors and administrators, so that they always get the latest version of content and so that nobody else gets content that was custom generated for specific users.

Telling caches to throw away the things they have cached is called "invalidating" the cache or "purging" it. This can be done for only a single thing in the cache, or for everything in the cache.

Caches can only be purged or invalidated when they are under control of WordPress. In particular, WordPress is not able to purge visitor web browser caches or ISP/corporate caches.

Note that turning off casing for all logged in users slows things down considerably for them and puts extra load on the web server, but is a necessary trade off.

Where does content caching occur?

Diagram showing that in the default configuration, caching only occurs in the user's web browser.

By default, caching occurs only in the user's web browser.

The WordPress web server usually needs additional configuration in order for the web browser cache to be fully effective.

Where does content caching occur?

Diagram showing the WordPress web server configured with a WordPress plugin for caching.

The combination of WordPress caching and web browser caching is a very common configuration.

Where does content caching occur?

Diagram showing a dedicated caching server sitting in between the WordPress web server and the Internet.

A specialized, dedicated caching server between WordPress and the Internet can greatly speed up WordPress and handle heavy visitor loads. This usually requires special control by the WordPress server and so is usually used in conjunctiva with a WordPress caching plugin.

Caching servers include Varnish, Squid, and mod_cache.

Where does content caching occur?

Diagram showing Content Distribution Network (CDN) caches inside the Internet 'cloud'.

A Content Distribution Network (CDN) such as CloudFlare or MaxCDN is usually used in conjunction with a WordPress caching plugin but without a caching server. (A caching server doesn't add any benefit when behind a CDN).

The CDN will have caches (nodes) in different geographical regions, and each user will use the cache that is closest to them. This both reduces the number of network hops for traffic as well as spreads traffic out across multiple caches for performance and fault tolerance.

Where does content caching occur?

Diagram showing the position of an ISP or corporate cache (or proxy) between the visitor and the Internet.

If a user is accessing a WordPress site from work, their employer may put a cache or proxy in between their computer and the Internet. Or, a consumer's ISP may put a cache in between them and the Internet.

This is often done to save on bandwidth costs, or to deal with high-latency networks, such as when a visitor is using a satellite uplink to access the Internet. However, it can also be done to control, monitor, or alter the content the user is accessing, either with or without caching the content.

All of the other forms of caching we have looked at are affect all visitors to the site, but this form will only affect some visitors to the site, unusually a very small minority, and so problems caused by this form of caching can be difficult to diagnose, especially if the ISP or corporate cache chooses not to adhere to Internet standards (that is, when it doesn't "play nicely").

Where does content caching occur?

Diagram showing that caching can occur in the user's web browser, at their ISP, on the Internet (for example, in a Content Distributions Network), on a special caching server (such as Varnish) in front of the WordPress web server, or on the WordPress web server / within WordPress itself.

Here are all of the places, together, that caching can occur. While it is possible for all types of cache to be present simultaneous, it would be unusual.

KEY POINT: Each of these open boxes could contain a different saved copy of each of your articles, images, and other content.

Browser caching

Diagram showing where the web browser cache is on the network

The web browser cache is on the visitor's computer.
WordPress can ask that the browser store (or not store) something in its cache, but cannot control if the browser actually does so or not.
Once something is in the browser's cache, there is no way for WordPress to force the browser to get it out. (But there is a way around this that we'll discuss later).
The user can clear the cache whenever they want (but most users never do).

Browser caching

Locations of browser caches under MacOS X:

Apple Safari:
~/Library/Caches/com.apple.Safari/fsCachedData
Google Chrome:
~/Library/Caches/Google/Chrome/Default/Cache
Mozilla Firefox:
~/Library/Caches/Firefox/Profiles/NAME OF PROFILE/cache2/entries

Clear the browser cache and then see what happens in these directories when you access various URLs:

There are similar locations under Microsoft Windows, Linux, and on smartphones. And Microsoft Internet Explorer and Opera also have web browsers caches. Do a web search if you want to find where these are.

Note that both Chrome and Firefox can have multiple user profiles, each with their own cache. Chrome names the default profile "Default", but Firefox will choose a random name for each profile.

Browsers may group their cache as a part of their private data, history, or private data. When you clear one of these things, you may be asked what time period — make sure you specify "from the beginning of time" or "everything", or the cache will only be partially cleared. I also recommend checking the checkboxes to clear all types of data.

You may want to use a different browser from the one you use day-to-day when doing this so that you don't lose important data or configuration.

Caching - general

Caching is actually fairly tricky. If something is cached for too long, visitors will get out of date content. On the other hand, if something is not cached at all, or not cached as long as it could be, then visitors won't get the performance and scalability that are the reasons we are doing caching in the first place.

Modern web browsers and modern web servers currently exchange information via version 1.1 of the Hypertext Transfer Protocol (HTTP), which is documented in a series of six RFCs (RFC 7230 - 7235).

RFC 7234 describes how caching works. The "meat" of this document is only 32 pages, so consider looking at it.

Rather than explaining the standard, let's take a look at a real-world example.

RFC stands for "Request for Comments" — some are just proposals, others are informative memoranda, and yet others (including the HTTP/1.1 RFCs) are designated by the Internet Engineering Task force and the Internet Society as official standards.

HTTP/2, which is the next version of HTTP, will hopefully be finalized as a standard soon (December 2014 - February 2015).

Caching - general

Clear the browser cache and then look at the HTTP request for a JPEG file (https://dev.catseye.org/wpannarbor.jpg):

GET /wpannarbor.jpg HTTP/1.1
Host: dev.catseye.org
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:32.0) Gecko/20100101 Firefox/32.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive

This is a standard HTTP request, there's nothing special about it.

Caching - general

The HTTP response for the image with cache-related lines highlighted:

HTTP/1.1 200 OK
Date: Mon, 27 Oct 2014 19:43:55 GMT
Server: Apache/2.4
Last-Modified: Mon, 27 Oct 2014 16:58:26 GMT
Etag: "2472-5066a706efac9"
Accept-Ranges: bytes
Content-Length: 9330
Cache-Control: max-age=604800
content-security-policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; img-src 'self' data: ; font-src 'self' data: ; report-uri /csp-report.php
Age: 300
X-Cache: HIT from dev.catseye.org
X-Cache-Detail: "cache hit" from dev.catseye.org
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: image/jpeg

The X-Cache and X-Cache-Detail lines explain that the JPEG came from the web server's cache rather than being retrieved by the web server itself. You shouldn't encounter these lines on publicly accessible sites.

This response is from a WordPress web server that has its own cache (Apache HTTP Server mod_disk_cache). Since HTTPS is being used, we know that there are no other caches between the WordPress web server cache and the browser cache.

Note that nearly half of the lines affect caching!

The two X-Cache lines are actually debugging information that I've had the web server add to make it more obvious what is going on; they are not a part of the HTTP/1.1 standard.

Caching - general

Removing the debugging information and reordering things gives us:

Date: Mon, 27 Oct 2014 19:43:55 GMT
Age: 300

Last-Modified: Mon, 27 Oct 2014 16:58:26 GMT
Etag: "2472-5066a706efac9"

Cache-Control: max-age=604800

Date is the date and time that the web server originally produced the JPEG file, before any caching. It's used to determine the age of what the browser will be storing in the cache, if the Age header is not present.
Age is only present if the response came from a cache — in this case, from the web server cache sitting in front of WordPress. It's the number of seconds since the response was originally generated (that is, since the JPEG file was originally put in the web server cache).

If both Age and Date are present, Age will be used. In this case, both are present because the JPEG file came from the web server cache rather than from the web server itself; if it had come from the web server itself, only the Date header would be present.

Caching - general

Date: Mon, 27 Oct 2014 19:43:55 GMT
Age: 300

Last-Modified: Mon, 27 Oct 2014 16:58:26 GMT
Etag: "2472-5066a706efac9"

Cache-Control: max-age=604800

Last-Modified is the date and time that the response (the JPEG file) was changed.
ETag, which is short for "Entity Tag", uniquely identifies the response.

The web browser will use these things later on when asking the web server cache and web server if they have a newer version of the image.

Caching - general

Date: Mon, 27 Oct 2014 19:43:55 GMT
Age: 300

Last-Modified: Mon, 27 Oct 2014 16:58:26 GMT
Etag: "2472-5066a706efac9"

Cache-Control: max-age=604800

Cache-Control contains instructions from WordPress, the web server, and any other caches about whether the web browser should store the response in its own cache. Multiple directives can be present after the colon, separated by commas.

max-age=604800 says that cached copies of this JPEG file should only be considered "fresh" for a maximum of 604,800 seconds (1 week). After this, the JPEG file will be considered "stale" and if the web browser needs it again past this point, it will try to get a new copy rather than using the copy in its cache.

Note that if the web browser tries and fails to get a fresh copy of the JPEG file after it becomes stale, it will still use the stale copy that it has in its cache.

Caching - general

Date: Mon, 27 Oct 2014 19:43:55 GMT
Age: 300

Last-Modified: Mon, 27 Oct 2014 16:58:26 GMT
Etag: "2472-5066a706efac9"

Cache-Control: max-age=604800

What will the browser do? It will store the response in its cache IF

There is a Cache-Control header with a max-age or public directive, or an Expires header.
There is NOT a Cache-Control header with a no-store, no-cache, or private directive, or an Authorization header.

Otherwise, the response will not be stored in the cache, and the web browser will instead request it again the next time it is needed.

The Expires header is an older version of Cache-Control: max-age. Instead of giving the number of seconds the resource remains valid, it gives the date at which the resource becomes invalid.

The Pragma: no-cache header is an older version of Cache-Control: no-cache.

WordPress can prevent web browser — or other — caches from storing things by specifying Cache-Control: no-store. (Note that private will only prevent something from being cached if the cache is shared by multiple clients.)

The Authorization header is used by HTTP Basic Authentication and HTTP Digest Authentication, neither of which are used by WordPress. It indicates that the visitor is logged in and hence that the response probably should not be cached. (Note that since WordPress does not use these forms of authentication, it will set Cache-Control: private for its logged in users, instead.)

Caching - general

Now let's go back to the web browser tab in which we loaded the JPEG image, and, still capturing HTTP requests and responses, request the image again by clicking in the web browser location bar and pressing Return.

We don't see any HTTP traffic, because the web browser retrieved the JPEG image from its cache.

Next, click the web browser Reload button. This will — at least in Firefox — ask the web server to check to see if a new version of the image is available on the web server, and, if so, to get it.

(Note that if you hold the Shift key while clicking Reload, instead of checking if a new version is available, Firefox will just go get the resource again, just like it did the first time it loaded it, completely disregarding anything in its cache.

Caching - general

HTTP request sent when we click the Reload button:

GET /wpannarbor.jpg HTTP/1.1
Host: dev.catseye.org
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:32.0) Gecko/20100101 Firefox/32.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
If-Modified-Since: Mon, 27 Oct 2014 16:58:26 GMT
If-None-Match: "2472-5066a706efac9"
Cache-Control: max-age=0

The Cache-Control: max-age=0 header, when used in a request, tells any caches that may see this request to only respond if what they have is at most 0 seconds old — that is, never, thus letting this request all the way through to the WordPress web server.

Caching - general

GET /wpannarbor.jpg HTTP/1.1
If-Modified-Since: Mon, 27 Oct 2014 16:58:26 GMT
If-None-Match: "2472-5066a706efac9"

This tells the web server to only respond with a new copy of the JPEG file if the copy the web server has...

is newer than the date in the If-Modified-Since header. The web browser sets this date based on what the web server originally sent as the Last-Modified header in the copy of the file that the web browser has in its cache.
or has an ETag that is different than the one given in the If-None-Match header. (The value the web browser is sending is the same as the one it received for the copy of the file it has cached; if the file has changed, its ETag will be different.

Note that doing a GET with If- headers is much more efficient than doing a HEAD, interpreting the response, and doing a regular GET if a new version is available.

A HEAD should only be used if the client needs to check to see if a new version of a resource is available but does not need to retrieve it.

Caching - general

And here is the web server's response to the reload request:

HTTP/1.1 304 Not Modified
Date: Mon, 27 Oct 2014 20:12:34 GMT
Server: Apache/2.4
Connection: Keep-Alive
Keep-Alive: timeout=5, max=100
Etag: "2472-5066a706efac9"
Cache-Control: max-age=604800

The web server responded with a 304 and did not send the JPEG again (note the absence of Content-Type and Content-Length headers). This means that no newer version was available.
If a newer version were available, the web server would have responded with a 200 and sent the JPEG again.

Caching - shared

Diagram showing the location of all caches on the network

A shared cache is one that is used by multiple visitors. Most caches are shared. The exceptions to this are web browser caches and private proxying caches, which are private to a single user.

A WordPress cache, a web server cache, a CDN, and an ISP or corporate cache are all examples of shared caches.

WordPress can use the Cache-Control: s-maxage=XXXXXX header to control how long something is stored by shared caches. s-maxage works exactly the same as maxage except that private caches will ignore s-maxage.

In the diagram, all of the caches are shared caches except for the web browser cache.

Caching - shared

Diagram showing the location of the web server cache on the network

In the following example, the resource (page) will be cached by the web server cache for 4 hours because s-maxage=14400 is set, but it will not be cached by the web browser cache because max-age=0 is set.

This configuration can be used if you know you only have a single shared cache between you and the web browser (for example, because you are using HTTPS) which is under control of WordPress. WordPress can then purge the page from the shared cache if it changes; otherwise, the cache will avoid unnecessarily regenerating the page.

Cache-Control: max-age=0, s-maxage=14400

In addition to s-maxage=0, the Cache-Control: private directive can be used to prevent something from being stored in a shared cache.

The Vary header

WordPress or the web server can set the Vary header in HTTP responses to indicate that the response that was generated is specific to one or more HTTP request headers.

For example, the response header

Vary: Accept-Encoding

means that the response is either compressed or not compressed, depending on what the client said that it could accept. Compressed content must not be sent to browsers that are not able to deal with it, and if uncompressed content gets sent to a browser that can deal with compressed content, then performance will be slower than it needs to be.

Another common example would be for WordPress to set Vary: User-Agent in responses to indicate that it is serving different versions of the site to traditional versus mobile web browsers.

Note that the Vary header only affects shared caches, not private caches.

The Vary header

If the Vary header is set, caches will store a different version of the content for each different value of the headers named by the Vary header. A cache will only serve a piece of content to a client if the clients request headers exactly match the request headers specified by the Vary header that applied when the content was cached.

For example, if four web browsers all request a sites main page:

Client 1: Accept-Encoding: gzip, deflate
Client 2: Accept-Encoding: gzip
Client 3: (does not specify any Accept-Encoding header)
Client 4: Accept-Encoding: gzip

Then the main page will be generated and stored in the cache three separate times (there will be three copies), once for each of clients 1, 2, and 3. Client 4 will then get the cached copy that was generated for client 2.

Note that this only applies if WordPress or the web server set the Vary: Accept-Encoding header in its responses. If it doesn't, then the Accept-Encoding header in requests is ignored, and the main page will only be generated and cached once, for client 1. Clients 2, 3, and 4 will then get the copy that was cached when client 1 requested it.

Static vs dynamic assets

There are two types of things that can be cached:

Static assets: things that are the same for all visitors and generally do not change over time, including CSS, JavaScript, images, and web fonts.
- Static assets can generally be safely cached; the main problem is getting rid of any cached copies during upgrades.
Dynamic assets: things that generally change over time and/or things which different visitors get different versions of at the same time. Includes articles (especially those with comments), pages, feeds, and so on.
- Some dynamic assets (e.g., the WordPress dashboard or edit screens) should never be cached.
- Whenever a change is made, the very next request should regenerate the page rather than using the one in the cache, but it turns out that this is tricky.

For these reasons, the easiest and safest thing to do is to only cache static assets, and always generate dynamic assets for every request. But most static assets are quick to serve up, and the big performance improvements are all from correctly caching dynamic assets.

Also for these reasons, many CDNs have a low-end or "starter" mode where they only cache static assets.

Static assets

Generally, WordPress and the web server will set very long lifetimes for cached versions of static assets — often around 1 year. This minimizes the number of times clients check back for new versions, which slightly increases performance but minimizes network data charges.
However, on the rare occasions when static assets change — for example, some CSS files may change when upgrading WordPress to a newer version — we want clients to get the new version immediately, not when the cached versions expire, which could be hours, days, or even a year later.
To achieve this, we do something clever: we have the WordPress (either WordPress itself, or the caching plugin) append a query string with a version number or an opaque version-dependent identifier to the file name. Since the file is static, it can't actually take query string parameters, and so these will be ignored by the web server. But caches will consider the query string to be a part of the URI and will cache each version of the static asset separately.

Static assets

For example, if there is a file named

my-buttons.css

and it is accessed via the URI path

/wp-content/plugins/my-plugin/my-buttons.css

then make sure all of your PHP, CSS, JavaScript, and HTML files always access it as something like

/wp-content/plugins/my-plugin/my-buttons.css?ver=1.2

The ?ver=1.2 will be ignored by the web server but treated as a part of the file name by caches.

Static assets

When it's time to upgrade the file, change all of the places that reference it from

/wp-content/plugins/my-plugin/my-buttons.css?ver=1.2

/wp-content/plugins/my-plugin/my-buttons.css?ver=1.3

Then, the next time a file referencing my-buttons.css is served by the web server, it will request version 1.3. Since only the copy with ?ver=1.2 is in the caches, the request for my-buttons.css will fall through to the web server. Version 1.3 will get cached, version 1.2 won't be served from the cache again and will eventually reach its maximum lifetime and be dropped from the caches.

Note that this depends on files referencing the static asset being served freshly by the web server rather than from a cache. Ultimately, the file that references all other files will be a PHP script, a dynamic asset. We'll take a look at how dynamic assets are handled on the next few slides.

Dynamic assets

For dynamically generated pages that should never be cached (such as the WordPress dashboard and other pages generated for logged-in users), WordPress generates the following HTTP header:

Cache-Control: no-cache, must-revalidate, max-age=0

However, we want as many pages as possible to be cached so that they do not have to be generated by WordPress each time any visitor requests them, while still ensuring that no cached page is ever served if it would be regenerated differently.

To do this, we ensure that these dynamic assets only get put into caches that can be directly controlled by WordPress. Then, when WordPress changes something, it can reach into the cache via an API and tell the cache to purge (invalidate) affected assets.

In particular, for dynamic assets, we must make sure that they are cached by some cache that is controlled directly by WordPress (such as the web server cache) while at the same time making sure that they are not cached by the web browser cache. This can be tricky, but the most common solution involves having the web server cache modify the caching headers before passing them further along.

Up until this point, everything has happened solely via the HTTP/1.1 protocol. The problem with this is that HTTP only allows things to happen in response to requests for specific resources for a single client. To provide the level of control we need to have our cake and eat it too in regards to dynamic assets, WordPress and the cache need some additional way to communicate.

Dynamic assets

WordPress is able to directly control the following types of caches, as WordPress and the cache know about each other and the cache trusts the WordPress site:

WordPress caching plugin cache
Web server cache
CDN cache (usually)

However, WordPress is NOT able to control the following caches via means other than HTTP, since they are set up by and run by people who don't necessarily trust the WordPress site:

ISP / corporate cache
Web browser cache

Cache invalidation

How WordPress tells a cache to invalidate (purge, remove) something that is has cached depends on the specific caching software that is being used:

Local cache that is a part of the WordPress caching plugin: Since the WordPress plugin is the cache, it simply manages itself, most often by deleting the cached file from disk.
Varnish: by connecting to the cache via HTTP and sending a PURGE request for the URI in question. Varnish will only accept PURGE requests from trusted IP addresses (typically, the IP address of the WordPress web server).
Squid: same mechanism as Varnish.
Apache HTTP Server mod_disk_cache: by running the external program htcacheclean, giving it the full URL of the cached resource to delete.
CDN: The exact details depend on the CDN, but most provide an interface that is similar to that of Varnish and Squid.

Some local caching WordPress plugins have the ability to use memcached, which is a memory-based cache rather than a disk-based cache. In this case, the plugin sends a delete command to memcached.

Cache invalidation

As an example, if a visitor adds a new comment to an article with WordPress post ID 123, the following URI paths need to be invalidated so that each of them will be regenerated the next time it is requested:

/(permalink to post 123)
/(permalink to post 123)/(page number)    multi-page posts
/(permalink to post 123)/comments-page-(page number)
/                                         site main page (may show comment count)
/page/(page number)                       additional main-page content
/feed/...                                 RDF, RSS, ATOM feeds
/comments/feed
/tag/(each tag of post 123)
/category/(each category of post 123)
/author/(author of post 123)
/(year of post 123)
/(year of post 123)/(month of post 123)
/(year of post 123)/(month of post 123)/(day of post 123)
/author/(author's username)

In addition, most WordPress caching plugins will provide a dashboard button that will purge the entire cache; this is useful during upgrades and other site maintenance.

Non-HTTP caching

Up until now we've talked mostly about caching at the HTTP protocol level. HTTP caching completely avoids running WordPress and its plugins unless a WordPress page needs to be regenerated. Also, the HTTP request is intercepted by the cache as early as possible, so that most of the work that the web server does for each HTTP request is avoided, too. For these reasons, HTTP caching is the fastest caching option available.

The downside to HTTP caching is that you need to set up some software to do it. This adds considerable complexity to a WordPress installation, and is not always possible. (We're ignoring web browser caching here, since we can't safely cache dynamic assets such as articles in the web browser cache, which loses a lot of performance.)

Non-HTTP caching

A WordPress caching plugin can save a copy of every page that is generated, however, and when new requests come in to WordPress itself, it can check to see if there is a saved copy that can be used instead of generating a new copy.

This avoids all of the complexity of HTTP caching, but is substantially slower:

The web server has to do the full work of handling both the HTTP request and response.
WordPress actually starts up to handle the request and does a fair portion of its initialization (although the caching plugin will avoid as much of the initialization as possible unless it turns out to be necessary to regenerate the page).
The checks to see if a previously saved copy can be served are done in PHP, which is relatively slow.

However, #2 and #3 can be completely avoided if you are able to set up mod_rewrite rules in the WordPress .htaccess file to check for and serve cached content.

Troubleshooting caching problems

Caching problems can be very non-obvious and difficult to troubleshoot since so much is hidden. The most common problems are:

The cache is either misconfigured or is not completely configured.
The cache is configured correctly, but is being too aggressive, breaking things.
The cache is configured correctly, but is being too conservative, giving all the downsides of caching without really speeding anything up.
The cache has old or corrupted data and needs to be purged.
A plugin or theme assumes that the only cache is the web browser cache, and fails to add necessary caching headers to its output.
A plugin or theme assumes that a particular caching setup is being used and hence adds the wrong caching headers to its output.

If the problem is identified to be a particular plugin or theme, options include replacing it with a similar plugin or theme that does not have the problem, or modifying (customizing) the plugin or theme to fix the problem.

Troubleshooting caching problems

Check the web server logs to see what is being served. If something does not show up when it should, it is probably being inappropriately cached.
If you have access to the web server cache or CDN logs, check them. Enable debugging or diagnostic messages, if possible, to get more details.
Use Live HTTP Headers or a similar web browser plugin to examine the caching headers. Check the dates and ages in particular to determine if something is being cached that should not be cached.
Clear (purge) all caches and try again. If the problem goes away, it's probably a caching problem that needs to be investigated and fixed. If the problem doesn't go away, it could still be a caching problem, but look at non-caching related causes, too.
As a last resort, temporarily disable caching and also clear the web browser cache. If the problems still exists at this point, it's definitely not a caching problem.

References:

RFC 7234: Hypertext Transfer Protocol (HTTP/1.1): Caching.
Google's HTTP caching guide.
A Beginner's Guide to HTTP Cache Headers.
W3 Total Cache documentation.
WP Super Cache documentation.

Questions?

Diagram showing most of the places caching can occur.