Caching is a key performance factor and a source of issues when it’s not set up correctly. This article will give an overview and useful, practical details on how to adjust caching for a local development environment and where the cached files are stored.
So what is cache in AEM, and how it works?
Cache in AEM is a component that transparently stores data o that future requests for that data can be served faster. Once requested, a cacheable document is checked by the Dispatcher to identify whether that document exists in the web server’s file system:
- If the document is cached, the Dispatcher returns the file.
- The Dispatcher requests the document from the AEM instance if it is not cached.
By default, resources will be cached only if all of the following conditions are met:
- The HTTP request uses the GET method;
- The requested URL has an extension (such as html or .xml);
- The requested URL has no query string (there are no parameters after extension);
- Request has no “Authorization” header (unless AllowAuthorized is 1).
The settings are defined in the dispatcher configuration file. In our case, it’s a conf/dispatcher.any file. The configuration file contains a series of single or multi-valued properties that control the dispatcher behavior:
- property names are prefixed with a forward slash (“/”);
- multi-valued properties enclose child items using braces (“{}”);
- comments begin with the ‘#’ symbol.
Renders
First, you must specify renders for the dispatcher. Renders are AEM instances the dispatcher receives the content from, which can be cached. Renders is the first thing we will define in our configuration file. The Dispatcher will automatically balance the load among these AEM instances if you define more than one render. We will set only one render in our case: public AEM instance.
/renders
{
/rend01
{
/hostname "localhost"
/port "4503"
}
}
You can now restart the httpd server and check if the dispatcher can request resources from a public AEM instance.
For example, if you have a working http://localhost:4503/content/geometrixx/en.html page, then the dispatcher version of this page should be available at http://localhost/content/geometrixx/en.html. Note that we don’t set the port in the URL request because the dispatcher (more precisely, httpd) works on port 80, which is the default for all browsers.
Filters
The /filter section specifies HTTP requests that the dispatcher can accept. All other requests are sent back to the web server with a 404 error code (page not found). Let’s allow access to all the resources for our demonstration case.
/filter
{
/0001 { /type "allow" /glob "*" }
}
Filters types: “allow” or “deny”.
Globs will be compared against the entire request line, e.g.:
/0001 { /type "allow" /glob "* /index.html *" }
This glob matches the request “GET /index.html HTTP/1.1” but not “GET /index.html?a=b HTTP/1.1”.
Instead of “globs”, you may use separate “url”, “method”, “protocol”, “extension” to define your filter. In addition to “url”, you may use “path”, “selectors”, “extension”, “suffix”.
When a request fits multiple filter patterns, only the last filter pattern is applied.
After defining your filters, you may restart httpd and check if the dispatcher has access to all resources of the public AEM instance. Of course, you should deny access to some resources for security reasons in a real production environment.
Cache
The caching section determines resources that the dispatcher will cache. This section has many rules similar to the filter rules, with a few additional settings.
For example, /docroot determines the location of the directory where cached files are stored. The value must be the exact same path as the document root of the web server so that dispatcher and the web server can handle the same files.
For our demonstration, let’s set docroot and allow caching of all resources received from our render (publish instance):
/cache
{
/docroot "/Apache22/htdocs"
/rules
{
/0000
{
/glob "*"
/type "allow"
}
}
}
Once you’ve applied these changes, you can restart the HTTPd server, open a new private browser window for unauthorized access without using the “Authorization” header (Chrome ctrl+shift+n, firefox ctrl+shift+p) and go to: http://localhost/content/geometrixx/en/products.html.
Cached resources should appear inside the htdocs directory. Resources have a URL-like hierarchy: directories form paths, and static HTML files contain rendered content.
Headers
You know that cached HTML files contain only HTML content. But what should we do if we want to cache response headers received from renders?
For example, if responses from renders contain a “Content-Type” header that defines the encoding, the HTML content may not be displayed correctly without the header. That’s what the /headers block inside /cache sections is designed for. Let’s cache some common and useful headers:
/cache
{
/headers
{
"Cache-Control"
"Content-Disposition"
"Content-Type"
"Expires"
"Last-Modified"
"X-Content-Type-Options"
}
}
Having changed the headers in your dispatcher configuration file, delete all cache from the htdocs directory and restart httpd. Then you can open http://localhost/content/geometrixx/en/products.html.
Finally, not only will you find the cached HTML-file products.html in the htdocs/content/geometrixx/en directory, but also the products.html.h file next to the original HTML file products.html. This *.h file contains headers for the cached HTML file.
Summary
You can quickly activate caching using the following initial settings in the dispatcher configuration file:
- set renders;
- set filters;
- set htdocs and rules for caching sections;
- set headers for storing HTTP-headers.
FAQ
How does AEM dispatcher cache work?
Once requested, a cacheable document is checked by the Dispatcher to identify whether that document exists in the web server’s file system:
If the document is cached, the Dispatcher returns the file. The Dispatcher requests the document from the AEM instance if it is not cached.
What is cache in AEM?
Cache in AEM is a component that transparently stores data o that future requests for that data can be served faster.
What is permission-sensitive caching in AEM?
Permission-sensitive cache AEM enables you to cache secured pages. The dispatcher checks the user’s access permissions for a page before delivering the cached page.