Improving headers in a Drupal site using Dries' HTTP Header Analyzer

Miércoles 25 de Septiembre de 2024

In a previous article I wrote about the importance of the HTTP headers and web security avoiding the technical stuff. In this article I want to get into all the dirty technical details.

Some time ago we came across the HTTP Header Analyzer by Dries Buytaert. This tool, as the name suggests, analyses the headers of the HTTP response from a website. There are other header analysers out there, but this one, published by the creator of Drupal, also takes into account Drupal-specific headers. The tool displays a report of all headers found, along with an explanation of the purpose of the header and notes on the values of the header, sometimes including recommendations for better values. It also displays information about missing headers that should be present.

With all this information, the tool gives a score based on the missing headers, warnings and notices detected during the analysis. On the first runs on our website, I have to admit that the score was not bad, but not good: 6/10. Now I am happy to say that we get a score of 10/10 on the home page. Unfortunately, not all pages can get the highest score, as I explain below.

A score of 10/10 for https://www.metadrop.net/en — 10/10 score at Metadrop home page with just one notice.

This tool helped us to better understand the HTTP headers, something that it is not easy at all. Let me talk about how we got this score and explain about some of the headers involved.

I have grouped the the headers according to the functionality they provide or are related to. The classification is not strict, if you want a proper classification please check the Mozilla documentation about HTTP headers.

Cache headers

Headers related to caching and the behavior of requests depending on the change state of the resources.

Expires header

The Expire HTTP header was introduced in the first HTTP specification, HTTP/1.0, and controls the expiration date of a content (what a surprise!), providing an exact date. While this header is still used, it is recommended to use the Cache-Control header as it provides more control over caching.

Drupal provides an Expires header with the birth date of Dries, which is obviously in the past. Why is this? It is a trick to deal with older devices. HTTP/1.0 devices do not support the Vary header. This can be critical because Drupal can send different responses to the same URL and relies on the Vary header to avoid caching problems. This means that a HTTP/1.0 device may cache a response regardless of the Vary header. To avoid this, the Expires header is set: an HTTP/1.0 sees an Expires header in the past and therefore it does not cache the response, while a modern device will ignore the Expires header because it processes the Cache-Control header, which always supersedes the Expires header if present.

Funnily enough, the Dries tool itself considers the Expires header to be something to be removed. As it is hardcoded into the Drupal core, we used the web server configuration to remove the Expires header.

Last-Modified header

The Last-Modified header provides the date when the resource at a URL was last modified. This is used by clients to determine whether the stored resource (if the client has already that URL's resource stored) is the same as the current resource. This header is related to the If-Modified-Since and If-Unmodified-Since request headers because it should be included in responses to requests with those headers. These headers are used to make the requests conditional. For example, a request with If-Modified-Since header will be answered with a status of 200 OK if the resources has been modified after the provided date, or a 304 Not Modified if it has not.

However, this header is considered less accurate than the ETag header, which provides a unique identifier for each specific version of a resource at an URL. Clients don't need to know when the resource was changed, they just need to know if it's different or not, and the ETag provides this information much more easily as there's no need to parse dates.

That's why Dries' tool suggests to remove it if there is an ETag present, something that happens in every Drupal as the ETag is provided by core. Again, this means removing it using an external layer such as the web server or proxy layer.

X-Drupal-Cache header

This header is specific to Drupal and just informs if the response was obtained from the cache managed by the Internal Page Cache module of the Drupal core. While the tool only reports whether the response was cached or not, I think it is interesting to know it. A HIT means it was retrieved from de cache, a MISS reports a cache failure.

This module does the same job as any external cache layer like Varnish or a CDN. If you have such a layer you can disable it (and you should because you are burdening your cache backend with data that is not used because it is already provided by your external cache layer).

X-Drupal-Dynamic-Cache header

Like the previous header, this one is related to Drupal caching. In this case, caching is handled by the Dynamic Page Cache module of the Drupal core. This module caches the parts of the response (you could say fragments of the HTML served) that are the same for all users. It is automatic and its magic comes from the powerful caching mechanism that Drupal uses, based on Cache Tags and Cache Contexts.

Again, the tool does not expect a specific value, but it is interesting to know. An UNCACHEABLE value means that no fragment was cacheable and you should probably review the page. Usually, this happens when certain functionalities like CAPTCHA forbid caching because they don't work with cached pages.

The Improve X-Drupal-Cache and X-Drupal-Dynamic-Cache headers, even for responses that are not cacheable issue adds more info to this header and was merged in October, 2024, and I will released with Drupal 10.4. Now it will be easier to know why the page us uncacheable because the header can have three different values: UNCACHEABLE (poor cacheability), UNCACHEABLE (no cacheability) and UNCACHEABLE (policy).

Security headers

Security headers are the tricky part because a misconfiguration can break the functionality of the site or increase the surface attack.

These headers are the mechanism used by sites to make the user's visit more secure. And some might think, what is the point? If sites are legitimate, there shouldn't be a problem, and if dodgy sites don't use these headers because, well, they're dodgy, right? As I explained in an earlier article about safe browsing and security headers, when a user visits a site, a lot of third-party resources are loaded that are outside the control of the original site. The site has to delegate to others, for example to load advertising, external media or audience measurement tools, and this means that at some point a malicious script could be loaded. Or the site could be compromised and forced to serve bad script to visitors. In these cases, the security headers help to prevent any damage and make sense because this plethora of third parties is being loaded into the browser.

Three key concepts to know are the meaning of cross-origin, same-origin and same-site, as they are used when defining the behaviour of many security headers. Since an origin is the combination of scheme, domain and port, a same-origin request is a request from a given origin to the same origin. Anything else is cross-origin. For example, a request to https://example.com/foo from https://example.com is same-origin, while a request to https://subdomain.example.com/foo from https://example.com or https://anotherdomain.com is cross-origin.

Same-site, on the other hand, is a bit more complex. It depends on the top level domain and certain rules, but it is broader than same-origin in the sense that two different domains can be considered the same site: https://blog.example.com and https://example.com are considered the same site.

Strict-Transport-Security

The Strict-Transport-Security header, sometimes abbreviated as HSTS, is a response header that tells the browser to always use HTTPS when connecting to the site. The interesting part is that the browser remembers this setting, so it will use HTTPS in the future for sites that include this header. This prevents downgrade attacks (forcing the browser to use HTTP instead of HTTPS) and cookie hijacking (for example, stealing a session cookie that could be used to impersonate the user).

Given the current ubiquity of HTTPS connections, the use of this header is highly recommended. The header includes a max-age that specifies how long the browser should remember to use HTTPS.

This header can be set in the web server or proxy layer, or using the Drupal Security Kit module.

In our case we enabled the header and set a max-age of one year because the web has been working on HTTPS for years and we don't intend to remove it in the future, so it's OK to force browsers to connect using HTTPS.

Permissions-Policy

The Permissions-Policy header allows sites to inform the browser of features that may be required or that should be disabled. A clear example is disabling the use of microphone and webcam. If a site is sure that it does not offer video calling or other features that require a microphone and webcam, it can send this response header indicating that they are not required. If a malicious script manages to load and tries to use an unauthorised feature, the browser will deny access no matter what.

And what functionality can be allowed or denied with this header? Well, quite a long list, but the ones that are stable are:

Camera: access the camera device
Fullscreen: enter into full screen mode
Geolocation: provide user's geolocation
Microphone: access the microphone device
Credentials: access to the Web Authentication API, used to login in sites without password thanks to different methods such as biometrics or hardware authenticators.
- Wake lock: prevents the screen from turning off or dimming.
Web share: controls the use of the WebShare API, which provides a mechanism for sharing text, links, files and other content.

There are many other functionalities such as access to Bluetooth, accelerometers, light sensors, payments or USB that can be managed by this header, but they are experimental.

As our site doesn't use any of these features, we've blocked most of them:

permissions-policy: "accelerometer=(), camera=(), geolocation=(), microphone=()"

Let's say we want to allow voice messages apart from the contact form. In this case we would need to allow microphone use, using something like

permissions-policy: "accelerometer=(), camera=(), geolocation=(), microphone=self"

That would allow the microphone to be used by resources from the same origin as our site. A too wide configuration would be:

permissions-policy: "accelerometer=(), camera=(), geolocation=(), microphone=*"

This would allow any source to request to use the microphone, and could be a potential security risk.

To set this header, you can use Drupal's Permissions Policy module, or set it at the web server or proxy layer.

Referrer-Policy

The Referrer-Policy header controls how much referrer information is sent in requests. This is important to protect user privacy and prevent data leakage if there is sensitive information in the query parameters of your site's URLs.

The different options range from always sending all referrer information (origin, path, query parameters) to never sending any referrer information. You can adjust the behaviour depending on the destination (same or different origin) and when the connection is downgraded (for example, when connecting from HTTPS to an HTTP resource).

The default value, strict-origin-when-cross-origin, instructs browsers to send the origin, path and query param in same-origin requests and cross-origin requests with the same security level (for example, when requesting an HTTPS resource from an HTTPS URL). This is a reasonable value that works for most sites, and although Dries' tool does not seem to penalise if it is missing, it does trigger a warning if it is not present. For this reason, we set the header explicitly on our site, using the default value.

If you want to set this header the Security Kit module provides the configuration to set it. Or as before, use your favorite previous layer such as the web server or proxy.

Content-Security-Policy

The Content-Security-Policy (CSP) header helps to prevent certain types of attacks such as Cross-Site Scripting (XSS) or data injection. It works by telling the browsers which domains are valid sources of resources. For example, it can set a few domains where scripts can be loaded. All other scripts loaded from other domains won't be executed. It can also mark all cookies with the secure attribute (so cookies will sent only over HTTPS connections).

CSP is a fairly complex header that can potentially break your site (for example, if you forbid loading a critical JavaScript resource). Luckily, the CSP policies can be deployed in report-only mode. In this mode the browser will detect any violation but won't block it, although it will report the violation for a given URL. This allows you to test the CSP policies without breaking the site, and to be aware of any problems that clients may have simply by checking the CSP notifications.

The Content-Security-Policy Drupal module provides a way to manage this header in a Drupal site, including handling the report mode and receiving notifications.

By he way, during the climb to the 10/10 score I found a bug in the header tool: it complained about using the deprecated report-uri clause and asked to use report-to instead. However, report-to is not widely supported and using both clauses caused the tool to complain anyway. I emailed Dries himself and he provided a fix the next day. I really appreciate the quick response, thanks!

Cross-Origin-Resource-Policy (CORP), Cross-Origin-Embedder-Policy (COEP) and Cross-Origin-Opener-Policy (COOP)

This is where things get very complex. A few years ago the Spectre and Meltdown vulnerabilities were discovered. These issues could allow a malicious script to access unauthorized data using certain techniques embedded into the processor itself. This means that it was not easy to fix because it originated in the hardware. The issue can be mitigated by creating isolated contexts for different processes, as processes in the same context could read data from others. These headers help to create or manage context isolation, but explaining the details is beyond the scope of this article. Probably not many people fully understand these vulnerabilities and how to use the above headers, so I won't pretend to completely understand them, although I would like to.

During the process of setting these headers I faced a bug in the YouTube website (at least is what I think), a lot of trial and error and some loss of sanity. It was a tough ride. Because of the YouTube bug we can't set a proper value in the Cross-Origin-Embedder-Policy header for all URLs because YouTube videos won't render if present. Thus, we use the HTTP Response Headers module to set the COEP header only on the front page.

Another HTTP analyzer

After getting the highest score with Dries' tool, I tried the Security Headers tool. The result was good, very good:

That's why I find Dries' tool very interesting, because if done right, other tools will agree on the high score, but also because it's tailored for Drupal. I would love to see this tool available as an API or released as standalone software to include it in our automated testing stack. Until then, we can flood the tool. Sorry Dries, I hope Acquia gives you a good price for hosting, the traffic may increase 😇

HTTP Header AnalYzer for metadrop.net.

Ricardo Sanz

CTO

Training courses

Face-to-face and online training for development and product teams.