Safe browsing the Internet and the birthday party analogy

Wednesday, September 18, 2024

When people browse the Internet, they usually assume that only two parties are involved: the browsing user and the site they visit. However, the reality is far more complex and involves many security concerns that most people overlook simply because they are unaware of them.

In fact, when you browse the Internet, there are numerous actors that can potentially access your computer through the browser. Although browsers do a good job protecting users from malicious activity, this is no easy task. To enhance security, browsers and website developers can collaborate to make sites safer. A website can instruct the browser on which actions may be requested at some point, thereby avoiding any potentially harmful functionalities that are not required for the site to operate properly, thus closing possible attack vectors. One way to achieve this is by utilizing security HTTP headers.

But let's set aside the technical jargon and explore this concept with a practical example.

The Birthday Party

Your birthday is coming up, and you want to throw an amazing party at home. By amazing, I mean a truly great event: food and drinks for everyone, live music, some sports activities, a DJ, special decorations, and even a magician.

Since you're not going to learn to play the guitar to entertain guests, become a DJ, or prepare hundreds of sandwiches yourself, you decide to delegate these tasks to professionals who are experts in their fields.

First, you hire a Birthday Party Planner. They know people who can handle the party decorations, find a good karaoke band (trust me, a karaoke band is a fantastic move), and prepare and serve the food and drinks. So, all you have to do is relax (or not) and enjoy the party.

The party will be held in the garden and some adjacent rooms to place the food and drinks. When the day comes, your planner has everything under control. The catering staff arrives early to set everything up. They prepare the rooms, organize the outside tables, and start cooking some fresh dishes. They need access to water and toilets because, well, they are human. The DJ arrives later and requires access to a power supply. The karaoke band comes when the party is in full swing and needs a discreet room to change clothes because they are a total show. Lastly, the magician wants a quiet space to prepare for the performance, stretching and warming up their hands for an hour beforehand.

A birthday party in a garden with robots helping guests — A weird birthday party image with robots made by a robot (an AI).

These are many requests and people around your home, indeed. However, you don't need to worry about anything because you live in the future, where your house is served by robots: a cooking robot for all kitchen tasks, other robots to assist with decoration or gardening, and more robots inside the house to find an electric socket for the DJ—a helpful team of automated minions.

Additionally, the planner is there to handle any requests from the different people making your party a huge success, commanding the robots to solve any issues that arise.

But… are all these people trustworthy? What if someone asks the robots to let them inside a private room "to add some decoration"? Or requests a valuable object because "the owner wants it in the garden"? Or what if someone peeks at your personal information in a desk drawer or snoops around your home? Sure, you trust the Birthday Party Planner, and you trust your robots, but… can they ensure that everyone else is doing only what they should for the party’s success? Maybe someone from a team has a different idea. Or someone managed to pose as staff but they are not. How can you ensure that your home remains safe and secure while allowing legitimate personnel to do their jobs?

Well, this is exactly what happens when you browse the Internet. Your home is like your computer with all the data it accesses, including emails, personal files, pictures, social network messages, and banking information. The robots are akin to your browser, the Birthday Party Planner is the site you’re visiting, and all the party staff are the different third parties involved in presenting a website (ads, YouTube videos from Google, audience measurement tools, external libraries used to display carousels, etc.). And while you can trust your browser, you can usually trust the site (at least most of the sites people visit), and can more or less trust those external tools, every time you visit a website there is actually a wide variety of people doing things inside your computer.

The Web Page Request Party

When you request a web page, your browser connects to a server and asks for the page content. The browser is like the robots at your home: it reads the website’s contents and follows the instructions there—display a certain title, place this big picture on top, put some text below, and so on. These instructions come from the site, but like a Birthday Party Planner, most sites rely on others to complete the task. For example, ads are managed by external organizations, which need to load their scripts onto the site to handle the content presented to the user. Audience tracking software is also loaded by the browser following the site’s instructions. This external code is often loaded without the site being fully aware of or auditing the content. Like the Birthday Planner, they delegate to others.

All these third-party resources are being loaded into your browser, and the browser carries out their instructions.

You might think this doesn't sound very scary. What’s the worst-case scenario? What could possibly go wrong? Well, the main problem is that your browser is capable of doing all sorts of things. It can access your computer disks, often knows your passwords, has access to your personal sites with lots of personal information that could be used against you by malicious actors, and even access your microphone and webcam because sites offering video calls need to access these features.

The browser is your ally and websites can help

Let's not panic. Just as your robots at the party follow instructions to ensure everything runs smoothly, your browser is designed to not only display web pages according to the site's instructions but also to enforce security measures. For example, this is why your browser prompts you for permission before using the microphone. You must authorize a site to use the microphone before it can record any sound, thankfully.

However, a malicious actor might trick you into granting microphone access. This is where the website itself can step in to help. A website can inform the browser about its needs and block any functionalities that are not required for its operation. For instance, a site can declare to the browser that it will never need to use the microphone. The browser records this information, and if later, any third-party scripts included in the website attempt to enable the microphone, the browser will deny the request. It doesn’t even bother asking you because the site has already stated that microphones are not needed at all—that's final.

Returning to the birthday party analogy, this is like your Birthday Party Planner telling the robots that no one will ever need to go to the upper floor. Any malicious staff member could come up with detailed excuses, but your robots won't let them go upstairs. Period.

How websites inform the browser

Websites can restrict the actions of the browser to increase the security of their visitors by using HTTP headers. Essentially, along with the content to display, websites send instructions such as "do not use the microphone," "do not allow third-party scripts to use your credentials on this site," or "always use secure connections."

This is not an easy task, as the website owners (or most likely, the site developers) must analyze the site and use these HTTP headers to remove all functionalities that the website doesn't require.

They need to determine whether the site must load scripts from external providers, such as YouTube videos or more specialized content like financial graphs or reports. For example, if a site displays videos from an external source, it must allow the execution of scripts from third-party sites, or the videos won't play.

Another consideration is whether the site can be embedded in other websites. This is what happens when a site displays messages from social networks, like embedding Instagram or Twitter content. While this may not seem like a big deal, consider a bank's website. If the bank's site can be embedded in another website, attackers could trick users into believing they are using the legitimate bank application when they are not. They could be on a hostile site that accesses sensitive information.

It’s not just the users who are at risk. Content editors of a website—the people who create and edit the different pages visitors see—can be targeted to steal their credentials or alter the website content in subtle ways that are not easily noticed, such as adding invisible links to disreputable sites to benefit from the site's SEO score (and damaging that score in the process).

All these security measures can be implemented in a straightforward and somewhat insecure manner or in a more secure and precise way. For example, a site using YouTube videos could simply allow all third-party scripts, and the videos would work. However, this would also permit any malicious script from any third party to run. In this scenario, the safest approach would be to authorize only scripts from YouTube and nothing else. This is done using the Content-Security-Policy header, which allows sites to declare from which other sites JavaScript is allowed to load.

Visualizing third-party resources loaded by a site

It is interesting to analyze all the third-party resources loaded by a site. The online tool Request Map does exactly that: it displays a graph showing all the third parties involved in rendering a web page. It is often surprising how many actors are involved in a simple request. For example, here we have the third-party graph from weather.com:

A graph that displays a central node that is weather.com surrounded and connected to several bubbles representing the third parties.

Graph of all the third party requests by Weather.com home page.

The blue circle represents the weather.com site. Apart from some data served by subdomains of weather.com, all other circles represent third-party resources. The red circles are related to advertising, the purple ones are unknown (meaning the tool could not determine the exact purpose of those resources), the green circles serve scripts from Adobe, and the yellow one is a JavaScript library.

Here is the graph from the Spanish Wikipedia, and as you can see, there are no third parties involved because all the circles are from Wikipedia itself:

A graph displaying a central circle and several other related circles, all owned by Wikipedia.

Wikipedia has no third parties involved in its website.

Unfortunately, it is almost impossible to serve a site as lean as Wikipedia. Most sites require certain tools related to marketing, advertising, audience tracking, and they often embed external media or other data.

This, for example, is the graph from four Metadrop clients:

Four graphs displaying the third party resources used by websites from Metadrop's clients

Third party resources of websites from Metadrop clients.

In these cases, our work is to ensure that those sites inform the browser about the exact requirements of the client's website and eliminate any additional unused capabilities.

Checking HTTP headers in sites

There are several tools available to check if a site's HTTP headers are set correctly and how permissive they are. The first tool that comes to mind is the HTTP Headers Analyzer by Dries Buytaert, the creator of Drupal. It's simple without being too simplistic; it provides detailed information on each point it checks, it's fast, and it gets straight to the point. Using this tool, I've learned a lot by working to push the score to the maximum—10 points. I plan to release an article with the technical details, but for now, let’s keep this discussion non-technical.

Another good tool is the HTTP Observatory Report provided by Mozilla. It seems stricter than Dries' analyzer and probably impossible to achieve a perfect score for many sites, as it requires some configurations that are simply unfeasible for certain sites to implement.

Lastly, I would like to mention Security Headers, another online tool that analyzes a site's HTTP headers. It is very clear and provides information on the score and the specific aspects it checks.

Bonus: The GDPR works!

I would like to conclude with an observation I made during the preparation of this article: it appears that the GDPR (General Data Protection Regulation) is effective in protecting users. Why do I say this? If you compare the third-party resources loaded by the same site when accessed from the USA versus the European Union, the difference is quite noticeable. The following graph illustrates this comparison for weather.com:

Weather.com loads much more resources when visited from USA

Comparison of third party resources loaded when visiting weather.com from a European Union address versus an IP address from USA.

The red square on the right side represents more or less the same set of resources as on the left. This means that when accessing weather.com from the USA, the browser is instructed to load significantly more third-party resources. While I cannot be entirely certain, I believe this is due to the GDPR, a law in the European Union that applies to all citizens within the EU. Consequently, websites may apply different rules when they detect an IP address originating from Europe.

Conclusion

The Internet can be a wild place, but browsers do a very good job of protecting users. However, websites can enhance this protection by providing browsers with hints on how to increase security without compromising functionality. Malicious actors are quite clever, and minimizing the attack surface is an effective way to prevent nasty surprises that can impact a site in various ways, including SEO, reputation, costs, and overall success.

Improving security through HTTP headers significantly helps in reducing this attack surface by removing all capabilities that the site does not require and closing potential attack vectors.

At Metadrop, we view site development as a comprehensive job that covers not only the visible aspects for the owner and visitors but also hidden elements like security. This is why we have incorporated everything we have learned about HTTP headers into the testing stack of our projects. If you think your site could benefit from this knowledge, please contact us, and let’s discuss it further.

Security