I’ve been beating this drum for many, many years now: https://csswizardry.com/2019/05/self-host-your-static-assets...
Regarding the resource hijacking (security vulnerability) risk, one need not just imagine it anymore. It actually happened, a few years after your blog post was written: https://news.ycombinator.com/item?id=40791829
- [deleted]
- [deleted]
A random fact from before the “before” in the article: the cache for all resource types used to be the same. We (ab)used this to do preloading before it was available in the web platform, e.g. download a JavaScript without executing it:
Then some browsers started having a separate image cache and this stopped workingvar js = new Image(); js.src = 'script.js'; js.onerror = function(){/* js done */};
As a user, you can use extension like LocalCDN. Instead of having your browser download common libraries and fonts, the extension would intercept the request and serve the local version. Even better in terms of privacy and security.
This also dramatically increases your fingerprintability, I think. Fingerprinters probably aren't using it yet though, because it's such a small target audience. But if it were to be implemented into a major browser, I'm sure they'd start using it.
To elaborate, JS running on a page can probe your cache (via timing) to discern which URLs are already in it or not. If the cache is shared cross-origin, then it can be used to leak information cross-origin.
There is also one called Decentraleyes: https://decentraleyes.org/
IIRC one is a fork of the other with the purpose of supporting more libraries
Why does this do anything? The attack described is a timing attack, so a local proxy doesn't really help.
https://github.com/arkenfox/user.js/wiki/4.1-Extensions#-don...
...atleast on firefox.
This is annoying, can't we pretend to download the resources we have instead if just not doing it?
It'd be nice, but doing it without giving away that's what's happening would probably get complicated
The author recommends Domain Consolidation however this seems like some bad advice in certain cases due to the browsers max connection limit on domains. At least in http 1.0. or am I mistaken ?
You are correct that this is an optimisation only available to H2+, but optimising to the H/1.x use-case is a use-case not worth optimising for—if one cared about web performance that much, they wouldn’t be running H1/.x in the first place.
Most connections on the web nowadays are over H2+: https://almanac.httparchive.org/en/2024/http#http-version-ad...
> if one cared about web performance that much, they wouldn’t be running H1/.x in the first place.
You may not have intended it this way but this statement very much reads as "just use Chrome". There's lots of microbrowsers in the world generating link previews, users stuck with proxies, web spiders, and people stuck with old browsers that don't necessarily have H2 connectivity.
That doesn't mean over-optimize for HTTP 1.x but decent performance in that case should not be ignored. If you can make HTTP 1.x performant then H2 connections will be as well by default.
Far too many pages download gobs of unnecessary resources just because they didn't bother tree shaking and minifying resources. Huge populations of web users at any given moment are stuck on 2G and 3G equivalent connections. Depending where I am in town my 5G phone can barely manage to load the typical news website because of poor signal quality.
In my opinion the benefits of domain consolidation outweigh the costs.
Using a bunch of domains like a.static.com,b.static.com etc was really only helpful when the limit on connections to a domain was like 2. Depending on the browser those limits have been higher for a while.
For http/2 it's less helpful.
But honestly there's not really one right answer theoretically. multiple domains increase fixed overhead of DNS, tcp connect, tls handshake, but offer parallelism that doesn't suffer from head of line blocking.
You can multiplex a bunch of request/responses over a http/2 stream in parallel... Until you drop a packet and remember that http/2 is still TCP based.
UDP based transports like http/3 and quic don't have this problem.
This seems to be quite a drastic change that could only have been warranted by a bigger problem than the one outlined in the article. Also strange the way than Osmani preemptively shuts down criticism of the change.
What is the actual problem thats being solved here? Is cache-sniffing being actively used for fingerprinting or something?
Whats going on here?
The privacy problems were really bad.
Want to know if your user has previously visited a specific website? Time how long it takes to load their CSS file, if it's instant then you know where they have been.
You can tell if they have an account on that site by timing the load of an asset that's usually only shown to logged-in users.
All of the browser manufacturers decided to make this change (most of them over 5 years ago now) for privacy reasons, even though it made their products worse in that they would be slower for their users.
We should also be asking if the benefits of a non-partitioned cache are truly there in the first place. I think the main claim is that if (for example) the same JS library is on a CDN and used across multiple sites, your browser only has to cache it once. But how often does this really happen in practice?
In the days of jQuery… all the time.
In the days of webpack, not so much.
Well I know it's not directly the answer to your question, but the article mentions that in practice a partitioned cache increases bandwidth by 4%. Which at the scale of billions of web browsers is actually pretty bad.
The privacy implications are discussed in the article.
You don't seem to have properly read the previous comment or the article.
The question (not addressed in the article) is: is there in fact a gigantic, widespread problem of using cache-sniffing to fingerprint users?
We have always known that this was possible, yet continued to include cross site caching anyway because of the enormous benefits. Presumably something fairly major has recently happened in order for this functionality to be removed- if so what?
My guess, as someone who doesn’t know the answer but works closely in this field, is that the potential downside (i.e. privacy/security) was much greater than the actualised benefit (i.e. performance).
Safari (who make a big deal about being privacy-conscious) was the first to introduce cache partitioning, looking into it as far back as 2013[1]; Chrome followed in 2020 and Firefox in 2021[3]. One thing I know, anecdotally, to be a strong motivator among browser vendors is ‘X is doing it, why aren’t we?’
1. https://bugs.webkit.org/show_bug.cgi?id=110269
2. https://developer.chrome.com/blog/http-cache-partitioning
3. https://blog.mozilla.org/security/2021/01/26/supercookie-pro...
- [deleted]
The privacy problems were catastrophic - the Google CDN fed directly into Google’s pervasive tracking infrastructure.
Most other CDNs similarly support tracking services.
As the article says the actual cost of partitioning is negligible, the cost of not partitioning is that nothing can prevent cross site tracking and invasion of privacy.
VPNs and extensions don’t work because the browser is ensuring your identity is constant. That’s why the large ad networks funded free/cheap CDNs.