I always felt like this was mostly a dream anyway due to the diversity of libraries, versions, and CDNs across the web. Everything would have to line up perfectly, within the TTL, to get the performance advantage of loading from cache. And even then it was only really an advantage on the first page load of a site visit; subsequent pageloads would hit cache anyway from the first page.
And speaking of privacy... if everyone across the web is loading resources from one CDN, that seems like an interesting stream of data for that CDN.
Absolutely. And resources like disk space and bandwidth have gotten much cheaper in the 13 years since jQuery was invented. Fewer cache hits, lower cache value, and less cost savings all point in the direction of retiring this feature.
You say this, but for people who have shit internet are acutely aware how cdn's no longer help things from the users' perspective beyond the mere "cdn's are better at delivering some assets than joe website."
It doesn't help that relative to everything else the churn in websites is immense, making the chance you'll have to pull in things more likely. And relative to everything is quite a statement, as churn in software is pervasive.
EDIT: that is, I'm just complaining, not claiming the status quo (or what was before) was better, obviously.
That’s what I found as well: every time I measured the benefits were much lower than hoped, and especially where you wanted it the most on mobile the cache sizes were small enough that they churned frequently. Way back in the day, the low per-host connection limits were a consideration but that era is firmly dead.
The other side was that people notice slow performance more than fast, and the failure modes were always worse than the savings when some fraction of connections would take, say, 2 seconds to connect to Google’s CDN even though their time to yours was much better. You don’t have an easy option for those slow clients hitting your property but you can at least reduce the number of dependencies to that one service.
@src can be locally hosted. If it's not in cache, the browser can try each @try-shared attr (without loading the resource from CDN). If no match, the browser downloads @src from your own domain.
Of course, this doesn't solve the Shared Cache issue raised by the article. Suppose the only way to solve that would require adding resources to the shared cache explicitly. The most effective way (I assume) would be a header provided by the CDN of a shared resource, eg, X-Shared-Cache: true, that a browser would recognize... Then @src/@try-shared could still get the benefits of the shared cache and developers don't have to worry about it.
If your perspective is only that of a developer, which is fine, this is a developers' website, then in fact it doesn't matter at all, because the whole space has this experience, so no one actor is at a disadvantage anymore unless you like self-host on your home PC.
> I'm sad about this change from a general web performance perspective and from the perspective of someone who really likes small independent sites, but I don't see a way to get the performance benefits without the [privacy] leaks
Maybe I'm missing something, but the obvious solution to me would be more cache-control headers.
The only notable case where shared cache is useful are resources on public CDNs hosting libraries and other common resources. These could just send a "cache-control: shared" header, or "cache-sharing: true" if adding new values to existing headers breaks too many existing implementations. This puts them in a shared cache, everything else gets a segmented cache.
I think the page that loads the resources itself would need the 'cache-sharing' header since other websites could still perform a timing attack if it loads a CDN asset that specifies 'cache-sharing true'. Even then, enabling cache sharing would still make you open to a timing attack and the effectiveness of a shared resource cache would dwindle as less and less sites share that cache.
If Google Fonts serves Roboto with cache-sharing true that is unlikely to leak any data. Sure, you can detect that I at some point visited some site that uses Roboto, but that's vague enough to be useless.
There is some potential for leakage with uncommon assets. Maybe only a handful of websites use JQuery 1.2.65 or Helvetiroma Slab in font weight 100. It's a less severe vector than just testing if someforum.example/admin.css is cached, but still it's leaking data. The CDN could mitigate that by only sending a cache-sharable header on sufficiently popular assets, but depending on others going out of their way to preserve privacy is probably a bad idea.
If a website uses 10 common assets, that's often an uncommon combination. And if you have 100 websites on your "targets list" (let's say, fetish websites, or LGBT communities) then you could get a positive match on some of them.
The ten common assets have to be uniquely uncommon for this to be a risk. Tinymodeltrains.com might have a distinct combination of ten assets, but if my browser caches two of them from my visit to reddit, three from hackernews, two more from imgur, and the last three from pornhub, your tracking data will be meaningless.
Not entirely meaningless; it's kind of like a Bloom filter. False positives exist, but false negatives are unlikely. Combined with other data in the style of the Panopticlick, one can obtain a target set to which to apply closer scrutiny.
Then maybe have the browser enforce "common assets only" by tracking how many unique first party websites use a particular asset and only sharing the cache if the number of such sites is sufficiently high. Though I suppose that would reduce the effectiveness of the cache.
The attack could still work in some cases, and is as described in the linked post.
Your webmail provider has a search box, and the content that is returned is styled with Roboto. If the search finds nothing, then Roboto isn't loaded. The attacker forces Rotobo out of the cache with a specially formatted fetch() request, then loads an iframe of the search. Then the attacker checks if Roboto is in the cache or not. This allows the attacker to essentially read your email inbox.
I think it would be fine to let the CDN mark the common shared resource as "Caching: shared" as an opt-in, and also allow the including page to override with another header as an opt-out. If you are including shared cdn resources on a sensitive page, you are already doing it wrong. The CDN could already control its header to only send the opt-in for very commonly used resources in order to avoid fingerprinting based on less common ones.
Hypothetically, there's nothing to stop some useragents from "blessing" certain libraries as common enough to justify shipping them with the useragent and satisfying requests for them locally. That could leak details on useragent, but none that shouldn't be available from the http headers anyway.
But I suspect this will be unnecessary; even the bandwidth-constrained use case is getting to be more bandwidth every year.
I think a lot of people are unclear on the threat model here. If I have it correct, there's no way around it: either you live with the privacy leak, or you disable the shared cache.
The threat is that when you navigate to creepy website, it loads some library and tracks the timing. They use that to infer that you've accessed some resource from a sensitive site.
None of the workarounds with extra attributes are going to help, because they rely on the web developer to
1. know about the attack
2. know that some library or asset is a realistic candidate for the attack, and take appropriate action.
Neither one is that realistic. We developers are just too lazy to get stuff like that right, even if we know about it. Cargo culting is the rule.
As for the effects, I suspect this will have a modest effect on the average website. The sources I've encountered seem to cast doubt on the effectiveness of share cache (https://justinblank.com/notebooks/browsercacheeffectiveness....). I poked around the mod pagespeed docs and project, and couldn't find any indication of how they'd measured impacts when they implemented the canonicalization feature.
I wonder if you'll see a big impact on companies like Squarespace and Wix, where there are a lot of custom domains that are all built using the same stack.
Off the top of my head I can think of several ways to compromise on this by making shared caching opt-in.
One way is for the requester to specify if the asset is shared. A new 'shared' attribute on html tags and XMLHttpRequest would do this. Browsers enforce cache isolation _unless_ the shared attribute is set, in which case it comes from a 'shared' cache.
So if the attacker requests a www.forum.example/moderators/header.css from the _shared_ cache, but the forum software itself didn't specify it was shared so it never got loaded into the shared cache, then nothing is leaked.
And as it would only make sense to opt to share stuff like jquery.js from a CDN, the forum wouldn't naturally share that css file and so on.
The other approach is for the response to specify sharing, e.g. new cache control headers. Only the big CDNs would bother to return these new headers, and most programmers wouldn't have to change anything to regain the speedup they just lost from going to isolated caches once the CDNs catch up and return the header.
In either case, sharing can _still_ be an information channel if the shared resource is sufficiently rare e.g. the forum admin page is pretty much the only software stuck on version x.y of lib z. The attacker can see if its in the cache, and infer if the victim is a logged-in admin or not. Etc.
I think the trouble with both of these plans is that it shifts cognitive load to a lot of people who aren't expert in the topic. How many people would put "shared" on something because it sounds good, or is the default in a template? And even if they don't, how many brain-hours do we have to burn on people understanding the complexity of an optimization that probably doesn't make much difference to the average website?
If the enemy is the developer then you've already lost. Its not like cache sharing is how a developer chooses to unmask your anonymity when browsing between sites; they have cookies to do that in much better ways.
> If the enemy is the developer then you've already lost.
It's not that the developer is the enemy.
Pretend I create a website called "Democratic Underground: how to foster democracy under a repressive regime." I'm naive, or I want it to load quickly, or I accidentally include a framework that is either of those two -- some library versions are cached.
Now, the EvilGov includes cache-detection scripting on its "pay your taxes here" webpage. Despite my salutatory goals, shared caching leaks to the government some subset of my readers.
I don’t think it does. I think it shifts the load to CDN maintainers. Which is fine because we just gave them a task to do that avoids obsolescence.
The browsers have always allowed cross domain requests which have been tolerated until now but involved all of us being aware of XSS and CSRF issues, or suffering the consequences.
Removing shared cache is the beginning of the end for cross domain requests by default. The other obvious use these days is ad networks, but they also get used for integrations like SSO and shared services like Apple Pay and presumably PayPal? And other collaborations between companies.
> "early experimental results in canary/dev channels show the cache hit rate drops by about 4% but changes to first contentful paint aren’t statistically significant and the overall fraction of bytes loaded from the cache only drops from 39.1% to 37.8%."
What about exceptions for loading common JS libraries from a shared CDN? I'm looking at the Google Chrome design doc and don't see how one gets around this. Maybe I'm just missing something, but if not it seems like they need to dig more into perf from the perspective of the slower end of the distribution, it could make a big difference.
Good. I've been advocating for this since publishing a history-leaking attack on Chrome's shared bytecode cache, which also doesn't rely on the network (CVE-2019-13684 - see page 8 of ). Would also like to see this applied to visited link state eventually. Shared state between origins inevitably leads to information leaks.
Besides, Webpack and similar bundlers with tree-shaking abilities makes it practical to just load a subset of a large library.
And last (but certainly not least) there is the security angle. Imagine if someone managed to sneak malicious code on to CDNJS or Bootstrap CDN, how many nasty things they might be able to get up to, even if everyone remembered to set crossorigin="anonymous" on their shared assets.
It's not clear from the Chromium Design Document whether resources loaded via Subresource Integrity (SRI) will have a shared cache or not. It's not explicitly mentioned, so it's probably best to assume it's not until someone has tested it.
The SRI spec github project has an issue for shared cache  that seems to be coming to the consensus that there will not be a shared cache for SRI:
> "it seems rather unlikely that we can ever consider a shared cache"
Why doesn't the browser just record the original request time for the resource and simulate the same download speed when a different domain requests it for the first time? Maybe even randomize the delay some.
Of course you get a false delay on first load but still saves network bandwidth while still preventing information leakage.
That's clever, but it still sounds like a way that information could be leaked. Download the target resource, and then download it again concurrently with a known unique resource, and see if the timing changes, for example.
It's an arms race where the browser would ultimately have to simulate every consequence of actually downloading every resource over the slowest link in the network. You're making the problem (and its solution) more complex but not completely solving it.
> but still saves network bandwidth while still preventing information leakage.
If it saves network bandwidth, then you just have to measure the network bandwidth, like a speedtest page does. As Spectre and friends have shown, even the tiniest difference can be used for an information leak.
I think you'd also need to be careful about freshness of cached data. If you give stale data but delay it to give the illusion it was just loaded, an attacker might be able to infer it actually was from the cache after all by looking at the data and cross-checking that against what the current data looks like.
Consider, as an example, an HTTP resource that contains a text string representing the current time and which is updated once a minute. Its cache lifetime is set to 1 minute. A page fetches it at 9:01:30AM and gets the string "9:01AM". This goes into the cache. At 9:02:15AM (45 seconds later), an attacker loads it, you give the cached data which is still the string "9:01AM". To cross-check the data, the attacker hits another server (say, its own proxy that it runs, which fetches the resource and forwards it on), so it can tell that the data it should have gotten is "9:02AM".
In other words, if you give it stale data slowly, it might be able to detect the staleness instead of trying to detect the slow loading time.
Perhaps you could fix this by validating the freshness of the cache, using ETags or something. You'd hit the server, validate what's in your cache is fresh, and then still delay it, thus giving a more complete illusion to the attacker.
I'm not sure if HTTP allows a page to access the TTL of cached data, but if so, you might want to fake that too. If you give the real TTL numbers, then some of the time it's going to look like you just loaded it but it's about to expire.
The article links to a reliable attack that doesn't need to test timing/bandwidth at all. It requests the resource from a page that has a very long URL, which causes the origin server to fail due to Referer header exceeding its header length limit. Cached = ok 200, Uncached = error 431.
In that case, the use of the shared cache doesn’t reduce latency, so you’re giving up over a big part, arguably the majority, of the benefit of caching.
Where they come apart, I think it’s common to say latency is more important than bandwidth. It certainly is to me, though if you’re on a metered connection, you could certainly view things differently.
I'm having difficulty determining how this impacts subdomains.
From what I can tell a.example.com, b.example.com, and example.com would all have their own caches, correct?
We have multiple (sub)domains a|b|c.xxx.example.com that share a template, and therefore resources (we're a .edu). If we're now looking at an initial load hit for all of them, that may impact how we've been setting up landing pages for campaigns.
I can't see us completely moving away from a CDN because of the other benefits they provide.
I wonder how the dust will eventually settle when these happy naive times of using shared caches for great performance gains are in the past, anywhere from CPUs (meltdown, spectre) to www. Will we decide that the extra cost of security is not worth it in all but a few critical applications? Or will we accept is as the necessary tax?
>Unfortunately, a shared cache enables a privacy leak. Summary of the simplest version:I want to know if you're a moderator on www.forum.example, I know that only pages under www.forum.example/moderators/private/ load www.forum.example/moderators/header.css.,When you visit my page I load www.forum.example/moderators/header.css and see if it came from cache.
You would expect fewer requests to www.forum.example/moderators/private/ than to, for example, www.forum.example/public. If you look at caching from the server load angle vis-à-vis security, then it could be inexpensive to not cache www.forum.example/moderators/header.css so you would simply not allow browsers to cache this resource.
If site A thinks that allowing the user's browser to cache a certain resource puts them at a security risk, then this resource should be treated as not-public.
In this specific case, I think only marginally worse. If you're a moderator, you use the resource every day, going through many entries. The slowdown on the first request is both unlikely (already cashed), and insignificant for the task. This is not a "extra 100ms will cost you a customer" situation.
I want Roboto font to load instantly for my first time visitors and knowing Roboto is cached leaks basically nothing. On a personal level, I don't want any site to be able to figure out what media sites I am reading based on which images are cached in my browser.
"allowcache" would cause developers to do something stupid like put it on all images, but "multisite-shared" may cause developers to make reasonable choices.
A change that makes small sites a bit slower to load things like font becomes another brick in the wall of walled gardens.
The problem is that the security risks aren’t obvious and that feature would instantly lead to a ton of posts, StackOverflow answers, etc. saying you need to put it on everything for performance reasons (no doubt billed as a huge SEO advantage). Then we’d learn that, say, a Roboto load wasn’t so harmless when someone used it to detect fonts or Unicode subsets used by, say, Uighur or Rohinga sites out that the combination of scripts, font variants, etc. was more unique than expected.
By default, all existing code would not have the allowcache property, so would load from an isolated cache. Those devs that explicitly care about speed for certain resources, where leaks are not a concern (e.g. loading a lib from a CDN) can set the allowcache property on those resources.
I think there are two categories of developers who would use that. One is smart, experienced people who have correctly evaluated both security and performance concerns and decided to turn this on for a specific narrow case where it's truly valuable to speed up first-time page loads.
The other is people who want things to go faster and flip a lot of switches that sound fast without really understanding what they do, and then not turning off the useless ones because they're not doing any real benchmarking or real-world performance profiling. This group will get little or no benefit but open up security holes.
Given the declining usefulness of shared caching (faster connections, cheaper storage, explosion of libraries and versions), I expect the second group to be one or two orders of magnitude larger than the first.
I agree with you, for now. But, I can imagine a future where library payloads will increase significantly. In those cases, shared caching will be pretty useful (I'm thinking along the lines of a ffmpeg WASM lib for web based video editing apps - sounds crazy, I know, but I think we're heading in that direction!). I could of course be totally wrong, and instead we just get fancier browser APIs with little need for WASM... I guess we wait and see!
I'd like to know what you think "where leaks are not a concern" might mean. As a web developer, I have no idea how I'd be able to know that, even if I were perfectly benevolent and competent. Loading resources from a CDN is exactly the sort of thing that a malicious website can use for a timing attack.
This sounds to me no different than a developer wanting to opt-out of memory protection, on the basis that it will be a little faster -- and my program doesn't have any bugs or viruses!
But that's a separate issue, no? The leak issue is all about someone knowing whether you've accessed a resource previously or not (i.e. checking to see if the resource comes for cache or not).
For a lib hosted on a CDN, who cares?! However, if someone wants to track if you've been to myservice.com, they could try and load myservice.com/logo.png - if it's from the cache, then bingo, you've been there. That's a leak.
Maybe I've misunderstood; could you explain your timing attack mechanism in more detail please?
It also made sense when everything was HTTP 1.1, but that’s on the way out too.
Browsers throttled the number of requests per domain because parallelism was expensive for the servers. Loading from another domain could happen simultaneously. If you had a fast internet connection you’d see a reduction in page load time. You’d also see that to a lesser extent if your connection was shared with others.
My thought exactly. JS devs have told me for years that the fact that their site uses >1Mb of JS script doesn't affect performance because almost all of it is already in the cache for 90% for users. Now there's a counter-argument and we'll get back to reasonable page sizes.
Broswer vendors are in a good position to make this call because they can use telemetry to measure the effectiveness of shared caching. Personally I doubt shared caching is as effective as it used to be. Surely that remaining effectiveness would be decimated by any attempt to implement an isolated-by-default policy that would require website authors to opt in. So all in all disabling the shared cache strikes me as a reasonable option.
Broswser vendors could choose to bundle some popular fonts and libraries but that comes with its' own set of problems.
Does the browser at least dedupe these files internally? For example, it goes through the motions of a real download and so forth but afterwards it just stores things in a content-addressable fs. Or will I now have 50 identical copies of React on my hard drive?
The performance benefits of using the shared CDN copy of resources versus hosting your own with HTTP Keep-Alive are vastly overstated. In-theory, if everyone were using the same version of the resources and everyone were using the same CDN, you'd see a benefit (maybe). In practice, there are too many variables and you end up cache missing most of the time, anyway.
Besides, this was only ever a concern for bad devs loading tons of tracking scripts and hacking together sites via copy-paste anyway. If you're really concerned about performance, you should be building, tree-shaking, and minifying all of your JS into a single file.
> As far as I’m aware, a TCP connection is still opened when using the cache, as well as TLS with all its overhead.
If the cache control headers says it expires in the future, the browser will not usually make any request, just load it from the disk. Hence a typical practice of setting a expiration date very far in the future, and just changing the URL when the resource is updated (thereby forcing the browser to request the new representation).
I wonder if a compromise could be caching things only if they are widely used across public sites. A browser vendor could use telemetry or crawling to aggregate information about commonly used resources across the web. The browser could cache these resources, even proactively. It's certainly more complex than the shared cache, but it could achieve an end that is broadly similar. Then again, maybe the vendors' telemetry it telling them that first site load is not that common and that the shared cache doesn't move the needle that much. This wouldn't be surprising to find out.
“ What does this mean for developers? The main thing is that there's no longer any advantage to trying to use the same URLs as other sites. You won't get performance benefits from using a canonical URL over hosting on your own site (unless they're on a CDN and you're not) and you have no reason to use the same version as everyone else (but staying current is still a good idea).”
From a security and privacy perspective there are already good reasons to self-host JS code and other external artifacts instead of sourcing them from CDNs. In some situations even without this change it's faster (if it's not already cached - because you can fetch it from the same host in the existing connection via http2).
So self-host those JS files, and also use fewer of them if possible.
This might be a crazy idea but... why is it that browsers haven't implemented something like Java Maven's package cache and proxy yet?
Basically the website says "I need com.google.angularjs:2.0.1" and browser grabs and caches the package for all future usages? It seems to work very well for Java... why hasn't there been any such initiative for the web?
Less effective, because browsers don't cache, recursive resolvers do, and they are often shared; and it may be harder to tell the difference between a cache hit and a cache miss in DNS (responses can be very fast).
Browsers could safely pull a list of very commonly requested, content addressable resources from various CDNs and pre cache it (independently of any request). That would even help with first request latency, and for mobile (where bandwidth is expensive) you could do the pre caching on Wifi.
On many web applications, static files like CSS/JS files and non-user-generated images are not served by the application server, but directly from the filesystem. This conserves CPU resources and might also improve network throughput because one application less is involved in the path.
CDNS, which usually are the use case for global caches, are also kind of critical when it comes to the GDPR and other privacy laws.
Having no global cache may kill of the usefullness of CDNs (which is somewhat doubtfull given the number of stuff available).
But you are not allowed to use them anyway unless the site is plastered with some allow-all-the-things-popup.
It's eminently pluggable if you stop running hostile general-purpose code on our own machines, giving it a large poorly-defined attack surface! That's the eventual answer here. Websites have a perfectly cromulent place to run whatever code they'd like - on their own servers. If you knew someone was trying to kill you, you wouldn't invite them into your home for a party so they could easily tamper with your medicine cabinet.
Consider the requestAnimationFrame API. It will give you a 60hz timer (even higher on high refresh rate displays) but is used for a ton of animation related tasks as well as games. That said it effectively can be used as a timer which in this case would likely be precise enough.
What do you do in the case where a ton of website's use this API for legitimate animations?
So, based on the response to Spectre and friends ("intel knowingly sacrificed security in the pursuit of performance and everyone should sue them") , is the correct response here "browser vendors knowingly sacrificed security in the pursuit of performance and everyone should sue them?"
It's not exactly the same as having bought a processor, and then having to give up a significant fraction of performance to have it be secure, while other processor manufacturers had much smaller performance penalties...