Let me clarify a couple things I've seen in a lot of the comments.
First of all, we do load balance by request, not connections and we do not use sticky sessions.
Secondly, we are aware that our application has underlying problems with large numbers of concurrent requests, which was exposed by using HTTP/2 and we are working on fixing them. The point of the post is not "HTTP/2 sucks because it broke our app". It is "moving to HTTP/2 isn't necessarily as easy as just flipping switch".
Titles aren't supposed to be comprehensive, and rhetoric in general is significantly less effective if you spend all your time carving out exceptions for differing opinions. Titles are supposed to get you to read the article and give you an idea of the topic.
If you read the article, it's clearly discussing the problems that they experienced. I don't see why they need to qualify everything they said with IMO, IMX, for us, in our case, etc. At no point in the article itself does the author assert that their experience will match yours. Quite the opposite; the author goes out of their way to describe their environment and why they were impacted in particular. At some point, the reader needs to assume that the author is only talking about what they're talking about -- their specific situation -- and not asserting some omnipotent will to obliterate your opinion or disagreement because they didn't add "(For Us)" to the title.
In other words, complaining that the title isn't explicitly subjective is probably the weakest criticism you could make because, at best, it's a criticism of a rhetorical style rather than any criticism of the substance of the piece. Not only is the reader more than capable of coming to that decision, not only can they be expected to do so if they actually read it, you're not actually disagreeing with anything the article says nor are you presenting any alternative opinion and you're certainly not providing any contrary evidence. You're disagreeing, but there's no substance to your disagreement. "I had to read the article to understand what it was actually about," is not really a criticism!
Rhetoric is not just writing persuasively. Rhetoric is about communicating effectively. Saying you don't need rhetoric because you're an engineer is like saying you don't care if people think you're telling the truth because you're an engineer. It's absurd.
It doesn't matter if you're writing an opinion piece, a political piece, or a technical piece, if you want to structure your writing in a way that can be easily read, understood, and followed, you want to obey the principles of rhetoric.
Deliberately withholding details from a title to create more engagement based on a false impression of the content doesn't equal good rhetoric. If anything it hinges on clickbait. Calling it effective communication is the absurdity here.
I'd like you to subject your theory to the most popular (i.e., most frequently cited) engineering writings in the past 30 years. How do they exemplify such rhetoric? And how does the reader benefit from whatever rhetoric you find in them?
> It didn’t need to be spiced up with a provocative title to be valuable.
Yes it did. HN title-fu is the same appeal to emotion as any political blog and is just as effective at surfacing stories, thanks to people just like you.
You came to this thread and made your noise for pedantic reasons and helped surface it higher so the correct audience could actually see it
I think it was clear enough - it didn't say "Why Turning On HTTP/2 Is A Mistake" or "Why HTTP/2 Was A Mistake," either which would have implied universal badness. The phrasing had me expecting a specific war story, which is what it ended up being.
Besides, I think it's quite reasonable to argue that it's a majority bad idea, even if it's not a universally bad idea. I think many people are probably in the same boat.
There's little difference in using "why turning on http/2 (is/was) a mistake". Is is stronger, but both imply that there is/was something wrong with it, to the point of writing an article about it.
A better title would be "why turning on http/2 was a mistake (for us)." The current one implies failure due to http/2 itself, and not architectural decisions that lead to making the upgrade to http/2 not as easy as imagined.
Without data, one cannot say either way. What we have before us is a single anecdotal tale. And even that tale, even if you believe it rises to the level of "data," doesn’t provide clear guidance to others, because the details of their stack and capacity are unspecified.
Your argument makes sense as a generic statement, but we don’t need a statistically significant sample here. The behaviour described is a logical consequence of how the network and application layers interact and will be seen by anyone with a similar setup, not a natural event that demands more data.
I’m not denying the probability that others similarly situated could experience similar problems. But without the data, there’s no way to know whether the author’s situation would be experienced by a majority of website operators, or even a significant minority.
Sure there is—it is my experience as a website operator and as someone familiar with the field of website operation that architectures like theirs are more common than architectures unlike theirs. Professional experience is a (highly compensated) form of gathering data.
And I’ve got 20 years of such experience that concludes their architecture is less important than the fact that they simply ran out of peak capacity; and that we do not know how many sites operate so near capacity for us to conclusively determine whether a majority of them are at risk.
Ideally, we’d be discussing the content, not the headlines. And ideally the headlines reflect the content, but history shows that is often not the case. Self-aggrandizement and sensational headlines are par for the course for HN nowadays.
Was a mistake, given common English usage, implies a mistake in the individual case - that is to say "Was a mistake" means "Was a mistake for us". On the other hand would be a mistake implies a mistake for at least the reader, and commonly is interpreted in the most general application available.
"Broke" is commonly used to describe an interruption in application service to end users.
If one enables HTTP/2 and production goes down, someone could quite rightly point out that "you performed action A, causing impact B, which broke the app". Determining in root cause analysis that impact B stemmed from underprovisioned peak demand compute resources in no way contradicts the usage of "broke".
That’s not usually the way the term is used internally in practice when you exhaust your capacity, in my experience. It’s more typically used when sites start crashing or returning invalid data due to bugs in the software.
In my experience (adding 25 years of anecdata to the pile) running out of capacity which leads to interruptions in service to users will quickly get you a lot of emails about things being "broke" from both users and internal teams.
Internally, no. When dealing with incident response, you would not just say "http2 broke it", but as a quick way to describe the issue, it's fair to say "http2 broke our application" as it prevented access to the service.
I'm familiar with your app, but only as a casual user. It's quite nice, btw.
I'm having some trouble picturing this. Can you add some numbers? Like, how many nodes is the load balancer spreading the load over, and how many simultaneous requests were you seeing from a browser?
Whether or not that's a fair characterization of their justification, that seems like a perfectly defensible justification to me. Most people out there are dealing with legacy code (as in "code I didn't write") and business logic they can't safely rewrite, and it's more important to deliver business value than to build an emotionally-satisfying architecture. And even so, why design for concurrency when the high-level task (loading a web page in a bounded amount of time) doesn't require it? Personally, having a system that scales very well that doesn't need to scale isn't even an emotionally-satisfying architecture.
This is so true and is why I use "pure" functions where possible. Make it super obvious that a function can be replaced or reused safely and your future counterpart will praise your name to the code Gods.
Designing software for concurrent requests, as in using poll/epoll/kqueue/IOCP, implementing worker pools, etc. is given, yes.
Designing infrastructure for concurrent requests is definitely not. I've worked on shared hosting systems with high concurrency requirements and it definitely was more complicated than just installing an Apache MPM—we had to think about balancing load across multiple servers, whether virtualizing bare-metal machines into multiple VMs was worthwhile (in our case it was for a very site-specific reason), how many workers to run on each VM, how much memory we should expect to use for OS caching vs. application memory, how to trade off concurrent access to the same page vs. concurrent access to different pages vs. cold start of entirely new pages, whether dependencies like auth backends or SQL databases could handle concurrency and how much we needed to cache those, etc. At the end of the day you have a finite number of CPUs, a finite network pipe, and a finite amount of RAM. You can throw more money at many of these problems (although often not a SQL database) but you generally have a finite amount of money too.
I would be surprised if most people had the infrastructure to handle significantly increased concurrency, even at the same throughput, as their current load. It's not a sensible thing to invest infrastructure budget into, most of the time.
(You can, of course, solve this by developing software to actively limit concurrency. That's not a given for exactly the reasons that developing for concurrency is a given, and it sounds like Lucidchart didn't have that software and determined that switching back to HTTP/1.1 was as good as writing that software.)
Sure, but in this article the problem was lack of software concurrency. This article really does boil down to "HTTP/2 exposed a fundamental flaw in our software".
Maybe I misread? My takeaway was that their frontend web server handled concurrency just fine and was happy to dispatch requests to the backend in parallel, but the backend couldn't keep up and the frontend returned timeouts. That's exactly what you get if you put a bunch of multithreaded web servers in front of a single-writer SQL database that needs to be hit on every request.
Yes, most such cases should be rearchitected to not go through a single choke point. But my claim is that this isn't automatic merely by developing for the web, and going through a CP database system is a pretty standard choice for good reason.
I have operated services where each web server had a single thread, because that provided a better user experience than more "optimal" configurations. It had certain bizarre scaling implications for single-core performance and I wouldn't have designed it that way in today's era, but there are times when this makes sense. (For example, when the web server only serves authenticated sessions, and each session requires a bound mainframe connection for requests, and parallelism is not only impermissible to the mainframe but would exceed the capacity available.)
I feel like the description we should be using for that class of machine is "former mainframe". (I'm sort of joking)
In seriousness, though, I'm both curious and a little bit skeptical of what user experience benefit that architecture would give over a server-side request queue and a single worker against the queue. That would allow you to pay the cost of networking for the next request while the mainframe is working. You could even separate the submission of jobs from collecting the result so that a disconnected client could resume waiting for a response. Anyway, I'm not saying you needed all that to have a well-functioning system, I'm just not convinced that a single threaded architecture is ever actually good for the user unless it gives a marked reduction in overhead.
Queues are operational complexity. Given the (worst-case-ish) choice between "architecture without a queue that sometimes has HTTP-level timeouts" and "architecture with a queue that reliably renders a spinner and sometimes has human-task-level timeouts," I'd probably favor the former unless management etc. really want the spinner and I'm confident we have tooling to figure out why requests are getting stuck in the queue. Without that tooling, debugging the single-threaded architecture is much easier.
Sure! But that is trading off user experience for technical simplicity (which you do often have to do at some point). However: the argument was that this system was better for user experience than a design that could accept requests in parallel, which is what I'm resisting/not yet understanding. In reality, I'm sure that the system was fine for the use cases they had, which is what I meant to admit with "I'm not saying you needed all that". I will say that the single threaded no-queue design already carries a big risk of request A blocking request B.
My argument that this helps user experience is that, when a failure does happen, it's a lot easier to figure out why, tell the user that experienced it what happened and get them unblocked, and fix it for future users in a simpler system than a more complex one. The intended case is that failures should not happen, so if you're in the case where you expect your mainframe to process requests well within the TCP/HTTP timeouts and you can do something client-side to make the user expect more than a couple hundred ms of latency (e.g., use JS to pop up a "Please wait," or better yet, drive the API call from an XHR instead of a top-level navigation and then do an entirely client-side spinner), you may as well not introduce more places where things could fail.
If you do expect the time to process requests to be multiple minutes in some cases, then you absolutely need a queue and some API for polling a request object to see if it's done yet. If you think that a request time over 30 seconds (including waiting for previous requests in flight) is a sign that something is broken, IMO user experience is improved if you spend engineering effort on making those things more resilient than building more distributed components that could themselves fail or at least make it harder to figure out where things broke.
I have a lot of sympathy for their situation. I can imagine humming along just fine at 5 concurrent requests over 10 seconds, having busy days where you hit 10/10
But to suddenly hit 50 for 1 second, then nothing for 9 seconds, well, that’s a tough spot to be in.
There must be some hard to find sequencing happening there, that they were not really exposed to before.
The reverse is also quite stellar, because in some scenarios if you can lower peak burst demand per node from N to (0.1 x N) you can often reduce allocated capacity by a factor greater than 0.1. (This is more likely the case when N exceeds SOMAXCONN, for example.)
Author here. Our application can handle concurrent requests just fine. The problem was actually that our application was partly that our application was trying to handle too many requests in parallel instead of queueing them, and partly that later requests were timing out because our load balancers were configured to expect clients to make a request, wait for the response then send the next request, not send all the requests, then wait for all the responses (which means the last response will take longer to complete from when the request was first sent).
This misconstrues the comment it replies to. The post author's comment says that their application was tuned for a certain level of concurrency, and that when the level of concurrency to the load balancers increased due to the HTTP/2 change, their load balancers increased the level of concurrency to the backend, causing issues.
This is an extremely common issue with Apache configurations, which often default to accepting hundreds of simultaneous requests without regard for memory capacity. If peak demand causes enough workers to spin up that Apache starts swapping, the entire server effectively goes offline.
Depending on the specific characteristics of the application, this could occur when load increases from 50 concurrent requests to 51 concurrent requests, or from 200 to 201, or from any integer A to B where A was fine but B causes the server to become unresponsive.
Saying that their A is 1 seems unnecessarily dour, given how common this problem has been over the past couple decades due to Apache's defaults alone.
The problem, of course, is that HTTP2 is behaves like having infinite connections, so the "more threads" on the server are almost always detrimental to performance.
Less is more is the mantra I have unsuccessfully tried to drill. If your API (assuming a basic rest like service) is running at 100% cpu utilization, you've likely over provisioned it.
Possibly, just depends on what you are doing and where you are going from/to.
For example, 200 threads with 200 connections to a single service is insane and likely causing you to be slow already. Increasing that will negatively impact performance.
I'm guessing that the distinction being drawn has to do with whether the application can handle concurrent requests for the same resource from different sessions on different computers (yes?), vs concurrent requests for every resource at once from a single session (no?, even though this is a much smaller number of requests).
I don't want to get into it but I've dealt with issues like this and I don't think you are terrible or bad at your job. I'm seeing a lot of shade going your way. Sometimes a particular configuration works well until something changes. Just wanted to offer my two cents.
I think people are reacting badly because the title can create the impression that this is HTTP/2’s fault.
The actual post seems perfectly reasonable though (essentially “you might think you can just turn on HTTP/2 as a drop in on your load balancer a but if your server code hasn’t been written to rapidly handle the quick bursts of requests that enable HTTP/2 to provide faster overall loads to the client then this can cause issues; you should test first and make sure your server systems are able to handle HTTP/2 request patterns.)
Somewhat understandable. I didn't get hung up on the title, and if anything the story is an object lesson in the need to familiarize yourself with the intricacies of inbound changes to your stack.
I appreciate when people share war stories; I like to think that wisdom is knowledge survived.
Most places I've worked, the timeouts have been calibrated for people on 1990s dial-up, and the 99th-percentile response time targets have been 2 orders of magnitude less.
I'm curious what platform your app is on, if you are willing to share?
A typical non-tuned Rails deployment, for instance, is gonna have queueing built in, with really not as much concurrency as one would want (enough to actually fully utilize the host; the opposite problem). So I'm guessing you aren't on Rails. :)
Curious what you are on, if you're willing to share, for how it effects concurrency defaults and options and affordances.
(I know full well that properly configuring/tuning this kind of concurrency is not trivial or one-size-fits all. And I am not at all surprised that http/2 changed the parameters disastrously, and appreciate your warning to pay attention to it. I think those who are thinking "it shouldn't matter" are under-experienced or misinformed.)
> I'm curious what platform your app is on, if you are willing to share?
Sure. We use the Scala Play framework (https://www.playframework.com/). And it does have some queuing built into, but we have tweaked it to meet certain application needs.
I assume your application is using a dedicated thread for each request?
Even then you would be handling more requests in parallel than the number of cores you have, but your concurrency would be limited by the cost of context switching and your memory capacity (having to allocate a sizable stack for each thread in most threading implementations).
Queueing is usually required for a stable multi-threaded server, but if you were doing async I/O you wouldn't need it. The extra memory overhead for each extra concurrent request (by means of lightweight coroutine stacks, callbacks or state machines) is not much different from the size it would take on the queue, and there is no preemptive context switching.
In most cases, you'll get the same behavior as having a queue here. Cooperative task-switching happens only on async I/O boundaries, so if you're processing a request that requires some CPU-heavy work, your application would just hog a core until it completes the request and then move to the next one.
Sizeable stacks don’t eat into memory unless your thread actually utilizes the full stack allocation. Otherwise, the memory is available for other uses, so you can spin up more threads than you’d think.
Do you think your local pizza joint could handle every order that they normally get between 4pm and 6pm if they got them all between 5pm and 5:30pm? At the same level of service, with no late orders, and with no additional equipment or labor?
they are not getting them at 5 but at 4. they should be able to handle all orders in the next two hours.
the problem is not making the pizzas in time but trying to get all pizzas started at once when there is not enough table space to even roll out that much dough, and then trying to squeeze all the pizzas into the oven at once, whereby several of them got messed up.
At full service resturants, it is the maître d′s duty to control the pace of orders so they arrive in a steady stream at the kitchen, instead of batches of 20 tickets at once that could easily overwhelm the chefs.
The logic here is not dissimilar at all: if the backend has no ability to queue and prioritise the requests, then the same function needs to be done elsewhere to safeguard quality of service.
This is more like if a pizza joint that could seat 10 people at once moved into a new location that could seat 100 people but still kept the same wait and kitchen staff and is suddenly surprised that wait times have increased.
Well, with every pizza joint I’ve ordered from I can order a bunch of pizzas in just one phone call instead of having to make a separate phone call in serial for each pizza. And I certainly don’t have to wait for each pizza to be delivered before ordering another.
That's an unfair assessment. HTTP/2 fundamentally changes how requests are handled. With HTTP/1.1 there is a defacto connection pool inside the browser and this throttling has been a feature of front-end development for 15+ years (from when ajax became a thing) so this wasn't something on anybody's mind. HTTP/2 all of sudden removes this constraints and for lucidchart, it led to a number of unintended consequences. This is an important consideration because the mantra has been that HTTP/2 can simply be turned on and everything will simply work as before.
> With HTTP/1.1 there is a defacto connection pool inside the browser and this throttling has been a feature of front-end development for 15+ years
This is only true when you look at a single client. If you look at a larger number of clients accessing the service at the same time, you would expect similar numbers of concurrent requests on HTTP/2 as on HTTP/1.1. Clients send larger numbers of requests at the same time, but they are done sending requests earlier so there are requests from fewer clients being processed concurrently. It should average out.
If you have, say, a 1000 clients accessing your service in one minute, I doubt the number of requests/second would be very different between both protocol versions. It would only be an issue if the service was built with a small number of concurrent users in mind.
You may be forgetting that load balancers have been working on a per request basis, and that no two requests are the same cost (despite what load balancer companies would have you believe).
Under HTTP/1.1 requests may have been hitting the LB and then being scattered across a dozen machines. Each of those machines was in a position to respond on their own time scale. Some requests would get back quickly, others slowly, but still actively being handled.
Under HTTP/2 with multiplexing, if the LB isn't set up to handle it (and they often aren't) they can be hitting the LB and _all_ ending up on a single machine, which is trying to process them while some of those requests might be requiring more significant processor resources, dragging the response rate for all the requests down simultaneously.
But it didn't, unless you're saying that Lucidchart made an incorrect analysis. Is that your argument?
>Clients send larger numbers of requests at the same time, but they are done sending requests earlier so there are requests from fewer clients being processed concurrently. It should average out.
Again, it didn't average out. And you assume it 'will average out' at your peril. Maybe it will, maybe it won't. Lucidchart engineers thought that too and it turns out that was wrong in a way that wasn't foreseen.
>It would only be an issue if the service was built with a small number of concurrent users in mind.
I doubt Lucidchart 'was built with a small number of concurrent users in mind'.
It literally says “we are aware that our application has underlying problems with large numbers of concurrent requests”. How much clearer than that so you want it ?
If I'm reading this article correctly, they're claiming their application couldn't handle the load of a single-user loading their webpage. They didn't talk about load spikes during certain times, so it certainly sounds like they just have an inadequate backend.
> all existing applications can be delivered without modification....
> The only observable differences will be improved performance and availability of new capabilities...
Lucidcharts may have an inadequate backend, but it wasn't a problem until they moved to HTTP/2, so those statements weren't true for them. For anyone else rolling out HTTP/2, that is worth bearing in mind.
There's also no indication on what kinds of requests were timing out, nor if it was possible to send fewer requests, or minimize static assets (if those were the problem).
Really, this is an issue in the library/server: the library/server needs to expose HTTP/2's controls on maximum permitted streams.
> And secondly, because with HTTP/2, the requests were all sent together—instead of staggered like they were with HTTP/1.1—so their start times were closer together, which meant they were all likely to time out.
No, browsers can pipeline requests (send the requests back-to-back, without first waiting for a response) in HTTP/1.1. The server has to send the responses in order, but it doesn't have to process them in that order if it is willing to buffer the later responses in the case of head-of-line blocking.
Honestly, over the long run, this is a feature, not a bug. The server and client can make better use of resources by not having a trivial CSS or JS request waiting on a request that's blocked on a slow DB call. Yes, you shouldn't overload your own server, but that's a matter of not trying to process a large flood all simultaneously. (Or, IDK, maybe do, and just let the OS scheduler deal with it.)
Also, if you don't want a ton of requests… don't have a gajillion CSS/JS/webfont for privacy devouring ad networks? It takes 99 requests and 3.1 MB (before decompression) to load lucidchart.com.
> If you do queue requests, you should be careful not to process requests after the client has timed out waiting for a response
This is a real problem, but I've suffered through that plenty with synchronous HTTP/1.1 servers; a thread blocks, but it's still got other requests buffered, sometimes from that connection, sometimes from others. Good async frameworks can handle these better, but they typically require some form of cancellation, and my understanding is that that's notably absent from JavaScript & Go's async primitives.
> No, browsers can pipeline requests (send the requests back-to-back, without first waiting for a response) in HTTP/1.1. The server has to send the responses in order, but it doesn't have to process them in that order if it is willing to buffer the later responses in the case of head-of-line blocking.
Browsers can pipeline requests on http/1.1, but I don't think any of them actually do in today's world, at least that's what MDN says. [1] And from my recollection, very few browsers did pipelining prior to http/2 either -- the chances of running into something broken were much too high.
When we first tried to enable HTTP/2 on our load balancers a few years ago, we ended up breaking several applications built on (iirc) gunicorn. We eventually determined the root cause to be:
1) The browser was sending a "streaming data follows" header flag followed by a 0-byte DATA packet in the HTTP/2 stream to work around an ancient SPDY/3 bug.
2) The load balancer was responding to the HTTP/2 "streaming data follows" header packet by activating pipelining to the HTTP/1.1 backend.
3) The backend was terminating the HTTP/1.1 connection from the load balancer with a pipelining-unsupported error.
The browser removed the workaround, the load balancer vendor removed the HTTP/2 frontend's ability to activate HTTP/1.1 pipelining, and after a few months we were able to proceed.
Diagnosing this took weeks of wireshark, source code browsing, and experimental testing. We were lucky that it broke so obviously that the proximity to enabling HTTP/2 was obvious.
If you can recollect more details, I would love to know what happeend, but I'm not sure about 3) I'm not aware of a pipelining-unsupported error in http (it is a thing in SMTP). It would take a very special HTTP server to look for another request in the socketbuffer after the current one and respond with failure.
I looked it up and I remembered incorrectly: the bug was due to the load balancer activating chunked transfer encoding to the backend nodes due to receiving the described HTTP2 request. It did not involve pipelining.
cURL recently removed its 1.1 pipelining support, it was rarely used and pretty broken in practice because few clients had been using it: https://news.ycombinator.com/item?id=19595375
Firefox supported pipelining, but as far as I remember, that setting was always disabled by default, and you had to manually enable that through about:config. It was a very common performance tip, and there were even some extensions[1] that enabled that for you, but the usage was still limited to only a small group of power users.
> Good async frameworks can handle these better, but they typically require some form of cancellation, and my understanding is that that's notably absent from JavaScript & Go's async primitives.
Go added cancellation support to the standard library at 1.7. I don't like its coupling with contexts, but the implementation is solid and supported throughout most blocking operations, so this statement is patently untrue for Go.
JavaScript really doesn't have a standard way for doing cancellation, which is a shame.
How many requests does your page make on initial load (that can't be handled by a CDN)? If you're making more than six XHRs to your application servers concurrently, this sounds like a problem that would have existed anyway had it not been for the browser's (rather arbitrary) connection limit.
It's also curious to me that the load balancer doesn't smooth this out. If you have ten application servers and a client makes ten requests over a single HTTP/2 connection, I'd expect each server to respond to one request each. The details are a little fuzzy, but it sounds like the load balancer is only distributing connections, not requests. That seems wrong.
High CPU load should be fine, really, if your application servers are processing requests. If the load is unbalanced, then by definition you need a load balancer to balance the load. If you have one and the load is unbalanced, something is misconfigured.
> How many requests does your page make on initial load (that can't be handled by a CDN)? If you're making more than six XHRs to your application servers concurrently, this sounds like a problem that would have existed anyway had it not been for the browser's (rather arbitrary) connection limit.
I don't know the exact number, but definitely higher than six. And I certainly agree that is a failing in our application. The point is that the browserd _did_ arbitrarily limit connections, and our application (unknowingly) depended on that.
I've seen the same kind of thing happen when people switched from one load balancer to another: people were unaware they were dependent on a particular kind of queueing or rate limiting to protect their backend services, and they got hit hard when their new load balancer did not protect them in the same way.
Concurrency/rate limiting/queuing for HTTP apps, is, I agree, certainly not a trivial thing. You want to be maximizing utilization of your available host resources, while minimizing latency even under unexpected loads (for both median and upper percentiles). Dealing mostly with CPU resource limits, but other issues can be IO contention or contention for limited shared resources like an rdbms, while also not maxing out your RAM.
As with anything involving concurrency and hard-to-predict-exactly usage patterns, it can easily get complicated.
This stuff ain't easy. Anyone who thinks this would only happen to an unusually "wrong" app, I think, hasn't looked into it seriously. This post was good information, I think it's unfortunate that so much of the discussion seems to be trying to shame the author (making it less likely people in the future will generously share such post-incident information!).
It can also be affected a lot by what platform you are on, the language/framework and server architecture(s). They each have different request-handling concurrency patterns, built-in features, and affordances and standard practices. Node is gonna be really different from Rails. I am curious what platforms were involved here.
I think the reactions are somewhat justified because HTTP/2 was presented as the problem -- or at least it seemed obvious to read the OP in that way.
If the OP had been framed as a cautionary tale about how the devs did not realize their traffic patterns were throttling their requests, the reactions probably would have been more positive.
Turning on HTTP/2 led to a problem for them. That's what they said, that's true, and it's a good warning for others, I don't think it will be at all rare for others to have similar experiences, if they have a high volume. You can't necessarily just turn on HTTP/2 without paying attention to how it will effect your performance characteristics (which you may never have paid much attention to before). The nature of the potential problems that can arise with concurrency/routing/queueing can make them not that obvious to diagnose/debug. Your stack may have been tuned (by you, or by the open source authors/community that established the defaults and best practices for whatever you are using) for pre-HTTP/2 usage patterns.
This is useful notice, and post-mortem. Because I agree some discussion around HTTP/2 seems to have the assumption that it will be basically a transparent freebie.
Some people just like to feel superior. shrug. I was hoping for more interesting discussion about HTTP request concurrency and queueing from those who had been in the trenches, which is what you get from HN technical posts at their best. Instead of a reddit-style battle over who was wrong and who is too smart to make that mistake, which is what you get from HN technical posts at their worst. :)
It seems odd to pin the blame on HTTP/2, no? Nobody made the intentional decision to lean on the browser's artificial connection limit. It sounds like nobody also made the intentional decision to make lots of XHRs on page load, or configure the load balancer to work the way that it's working, etc. Had you been using HTTP/2 all along and made a change to your load balancer (that resulted in its current behavior), your blog post would likely not have blamed HTTP/2 but rather haproxy/ELB/etc.
> Had you been using HTTP/2 all along and made a change to your load balancer (that resulted in its current behavior), your blog post would likely not have blamed HTTP/2 but rather haproxy/ELB/etc.
I'm not exactly blaming HTTP/2, just saying the claim that switching to HTTP/2 is easy, safe, and only brings benefits is false.
> I'm not exactly blaming HTTP/2, just saying the claim that switching to HTTP/2 is easy, safe, and only brings benefits is false.
Eh. Technically anything could cause problems. I don't think you'll find much in the way of claims that swapping out subsystems could only bring benefits.
Google certainly leans pretty hard in that direction.[0] Any and all changes may have unintended side effects, but this particular combination didn't seem obvious to me except in retrospect.
Ah, the good old "unrealised infrastructure dependency" - nice to see you my old friend. People that have never been bitten by one of these never built anything worth talking about :)
It's worth observing that Gatling (load testing tool) supports HTTP/2.
Once you've got the hang of it, you can fairly easily build load profiles to simulate situations like these. Probably wouldn't have helped you prevent the situation - unknown-unknowns being what they are; but you might find it helpful during remediation.
> The details are a little fuzzy, but it sounds like the load balancer is only distributing connections, not requests. That seems wrong.
They may be using sticky sessions or affinity in some regard, having the load balancer hold each client connection intentionally to a server. It's not necessarily wrong, entirely depends on what you need to accomplish.
If they had been using sticky sessions, this should have been a problem before, but to a lesser extent. The same server would have needed to process all of the requests for a single client. You'd still have spikey metrics.
This might not be such a problem with one client artificially limited to a single application server. But in practice, it means that individual servers will be overloaded when they are chosen to handle multiple clients concurrently (while other servers are idle).
> It's also curious to me that the load balancer doesn't smooth this out. If you have ten application servers and a client makes ten requests over a single HTTP/2 connection, I'd expect each server to respond to one request each.
A lot of people set up their load balancers with session pinning (i.e. always choose the same backend based on the session id). This can improve things like cache performance.
Not sure if this is the case here, but it sounds like it.
This is a neat little writeup. Although the issues were not fundamental and relatively easy to spot and fix, it's valuable input especially since http/2 advocates seem to insist that you just need to put your webapp behind a http/2 capable proxy and you won't even notice a difference. We didn't enable it yet on our servers and now there's definitely something to test first before we roll out.
This is why Envoy exists. It will take HTTP/2 requests from the user and shard the actual requests out for backends to handle. It appears that what happened to the author is that their web server only balanced TCP connections, which indeed no longer works.
I see. So it sounds like the issue was one of timing, where a bunch of converted HTTP/1.1 requests all arrived at your application at the exact same instant.
Did the ALB open a new TCP connection for each request, or does it use a pool of connections?
The underlying lesson that I have learned (the hard way) repeatedly: anything that may change the traffic pattern can result in difficult to predict infrastructure issues. This can be as related as a changing protocol (as shown here) or a seemingly unrelated like a UI change.
We experienced issues when we enabled H2 on our HAProxy 1.8 reverse proxies into our K8s cluster. Didn't anticipate the increased memory consumption and we ran into a few memory-related defects with older versions of HAProxy that were fixed with more recent versions. We'll re-enable it at some point, but we've upgraded our reverse proxies in anticipation of it.
There is a setting in HTTP/2 called SETTINGS_MAX_CONCURRENT_STREAMS, if set to 1 it works like HTTP/1.1, with no multiplexing. Setting it to 4~8 would make it behave in a similar way a browser actually does with HTTP/1 (creating multiple connections in parallel).
While it is correct to say that it is a client setting, it is also a server setting. The HTTP/2 specification uses the word "peer" since SETTINGS_MAX_CONCURRENT_STREAMS (0x3) is negotiated in both directions as part of the client/server handshakes.
Surely your application/serving process is able to handle the request burst coming from any single user ? (If not, you have a bigger problem to solver first).
If so, I don't quite see why queueing is discussed as an option at all. Queueing means extra latency and worse user experience (not to mention DoS potential).
What you should be discussing instead is how to (auto-)scale your app and infrastructure to handle your users' requests.
It’s hard to tell because there isn’t anything concrete provided.
That said, if this is a traditional web app, this smells to me of a poorly designed application. It sounds like they’re doing on the fly compilation of static assets or something crazy like that, and in any event need to reduce the total number of requests per page or resource and look for opportunities to make things static or cached?
> Sometimes "thundering herd" is a feature, not a bug.
To expand: HTTP/1.1 naturally caused the latency of the Internet to pace requests. A sort of implicit intrinsic rate limiting. HTTP/2 intentionally avoids that "problem" by batching/pipelining requests.
Great article on approaching massive technical change. Honestly, I think a lot of people general think that most things are just a "switch flip". Even something has implementing SSL on internal apps can cause a big change, let alone the underlying protocol for managing your requests.
Thanks for this, because honestly I hadn't thought about the implications myself and it'd be good not to accidentally walk into this problem.
It's actually a somewhat difficult thing to predict spikes.
Say you have n clients, d timeslices. When does the probability exceed 0.5 that you get more than k requests concurrently?
Unfortunately the solution is exponential in k.
People might recognise the birthday problem here; for d=365, k=2 (days a year, single share) the well known answer is 23.
Wikipedia gives a formula for a rough approximation for n for p=0.5 and k < 20.
Isn’t it happening because the load balancer does not distribute the http requests evenly? We used to use advanced loadbalancers that took the actual http req from a client and used fix number or tcp connections to backend and distributed the http requests through those. Maybe http/2 does not allow this style of load balancing?
The article is short on detail but it sounds to me like they are balancing their traffic by connection instead of by request. Either nginx or haproxy should be able to spread those multiplexed requests across a number of servers and give more the desired backend behavior.
Do you terminate http/2 at the load balancer and convert it to http1.1? Or do you support http/2 all the way to the end service?
I would imagine the former would solve these issues.
If your service can't handle well a bunch of requests coming at once, it doesn't matter if it gets them as individual HTTP/1.1 requests or as multiplexed requests in HTTP/2 coming directly. It only makes a difference if the bottleneck is HTTP/1.1 parsing logic.
Not sure what the authors application is, but we run a dozen servers behind http/2 load balanced and dozens of sites and haven't seen anything similar to what he is describing.
Our application is Lucidchart (www.lucidchart.com). It is a very sophisticated web-app with significant amounts of dynamic data,running on hundreds of servers. I would imagine applications with less dynamic data and requests that require substantial amount of compute wouldn't run into this problem.
What's wrong with that? Lucidchart is a fully-featured application that runs in a web browser with a cloud-backend. I don't understand what you're trying to argue. That you have to optimize for request count? Why?
There's an argument to be made that this [0] (generated via [1]) is not something to be celebrated. But that's just me, not necessarily what the parent poster had in mind.
This is what's known as 'moving the goalpost'. OP never mentioned that their issues with lucidchart were due to them using ad trackers and analytics libraries. They talked about how terrible it was that an application made a ton of requests. If they have an issue with the former, I take their point.
I was hoping to experience this as a lucidchart visualisation of "sweaty spikes" and "work spreaders" because of too many "Internet tubes". Oh well. Another time :D
The maturity of HTTP/2 is not a causative factor here. They removed a previously-unaware limit on the number of concurrent backend requests, which overflowed their backends. They could have experienced this same outage by simply removing that limit without enabling HTTP/2, and then hitting a peak demand period that was sufficient to cause the outage. Yes, HTTP/2 changes traffic patterns, but the issue could easily have occurred with HTTP/1 as well.
Let me clarify a couple things I've seen in a lot of the comments.
First of all, we do load balance by request, not connections and we do not use sticky sessions.
Secondly, we are aware that our application has underlying problems with large numbers of concurrent requests, which was exposed by using HTTP/2 and we are working on fixing them. The point of the post is not "HTTP/2 sucks because it broke our app". It is "moving to HTTP/2 isn't necessarily as easy as just flipping switch".
If you read the article, it's clearly discussing the problems that they experienced. I don't see why they need to qualify everything they said with IMO, IMX, for us, in our case, etc. At no point in the article itself does the author assert that their experience will match yours. Quite the opposite; the author goes out of their way to describe their environment and why they were impacted in particular. At some point, the reader needs to assume that the author is only talking about what they're talking about -- their specific situation -- and not asserting some omnipotent will to obliterate your opinion or disagreement because they didn't add "(For Us)" to the title.
In other words, complaining that the title isn't explicitly subjective is probably the weakest criticism you could make because, at best, it's a criticism of a rhetorical style rather than any criticism of the substance of the piece. Not only is the reader more than capable of coming to that decision, not only can they be expected to do so if they actually read it, you're not actually disagreeing with anything the article says nor are you presenting any alternative opinion and you're certainly not providing any contrary evidence. You're disagreeing, but there's no substance to your disagreement. "I had to read the article to understand what it was actually about," is not really a criticism!
HN is aimed at an engineering audience, not a political one. I, and others like me (at least used to) come here for enlightenment, not rhetoric.
Moreover, the article itself is not argumentative. It didn’t need to be spiced up with a provocative title to be valuable.
It doesn't matter if you're writing an opinion piece, a political piece, or a technical piece, if you want to structure your writing in a way that can be easily read, understood, and followed, you want to obey the principles of rhetoric.
i.e. strict pedants with no comprehension of/tolerance for ambiguity or emotion?
Yes it did. HN title-fu is the same appeal to emotion as any political blog and is just as effective at surfacing stories, thanks to people just like you.
You came to this thread and made your noise for pedantic reasons and helped surface it higher so the correct audience could actually see it
Besides, I think it's quite reasonable to argue that it's a majority bad idea, even if it's not a universally bad idea. I think many people are probably in the same boat.
A better title would be "why turning on http/2 was a mistake (for us)." The current one implies failure due to http/2 itself, and not architectural decisions that lead to making the upgrade to http/2 not as easy as imagined.
Who’s right?
“Turning on HTTP/2 increased request burstiness, breaking our application”
If one enables HTTP/2 and production goes down, someone could quite rightly point out that "you performed action A, causing impact B, which broke the app". Determining in root cause analysis that impact B stemmed from underprovisioned peak demand compute resources in no way contradicts the usage of "broke".
I'm having some trouble picturing this. Can you add some numbers? Like, how many nodes is the load balancer spreading the load over, and how many simultaneous requests were you seeing from a browser?
Designing infrastructure for concurrent requests is definitely not. I've worked on shared hosting systems with high concurrency requirements and it definitely was more complicated than just installing an Apache MPM—we had to think about balancing load across multiple servers, whether virtualizing bare-metal machines into multiple VMs was worthwhile (in our case it was for a very site-specific reason), how many workers to run on each VM, how much memory we should expect to use for OS caching vs. application memory, how to trade off concurrent access to the same page vs. concurrent access to different pages vs. cold start of entirely new pages, whether dependencies like auth backends or SQL databases could handle concurrency and how much we needed to cache those, etc. At the end of the day you have a finite number of CPUs, a finite network pipe, and a finite amount of RAM. You can throw more money at many of these problems (although often not a SQL database) but you generally have a finite amount of money too.
I would be surprised if most people had the infrastructure to handle significantly increased concurrency, even at the same throughput, as their current load. It's not a sensible thing to invest infrastructure budget into, most of the time.
(You can, of course, solve this by developing software to actively limit concurrency. That's not a given for exactly the reasons that developing for concurrency is a given, and it sounds like Lucidchart didn't have that software and determined that switching back to HTTP/1.1 was as good as writing that software.)
Yes, most such cases should be rearchitected to not go through a single choke point. But my claim is that this isn't automatic merely by developing for the web, and going through a CP database system is a pretty standard choice for good reason.
In seriousness, though, I'm both curious and a little bit skeptical of what user experience benefit that architecture would give over a server-side request queue and a single worker against the queue. That would allow you to pay the cost of networking for the next request while the mainframe is working. You could even separate the submission of jobs from collecting the result so that a disconnected client could resume waiting for a response. Anyway, I'm not saying you needed all that to have a well-functioning system, I'm just not convinced that a single threaded architecture is ever actually good for the user unless it gives a marked reduction in overhead.
If you do expect the time to process requests to be multiple minutes in some cases, then you absolutely need a queue and some API for polling a request object to see if it's done yet. If you think that a request time over 30 seconds (including waiting for previous requests in flight) is a sign that something is broken, IMO user experience is improved if you spend engineering effort on making those things more resilient than building more distributed components that could themselves fail or at least make it harder to figure out where things broke.
Migrating to HTTP/2 delivers business value.
Responsive servers are more satisfying to customers than slow servers. That has business value.
Improving tangible metrics is not done to satisfy your emotions, it's done because of engineering rigor.
But to suddenly hit 50 for 1 second, then nothing for 9 seconds, well, that’s a tough spot to be in.
There must be some hard to find sequencing happening there, that they were not really exposed to before.
This is an extremely common issue with Apache configurations, which often default to accepting hundreds of simultaneous requests without regard for memory capacity. If peak demand causes enough workers to spin up that Apache starts swapping, the entire server effectively goes offline.
Depending on the specific characteristics of the application, this could occur when load increases from 50 concurrent requests to 51 concurrent requests, or from 200 to 201, or from any integer A to B where A was fine but B causes the server to become unresponsive.
Saying that their A is 1 seems unnecessarily dour, given how common this problem has been over the past couple decades due to Apache's defaults alone.
"API server not fast enough, how do we fix it?"
"More threads! More connections!"
The problem, of course, is that HTTP2 is behaves like having infinite connections, so the "more threads" on the server are almost always detrimental to performance.
Less is more is the mantra I have unsuccessfully tried to drill. If your API (assuming a basic rest like service) is running at 100% cpu utilization, you've likely over provisioned it.
For example, 200 threads with 200 connections to a single service is insane and likely causing you to be slow already. Increasing that will negatively impact performance.
Going from 16 -> 32? That's more reasonable.
The actual post seems perfectly reasonable though (essentially “you might think you can just turn on HTTP/2 as a drop in on your load balancer a but if your server code hasn’t been written to rapidly handle the quick bursts of requests that enable HTTP/2 to provide faster overall loads to the client then this can cause issues; you should test first and make sure your server systems are able to handle HTTP/2 request patterns.)
I appreciate when people share war stories; I like to think that wisdom is knowledge survived.
Presumably that's not the case for you?
A typical non-tuned Rails deployment, for instance, is gonna have queueing built in, with really not as much concurrency as one would want (enough to actually fully utilize the host; the opposite problem). So I'm guessing you aren't on Rails. :)
Curious what you are on, if you're willing to share, for how it effects concurrency defaults and options and affordances.
(I know full well that properly configuring/tuning this kind of concurrency is not trivial or one-size-fits all. And I am not at all surprised that http/2 changed the parameters disastrously, and appreciate your warning to pay attention to it. I think those who are thinking "it shouldn't matter" are under-experienced or misinformed.)
Sure. We use the Scala Play framework (https://www.playframework.com/). And it does have some queuing built into, but we have tweaked it to meet certain application needs.
Even then you would be handling more requests in parallel than the number of cores you have, but your concurrency would be limited by the cost of context switching and your memory capacity (having to allocate a sizable stack for each thread in most threading implementations).
Queueing is usually required for a stable multi-threaded server, but if you were doing async I/O you wouldn't need it. The extra memory overhead for each extra concurrent request (by means of lightweight coroutine stacks, callbacks or state machines) is not much different from the size it would take on the queue, and there is no preemptive context switching.
In most cases, you'll get the same behavior as having a queue here. Cooperative task-switching happens only on async I/O boundaries, so if you're processing a request that requires some CPU-heavy work, your application would just hog a core until it completes the request and then move to the next one.
It is not so easy. The article said it timed out (on the client).
When the queue is on client, the client start the timer when the request start.
That speed observed by clients should come from somewhere. In their case, they did not have a large reserve of performance to tap.
the problem is not making the pizzas in time but trying to get all pizzas started at once when there is not enough table space to even roll out that much dough, and then trying to squeeze all the pizzas into the oven at once, whereby several of them got messed up.
The logic here is not dissimilar at all: if the backend has no ability to queue and prioritise the requests, then the same function needs to be done elsewhere to safeguard quality of service.
This is only true when you look at a single client. If you look at a larger number of clients accessing the service at the same time, you would expect similar numbers of concurrent requests on HTTP/2 as on HTTP/1.1. Clients send larger numbers of requests at the same time, but they are done sending requests earlier so there are requests from fewer clients being processed concurrently. It should average out.
If you have, say, a 1000 clients accessing your service in one minute, I doubt the number of requests/second would be very different between both protocol versions. It would only be an issue if the service was built with a small number of concurrent users in mind.
Under HTTP/1.1 requests may have been hitting the LB and then being scattered across a dozen machines. Each of those machines was in a position to respond on their own time scale. Some requests would get back quickly, others slowly, but still actively being handled.
Under HTTP/2 with multiplexing, if the LB isn't set up to handle it (and they often aren't) they can be hitting the LB and _all_ ending up on a single machine, which is trying to process them while some of those requests might be requiring more significant processor resources, dragging the response rate for all the requests down simultaneously.
According to the author “we do load balance by request, not connections and we do not use sticky sessions.” (Source: https://news.ycombinator.com/item?id=19722637)
But it didn't, unless you're saying that Lucidchart made an incorrect analysis. Is that your argument?
>Clients send larger numbers of requests at the same time, but they are done sending requests earlier so there are requests from fewer clients being processed concurrently. It should average out.
Again, it didn't average out. And you assume it 'will average out' at your peril. Maybe it will, maybe it won't. Lucidchart engineers thought that too and it turns out that was wrong in a way that wasn't foreseen.
>It would only be an issue if the service was built with a small number of concurrent users in mind.
I doubt Lucidchart 'was built with a small number of concurrent users in mind'.
This comment suggests otherwise: https://news.ycombinator.com/item?id=19722637
> all existing applications can be delivered without modification....
> The only observable differences will be improved performance and availability of new capabilities...
Lucidcharts may have an inadequate backend, but it wasn't a problem until they moved to HTTP/2, so those statements weren't true for them. For anyone else rolling out HTTP/2, that is worth bearing in mind.
[1]: https://developers.google.com/web/fundamentals/performance/h...
The change in traffic patterns http/2 imposes was.
Hence the blog post.
Most would not think about the fact that your spikiness could increase 20x.
> And secondly, because with HTTP/2, the requests were all sent together—instead of staggered like they were with HTTP/1.1—so their start times were closer together, which meant they were all likely to time out.
No, browsers can pipeline requests (send the requests back-to-back, without first waiting for a response) in HTTP/1.1. The server has to send the responses in order, but it doesn't have to process them in that order if it is willing to buffer the later responses in the case of head-of-line blocking.
Honestly, over the long run, this is a feature, not a bug. The server and client can make better use of resources by not having a trivial CSS or JS request waiting on a request that's blocked on a slow DB call. Yes, you shouldn't overload your own server, but that's a matter of not trying to process a large flood all simultaneously. (Or, IDK, maybe do, and just let the OS scheduler deal with it.)
Also, if you don't want a ton of requests… don't have a gajillion CSS/JS/webfont for privacy devouring ad networks? It takes 99 requests and 3.1 MB (before decompression) to load lucidchart.com.
> If you do queue requests, you should be careful not to process requests after the client has timed out waiting for a response
This is a real problem, but I've suffered through that plenty with synchronous HTTP/1.1 servers; a thread blocks, but it's still got other requests buffered, sometimes from that connection, sometimes from others. Good async frameworks can handle these better, but they typically require some form of cancellation, and my understanding is that that's notably absent from JavaScript & Go's async primitives.
Browsers can pipeline requests on http/1.1, but I don't think any of them actually do in today's world, at least that's what MDN says. [1] And from my recollection, very few browsers did pipelining prior to http/2 either -- the chances of running into something broken were much too high.
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Connection...
1) The browser was sending a "streaming data follows" header flag followed by a 0-byte DATA packet in the HTTP/2 stream to work around an ancient SPDY/3 bug.
2) The load balancer was responding to the HTTP/2 "streaming data follows" header packet by activating pipelining to the HTTP/1.1 backend.
3) The backend was terminating the HTTP/1.1 connection from the load balancer with a pipelining-unsupported error.
The browser removed the workaround, the load balancer vendor removed the HTTP/2 frontend's ability to activate HTTP/1.1 pipelining, and after a few months we were able to proceed.
Diagnosing this took weeks of wireshark, source code browsing, and experimental testing. We were lucky that it broke so obviously that the proximity to enabling HTTP/2 was obvious.
On the other hand, a quick search found evidence of some very special HTTP servers doing bizarre things with HTTP: https://github.com/elastic/elasticsearch/issues/2665
[1] http://fasterfox.mozdev.org/
Go added cancellation support to the standard library at 1.7. I don't like its coupling with contexts, but the implementation is solid and supported throughout most blocking operations, so this statement is patently untrue for Go.
JavaScript really doesn't have a standard way for doing cancellation, which is a shame.
It's also curious to me that the load balancer doesn't smooth this out. If you have ten application servers and a client makes ten requests over a single HTTP/2 connection, I'd expect each server to respond to one request each. The details are a little fuzzy, but it sounds like the load balancer is only distributing connections, not requests. That seems wrong.
High CPU load should be fine, really, if your application servers are processing requests. If the load is unbalanced, then by definition you need a load balancer to balance the load. If you have one and the load is unbalanced, something is misconfigured.
> How many requests does your page make on initial load (that can't be handled by a CDN)? If you're making more than six XHRs to your application servers concurrently, this sounds like a problem that would have existed anyway had it not been for the browser's (rather arbitrary) connection limit.
I don't know the exact number, but definitely higher than six. And I certainly agree that is a failing in our application. The point is that the browserd _did_ arbitrarily limit connections, and our application (unknowingly) depended on that.
As with anything involving concurrency and hard-to-predict-exactly usage patterns, it can easily get complicated.
Who else remembers the [Heroku routing debacle of 2013](https://blog.heroku.com/routing_performance_update)?
This stuff ain't easy. Anyone who thinks this would only happen to an unusually "wrong" app, I think, hasn't looked into it seriously. This post was good information, I think it's unfortunate that so much of the discussion seems to be trying to shame the author (making it less likely people in the future will generously share such post-incident information!).
It can also be affected a lot by what platform you are on, the language/framework and server architecture(s). They each have different request-handling concurrency patterns, built-in features, and affordances and standard practices. Node is gonna be really different from Rails. I am curious what platforms were involved here.
If the OP had been framed as a cautionary tale about how the devs did not realize their traffic patterns were throttling their requests, the reactions probably would have been more positive.
This is useful notice, and post-mortem. Because I agree some discussion around HTTP/2 seems to have the assumption that it will be basically a transparent freebie.
Some people just like to feel superior. shrug. I was hoping for more interesting discussion about HTTP request concurrency and queueing from those who had been in the trenches, which is what you get from HN technical posts at their best. Instead of a reddit-style battle over who was wrong and who is too smart to make that mistake, which is what you get from HN technical posts at their worst. :)
I'm not exactly blaming HTTP/2, just saying the claim that switching to HTTP/2 is easy, safe, and only brings benefits is false.
Eh. Technically anything could cause problems. I don't think you'll find much in the way of claims that swapping out subsystems could only bring benefits.
[0] https://developers.google.com/web/fundamentals/performance/h...
Ah, the good old "unrealised infrastructure dependency" - nice to see you my old friend. People that have never been bitten by one of these never built anything worth talking about :)
It's worth observing that Gatling (load testing tool) supports HTTP/2. Once you've got the hang of it, you can fairly easily build load profiles to simulate situations like these. Probably wouldn't have helped you prevent the situation - unknown-unknowns being what they are; but you might find it helpful during remediation.
They may be using sticky sessions or affinity in some regard, having the load balancer hold each client connection intentionally to a server. It's not necessarily wrong, entirely depends on what you need to accomplish.
This might not be such a problem with one client artificially limited to a single application server. But in practice, it means that individual servers will be overloaded when they are chosen to handle multiple clients concurrently (while other servers are idle).
A lot of people set up their load balancers with session pinning (i.e. always choose the same backend based on the session id). This can improve things like cache performance.
Not sure if this is the case here, but it sounds like it.
Did the ALB open a new TCP connection for each request, or does it use a pool of connections?
If so, I don't quite see why queueing is discussed as an option at all. Queueing means extra latency and worse user experience (not to mention DoS potential).
What you should be discussing instead is how to (auto-)scale your app and infrastructure to handle your users' requests.
That said, if this is a traditional web app, this smells to me of a poorly designed application. It sounds like they’re doing on the fly compilation of static assets or something crazy like that, and in any event need to reduce the total number of requests per page or resource and look for opportunities to make things static or cached?
> Sometimes "thundering herd" is a feature, not a bug.
To expand: HTTP/1.1 naturally caused the latency of the Internet to pace requests. A sort of implicit intrinsic rate limiting. HTTP/2 intentionally avoids that "problem" by batching/pipelining requests.
Thanks for this, because honestly I hadn't thought about the implications myself and it'd be good not to accidentally walk into this problem.
People might recognise the birthday problem here; for d=365, k=2 (days a year, single share) the well known answer is 23.
Wikipedia gives a formula for a rough approximation for n for p=0.5 and k < 20.
Our application is Lucidchart (www.lucidchart.com). It is a very sophisticated web-app with significant amounts of dynamic data,running on hundreds of servers. I would imagine applications with less dynamic data and requests that require substantial amount of compute wouldn't run into this problem.
On a decent connection, kinda. Anything mobile or worse mobile and moving will suffer terribly.
[0]: https://i.imgur.com/LhEahvi.png
[1]: https://www.evidon.com/solutions/trackermap/
Second. turning it on when it is not mature yet is.
I suspect you're just saying things to cover for the fact that you don't have anything meaningful to say.