StackExchange Performance Stats

(stackexchange.com)

125 points | by ksec 1 day ago

22 comments

  • f6v 1 day ago

    I’m underwhelmed by the absence of 139 micro services(written by 11 one-pizza teams), each with its own database instance.

    • qes 1 day ago

      I serve near StackOverflow levels of traffic. Similar # of page views and more than 5x the egress bandwidth. Also .Net.

      Also running a traditional monolith style system. One main SQL DB. Another for some logs. Redis. Blob storage. KISS.

      We have various miscellany and a dozen+ web apps for public website, internal admin, content editing, monitoring, on-the-fly image resizing, etc. but even the API's serving those live in one web app, and our high traffic data & content service is just one app. Nearly everything in one solution and repo.

      Tbf, we've been utilizing chromium more over the past year and I think we're going to have to split that out into its own service.

      We do a lot of data-driven image generation and used a couple .Net imaging libraries for that until last year. Since then we've been building new data-driven content solely in HTML/JS and screenshotting through chromium to generate static images (we serve the same content in animated HTML and JPG and used to build each separately).

      • Two-pizzas teams.

      • matt2000 1 day ago

        I love to read about how StackExchange does things, it's such a stark contrast to the cloud maximalists. I'm not saying there aren't advantages to each camp, it's just nice to have a solid example on the "vertical scaling can actually get you pretty damn far" side of things.

        • One day stack overflow will be offline for weeks and then people will have an answer to this.

          • zamadatix 1 day ago

            It's 20 servers, many of which are hot spares. What about self hosting that is going to take weeks and why hasn't it happened in the last decade?

        • kburman 1 day ago

          I have worked in a project which required double infrastructure of what SO is using and render page at > 200ms. Our db size was 1/4 of stackoverflow but we don't even have 0.1% of content was stackoverflow offers.

          • GordonS 1 day ago

            What stack were you using?

            The only time I've personally seen render times that high is with large, complex databases in bad need of indexes and/or restructuring, and with hybrid cloud stuff, where cloud services are using an on-prem API or database.

            • f6v 1 day ago

              Do live queries for frequently accessed data without caching and you can go into seconds to render a page. I once came across a Postgres full-text query without an index. The page took a minute to render, I kid you not. And that page was arguably crucial to internal operations.

              • kburman 23 hours ago

                Rails, but I won't say rails was any way responsible for all this. It was poor design choices from our side.

                It was a large database due to unnecessary complexity and we do have indexes but due to poor design it was hard to gain any significant performance gains from adding indexes.

            • Stack overflow is awesome

              Stack overflow runs on .net

              Stack overflow squeezes lots of performance out of very little hardware or cost

              Stack overflow doesn’t do that much

              Customers haven’t added feature after feature until it resembles a half dog half cow hybrid

              Stack overview developers spend time working out how to scale for their specific use case rather that theoretical use cases

              I honestly believe that by sticking to a singular narrow focus stack overflow has achieve magnificent things. It isn’t because of .net it is in spite of it.

              Well done stack overflow but unless you can stop the business or the users with extra requests or expanding the product you cannot replicate it.

              • jmaygarden 1 day ago

                There’s really no need for all of the speculation in this thread. Jeff Atwood documented the StackOverflow development process as it happened on his blog. Here’s the first announcement from 2008:

                https://blog.codinghorror.com/introducing-stackoverflow-com/

                • atdt 1 day ago

                  At first I was surprised that they don't use edge caching: it seems that all requests get proxied to a backend server, which use just two redis servers as a shared cache. A global internet site can't get good performance this way: RTT between NYC and Sydney is >200ms, so even if your application is lightning fast, it's going to feel slow to users in many parts of the world.

                  Digging around a bit, it seems that they do use Fastly for edge CDN: https://nickcraver.com/blog/2017/05/22/https-on-stack-overfl....

                  Still, it's an impressively lean stack.

                  • EE84M3i 1 day ago

                    Potentially they can't get much edge cache offload for HTML/API content because many people are logged in and/or they do page customization even for non-logged in sessions and pushing logic to the edge isn't worth it.

                    • atdt 1 day ago

                      Probably you're right!

                      A webpagetest from Sydney shows pretty dismal performance: https://www.webpagetest.org/result/200801_FJ_dd5f2a76d77bcaa...

                      ~2 seconds for first contentful paint. That's pretty dismal, IMO, and shows the limitation of their current setup.

                      • tenken 1 day ago

                        A 2 second wait for some impossibly hard IT issue resolution I've been struggling for hours/days against ..... Where do I sign up :D

                        • ksec 1 day ago

                          >That's pretty dismal, IMO, and shows the limitation of their current setup.

                          Well they cant beat the speed of light. What could they do ?

                          • lixtra 1 day ago

                            Cache their content on different continents and live with only eventual consistency. But as long as there is no competition, why should they complicate things.

                    • BossingAround 1 day ago

                      Wow, it's a bit surprising that they use C# under the hood. I wonder what was the cause for that decision; back when I started learning Java, C# was said to be the slower of the two.

                      Honestly, I never wanted to get into C# as much because typically, it also means getting into Windows (from personal experience, companies using C# are heavily invested in Windows, and I love using Linux for everything).

                      Do companies nowadays develop and deploy C# on Linux boxes?

                      Edit: Just a reaction to a number of responses, I know C# is now fully open source. That doesn't mean companies using C# develop using Linux and deploy on Linux. C# being open source means very little to me when all deployments are on a proprietary platform.

                      • GordonS 1 day ago

                        C# dev here, mainly working in enterprise size companies who are Microsoft shops. We usually deploy on Windows for on-prem services, and on Linux for cloud services (some on VMs, some on containers, some serverless).

                        I'm more of a Windows guy for desktop, but always prefer Linux for server. Nowadays dotnet is a first class citizen on Linux and MacOS, through dotnet core.

                        On performance, there has been a big focus on perf by the dotnet team over the past years - I'd be shocked if Java could beat dotnet on just about any metric. If you know what you're doing you can write allocation-free code, and you even have hardware intrinsics at your disposal - I recently ported a hashing algorithm to C# and got the performance pretty close to parity with native C code (almost memory speed).

                        C# doesn't get much love on HN, but IMO it's a truely fantastic language.

                        • cheerlessbog 1 day ago

                          Their recent blog post gives a flavor of some of the performance work in the platform:

                          https://devblogs.microsoft.com/dotnet/performance-improvemen...

                          • jeswin 1 day ago

                            > We usually deploy on Windows for on-prem services, and on Linux for cloud services.

                            Which makes a lot of sense. AWS Windows machines cost ~50% more compared to Linux boxes. I was quite surprised at Windows pricing actually; that made it an easy decision to make given that .Net core has been stable on Linux for a while now.

                          • easton 1 day ago

                            I think the major driver for it was that Joel Spolsky worked at Microsoft for a long time (among other things, he was a PM for Excel), and therefore knew the MS stack the best. For a long time, Stack Exchange ran on IIS + SQL Server, I don't know if that's true anymore.

                            • BossingAround 1 day ago

                              Ah! That makes perfect sense. Interesting, thanks for the comment.

                              • randompwd 1 day ago

                                Not the case. Spolshy was not a coder.

                                https://meta.stackexchange.com/a/122854

                                > The main reason, it was the development stack Jeff, Jarrod and Geoff knew best when they embarked on the mission.

                                Jeff Atwood comments on linked answer also.

                              • UglyToad 1 day ago

                                Yes, though being an enterprise friendly language there's a bit of a lead time in most of the places using it, but NET core on Linux is increasing adoption.

                                Just this week I built and deployed a new C#. NET core API running on a Microsoft provided Ubuntu docker image and it's now ticking along nicely in production.

                                I still develop on Windows because Visual Studio is for my money the best development environment available but I almost never write new code to run on Windows, with core at publish time its as simple as a command line flag to target Linux. If you don't want to use Windows for development then other OSes are just as viable for developing on, I developed my personal blog on Linux.

                                It's a shame that people (perhaps understandably) have this image of C# as a legacy environment when it's a language and environment that is constantly innovating and has been sitting around the top of some server benchmarks for quite a while.

                                • p_l 1 day ago

                                  Long long ago, they mentioned how it allowed them to use way less hardware - to quote, "MS licenses paid for itself in hw running costs".

                                  Features like AOT compilation, or the fact that SQL Server is really powerful database engine that they already knew - in fact, they knew windows platform very well, and do things like putting some critical paths directly using HTTP.SYS kernel driver webserver (IIS uses it as well, but build extra functionality on top)

                                  Compare it with popular open source web stacks of the time - Python, Ruby, PHP, all with no real compilation support, slow or otherwise problematic (PHP's emulation of CGI environment, for example).

                                  But ultimately - they knew their tools well, and applied that knowledge to very good results :)

                                  • belltaco 1 day ago

                                    >back when I started learning Java, C# was said to be the slower of the two.

                                    Which year was that? Java apps and the JVM in my opinion and experience are very bloated and have a lot of memory consumption and leaks, which cause frequent and annoying GC slowdowns. I shudder to run large Java apps on company servers like the Atlassian stack(Jira, Confluence etc.) and Tableau Server.

                                    • oxfordmale 1 day ago

                                      There is a special circle of hell for the Atlassian stack. It is a classical example of legacy bloat.

                                    • giulianob 1 day ago

                                      C# has gone completely open source and cross platform. .NET core was essentially a huge modernization of the entire runtime and ecosystem. It's becoming one of the most performant managed languages according to Techempower benchmarks.

                                      You can also see just how much it's improving in some areas and the level of detail they're paying to it: https://devblogs.microsoft.com/dotnet/performance-improvemen...

                                      • alkonaut 1 day ago

                                        > C# was said to be the slower of the two.

                                        I don't think that was ever the case, (Java VM's always had some clever optimizations in due to being more mature, but if you used C# properly and used its value types effectively, you could beat idiomatic Java hands down, and the Java that ran equally fast was often bastardized C-looking java that didn't use proper types but arrays. Basically High perf java only supports AoS, only SoA, which may or may not be convenient).

                                        For a lot of workloads obviously the difference is minuscule, especially if a database or web server is involved (in this case, both).

                                        • rockwotj 1 day ago

                                          Last I heard hudl.com runs on C# and linux using Mono.

                                        • rolls-reus 1 day ago

                                          Is stackoverflow essentially a monolith? I remember reading about the single database setup several years ago, I'm surprised that they have been able to keep scaling vertically.

                                          • I think that is the thesis of StackOverflow's devs. You can vertically scale easier and much further than what most people think.

                                            No groud up re-architecture, no microservice complexity, no crazy high cloud bills for exotic high power VMs. Just upfront capx for your own big iron and the skill to run it.

                                            • vii 1 day ago

                                              Amazon is offering 24TB RAM high memory machines on AWS. The Stackoverflow setup is far away from Big Iron :)

                                              In terms of opportunities to add complexity: I'm surprised there is no mention of a data-platform or any model training for ranking, etc.

                                              To reduce components, it might be possible to combine ElasticSearch and MSSQL into just PostgreSQL which has awesome text indices. As MSSQL performs very well and presents a somewhat esoteric SQL dialect this could be an expensive project.

                                              • zamadatix 1 day ago

                                                The specs seem to line up with their upgrade notes from 2015 so it was a bit more "big iron" when made. Not that it's exactly under stress at the moment by any means but that's just further proof big iron can get you really far.

                                          • vyrotek 1 day ago
                                            • heipei 1 day ago

                                              I've been patiently waiting on on update for this post and others on the Stackoverflow architecture and hardware on his blog. Nick, if you're reading this, we're dying for on updated post ;)

                                              • p_l 1 day ago

                                                If you compare the components listed on the performance stats, it seems to match the 2016 post pretty much 1:1. I wouldn't be surprised if they just updated the hardware a bit and that was it.

                                              • skunkworker 1 day ago

                                                This is all for a single site (location) though right? I wonder how many backup AZs they have and how they’re replicating across zones.

                                                It’s surprisingly less infrastructure then I was imagining.

                                                • p_l 1 day ago

                                                  Two datacenters, they used to have each site run in 2 not-completely-full server racks (or maybe 4, been long time since I read the blog post of their switch gear upgrade to 10 gigabit).

                                                  AFAIK they still keep to two datacenters, and the second one is a backup. All of SO sites run from the same few racks.

                                                • ksec 1 day ago

                                                  I have always thought StackExchange as the Gold Standard of CRUD Apps.

                                                  I would be happy if Ruby Rails could do it in 3x the rendering time ( i.e slower ) with 2x Resources / Server.

                                                  • alkonaut 1 day ago

                                                    I knew about most of their architecture, but what puzzles me is: why are some servers vertical boxes and some horizontal?

                                                    • Youden 1 day ago

                                                      It could be blade vs. regular rackmount but I think it's just a design choice since the other ways of arranging 9 web servers aren't as visually appealing.

                                                    • syspec 1 day ago

                                                      Didn't know stackexchange made such use of websockets, where are they using them?

                                                      • haneefmubarak 1 day ago

                                                        They use websockets for notifications (upvotes, comments, etc).

                                                      • So, no cloud, no AWS. All in house?

                                                        • redleggedfrog 1 day ago

                                                          I couldn't tell but are those CPU percentages real time?

                                                          • li4ick 1 day ago

                                                            I wonder about the future of Stackexchange, now that every community has it's own subreddit/discort/etc.

                                                            • Can_Not 21 hours ago

                                                              those two things are terrible at the Q&A format, search, and SEO, so probably not much to wonder about.

                                                            • 11k peak SQL queries @ 15% peak CPU usage across how many cores/processors?

                                                              • recuter 1 day ago

                                                                300-450 rps for the web server tier seems very low. I wonder why that is.

                                                                • f6v 1 day ago

                                                                  9x300x60x60x24x30=6998400000

                                                                  • recuter 1 day ago

                                                                    Sure, I meant I'm surprised one server handles so few requests per second each.

                                                                    • ksec 1 day ago

                                                                      In a blog post a few years ago they explained they could have fitted all 3000+ Req/s in a single server, but its ~99% percentile would have been ~100ms+. So in the name of performance and sub 20ms rendering time they decided to it with more servers.

                                                                • jchook 1 day ago

                                                                  Surprised to see such low peak ops/sec on HAProxy

                                                                  • rogerdonut 1 day ago

                                                                    I'm curious to know what you mean by this? It's not like it can't handle more that is just what their peak workload was. With only 18% CPU usage at peak it's clear that it could handle much more.

                                                                    • jchook 1 day ago

                                                                      I guess I'm mostly curious how they handle websocket load balancing.

                                                                  • DonCopal 1 day ago

                                                                    Much performance! Yet they don't even have a button to select all of code in a codeblock.

                                                                    • bzb3 1 day ago

                                                                      They should rewrite everything in python so they need 10 times as much hardware. The economy is stagnating

                                                                      • brainzap 1 day ago

                                                                        What about the CDN?