How we built Uber Engineering's highest query-per-second service using Go (2016)

(eng.uber.com)

185 points | by godelmachine 1622 days ago

10 comments

woeirua 1622 days ago
This is a very inefficient implementation. Really, just poor quality work overall, as anyone with even a basic understanding of spatial indexing would know that an R-tree would be many times faster, as illustrated here: https://medium.com/@buckhx/unwinding-uber-s-most-efficient-s...
[-]
- dang 1622 days ago
  This comment is unacceptable on Hacker News, and I'd like to explain in detail why. Please don't think I'm picking on you personally; we all react like this sometimes. Rather, I want to drive this point home to the community. It's important for discussion quality here!
  -----
  Please omit swipes like "Really, just poor quality work" and "anyone with even a basic understanding" from your posts to HN. It's great to add relevant information, such as an applicable data structure and a link to a good article on the same topic. But it's not great to put others and their work down, and HN has at least two guidelines that ask you not to:
  When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."
  Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.
  Actually, a third guideline is relevant too:
  Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize.
  The strongest plausible interpretation is not that the engineers lacked a basic understanding of relevant CS—or, for that matter, how to use a search engine, since R-trees would pop up nearly any place you searched about this stuff. The strongest plausible interpretation is that they had some other reason for choosing the implementation they did. For example, perhaps it was efficient enough and they were smartly choosing not to build something more complicated than needed.
  https://news.ycombinator.com/newsguidelines.html
  Edit: and it turns out that the article explains why they didn't use R-trees.
  [-]
  - deminature 1622 days ago
    It's disappointing to see a highly-upvoted comment from someone that didn't read the article. The article discusses explicitly why R-trees were not used:
    >Instead of indexing the geofences using R-tree or the complicated S2, we chose a simpler route based on the observation that Uber’s business model is city-centric; the business rules and the geofences used to define them are typically associated with a city. This allows us to organize the geofences into a two-level hierarchy where the first level is the city geofences (geofences defining city boundaries), and the second level is the geofences within each city.
    [-]
    - woeirua 1622 days ago
      I did read the article before posting. They don’t actually explain why they opted to not use an R-tree except to say that they think this organizational structure makes sense intuitively. Unfortunately, intuition here is wrong, and the link I posted specifically benchmarks their solution and shows that if they had done even basic benchmarking they would have seen this.
      Now there might be a valid argument for this approach if the geofences are changing constantly and reindexing the R-trees would take too long, but in the end they synchronize everything anyways, and the R-tree could easily be generated on another node, serialized and then unserialized asynchronously before swapping to an updated tree.
    - icholy 1622 days ago
      R-tree will easily outperform that approach.
      [-]
      - xtomus 1622 days ago
        You can either ship a 2 dimensional linear search approach today or ship an R tree based approach in 2 weeks time. I know which one in choosing if they both meet all of the requirements
        [-]
        woeirua 1622 days ago
        Just about every modern language has a well established R-tree library...
        icholy 1621 days ago
        I mean why stop there? Let's remove the database indexes too. Hell, why use a hashtable when you have lists amirite?
- paggle 1622 days ago
  If you follow American football, your comment reads quite a bit like "Tom Brady has terrible throwing mechanics." Ultimately, it might be right, in some purely academic sense, but in a more relevant sense, good throwing mechanics are those that win you games, and good software implementations are those that meet the business requirements.
- buckhx 1622 days ago
  Thanks for the shout out. If anyone has questions or comments feel free to reach out.
  Also I'm looking for work so if anyone is interested snag my email from my profile.
- jorblumesea 1622 days ago
  It's funny because had you tried to brute force any of their interview questions (spatial or not) it would have been instant rejection. Often feels like algo analysis is an exercise to get a job and not something engineers actually do, even when required.
  [-]
  - woeirua 1622 days ago
    The entire line of argument would have been immediately rejected in a phone screening or an on-site interview at any of the major companies, and even many lesser companies that deal with spatial data. Anyone doing spatial queries and not using existing spatial data structures had better have a really good reason for not doing so.
- winrid 1622 days ago
  I've used RTrees for this too (In Java) and they're great.
  Hundreds of thousands of queries a second per cpu core on millions of data points.
- _virtu 1622 days ago
  I'm currently implementing an rtree in elm. Does anyone have good references about rtrees that they like?
  [-]
  - mateo411 1622 days ago
    I think this is the first paper on R Trees.
    http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf
- not_a_cop75 1622 days ago
  So am I to understand this shows a little bit about why Uber is not the center of excellence it needs to be?
- thejigisup 1622 days ago
  clearly the most efficient solution is to pontificate about other people's implementations while never having done anything of the sort at that scale yourself.
  [-]
  - benburleson 1622 days ago
    For future internet-searchers that stumble upon this, it might be helpful to point out that better solutions exist..
    [-]
    - rutenspitz 1622 days ago
      It would be even more helpful to point out that the article talked about all of this.
  - jzoch 1622 days ago
    While he was a bit rude I wouldnt assume he hasn't built something at this sort of scale.
    [-]
    - pc86 1622 days ago
      The vast majority of programmers haven't built something at this sort of scale, so while it's very possible that he has, odds are that he hasn't.
    - thejigisup 1622 days ago
      nah i think its a pretty fair assumption
      [-]
      - woeirua 1622 days ago
        Let's just ignore the fact that in this case scale has nothing to do with the actual efficiency of the algorithm they implemented. As the link I posted mentioned, had they used a more efficient algorithm they could have used far fewer resources, or used the same resources to scale up further, with much faster response times overall.
      - forgottenpass 1622 days ago
        99% of people building at "scale" are borrowing bragging rights from a large company they happen to be employed by and/or tools they use while doing zero novel work unique to the size of their deployment.
        To say that these paper tigers are above the bikeshedding of all the plebs is quintessential software echochamber.
        [-]
        dang 1622 days ago
        "Don't be snarky."
        "Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
        https://news.ycombinator.com/newsguidelines.html
        We've had to ask you to stop doing precisely this in the past:
        https://news.ycombinator.com/item?id=19884138
        https://news.ycombinator.com/item?id=13587522
        Unfortunately, you've continued to post like this frequently. If you keep doing that we're going to have to ban you, so could you please review the guidelines and fix this?
hobofan 1622 days ago
> High performance in throughput and latency. In our main data center serving non-China traffic alone, this service handled a peak load of 170k QPS with 40 machines running at 35% CPU usage on NYE 2015. The response time was < 5 ms at 95th percentile, and < 50 ms at the 99th percentile.
> Geofence lookups are required on every request from Uber’s mobile apps and must quickly (99th percentile < 100 milliseconds) answer
For a 100ms total budget, a ~50ms 99th percentile for a single microservice doesn't sound like something to boast about.
[-]
- bsaul 1622 days ago
  I don’t read this the way you do. I read it as their service had to answer under 100ms and they made it run under 50
- foobarian 1622 days ago
  If that's 40 bare metal machines... assuming 60ish HT cores, and 35% CPU... that's not that earth shattering. We do that in Java :)
  [-]
  - matt2000 1622 days ago
    My sense is that most people moving to Go are coming from Node and that explains their excitement about the performance. Not many upsides when you’re already on Java.
    [-]
    - dashwav 1622 days ago
      I mean there is a definite performance gain from Java to Go, although I would agree most people probably wouldn't hit it - but if its just as easy to write, might as well use the better language. On top of that Go makes concurrency so much easier to factor in and program with, which in the backend server space is a very important asset over Java.
      As an interesting addendum, I found this to be a really interesting resource when comparing languages at a very base level, as it is very well written and you can actually look at the source code/ thesis papers for all of the implementations: https://github.com/ixy-languages/ixy-languages
      [-]
      - matt2000 1622 days ago
        I hadn't seen the network driver comparison before, interesting. Go does really well here, maybe because this is the kind of job is was designed for originally? I'm not sure. For the kinds of things I generally write, this is usually a good way to compare something approximating real world performance: https://www.techempower.com/benchmarks/#section=data-r18&hw=...
        C, Java, C# and Go all appear at the top there so it's basically a wash, +/-5%. I still think the main perception of Go as "the fastest option" on places like HN is because people are coming from some of the slower languages (Javascript, Ruby, etc).
        Footnote: I don't think there's anything wrong with using a slower language, a lot of them have higher productivity overall. I also just happen to prefer programming in Java, so it's overall the best choice for most jobs for me.
      - MrBuddyCasino 1622 days ago
        > I mean there is a definite performance gain from Java to Go
        I don't think thats true.
        [-]
        CamouflagedKiwi 1622 days ago
        There are dramatic memory and startup time improvements. Much less so for CPU once everything is up and running.
        [-]
        stefano 1622 days ago
        Is that still true after tuning the JVM GC for a low memory footprint? I'm genuinely asking, I'd like to see an article on the subject. I've read in the past that the default settings are geared towards long-running processes and trade off memory usage for higher throughput, as in general there's no free lunch with GCs. Better memory usage always implies worse performance (and viceversa), just like a higher throughput increases pause times. If Go has lower memory usage and lower pause times, I'd expect it to have lower throughput than the JVM GC.
        [-]
        CamouflagedKiwi 1621 days ago
        Yes.
        Every object in Java has something like three words of overhead; also since it doesn't have value semantics (yet) objects are typically allocated out-of-line, so an ArrayList makes a linear number of allocations, whereas a Go slice makes a constant number. Plus the binary sizes are typically much smaller; not really an expert but our observataion at work is non-trivial Java services allocate a lot of memory through classloading / JITing whereas a Go binary will typically be very small.
        Basically, agreed on the GC tradeoff, but the higher memory footprint of Java mostly comes from other areas.
        > Better memory usage always implies worse performance
        That isn't really true. Better memory usage implies better cache-friendliness (and possibly better locality too).
        DuskStar 1622 days ago
        There's two more variables you're missing here, besides "memory" and "pause times" - how well matched to GC the language is, and how advanced/good the GC is. Some languages might just be fundamentally easier to garbage collect than others - a simple example might be that of a language with no references and no threading, where the GC is going to be extremely simple. And GC quality is certainly a thing, since I doubt major companies would spend tens/hundreds of millions of dollars producing GC 'advancements' that just move a slider back and forth.
        erik_seaberg 1622 days ago
        The GC isn't very good yet, but brute-force sequential code can keep more state on the stack where it doesn't matter. (Just please codegen that instead of expecting everyone to read it.)
        [-]
        marsdepinski 1622 days ago
        Go's GC is better than most of the Java GCs except the new ZGC and Azul's proprietary one. JIT makes some Java code faster. Ultimately, it's more about programmer expertise in a language rather than the actual language.
        [-]
        daxfohl 1622 days ago
        It's all relative. Go is optimized for pause times and beats all other mainstream runtimes for that. JVM GC is very configurable and can be optimized for throughput, pause times, heap size, fragmentation, or whatever you want, and can beat Go's for pretty much all of them except pause times. Both have very smart people working on them and the techniques are well established, so it's about tradeoffs. (An oldish post but I think still relevant: https://blog.plan99.net/modern-garbage-collection-911ef4f8bd...)
        That said, Go's focus may be the right one for many use cases. On a high throughput service where you would assume you want to optimize for throughput, 100 ms pause times can wreak havoc because they're unpredictable and can cause work queues to explode and such. This isn't easily mitigated by load balancing. Whereas "less efficient" GC is at least predictable and you can just add a server to balance that extra work.
      - erik_seaberg 1622 days ago
        Go has a little syntactic sugar for the same green threads other languages have, but it's missing a lot of immutable and thread-safe data structures.
      - apta 1621 days ago
        > might as well use the better language
        Which would be Java in this case.
    - overcast 1622 days ago
      Why is it that every Java application I've encountered in my 20 years of corporate IT, has been a performance pig? Is it just a case of bad developers, making shit code? Is it difficult to make Java perform well, but when it does, it shines? I can't have accidentally interacted with ONLY the crappiest Java apps in my tenure.
      [-]
      - geodel 1622 days ago
        I have exactly similar experience. At this point I feel Java programmer/developer is misnomer. Most of the time they are 'framework' fiddlers. Be it J2EE, Spring, Spring Boot (nowadays) Vert.x and so on. When I talk about an http server, the only frame of reference in their mind is Weblogic, Websphere, JBoss, Tomcat etc. The stuff they work with is 80% auto generated and they are quite proud if it.
        So few application I own have about 200 LOC of business logic and 10K LOC Spring Boot fluff along with some 60 jar files which I am not sure what they really do.
      - asdfman123 1622 days ago
        It's possible to make bloated crap in any language.
        But enterprise programmers making internal software have little incentive to make their programs sleek and fast because they 1) have a captive user base who are forced to use their bad UIs, and 2) there aren't many users so you don't really need to optimize for minimal server use.
        From the eyes of management, a good enterprise programmer can take a request from start to beginning and fulfill a business requirement in a reasonable amount of time. Architecture, UI, future maintainability, speed, security... that's not really on their radar at all.
        There are plenty of managers who do not understand or respect feature creep and technical debt, so spending time refactoring is useless and opaque to them. Developers have no incentive to write good code, and bad developers who can talk a good game get hired on, so the codebase spirals into an unmaintainable sea of crap.
        According to management, if you solve business problems, you're valuable to them. So if you care about good code, it's an uphill battle fighting for it against the other entrenched developers who really don't give a shit.
        These programs usually happen to be written in Java because there are tons of Java developers out there, which is great if you don't live in a tech hub.
        [-]
        overcast 1621 days ago
        This isn't even internal stuff, all paid commercial applications. Oracle is notorious for it, but most recently dealing with SysAid(Helpdesk software).
  - asdfman123 1622 days ago
    Yeah, boring programming languages are criminally underrated. Fro instance:
    > High developer productivity. Go typically takes just a few days for a C++, Java or Node.js developer to learn, and the code is easy to maintain. (Thanks to static typing, no more guessing and unpleasant surprises).
    This is why the rest of the world doesn't use JS for everything.
    > There is a lot of momentum behind Go at Uber, so if you’re passionate about Go as an expert or a beginner, we are hiring Go developers. Oh, the places you’ll Go!
    If they did it in Java, they wouldn't have to recruit for programmers quite so hard -- they could pull them out of a hat, and fairly easily get a few wizards with two decades' worth of experience in it.
    [-]
    - sidlls 1622 days ago
      They also would have to sift through some really terrible programmers who have fully bought into the worst practices of enterprise-y OO development.
      [-]
      - asdfman123 1622 days ago
        Is that really true? Honest question. Or is that just bias against things that are not new and fancy?
        [-]
        sidlls 1622 days ago
        It's really true, in my experience. I managed a team in a company that used Spring Boot for services and batch jobs and Python for data and analytics (including ML). My team had its feet in both worlds so I had to hire for that. My experience trying to find programmers who could code and engineer first and not be slaves to the Spring framework was quite terrible in this context.
        Another thing of note: the python side of the team could regularly move across and help with bugs or issues in our services and jobs, but the reverse was rarely true. In the 2+ years I had that team there was one person hired for the services side who could do it, and he preferred Go to either Java or Python.
        [-]
        shantly 1622 days ago
        I'm not sure what causes it—and I'm not even sure it's a bad thing, certainly it must not be hurting these folks careers too much, and maybe it's even helping them—but there's a certain path for developers that ends up leaving them helpless outside whatever narrow ecosystem they've grabbed onto. Usually it's Java or some Microsoft thing.
        As someone who very much did not develop (as a person/programmer) that way, it seems baffling from the outside, but there's this whole world of programmers who work like that until they retire or are promoted into management. It seems bizarre to me but it must be working for them. Hell, they might even be the majority of all programmers. Bigcos, particularly the non-tech ones that nonetheless employ lots of developers, are full of such people.
        [-]
        asdfman123 1622 days ago
        Well, I think the difference between them and Silicon Valley programmers is that they don't constantly learn new technologies outside of work. It's just a 40-hour-a-week job.
        I try to learn new things, but I don't do much programming outside of work. I could learn Node.js, for instance... but I've got other things I want to do.
        Is that kind of what you mean?
        [-]
        shantly 1622 days ago
        Nah, I'm a 40-hour-a-weeker myself, I'm not sure it's strongly correlated with that.
        Some folks, you say "do you think you can do [thing you're basically familiar with] in [language and platform you're not]?" and get a "yeah, probably lemme check it out... cool, compiler and language support's installed, see a couple tutorials here, I'll poke around and get something doing [subset of thing you need] then get back to you on timeline" and very likely it works out fine.
        Others, you get "uh, I do (Java, .net), I don't... I don't understand what this is, is it a JVM language? Is there a jar for it?" and it's not really worth pushing any harder. And maybe some of them are deflecting because they just can't be bothered (mad respect), but most seem genuinely nervous and out-of-their-element at the mere suggestion of doing anything but their Java or .net thing (usually it's one of those) they're used to.
        Though, again, the latter seem to get along just fine, career-wise. It's just a different sort of path, I guess. Seems really weird to me, but there it is.
        shantly 1622 days ago
        There has to be some reason that a language that benchmarks like Java is used to build so many heavy-feeling bloated programs that eat way more memory than seems reasonable, even from big "tech first" companies. Either lots of highly-experienced Java programmers aren't using patterns suited to good performance, or the language actually sucks a lot more than benchmarks/theory indicate. I suspect it's mostly the former.
        It's been this way ever since I can recall. "Java's so fast you probably can't tell the difference most of the time". Well, OK, but, I can. So something's going on here.
        soedirgo 1622 days ago
        This might be related.
        From pg: "...you could get smarter programmers to work on a Python project than you could to work on a Java project." [1]
        [1] http://www.paulgraham.com/pypar.html
        pc86 1622 days ago
        More popular languages get more early developers, so will be naturally predisposed to have objectively bad developers. You'll definitely find more bad Java devs than bad Haskell devs, but you'll also probably find a higher rate of bad devs in Java or PHP than you will Haskell or F# or something.
        zzbzq 1622 days ago
        Totally true, the whole reason half the popular programming languages exist is that their communities are overreacting to the bad/crazy things that happen in the other half of popular programming languages. Java and C++ communities have a lot of bloated ideas, and the languages themselves lend to that.
        bborud 1622 days ago
        In my experience: yes, this is true. I’m not going to speculate about why, but I suspect this will eventually happen to Go as well when/if it becomes a more mainstream language.
        earthboundkid 1622 days ago
        At my old company, we would do hiring screens by having people send us simple code exercises in whatever language they'd like. Every once in a blue moon, I'd see a good Java sample, but most of them were bloated crap.
        FWIW, the JavaScript samples were also pretty bad because it was before async/await, and they'd all get twisted into callback hell waiting for IO.
        marcosdumay 1622 days ago
        Yes, hiring for mainstream languages have this cost. It's true whatever the quality of the language and how many good people are available to work on it.
    - geodel 1622 days ago
      I agree about performance. Java would be totally able to do it.
      I feel this article is about as exciting those Java article describing 'Hello World' http server running under 256MB memory as earth shattering.
    - MuffinFlavored 1622 days ago
      > This is why the rest of the world doesn't use JS for everything.
      What?
      [-]
      - surfmike 1622 days ago
        I think they’re referring to static typing.
  - gmurad2 1622 days ago
    Don't sell the JVM short. Using modern concurrency models (e.g.: Vertx) it will outperform go in throughput and latency.
    [-]
    - sgt 1622 days ago
      I've changed my opinion a bit about Vertx combined with Java.
      Personally I tried to push it for several years, but most Java developers (and I mean 95% in my particular case) seem to start resenting it over time.
      It's hard to write good services in Vertx, mostly due to its asynchronous model combined with Java's verbosity and boiler plate.
      Many teams have junior developers and IMHO it's simply not safe expecting them to write production grade services (albeit simple) mostly on their own. It can be done, but it drains the rest of the team. A more productive Java team would have used something like Spring Boot.
      Also, with any technology, there are some gotchas .. and with Vertx they are much harder to figure out. In the end we changed to another Java framework.
      [-]
      - YawningAngel 1622 days ago
        I suspect that Loom Fibers are going to offer parity with golang in this area soon enough.
        [-]
        sgt 1621 days ago
        Loom seems very interesting. Do you know how far off they are currently?
        [-]
        YawningAngel 1621 days ago
        Nope. They 'work' in a loose sense (you can run production-grade software like Jetty on top of them) but I don't understand very much about the current state of the project or what would be required for it to ship in a regular JDK release.
        lossolo 1621 days ago
        This is newest update on project Loom:
        https://www.youtube.com/watch?v=lIq-x_iI-kc
  - buckhx 1622 days ago
    As some other posts have pointed out, their algorithm was holding them back not the language.
    [-]
    - Scarbutt 1622 days ago
      Well, switching from nodejs to Go for this use case was probably already a big win even if the algorithm didn't change.
- sethammons 1622 days ago
  Per machine, that is 4,250k rps. For any http service we write in Go, that lines up with our general sizing estimates before we start work on a generic new service. After we start to hit the service with load, we start profiling and go from there. Sometimes it gets faster and sometimes it gets slower, but it is usually in the right ball park.
- Rapzid 1622 days ago
  With that kinda volume the 99th is not interesting at all. Should be looking at the 99.99th.
  [-]
  - sokoloff 1622 days ago
    The latency to human user is what matters. If I get an answer in under 100ms as a user 99% of the time, I don’t care whether 100 or 100K times per day someone else hits that 1% worst case.
    [-]
    - ses1984 1622 days ago
      It has nothing to do with your experience being 99% good enough and more to do with the size of the revenue opportunity that could be lost messing up 1% of requests.
    - kevan 1622 days ago
      As an individual you don't care as long as you aren't in that 1%, but as a customer-obsessed service owner I absolutely care if a bad experience is happening 100k times per day. If you're operating at internet scale you need to look at both percentiles and absolute numbers to assess customer impact.
      Also keep in mind how percentiles compound when you have more than one service involved in serving a customer request. For example, let's say it takes 5 internal requests to serve an external customer request and each of those services measures latency SLAs at the 99th percentile. The customer request may only finish inside the SLA 95% of the time (99%^5)
    - scottlamb 1622 days ago
      > The latency to human user is what matters.
      Agreed, which is why the most surprising part of this article for me was this phrase: "In our main data center serving non-China traffic".
      The transit latency to the one main datacenter in the (non-China) world is a lot more significant than the server time here. (Over 50 ms just to cross the US; compare with their server-side latency numbers of 95%ile < 5 ms, 99%ile < 50 ms.) If they're serious about latency, their deployment is holding them back much more than their choice of programming language or algorithm.
      Or maybe the client is not the user's phone, but some other service running within the same datacenter?
  - sethammons 1622 days ago
    I would argue they should be considering max. Percentiles are good for understanding macro trends. But if every request is sacred, max needs to be monitored. Even at four nines, there are a couple dozen requests per second that are greater than what you are observing. How much greater? Who knows, if you are not observing max. Could be minutes in latency.
    [-]
    - lclarkmichalek 1622 days ago
      Max is pretty horrible, as any comparison requires identical cohort sizes. P99.99 is much better if you want to go wild.
      Given that the service probably did not have a 100% success rate, and almost certainly had timeouts, the "max" would also likely be at the timeout.
      [-]
      - sethammons 1622 days ago
        And that would be a good signal and worth investigating. Why timeouts? I run a few services doing billions of daily queries. I watch max in addition to percentiles.
      - hobofan 1622 days ago
        > and almost certainly had timeouts, the "max" would also likely be at the timeout
        You filter out the 504s, just the same way you would analyze it on a per-route basis for those metrics.
        [-]
        spookthesunset 1622 days ago
        Sure but now your "100th percentile" would be nothing but the timeout.
        Quantifying things like service latency isn't a one-size fits all thing. Every service has its nuances and use cases that make it more meaningful to measure 99%, 99.9%, 99.99% or something else.
        .... just don't measure average like I've seen naive junior devs do. Average latency is the worst of all metrics to use as it will include all the outliers at the very tippy top of the spectrum and basically render the metric meaningless.
scarejunba 1622 days ago
There's a famous 'rebuttal' post to this here https://medium.com/@buckhx/unwinding-uber-s-most-efficient-s...
[-]
- twic 1622 days ago
  So Uber's algorithm takes 58756 ns/op, and the fastest non-brutal algorithm took 471 ns/op. That's about 100 times faster.
  They used 40 machines to serve New Year's Eve traffic. Perhaps with a better algorithm, they could have got away with one.
  (Possibly not, because there's still a lot of HTTP and JSON munging work to be done, and the network card becomes a bottleneck at some point)
- jeltz 1622 days ago
  Another rebuttal which is less through due to the benchmark being very synthetic but still points out that Uber's numbers are nothing impressive.
  https://www.cybertec-postgresql.com/en/beating-uber-with-a-p...
  [-]
  - Thaxll 1622 days ago
    This is a crap article tbh They don't take any requierments and build a 5min poc with no constrains.
    For instance their demo doesn't update anything, Uber updates location every seconds, I'd like to see how PG behaves when you rebuild index thousand time per seconds.
    [-]
    - jeltz 1622 days ago
      No, that was not a requirement. This is not the database for the locations of cars, but the database with human defined geofences which probably only update a couple of times per day. The database is used to check e.g. if a customer is currently at an airport or not. That is a very normal and simple GIS workload.
- merb 1622 days ago
  well basically ubers blog posts are really low quality and from the outside their engineering descisions are kinda vague.
  like the switch from postgres to mysql.
  I'm still clueless how you can have so much money and one of the biggest engineering team, but still can't correctly engineer your stuff. I mean, everybody makes wrong decisions or errors in production code.
  [-]
  - Slartie 1622 days ago
    From what I gathered off of several HN comments of Uber insiders, the size of their engineering team is actually more likely to be one of the causes for bad engineering decisions. They seem to have too many engineers running around for too little actual work to be done, which results in those engineers coming up with stuff to do to keep them busy (and of course to ensure they appear to be busy and worth their money to their superiors).
    One of the best ways to create yourself some work is to not choose an already-proven path to a solution, but invent a new one just for the sake of inventing a new one. Of course that's not how this kind of doing is justified - the justification is usually "the proven path does not scale to our needs" or "by using a special approach adapted to our needs we can be more efficient" or "the proven path is too complex, we can get by with something simpler and easier to maintain". Which might actually all be proper justifications, it's just that you should have some hard proof for these statements, like benchmark results of a comparison of different approaches. That part often gets skipped, which is actually ironic, because doing extensive evaluation and benchmarking and implementing different approaches first before choosing one for production actually serves quite well to create even more work to do.
    [-]
    - asdfman123 1622 days ago
      Also, it's resume-driven development. Five years later they want to be able to be the guy who can say, "Yeah, I was the original designer behind Kafka," but substituting in a brand-new Uber technology.
    - triceratops 1622 days ago
      It might be sub-optimal for Uber but it's better for the industry as a whole to have many different implementations of the same type of solution.
  - jeltz 1622 days ago
    And none of their blog posts about their own geo stuff ever come close to answer the why. What is wrong with e.g. PostGIS or Elasticsearch? Why do they need to reinvent geo databases several times?
    [-]
    - uber42163 1622 days ago
      Teams outside Infra aren't allowed persistent disk. Service authors get to pick from a small catalog of managed operational storage systems. Life would be a lot easier if a scalable Postgres were one of them, but it isn't. For the moment, the offerings with high scalability and good SLAs are all KV stores.
      We do have ES but it's operated for log search, not production critical paths.
    - loriverkutya 1622 days ago
      Too much time and not enough things to do.
      [-]
      - drchickensalad 1622 days ago
        How does this happen?! They're so expensive
        [-]
        Kye 1622 days ago
        Give me twenty billion dollars to make a ride hailing app and I'll show you.
        More seriously: they raised tens of billions of dollars to make a ride hailing app and research self-driving cars. Throwing more money at a problem doesn't solve it faster, but it does pay for hiring people, and headcount is seen as a proxy for doing stuff.
        When a measure becomes a target, it ceases to be a good measure.
        https://en.wikipedia.org/wiki/Goodhart%27s_law
        Some big start-ups[0] solve this conundrum by investing in adjacent companies to find the solutions they're paid to find. Outsourcing! This still has limits because there's so much money flowing around right now. Tossing a few million at a company with hundreds still won't solve the problem faster.
        [0] Start-up definition for this post: a company that has taken funding but hasn't yet found a sustainable business model. That's how Uber is still a start-up with more money than most companies make in decades.
- GrumpyNl 1622 days ago
  Thats a great write up. They should hire him on the spot
  [-]
  - buckhx 1622 days ago
    That'd be nice. For real though I did meet with them after, but it was basically the same as getting a referral so the same interview process which I chose to forgo.
- abgfm 1622 days ago
  Thank you for this :))
  [-]
  - buckhx 1622 days ago
    np ;)
dang 1622 days ago
Related from 2018: https://news.ycombinator.com/item?id=16093192 and https://news.ycombinator.com/item?id=16084090.
Discussed at the time: https://news.ycombinator.com/item?id=11205776.
cryptozeus 1622 days ago
I get all the criticism in comments but imo kudos to the team for going after the problem with new set of eyes and approaching in new language. At the end of the day you are under budget and have scalable fast microservice.
paggle 1622 days ago
"For each lookup, we first find the desired city with a linear scan of all the city geofences, and then find the containing geofences within that city with another linear scan."
Why wouldn't this be dogshit slow if they used all of the city geofences at once? I would think that first they would scan the country geofences, then the province geofences, then the city geofences, etc...
[-]
- mbo 1622 days ago
  It's almost like they could have used some sort of specialised tree data structure where each tree node was assigned to some spatial region on the map, where successive levels were assigned to increasingly smaller regions.
rayvy 1622 days ago
I think this write up is very fair for a solid engineering team. Is it groundbreaking and eye opening? Absolutely not, I’d say the most “hmm I didn’t know that” part of the entire thing was the part about R-Trees and S2. Is that bad? Absolutely not. These guys did the work, logged their performance and are sharing their story.
However, and I believe this is where the animosity in the comments is coming from - given the elitist (for lack of a better term) attitude of these engineering types at these orgs (think of the poster children of the Valley), this is pretty...lacking. I mean, the part about using the Read/Write lock on the second attempt and instead trying to go with an installed package just screams Node.js, and honestly made me chuckle. I guess Leetcoding and Production Engineering really are different things. I genuinely expected more.
[-]
- 0xEFF 1622 days ago
  > I guess Leetcoding and Production Engineering really are different things.
  This is key point.
  They delivered value to the business. That's the only thing that matters.
  [-]
  - privateprofile 1622 days ago
    > They delivered value to the business. That's the only thing that matters.
    Except this is an engineering blog post, so the engineering part actually matters.
    And, as demonstrated in the article [1] linked around this discussion, there were better engineering approaches that would have been even "better for business", as being more efficient means lower costs per transaction and/or higher throughput.
    [1] https://medium.com/@buckhx/unwinding-uber-s-most-efficient-s...
    [-]
    - elbear 1622 days ago
      It matters for engineers reading the article. It doesn't matter for the business. What they delivered was good enough. Could they have delivered a better solutions? Yes.
      Should they have searched for a better solution instead of implementing the one they found? They could have spent some time researching, but you can always miss something. It's better to err on the side of delivering something now with a not-so-good solution than constantly searching for a better one.
      I say this as software developer who is obsessed with efficiency. I'm starting to turn around and focus more on just delivering.
      [-]
      - SilasX 1622 days ago
        >It matters for engineers reading the article. It doesn't matter for the business. What they delivered was good enough. Could they have delivered a better solutions? Yes.
        The entire reason they posted the article is to brag about having accomplished intelligently. It's relevant if their approach was actually not so intelligent.
        "We encountered a standard problem and applied standard solutions" is like "dog bites man". It's not what they were trying to say with the blog post.
        >Should they have searched for a better solution instead of implementing the one they found? They could have spent some time researching, but you can always miss something. [...] I'm starting to turn around and focus more on just delivering.
        I think the critics point is that the efficient way was probably also cheaper than what they did, and would take the same time to implement, and have lower recurring costs. It would have just been a matter of using off-the-shelf tools and not reinventing the wheel because that wheel is "complicated" and "obviously our case is special". (Someone did benchmarks, and their case is not special.)
        You're right, there is a danger to what-if-ing everything and being stuck in decision paralysis. But the clear subtext is that they merit some kind of admiration for how well they did. If that subtext is wrong, it is worth pointing out.
  - ionforce 1622 days ago
    > They delivered value to the business. That's the only thing that matters.
    But this bit isn't true during the interview process.
    [-]
    - vorpalhex 1622 days ago
      That's half of the interview process. The other half is convincing your future-peers you aren't a liability. Provide value, get along well, communicate decently.
  - tomtomtom777 1622 days ago
    > They delivered value to the business. That's the only thing that matters.
    Although hardware is relatively cheap, 170 QPS on 40 servers for this type of query is astonishingly horrible even from a Business/Product Engineering standpoint.
    Boasting this as "highest QPS* engineering achievement is just awkward. It may be better suited for an article on how throwing hardware at problems is cheaper than hiring engineers.
    [-]
    - mkolodny 1622 days ago
      The article mentions 170k QPS, not 170.
  - spookthesunset 1622 days ago
    > They delivered value to the business.
    Sure, but could there have been ways to deliver that value with lower maintenance costs and shorter lead times?
    If all you care about is gross revenue and never focus on margins and cost of goods sold then you can justify any project as "adding value to the business".
kuharich 1622 days ago
Prior discussion: https://news.ycombinator.com/item?id=11205776
gunta 1622 days ago
it's sad to see these "moved from X language to Y language and it became faster!" every-time.
If the CPU is the hugest bottleneck, the best answer is not to optimize the algorithm by going to a lower level language, but rather to invest in a different architecture like GPGPU or FPGA.
For example, this paper shows a significant speed-up for PIP( Polygon in Point) algorithm, going from 15hs (CPU) to a mere 11sec (GPU) in task load-time. https://pdfs.semanticscholar.org/1e51/e3c681e1afc908a41ac253...
[-]
- justincormack 1622 days ago
  That is high throughput design, not a low latency design, which is what they are optimising for here. It is a very different design space.
PunchTornado 1622 days ago
To all the people "why not do it in Java / Java could have done it".
Why not do it in a different language? Having Go and Node around is awesome because we can change the language. Don't you get bored to use just a single language? I'd get so so so bored if I didn't change languages every year. Glad that uber offers Go jobs.
[-]
- jryan49 1622 days ago
  Usually when you're trying to ship something, the more boring (predictable) parts the better.
  [-]
  - PunchTornado 1621 days ago
    but is that the only goal? or you shouldn't enjoy the life? I find that I enjoy my life more as a programmer when I change things.
- jryan49 1621 days ago
  It's not but I also have more fun not getting paged in the middle of the night because something is broken ;)
  It's a balance for sure. I know it's cool to hate on Java, but I actually like Java for it's predictability, and my deep knowledge of it, and when using Intellij, it's a breeze for me to program in.