Django Async: What's new and what's next?

(deepsource.io)

148 points | by sanketsaurav 1350 days ago

11 comments

  • Bedon292 1350 days ago
    I love Django / Django Rest Framework and have used it for a long time, but we recently dumped it from a project in favor of FastAPI.

    There is just so many layers of magic in Django, that it was becoming impossible for us to improve the performance to an acceptable level. We isolated the problems to serialization / deserialization. Going from DB -> Python object -> JSON response was taking far more time than anything else, and just moving over to FastAPI has gotten us a ~5x improvement in response time.

    I am excited to see where Django async goes though. Its something I had been looking forward to for a while now.

    • lmeyerov 1350 days ago
      We ended up with 2 python layers:

      -- Boring code - Business logic, CRUD, management, security, ...: django

      -- Perf: JWT services on another stack (GPU, Arrow streaming, ...)

      So stuff is either Boring Code or Performance Code. Async is great b/c now Boring Code can now simply await Performance Code :) Boring Code gets predictability & general ecosystem, and Performance Code does wilder stuff where we don't worry about non-perf ecosystem stuff, just perf ecosystem oddballs. We've been systematically dropping node from our backend, where we tried to have it all, and IMO too much lift for most teams.

      • VWWHFSfQ 1350 days ago
        Similarly, we ended up doing the same. Boring CRUD/CMS stuff is all in Django. That's 90% of our codebase and by far the most important. Our "user scale" endpoints are all implemented in Lua in NGINX and just read/write to Redis and data changes go into SQS and processed by Celery back in the Django app. It scales phenomenally well and we don't lose any of the great things about developing all of our core biz-critical stuff in Django.
      • simonw 1350 days ago
        "Async is great b/c now Boring Code can now simply await Performance Code" - that's really smart, I like that philosophy.
      • innomatics 1350 days ago
        I like this idea. Also I am looking at a separate GraphQL stack alongside Django for flexible access points.
    • stavros 1350 days ago
      • Bedon292 1350 days ago
        That is quite interesting. There are a lot of things like management, tests, and such that I love and miss from django. Going to have to really think about what I think of this.

        Edit: Although, now that I think a little more its not that surprising. Our initial tests did literally just define FastAPI schemas on top of our existing DB. The co-mingling while actually running is an interesting concept though.

    • Nextgrid 1350 days ago
      Just FYI, for anyone reading this and having the same problem, I suggest they try Serpy which is a near drop-in replacement for default DRF serializers. It might solve your performance problem without having to switch to a completely different API framework.
      • neurostimulant 1350 days ago
        Thanks! I'll check it out. I also seeing pretty bad performance when deserializing complex objects.
      • Bedon292 1349 days ago
        We did look into it. But didn't end up going that route.
    • gonational 1350 days ago
      Django is obviously surpassed in raw performance for more basic applications and APIs, but there definitely isn’t a lot of “magic” in Django.
    • FridgeSeal 1350 days ago
      I recently wrote a python api for work and used FastAPI, I want to like it, but it was doing so much magic behind the scenes that it ended up being frustrating to use and just got in my way, ended up dropping it in favour of using Starlette directly
      • Bedon292 1349 days ago
        What didn't you like about it? Curious where our pain points will be. Only been using it about 6 weeks.
        • FridgeSeal 1349 days ago
          The way it tries to construct the return values kept getting in the way.

          I’d define the class, add it as the return value, I was manually instantiating the class and returning that, but for it didn’t like that and would constantly throw errors about it. I think it was PyDantic which was the root cause there.

          The Depends functionality refused to inject my classes as well, but I was probably doing something wrong there...

          Dropping back to Starlette was good because it gave me everything I needed and got out of my way. I’ve still got everything fully typed and passing MyPy.

    • spanhandler 1350 days ago
      If your DB is Postgres and you can do everything you need to fetch the data in SQL, Postgres can output JSON directly. It’s pretty fast at it. Usually it’s not too hard to do this on a few performance-sensitive endpoints in a framework web project.
      • VWWHFSfQ 1350 days ago
        For performance-sensitive endpoints, Django just isn't the right tool. You can do a lot of optimizations in Django but in reality the WSGI/ASGI overhead and Django's request routing through middleware and view functions or CBVs is extremely slow. Is anyone handling 1,000 requests/second in their Django app without having to run 50 servers? The answer is no. If you're getting to the point where you're trying to figure out how to emit JSON from your database directly, then you've already lost. Django is exceptionally well suited to exactly what it was originally designed for: a content management system and "source-of-truth" for all of the business data in your application. High-velocity "user-scale" is better done in another service.
        • wwwwwwwww 1350 days ago
          Not just Django, but Python.
    • mjhea0 1350 days ago
      Interesting.

      Have any interest in expanding this into a blog post? I've been working on a similar post. Maybe we can compare notes. I'm at michael at testdriven dot io, if interested.

    • IgorPartola 1350 days ago
      Django Rest Framework has really slow serialization. After seeing it in action, I wrote my own simple serializer that I have been using quite a bit. Deserialization isn’t event really needed: just feed the submitted JSON into vanilla Django forms. It works better anyways.
    • buttersbrian 1350 days ago
      What did you use in place of the DRF serialization to get from DB -> json response?
      • Bedon292 1350 days ago
        FastAPI uses Pydantic under it for python objects. And we have been tinkering with orjson for the actual json serialization, since it appears to be the winner in json serialization at the moment.
        • MapleWalnut 1350 days ago
          Why didn't you use Pydantic with Django if the DRF serializers were too slow?

          You can also skip the object serialization from the ORM and work with python dicts directly to significantly improve serialization performance from the database.

        • dec0dedab0de 1350 days ago
          Was there still a significant speedup using the standard library json module?

          For the DB requests Are you writing sql directly, using a different ORM, or something like sqlalchemy core that makes sql pythonic without being an ORM?

          • Bedon292 1350 days ago
            Yeah, the main improvement was seen even before playing with orjson. It did help too I think, but only started it yesterday so haven't actually profiled the two side by side. To have real numbers.

            And it uses SQLAlchemy under the hood. Can use all of it. But if you want full async all the way down, can just use core and something like encode/databases for the DB access.

    • bredren 1350 days ago
      How detailed was the profiling on this? Reason I ask is I’ve faced this myself and had to spend a lot of time on both query and serializer optimization.
      • Bedon292 1350 days ago
        We used `silk` a lot to profile the app. And basically all the time was being spend inside django somewhere between getting the data from the DB and spitting out the response. We would have things like 15ms in the DB, but 250ms to actually create the response. On simple things. Some of our responses were into multiple second (large amounts of data) but still only spending maybe 150ms in the db. And there was at least two weeks spent on and off trying to improve it before we finally decided we had to go somewhere else. And thats after having to redo some of our queries by hand because the ORM was doing something like 15 left joins.
        • dd82 1349 days ago
          You might be interested in https://hakibenita.com/django-rest-framework-slow. If you weren't able to update to a 3.x version that has https://github.com/django/django/commit/a2c31e12da272acc76f3..., this might have bit you pretty hard.
        • ldng 1350 days ago
          I'd be curious to know more about those 15 joins. Why do you think the ORM was doing those ? And what DB are you using ?
          • Bedon292 1349 days ago
            Basically just a complex permission model based on relationships. Much better handled with a subquery. Mostly on is. I don't blame the ORM entirely, but it was more joins than necessary too.
            • ldng 1349 days ago
              Isee. Would love to find an elegant way to use PostgreSQL permissions system from Django that would result in a great perf boost, no doubt.
    • mixmastamyk 1350 days ago
      So how does it get from DB --> JSON response? SQLAlchemy or dbapi?
      • Bedon292 1350 days ago
        Yeah, FastAPI uses SQLAlchemy under it. Along with pydantic to define schemas with typing. And then just started tinkering with orjson for the json serialization. Seems to be the fastest library at the moment.

        I have also been experimenting with encode/databases for async DB access. It still uses the SA core functions, which is nice, but that means it does not do the nice relationships stuff that SA has built in when using it to handle everything. At least not that I have found. However it does allow for things like gets without relationships, updates of single records, and stuff like that quite nicely.

        • mixmastamyk 1350 days ago
          I see, thanks. Is it required to define models twice as this page seems to recommend?

          https://fastapi.tiangolo.com/tutorial/sql-databases/

          • takeda 1350 days ago
            FastAPI is database agnostic, although tutorials talk about using SQLAlchemy (probably because it's most popular).

            I am using asyncpg[1] (much more performant and provides close mapping to PostgreSQL, making it much easier to use its advanced features) through raw SQL statements without problems.

            [1] https://github.com/MagicStack/asyncpg

          • Bedon292 1350 days ago
            Yeah, you do end up defining everything more than once, once for SA, and then for pydantic. Create, Read, and Update may all be different pydantic models as well. They are for defining what comes in and out of the actual API. Your create request may not have the id field yet and some optional fields, and then the response has everything. And then an update may have everything as optional except the id. Only been using it a few weeks now, but liking it a lot so far.

            https://fastapi.tiangolo.com/tutorial/sql-databases/#create-...

        • virtualmic 1350 days ago
          > FastAPI uses SQLAlchemy under it

          This is somewhat inaccurate. They use SQLAlchemy in the tutorial, but FastAPI is in no way tied to SQLAlchemy.

          • Bedon292 1349 days ago
            Valid point. Should have said "can use".
  • silviogutierrez 1350 days ago
    Great article. But I think this part may need as second look:

        If your views involve heavy-lifting calculations or long-running network calls to be done as part of the request path, it’s a great use case for using async views.
    
    That seems true for long-running network calls (IO). But for heavy-lifting calculations? I thought that was the canonical example of situations async won't improve. CPU bound and memory bound, after all.
    • ghostwriter 1350 days ago
      Perhaps they meant that heavy long-running calculations could be offloaded to a worker pool with a help of concurrent futures and run_in_executor()

      - https://docs.python.org/3/library/concurrent.futures.html

      - https://docs.python.org/3/library/asyncio-eventloop.html#asy...

      • pdonis 1350 days ago
        This will only help if the workers are separate processes. Thread workers will hold the GIL in Python and prevent network I/O while they are doing CPU bound tasks.
        • dr_zoidberg 1350 days ago
          > Thread workers will hold the GIL in Python and prevent network I/O while they are doing CPU bound tasks.

          Using cython:

              with nogil:
                  # whatever you need to do, as long as it
                  # doesn't touch a python object
          
          If you're doing heavy calculations from python you should at least be considering cython.
        • ghostwriter 1350 days ago
          sure, the pool only cares about concurrent.futures.Executor interface, an implementation could be processes or cloud resources.
    • Znafon 1350 days ago
      You are correct, async will only help for long-running network calls, which happens when calling another service or querying a database.

      When doing a long computation the CPU is not idle so there is no free compute power to use for something else.

      Finally, when doing IO calls in Python so GIL is usually released so the kernel can already schedule another thread while waiting for IO, so it is not sure that converting to async will yield improvement and should be benchmarked if you plan on converting an existing program.

      • pdonis 1350 days ago
        > when doing IO calls in Python so GIL is usually released so the kernel can already schedule another thread while waiting for IO

        This is true, but scheduling another thread through the kernel can have higher overhead since it requires context switches. Running multiple threads also has other potential issues with lock contention; how problematic they are will depend on the use case.

        The potential advantage of scheduling another thread is, of course, that it can do CPU bound work; but in Python, unfortunately, doing that means the GIL doesn't get released so that thread will prevent any further network I/O while it's running, the same as would happen in an async framework if a worker did a lot of CPU work. So Python doesn't really let you realize the advantages of threads in this context.

        • Znafon 1350 days ago
          > doing that means the GIL doesn't get released so that thread will prevent any further network I/O while it's running, the same as would happen in an async framework if a worker did a lot of CPU work. So Python doesn't really let you realize the advantages of threads in this context.

          I don't think that's true, the GIL get released for many computing intensive or IO bound tasks in Python, for example when reading from a socket the GIL gets released at https://github.com/python/cpython/blob/e822e37946f27c09953bb...

          • pdonis 1350 days ago
            > the GIL get released for many computing intensive or IO bound tasks in Python

            I/O bound, yes, since that requires system calls, and system calls (reading from a socket is an example of a system call) release the GIL.

            Computing intensive, no. Code that is doing a CPU intensive computation but makes no system calls will never release the GIL.

            • Znafon 1350 days ago
              > Computing intensive, no. Code that is doing a CPU intensive computation but makes no system calls will never release the GIL.

              Any code that does not involve Python objects can release the GIL, no matter whether it makes system call or not.

              For example, NumPy the most popular scientific computation package in Python, on which many other popular packages like Pandas are based, releases the GIL when doing operation on matrix. This is documented at https://numpy.org/doc/stable/reference/internals.code-explan...:

              > If NPY_ALLOW_THREADS is defined during compilation, then as long as no object arrays are involved, the Python Global Interpreter Lock (GIL) is released prior to calling the loops. It is re-acquired if necessary to handle error conditions.

              To do so it uses the same macro used by the socket module when doing system calls: https://github.com/numpy/numpy/blob/18a6e3e505ee416ddfc617f3...

              • pdonis 1349 days ago
                > Any code that does not involve Python objects

                And does not involve running Python bytecode. Yes, numpy and other packages that provide C extensions do this when they are doing computations that don't require running Python bytecode.

                > no matter whether it makes system call or not

                Yes, you're right, my statement was too broad.

            • jashmatthews 1350 days ago
              The GIL gets released whenever it gets released. C extensions like zlib release the GIL while (de)compressing requests/responses.

              https://docs.python.org/3/c-api/init.html#releasing-the-gil-...

              • pdonis 1349 days ago
                > C extensions like zlib release the GIL while (de)compressing requests/responses

                Yes, you're right, my statement was too broad.

          • colinmhayes 1350 days ago
            GIL gets released on blocking calls. CPU intensive is synonymous with non-blocking calls.
        • zzzeek 1350 days ago
          there is an advantage to threads in the CPU bound case which is that the work of other threads will not be blocked for a CPU-intense operation. With an IO-event based scheduler, your CPU bound task will not context switch leading to network logic elsewhere to simply time out. A particularly acute example is something like a network library logging into the MySQL database which gives the client a ten second window to respond to the initial security challenge. It was both an extremely difficult bug for me to diagnose as well as helpful for my role at work that I was able to track that one down in Openstack :).
    • dec0dedab0de 1350 days ago
      I thought the only reason to use ASGI is to use web sockets, and the only reason to use web sockets is to avoid making multiple requests for things that don't matter if a particular message is lost.
  • abledon 1350 days ago
    Whats the most elegant way for cutting edge Django to do websockets? is it still to 'tack' on the channels package [0] ?

    compared to FastAPI[1] I really don't want to use it, I only miss the ORM since in FastAPI it looks like you have to manually write the code to insert stuff[2].

    [0] https://realpython.com/getting-started-with-django-channels/

    [1] https://fastapi.tiangolo.com/advanced/websockets/#create-a-w...

    [2] https://fastapi.tiangolo.com/tutorial/sql-databases/#create-...

    • scrollaway 1350 days ago
      As someone who's done over a decade of Django work: Do use FastAPI, especially if you need websockets and such.

      Django is great for CRUD apps, MVPs and such. And I've used it with success for larger platforms, but it doesn't take long for me to want something closer to the metal whenever I need custom work. FastAPI has filled that need wonderfully well.

      I also miss the ORM though… SQLAlchemy is a pain.

    • tln 1350 days ago
      I rolled a microservice in node to handle websockets from our Django app. I can push a notification into an event queue, make an HTTP call to send a notification, and even use postgres notify/listen.

      FWIW I'd have just used Pusher.io and triggered via django signals but I didn't want clients to be able to query the list of channels, or have to think about costs.

      This was more of a Friday hack for fun rather than a core product requirement for profit. But I feel like using a separately deployed and scaled service, agnostic of business logic, was the right move.

  • hyuuu 1350 days ago
    time and time again, whenever I start a new project, Django has always been my go-to choice after analyzing the alternatives. I've worked on large scale, mono-repo, billion users to side projects over the weekend, Django really stay true to the batteries included philosphy.
  • dec0dedab0de 1350 days ago
    I think the article and some of the comments are not really looking at this the right way.

    For most things you're probably better off "doing the work" in a celery task, regardless if it is IO bound or CPU bound. Then use web sockets just for your status updates/progress bar, instead of having your front end poll on a timer.

    • emptysea 1350 days ago
      The downside of using web sockets is they complicate deployment and reliable delivery is more difficult than `/status?since=$pk`
  • honkycat 1350 days ago
    I love django, have not used it in years though. I've been in JavaScript land.

    I'm consistently surprised that there are not awesome web frameworks in JavaScript similar to Django

    • darkhorse13 1350 days ago
      Many have tried, with mixed to poor results. I am curious to understand why one hasn't emerged as a clear winner like Rails/Django/Laravel.
    • anaganisk 1350 days ago
      I recently stumbled upon keystone and Strapi, they seem a potential contender
      • midrus 1350 days ago
        I'm also coming from Django. I've found Keystone to fit my brain a lot better than most alternatives. Tried strapi and it looks good for basic stuff but the documentation is just absolutely terrible, and once you get out of the really basic stuff you're on your own digging into their source code to understand how to do anything. Nonetheless, ok it looks promising and maybe in a few years it could be something more interesting (to me at least).
        • midrus 1350 days ago
          Oh, and by the way, the reason I'm not using keystone either is because it doesn't support sqlite (for real).
    • 3131s 1350 days ago
      Check out Adonis, although it's not as complete as Django.
  • djstein 1350 days ago
    I’ve seen lots of blog posts, even the Django docs, saying async is available but still haven’t seen any real world examples yet. Do any exist?

    Also, I still haven’t seen how the async addition will work with Class Based Views. Also, Django Rest Framework is still considering spending time for support. Until these two use cases are viable many users won’t benefit.

  • kissgyorgy 1350 days ago
    > If your views involve heavy-lifting calculations ...

    Nooo, not at all. Your tasks should be I/O bound, not CPU bound to take advantage of asyncio. Maybe the async server using multiple threads with multiple event loops, but don't ever do a CPU-heavy task in an event loop because you just invalidated using asyncio completely.

  • IgorPartola 1350 days ago
    Looking at the async view example, at what point can we just drop async/await keywords and just have Python assume that everything is asynchronous?
    • anaganisk 1350 days ago
      javaScript ecosystem went out of the way to bring async await keywords, despite Node.js being asynchronous by using call backs and promises. The argument being code readability, While async await are just wrappers around Promise system.
  • leafboi 1350 days ago
    Wasn't there an article about how the async syntax was benchmarked to actually be slower than the traditional way of using threads? What's the current story on python async?

    reference: http://calpaterson.com/async-python-is-not-faster.html

    • bob1029 1350 days ago
      I think the story with async is always "it depends", unless we are questioning whether the specific implementation is broken.

      For some web applications, it might actually be faster (in meaningful aggregate volume) to service a complete request on the calling thread rather than deferring to the thread pool periodically throughout execution. I think the break over point between sync and async comes down to how much I/O (database) work is involved in satisfying the request. If each request only hits the database 1-2 times on average incurring a few milliseconds of added latency, making sync all the way down is might be better than with any amount of added context switching. If each request may take 100-1000 milliseconds to complete overall due to various long-running I/O operations, then async is certainly one good approach for maximizing the number of possible concurrent requests.

      In most of my applications (C#/.NET Core) I default to async/await for backend service methods, because 9/10 times I am going to the database multiple times for something and I cannot always guarantee that it will return quickly under heavy load. For other items, I explicitly go wide on parallelizable CPU-bound tasks. All of these are handled as a blocking call against a Parallel.ForEach(). Never would a CPU-bound task be explicitly wrapped with async/await, but one may be included as part of a larger asynchronous operation.

      This stuff used to confuse the hell out of me, and then I finally wrapped my head around the 2 essential code abstractions: async/await for I/O, Parallel.For() (et. al.) for CPU-bound tasks which have parallelism opportunities. Never try to Task.Run or async/await your way out of something that is CPU-bound and is blocking the flow of execution. Try to leverage asynchrony responsibly when delays >1ms are possible in large concurrent volumes.

    • pdonis 1350 days ago
      The "slower" is not really the problem--as the article notes, the sync frameworks it tested have most of the heavy lifting being done in native C code, not Python bytecode, whereas the async frameworks are all pure Python. Pure Python is always going to be slower than native C code. I'm actually surprised that the pure Python async frameworks managed to do as well as they did in throughput. But of course this issue can be solved by coding async frameworks in C and exposing the necessary Python API using bindings, the same way the sync frameworks do now. So the comparison of throughput isn't really fair.

      The real issue, as the article notes, is latency variation. Because async frameworks rely on cooperative multitasking, there is no way for the event loop to preempt a worker that is taking too long in order to maintain reasonable latency for other requests.

      There is one thing I wonder about with this article, though. The article says each worker is making a database query. How is that being done? If it's being done over a network, that worker should yield back to the event loop while it's waiting for the network I/O to complete. If it's being done via a database on the local machine, and the communication with that database is not being done by something like Unix sockets, but by direct calls into a database library, then that's obviously going to cause latency problems because the worker can't yield during the database call. The obvious way to fix that is to have the local database server exposed via socket instead of direct library calls.

      • leafboi 1350 days ago
        >whereas the async frameworks are all pure Python.

        No it's not pure python. It's a combination. The underlying event loop uses libuv, a C library that's makes up the underlying core of nodejs. The marker of "Uvicorn" is an indicator of this as "Uvicorn" uses uvlib.

        Overall the benchmark is testing a bit of both. The event loop runs on C but it has to execute a bit of python code when handling the request.

        >If it's being done via a database on the local machine, and the communication with that database is not being done by something like Unix sockets, but by direct calls into a database library, then that's obviously going to cause latency problems because the worker can't yield during the database call.

        I am almost positive it is being done with some form of non blocking sockets. The only other way to do this without sockets is to write to file and read from file.

        There is no "direct library calls" as the database server exists as a separate process to the server process. Here's what occurs:

          1. Server makes a socket connection to database.
          2. Server sends a request to database
          3. database receives request, reads from database file.
          4. database sends information back to server. 
        
        Any library call you're thinking of that's called from the library here may be a "client side" library meaning that the library actually makes a socket connection to the sql server.
        • pdonis 1350 days ago
          > I am almost positive it is being done with some form of non blocking sockets.

          Database libraries in Python that support this (as opposed to blocking, synchronous sockets, which are of course common) are pretty thin on the ground. That's why I would have liked to see more details in the article about exactly how the benchmark was doing the database queries.

          > There is no "direct library calls" as the database server exists as a separate process to the server process.

          Yes, you're right, I wasn't being very clear. The key question is, as above, whether nonblocking sockets are being used or not.

    • toxik 1350 days ago
      None of the other replies acknowledge this but it seems you are conflating concurrency and asynchronous. An asynchronous program can be sequentially executed. It is a distinct concept.
      • leafboi 1350 days ago
        The async await implementation in python and basic python threading is concurrent under IO. No conflation.
    • reticents 1350 days ago
      Wow, thank you for this link. I appreciate it when my assumptions are challenged like this, particularly given the fact that I have a tendency to take benchmark synopses like FastAPI's [1] for granted. I'll have to be more conscious of the ways in which authors hamstring the competition to game their results.

      [1] https://fastapi.tiangolo.com/benchmarks/

    • tomnipotent 1350 days ago
      The built-in event loop has meh performance, would love to see the benchmarks re-run using libuv - that would help close some of the gap.
      • leafboi 1350 days ago
        They max out the speed with tests. It does use libuv. Uvicorn is the indicator as it uses libuv underneath.

        If you heard of Gunicorn, Uvicorn is the version of Gunicorn with libUV, hence the name.

      • calpaterson 1350 days ago
        Hi, I am the author of the above article. Libuv was used - for example by the uvicorn-based versions.
    • syndacks 1350 days ago
      Why is this being downvoted? Seems like a fair counter-point to me.
      • ghostwriter 1350 days ago
        I didn't downvote it, but apart from the fact that async io is not meant to be faster (it's all about throughput, after all), the benchmark is flawed and it's been discussed in full before https://news.ycombinator.com/item?id=23496994
        • leafboi 1350 days ago
          asyncio is meant to be "faster" for IO heavy tasks and low compute. The benchmark tests requests per second which is indeed directly testing what you expect it to test.

          It's been discussed before but the outcome of that discussion (in the link you brought up) was divided. Highly highly divided. There was no conclusion and it is not clear whether the benchmark was flawed.

          The discussion is also littered with people who don't understand why async is fast for only certain types of things and slow for others. It's also littered with assumptions that the test focused on compute rather than IO which is very very evidently not the case.

          • ghostwriter 1349 days ago
            > asyncio is meant to be "faster" for IO heavy tasks and low compute.

            the point is that it's not meant to be any faster than a parallel pool of processes that perform the same heavy IO without blocking all requesting clients. asyncio is about packing as many concurrent socket interactions into a single process as possible, hence optimising for throughput by giving up the speed that gets eaten up by task context-switching. Hence the flaw in the becnhmark. The benchmark was run on the same machine where Postgres was operating. The benchmark used different number of processes for sync and async workloads, the connection pool was not setup to prevent blocking when a coroutine tries to acquire a connection from the pool when the pool is exhausted (for benchmark purposes it should not have upper bound limit and to be pre-populated with already established connections).

            • leafboi 1349 days ago
              >The benchmark used different number of processes for sync and async workloads.

              Wrong. Workers amounts are the same. See the chart with benchmark results. http://calpaterson.com/async-python-is-not-faster.html

              >The benchmark was run on the same machine where Postgres was operating.

              This wouldn't affect the variance between sync and async results very much because both frameworks were run on the same machine.

              >the connection pool was not setup to prevent blocking when a coroutine tries to acquire a connection from the pool when the pool is exhausted.

              Real world connection pools have an upper bound limit. I don't see why setting an upper bound limit to be closer to reality is not a good test.

              Also you're completely wrong about the connection pool blocking when it is exhausted. See source code:

              https://github.com/calpaterson/python-web-perf/blob/master/a...

              If all connections are exhausted then the system still yields to compute and incoming requests.

              >(for benchmark purposes it should not have upper bound limit and to be pre-populated with already established connections).

              Disagree. The real world sets an upper bound. There's nothing wrong with simulating this in a test.

              • ghostwriter 1349 days ago
                > Wrong. Workers amounts are the same.

                I see that aiohttp has 5, uwsgi has 16, and gunicorn has 12, 14, and 16 depending on a web-framework, is this your definition of the same?

                The author says:

                > The rule I used for deciding on what the optimal number of worker processes was is simple: for each framework I started at a single worker and increased the worker count successively until performance got worse.

                That's not how benchmark is supposed to be conducted, one doesn't fit workers number to the result that one finds "optimal", one should use the same amount of workers and find bottlenecks and either eliminate them or explain why they cannot be eliminated without affecting benchmark invariants.

                > this wouldn't affect the variance between sync and async results very much because both frameworks were run on the same machine.

                it will affect the variance. Firstly, because the db will spawn processes on the same machine, pgbouncer will spawn processes on the same machine, they all will compete for the same CPU and the order of preemptive context switches affects individual benchmark runs differently. On top of that, there are periodic and expensive WAL checkpoints, and fsync that competes with a benchmark for the kernel system call interruptions and context switches, and the multi-process workers setup may be affected dramatically. If you don't believe that external processes affect the numbers to the extent they become incomparable, try to serf the Internet with your web-browser randomly while running a benchmark.

                > Real world connection pools have an upper bound limit. I don't see why setting an upper bound limit to be closer to reality is not a good test.

                Because benchmarks are not real-world workloads, they are designed to show unbound performance of the implementation detail that is selected for a test, where external resources are non-exhaustable for the purpose of avoiding side-effects external to the functionality that is being tested.

                > Also you're completely wrong about the connection pool blocking when it is exhausted. See source code: > If all connections are exhausted then the system still yields to compute and incoming requests.

                I didn't say that it wouldn't yield, I said that the coroutine will be blocked at the point where it tries to acquire a non-existing connection from the pool, which affects the benchmark. Now, instead of one blocking context switch at a network socket call that queries Postgres, the coroutine will yield AND WAIT twice - at the exhausted connection pool, and at the network socket call after the connection is acquired. This is exactly the reason why resources should be unbound, and why DB should be on a separate machine (unbound spawning of connection processes upon request), and why the number of OS workers should be the same in all benchmarks, because the sync version will also block twice, and the consequence of blocking there will be much more dramatic and different than in the case of async, WHICH IS THE POINT of a proper benchmark - https://github.com/calpaterson/python-web-perf/blob/master/s...

    • theptip 1350 days ago
      This article doesn't evaluate the case that you actually want ASGI for, so I don't think it's very useful. (Or at least, it confirms something that should have already been clear).

      If you're compute-bound, then Python async (which uses cooperative scheduling similar to green threads) isn't going to help you. You get parallelism, but not concurrency from this progamming model; only one logical thread of execution is running on the CPU at a time (per-process), so this can only slow you down if you are CPU-constrained.

      The standard usecase of a sync API backed by a local DB with low request latency is typically going to be compute-bound.

      This is covered in the Django async docs (https://docs.djangoproject.com/en/3.1/topics/async/) and also in green threading libraries like gevent (http://www.gevent.org/intro.html#cooperative-multitasking).

      The case where async workers are interesting is for I/O-bound workloads. Say you're building an API gateway, or your monolithic API starts to need to call out to other API services, particularly external ones like Google Maps API. In this case, the worst-case result is that the proxied HTTP request times out; this could block your Django API's work thread for many seconds.

      In the async / green-threaded model, this case is fine; you have a green thread/async function call per request, and if that gthread is blocked on an upstream I/O operation, the event loop will just start working on a different API call until the OS gives a response on the network socket.

      Essentially, there's no reason to use Django async if you're doing a traditional monolithic DB-backed application. It's going to give you benefits in usecases where the standard sync model struggles.

      (Note, there's an argument that you might want green threads even in a normal monolith, to guard against cases like "developer accidentally wrote a chunky DB query that takes 60 seconds to run for some inputs", but most DB engines don't support one-DB-connection-per-HTTP-connection. There was a bunch of discussion on this topic a few years ago, with the SQLAlchemy author arguing that async is not useful for DB connections: https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a... although asyncio support was added: https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.htm...)

      • DangitBobby 1350 days ago
        > The standard usecase of a sync API backed by a local DB with low request latency is typically going to be compute-bound.

        > This is covered in the Django async docs

        Where is this mentioned in the linked docs? I see four mentions of the ORM in the linked pages, including where it says they are working on async ORM, but I see nothing about performance for the ORM typically being compute bound.

      • leafboi 1350 days ago
        The tests aren't compute bound. They are testing requests per second. It's testing biased towards IO, not compute. Please read the article.
        • theptip 1350 days ago
          I read the article, thanks. I think you've missed my point. I'll TL;DR it for you to make it clearer:

          > This article doesn't evaluate the case that you actually want ASGI for, so I don't think it's very useful.

          > The standard usecase of a sync API backed by a local DB with low request latency is typically going to be compute-bound.

          (Note, I'm specifically talking about Django here)

          > Essentially, there's no reason to use Django async if you're doing a traditional monolithic DB-backed application. It's going to give you benefits in usecases where the standard sync model struggles.

          My claim is that these benchmarks are not looking at the use-case that Django async is intended to solve. It's not about increasing throughput to your local DB, and so it's not surprising that you don't see an improvement in benchmarks testing that case. Django's async is intended to enable API-gateways and other long-running requests where the upstream latency can be long enough to starve your API worker threads.

          Regarding compute-bound vs. I/O-bound, I'm sure YMMV, but my APM tracing for a mature non-trivial production Django API shows that waiting on the DB accounts for about 25% of the total request time across all my endpoints.

          Serialization/deserialization takes an embarrassing amount of time in Django, see https://news.ycombinator.com/item?id=24161828 for example. This framework is optimized for developer productivity, not for performance.

  • ArtDev 1350 days ago
    I had a bad experience with Django. I found it cluttered and slow. I really wanted to like it. It might seem funny but a more straightforward framework like Symfony didn't get it in the way and ended up much faster. Python should be much much faster than PHP but I guess the framework matters a lot too.
    • IceWreck 1350 days ago
      > Python should be much much faster than PHP

      How? Afaik PHP is faster than Python in most aspects.