Python as a declarative programming language (2017)

(benfrederickson.com)

139 points | by akashtndn 2139 days ago

8 comments

  • mlthoughts2018 2139 days ago
    I was sad that Cython was only given a fleeting nod at the end and numba wasn’t even mentioned. I might even go so far as to say that Cython is the most user-friendly way to write new pure C programs that there is. If you’re about to write some greenfield project in C, seriously consider using Cython, even if you have no desire for any Python entrypoints in the final product. Numba as well is quite impressive: between jitclass, deferred_type, and facilities to target GPUs, it’s crazy how much speedup you can get with little extra code.

    Overall I think people don’t clearly state the speed trade-off Python presents. Everyone focuses on the “40 times slower” part and nobody points out that “equivalent program in C++” is a glib false equivalence, because a truly equivalent program would have to offer all the same run-time dynamic type manipulation and nearly arbitrarily modifiable object instances involved.

    What Python really proposes is this: “I’ll give you 100x more flexibility in exchange for 40x worse performance for certain classes of operations.”

    Don’t want the 40x slowdown? Das cool. Code around it by using extension modules that let you specify homogeneously typed contiguous memory arrays (or bare metal structs like with ctypes or numba’s jitclass), or let you bypass CPython overhead because e.g. you know you don’t need the flexibility offered by the data model looking up a generic python function to call for an __add__ operation, and instead you can arrange for a GIL-releasing call to a pure C function. And so on.

    I guess my point is that the story should be “look what Python gives you” and then “this flexibility had a cost.”

    But instead it always feels more like, “this C pseudocode for this tight algorithm looks superficially similar to Python syntax for that algorithm ... but oh boy look how much slower Python is!” —- which, apart from a possible surprise when you have never looked into Python before, should be of interest to badically no one.

    • srean 2139 days ago
      > I might even go so far as to say that Cython is the most user-friendly way to write new pure C programs that there is.

      Apart from my vehement disagreement with the claim above I am in violent agreement with the rest of you comment.

      If I want_to / need_to write C or C++ I would rather write it in, well, C or C++. The reasons are many -- more mature tooling, deeper social knowledge, one less source of undocumented or unintended edge cases etc., etc.

      But when I want to be in a Pythonic frame of mind, which is fairly often, and want to freeze the semantics of selected parts of the code to a C or C++ like statically typed semantics, Cython is really nice. With Cython I can choose to do this where I think the code will benefit from freezing the semantics. Often I dont have to rely too hard on my own thinking because the Cython compiler would identify those sections for me.

    • dkersten 2139 days ago
      > a truly equivalent program would have to offer all the same run-time dynamic type manipulation and nearly arbitrarily modifiable object instances involved.

      In actual production code, I rarely need this though. I mean, sure, for framework code, but in C++ I can do quite a lot with the standard library and templates. Sure it’s not so runtime flexible as Python, but for normal production code, I’ve rarely found it necessary. Don’t get me wrong, I absolutely prefer Python for most of the use cases that it’s conmonly used for, but I don’t agree quite agree that the quoted text above matters for most uses.

      > What Python really proposes is this: “I’ll give you 100x more flexibility in exchange for 40x worse performance for certain classes of operations.”

      Have you used a recent version of C++? I think the 100x might have been true 15 years ago, but, while Python has a lot, lot more runtime flexibility, modern C++ is a pretty productive environment, without sacrificing performance.

      I’m not arguing that Python isn’t more flexible, especially at runtime, or that it’s not more productive. Of course it is. But you’re paying a performance premium for it and I don’t think the productivity gap is as large as the performance gap anymore.

      With all that said, I do agree with your overall message. Python gives you productivity and flexibility and it’s important to focus on that rather than performance and, as you say, Cython gives you options when you do need performance. But the difference between Python and C++ nowadays, for normal production code, isn’t that huge anymore in my opinion.

      Of course, there are plenty of reasons to prefer Python though: fantastic web frameworks and batteries included libraries, fantastic community, fantastic documentation to name a few.

      • mlthoughts2018 2139 days ago
        I won’t be able to address all the points. But in my experience, C++ has been going in the wrong direction in terms of simplicity or productivity. Modern C++ is drastically more complicated and riddled with deep esoterica. The flexibility benefit modern Python gives is probably more like 1000x, further mitigating many cases when there are concerns about the flexibility/performance trade-off.

        Just for one example about what I mean with modern C++: < https://bitbashing.io/std-visit.html >.

        This example is relevant because you mentioned you feel you never need Python’s flexibility in the day to day. But in Python (much like in many functional languages), the entire idea of the visitor pattern can be boiled down to map() —- just one function call! If you want to specialize this for your own data type, you can very easily implement the iterator protocol, often simple __iter__, __getitem__, and __next__, which themselves could use e.g. numba to iterate some homogeneously typed tree class with pure C performance characteristics, or a sum type (like with Cython fused types) as in the std::visit example, etc.

        The point is that the data model’s flexibility makes this trivial, rather than convoluted. In the cases when I’d care about this pattern, it would be no contest that Python offers hugely improved productivity.

        So it’s not just naive flexibilty of modifying object instances, but the bigger picture of flexibility in the data model overall.

        • cpitman 2138 days ago
          > The flexibility benefit modern Python gives is probably more like 1000x

          If true, that would mean that a 1 year C++ project could be done in 2 hours, and a 10 year project in half a week.

          • mlthoughts2018 2138 days ago
            Development time is not the same as flexibility, particularly because, as others have pointed out in comments, the lack of flexibility in C++ usually manifests itself by the choice to exclude huge swaths of feature sets.

            I see this Python advantage meaning in C++ I have to give up more possibilities if I want to restrict myself to a subset whose complexity can be reasonably managed.

            Just think of how often people even suggest C++ without exceptions. This is a mark of inflexibility that people view the complexity cost of these techniques as too high, and so must write code that intentionally cannot use them (and so you have to contort even more to imitate their intended behaviors).

            This is a much more important thing than mere developer time, which is a shallow way to look at it.

        • stochastic_monk 2139 days ago
          It primarily depends on the size of the project. Past a certain size, I am definitely more productive in C++ than in Python. Though the compile-time checking of Cython might balance things out and make scaling a Python project easier.

          std::visit is broken, sure. std::vector<bool> is a mistake, too. You don't have to use it. Select a subset of the language which gives you what you need. That's part of what's great about having so much in the language.

          • jchmbrln 2138 days ago
            > You don't have to use it. Select a subset of the language which gives you what you need. That's part of what's great about having so much in the language.

            This is the precise reason I've not yet devoted the time to learn C++. I'm a little scared off by having to learn the biggest programming language and also which large sections of it to avoid before I can be productive.

        • dkersten 2139 days ago
          > The flexibility benefit modern Python gives is probably more like 1000x

          In my personal experience, this hasn’t been true for me. Granted, I’ve been mostly ignoring C++17 because I like to give compilers a little time to catch up, so I haven’t been exposed to std::visit beyond reading about it, but C++11 (and 14 to a lesser extent) have been really productive for me. It’s more verbose for sure, but that’s largely a non-issue especially with editor support. I also find it easier to manage and localise state in C++ (I’m a Clojure developer at heart so carefully management of mutable state is important to me) simply because in Python everything (including method dispatch) is mutable and in large codebasrs it can be very difficult to know when things are being changed behind my back. C++ allows you to mutate things too, of course, and since you have low level memory access you can do some truly horrific things, but most of the time it doesn’t happen outside of isolated performance optimisations so it hasn’t been an issue for me.

          TL;DR: Python is great and productive, but I personally am really productive in C++ too.

          EDIT: you edited while I was writing, so in response to the latter part of your message:

          std::visit, at least from my understanding of it, is really rather misnamed. It’s not a generic visitor pattern that you might apply over a collection but a tool for applying different functions on a variant, based on the type contained in the variant.

          Your example of Python visitor pattern boiling down to map() and a few dunder functions isn’t really different from C++ where it boils down to iterators. You have a lot of the flexibility available at compile time with templates too, my comment about not needing the flexibility was directed specifically at runtime modifying of objects (messing with its dispatch dict), stuff like overriding dunder methods is akin to templates in C++ in many ways and that flexibility exists in both languages.

          Btw to be clear, I’m not saying it’s not easier in Python, because it obviously is easier, I’m just saying that for me the productivity gap isn’t thaaaat large for most code, I don’t need the runtime flexibility in most code and C++ doesn’t pay a performance premium.

          I should add that I’m nitpicking a bit and not suggesting people use C++ over Python at all. I agree with a lot of what you’re saying, my disagreement is really over the productivity gap, since if you truly need the performance (most use cases don’t), then going directly to C++, at least for me, is often a sensible choice.

      • fjsolwmv 2138 days ago
        > I don’t think the productivity gap is as large as the performance gap anymore.

        It's impossible to make a general statement like this. Every project has a different performance budget.

        • dkersten 2138 days ago
          Sure it’s a bit over-general. Having said that, when I write bad C++ code t still typically runs faster than decent Python code, for typical use cases. I also don’t find the productivity loss of using C++, as a language (I mean, I’d still be more productive using django than a c++ web framework, just because django is super comprehensive), isn’t all that great. Also, when I need performance in C++, I have a lot more options than in Python and even on core data structures I have options (eg in Python I have a list, in c++ I have linked lists, vectors, deques, destructive versions of these, arrays, trees... memory pooled versions, cache line aligned versions, parallel versions etc).

          Don’t get me wrong, for many use cases where Python is often used, like web development, I’d much rather use and be more productive in Python, but I disagree with the premise that Python is so much more productive than C++ that the performance overhead is negligible. Even if in most cases the cost is worth it.

          It’s worth noting a counter to my argument though and that is for data analysis, Python is largely just glue between libraries that are implemented in C, so the performance difference here is largely negligible.

    • jlarocco 2139 days ago
      > Overall I think people don’t clearly state the speed trade-off Python presents. Everyone focuses on the “40 times slower” part and nobody points out that “equivalent program in C++” is a glib false equivalence, because a truly equivalent program would have to offer all the same run-time dynamic type manipulation and nearly arbitrarily modifiable object instances involved.

      Unfortunately, Python doesn't do well by that standard, either. For a few years now I've switched to Common Lisp for all of my small projects, and IME it's 20-30 times faster than Python by default, and more dynamic and flexible.

      Declare types and turn on optimizations, and it's more like 30-40 times faster, though still generally a little slower than C or C++. And then it also can access C and Fortran libraries to speed things up further.

      I like Python, it's my second favorite language, but it's not perfect.

    • mncharity 2139 days ago
      This seems a long-standing problem.

      Even with very early Python, it was possible to have objects providing an API of C pointers, to be dynamically combined with other such objects, to create runtime assemblages of C code, with only C's dynamically-linked function call overhead between the parts. And with cached runtime invocation of gcc and dynamic loading, it was possible for objects to provide C macro apis, and combine them into runtime-specialized fully-optimized code.

      But the approach didn't catch on. The community built C monoliths. Often multiple semi-abandoned monoliths with heavy feature overlap, but never code sharing. And then Windows slowly became important, with its lack of compiler access.

      So the problem seemed a social one, rather than technical. Far fewer people using python, and not enough of them cared about such things to achieve critical mass. And so the approach never matured or gained momentum. Was never embraced as "pythonic". In an alternate timeline, we did such very neat stuff with it. :/

      • mncharity 2130 days ago
        ERRATA: The assemblages created from objects carrying C pointers had only C function pointer call overhead (of course). It was different scheme, playing linker games, that had GOT overhead.

        And "multiple semi-abandoned monoliths with heavy feature overlap, but never code sharing" missed a key characteristic - the monoliths were all very partial solutions. So a common dynamic was "I need feature X, as in monolith A, and feature Y, as in monolith B, so... I'll create a new monolithic library C, with those features, but missing others from A and B that I don't care about". Leaving the next person in the same unhappy place, and scattering the little effort available into even smaller communities.

    • SlowRobotAhead 2139 days ago
      That’s certainly a fair way to put it. I guess I don’t encounter the type who really get upset that ”Python is too slow!! I’d rather use C++” because that takes a certain type of person who is technically able but practically challenged. I personally find myself surrounded by the opposite peoples.
      • mlthoughts2018 2139 days ago
        This is the beauty of Cython & numba for me. It lets you selectively target just the small sections of the code that need a speed-up for an understood reason.

        You can use nice Python profiling tools to gather evidence about which program hotspots truly are too slow, and then just target those for specialization either as extension module functions/classes in Cython (or even a manually written extension module, or a plain C library used via cffi), or as a JIT function or class with numba.

        You can use the annotation features of Cython & numba to further profile locations that have avoidable CPython API overhead, and it becomes a very evidence-based and iterative optimization cycle.

        At the end, for 90% of the Python code that was more than fast enough already and didn’t touch performance critical sections, you just got to save all the time & trouble of needing to rewrite it (usually painfully) and debug it and re-test it all in C or C++.

        This is why, for me, the entire idea that you should start a project out in C++ or C is a giant type of premature optimization in a lot of cases (not all cases but a lot).

        Did you prototype the boilerplate parts in Python and actually measure the sort of performance hotspots you’ll have to optimize, to a degree that you know for sure that some light sprinkling of Cython & numba certainly won’t work? If not, it’s probably premature optimization to presume you’ll need C or C++ as the primary implementation language.

        • CogitoCogito 2139 days ago
          Your defense of python in these comments is great! I feel like you're presenting my own personal philosophy...just more coherently than I usually manage.

          I agree the python first micro-optimizations second approach is usually so much better than anything else unless you absolutely know you're going to need something else for [insert reason]. Even then I think it can be great for an initial throw-away implementation (at least the high-level ideas) simply because it's so easy to focus on the idea over the details.

        • SlowRobotAhead 2139 days ago
          I use C day in day out for embedded. So you’ll get no complaints from me!

          What I’ll find interesting is if the CPyrhon idea is ever pushed to embedded. Because right now the state of microPython or Zerynth trying to be a one stop shops for micros but in Python isn’t attractive.

          Then again, I don’t want or need pythons standard libs - what I really want is all C code with an option to ultra lightweight run scripting in Python. So 90% C, 10% Py.

          No option exists yet.

          • jononor 2137 days ago
            Maybe Cython can be compiled to target microcontrollers via LLVM?

            Right now I'm usually writing embedded application logic as C++ in a pure functional style. Then exposing this to Python using pybind11 and testing/evaluating it in Python. Which is pretty decent

          • AlexCoventry 2139 days ago
            Lua sounds like a good fit.
            • SlowRobotAhead 2139 days ago
              Lua for embedded is pretty bad. We tried it, and the all options lacked something.

              eLua is out of date and tries to be 90%-100% Lua by wanting you to run the VM exclusively and the micro support is cortex M3 at best where they want you to use their headers and peripheral libs.

              No one seems to have the concept of a FFI (foreign function interface) worked out correctly that would allow for 90% C and calling C funcs from Lua.

              Lua Proper can’t ever agree on Lua VM or JIT. JIT is bad for embedded because bytecode is really what you want for serialization.

              Lua overall seems good but find me a 20-30K ROM non- dynamic memory VM that doesn’t include std lib nonsense you wouldn’t want on a micro (time, strings, etc)... because I looked and didn’t see anything close.

              Roll your own is for hobbiests or 30+ person teams.

              I’d take Lua or Python, but the lightweight scripting VM isn’t ready for embedded yet.

              Which is why I have hopes for Python (bringing this full circle) because of its natural relationship to C.

              • AlexCoventry 2139 days ago
                That's a shame. (About Lua. Hope it works out with cython.)
              • blattimwind 2138 days ago
                Is Pawn still around?
      • dnautics 2139 days ago
        it will screw you when you don't expect it. At our work we needed to profile drives (SSD, HDD) with workloads and one of our engineers wrote a disk workload replay program in python. Our results kind of didn't make sense, until I top'd it and realized that we were getting 100% CPU usage, so our workloads were CPU bound and not disk bound, so they were totally meaningless.
    • coldtea 2139 days ago
      >Overall I think people don’t clearly state the speed trade-off Python presents. Everyone focuses on the “40 times slower” part and nobody points out that “equivalent program in C++” is a glib false equivalence, because a truly equivalent program would have to offer all the same run-time dynamic type manipulation and nearly arbitrarily modifiable object instances involved.

      Not really, it should just offer the same functionality.

      • mlthoughts2018 2139 days ago
        That’s exactly my point. When you ask for a native list in Python, you are specifically writing a program that has the functionality of using a list of possibly any type of object. You’re asking for that heterogeneous type flexibility, either because you need it or the overhead doesn’t matter in your case. But either way, by writing that Python program you are saying, “I’d like all this stuff Python gives me.”

        If you want to write a totally different program, say using a contiguous memory array of a homogeneous type, then you write something different, like with the array.array class, or numpy, or wrap your own thing in Cython.

        That is the thing whose functionality you’d compare to C/C++, because the functionality goals would be similar.

        But writing an algorithm that expects tight contiguous memory array operations (or bare metal structs, etc.) in pure Python, it would just be a strawman that is not meaningfully comparable to similar C++ code.

        • coldtea 2138 days ago
          >That’s exactly my point

          I don't think so, since I meant "the same functionality as the user visible functionality" -- not that of the language.

          >You’re asking for that heterogeneous type flexibility, either because you need it or the overhead doesn’t matter in your case. But either way, by writing that Python program you are saying, “I’d like all this stuff Python gives me.”

          That's irrelevant to the end user though. You could argue that Python makes it faster to deliver the program (which is a win for the end user), but if the program is ultimately slower and more frustrating, the user might have preferred a faster version, if he was given the chance, even if that meant paying more or waiting more to get it.

          • mlthoughts2018 2138 days ago
            This still misses the point. If the goal was always to produce a fast program because the end user wanted speed, then why were you ever writing that pure Python version to start with? If you planned to compare it to a language with native support for some performance-critical aspect, why wouldn’t you have used the facilities that Python provides for those same performance goals?

            If we want to compare the speed of two horses, A and B, and Horse A can either wear fast shoes or sparkly shoes, while Horse B only has fast shoes, then what good is a speed comparison between sparkly-shoed A and fast-shoed B? We wanted speed the whole time. Putting a known-suboptimal thing in there would just be a strawman.

            • comex 2138 days ago
              In a nutshell, because none of those facilities are really designed for writing your whole program in. If you have a program whose runtime is dominated by a few very hot functions, probably numerical, then numpy, Numba, and Cython are all decent options for making those functions fast, and you can use the full power of Python for the rest of your code. But if you have a big fat codebase where just about everything is lukewarm – and that describes a lot of programs – then you need everything to be fast. It's certainly possible to write a whole program in Cython, using static C types for everything and minimizing interaction with the Python world, but … well, nobody does that. As in, I've never seen a single large pure-Cython codebase, though I wouldn't be surprised if some exist.
              • mlthoughts2018 2138 days ago
                I’m not sure how this is related to its parent comment thread, but in any case I agree with it, and would view this as supporting my point.

                In my experience, a big lukewarm codebase is exceedingly rare, and often gets that way because there was premature optimization in the form of assuming the project had to be written in this or that language way back at the beginning, without there ever having been a prototype stage when multiple implementation languages were actually tested. And even given that, it takes a special kind of alignment of the stars for the usage patterns, resource limits, and language tools to all work out properly for a system with few hotspots.

                But regardless, I do agree there are plenty of cases when it can be the right choice to presume a need to write the whole thing in C/C++.

                Examples become irrelevant though when they try to do a horse race comparison of a contrived, unoptimized Python example with a C/C++ example that would be more optimal out of the box just due to intended differences in the languages. Because these examples are exactly like targeted hotspot optimization, which would be the case when Python augmented with Cython/numba is a stronger contender.

            • coldtea 2138 days ago
              >This still misses the point. If the goal was always to produce a fast program because the end user wanted speed, then why were you ever writing that pure Python version to start with?

              Tons of reasons. Some companies will opt for what's fashionable -- e.g. Electron SPA's today. Or with what's cheaper to churn out. Or have their users locked in and could not care less, just use what's cheapest and fastest to get something out.

              • mlthoughts2018 2138 days ago
                Sure, but for the purposes of a blog post that’s supposed to shine a light on performance considerations in specific, contrived examples, it would be a strawman. Yet that’s the setting where you see a lot of people write e.g. a naive double loop algorithm in Python and criticize its performance compared with “the same” double loop algorithm in C.

                It’s an uninformative comparison precisely because they are not “the same.” Pure Python looping (iterators) and doing pure Python operations in the loop body (run time type dynamics implicitly assumed to be needed/desired, heterogeneous container types of PyObject references assumed to be needed/desired) is conceptually a different thing than loops in C (simple increments) and, say, operations on a contiguous memory array, regardless of whether the syntax looks superficially similar to invite a visual comparison.

  • andybak 2139 days ago
    I've always been a fan of Django's "Declarative for most stuff. Procedural when you need it" approach. Models and forms are a good example but it permeates the whole framework. It's largely an case of favouring composition over inheritance combined with sensible defaults but it feels declarative for the common cases.
    • dkersten 2139 days ago
      I agree, I think this is a good approach. Common things should be super quick and easy, but when you need to go further, it should get out of the way.
  • orf 2139 days ago
    Python is a great language for this kind of super-glue code. The pure tensorflow snippets are rarely run in isolation, and python makes retrieving, loading and munging data pretty simple when you tap into its rich ecosystem of scientific and utility packages.

    Python has always been about dipping down to C when needed - just look at the various stdlib features like namedtuple and lru_cache that had optimized C versions introduced once the python versions were deemed useful and performance critical.

    A commenter below seems outraged about the torch->pytorch change, but really the GitHub contributions and adoptions for each speak for themselves

    • GyYZTfWBfQw 2139 days ago
      > A commenter below seems outraged about the torch->pytorch change, but really the GitHub contributions and adoptions for each speak for themselves

      What are you trying to say exactly? Could you be more explicit?

  • Animats 2139 days ago
    Where's the "declarative" part here? It's common to do number crunching in NumPy with Python as the control layer. But the Python code is imperative.
    • dfox 2138 days ago
      It is declarative in the sense that the imperative code builds up some kind of object tree that declaratively describes the operation that is performed as late as possible. Numpy does not really work this way, but TensorFlow does and viewed from the right PoV this is also how SQLAchemy works.
    • joshuamorton 2138 days ago
      Numpy is declarative in the sense that it is vectorized.
      • Animats 2138 days ago
        ?
        • joshuamorton 2138 days ago
          Instead of explicitly writing loops, you decalre an operation like add, and it is looped over a vector for you.
          • Animats 2138 days ago
            That's just application. "Do this to all that stuff". Imperative, but possibly parallel.
  • myWindoonn 2139 days ago
    For faster Python, try PyPy. It's very compatible, very fast for many workloads, and has been constantly receiving interesting new work for over a decade.
  • magicalhippo 2138 days ago
    First I saw of this was about 9 years ago, with https://fenicsproject.org/

    The Python "front-end" allows you to write something which looks very much like the math you're trying to solve. When evaluated, the Python code generates specialized C++ code for a Python module which is in turn compiled and linked back in, and then the C++ code is executed producing the results.

    Guess this is old news now but I was pretty blown away by it back then.

  • plg 2139 days ago
    tldr: Vectorize your code. Good advice not just for Python but other interpreted languages like MATLAB and R.
  • qop 2139 days ago
    Where does this get off on saying torch didn't have traction until pytorch? Are you high?

    Twitter? NYU? Facebook?

    Python is a bad technology. Modern python like what's in the article is all about figuring out how to make python code that makes C do all the work, and its pathetic to then take credit for the success that fortune 100 companies and world renowned universities were using before python attached itself to it.

    Give me a break.

    If you want to write fast code and interop with C, write Lua and use Torch. If you're doing technical computing or extreme scale numerics, go with Julia.

    But never python.