A 100x speedup with unsafe Python

(yosefk.com)

318 points | by ingve 14 days ago

15 comments

Animats 12 days ago
Oh, array striding.
This is a classic bikeshedding issue. When Go and Rust were first being designed, I brought up support for multidimensional arrays. For both cases, that became lost in discussions over what features arrays should have. Subarrays, like slices but multidimensional? That's the main use case for striding. Flattening in more than one dimension? And some people want sparse arrays. Stuff like that. So the problem gets pushed off onto collection classes, there's no standard, and everybody using such arrays spends time doing format conversion. This is why FORTRAN and Matlab still have strong positions in number-crunching.
[-]
- yosefk 12 days ago
  From everything I'm seeing, it follows that Matlab & Fortran have been decimated by Python and C/C++ around 1990s and 2010s, respectively. Of course I could be wrong; any evidence of their still strong position will be greatly appreciated, doubly so evidence that this position is due to stride issues.
  (Of course Python provides wrappers to Fortran code, eg FITPACK, but this code specifically was mostly written in the 80s, with small updates in this century, and is probably used more thru the numpy wrappers, stride conversion issues and all, than thru its native Fortran interface)
  [-]
  - MobiusHorizons 12 days ago
    I think pretty much all linear algebra libraries are still Fortran and are unlikely to ever be C because Fortran is legitimately faster than C for this stuff. I don't know if it has anything to do with strong opinions about how values are represented, I think it has more to do with lower overhead of function calling, but that is just repeating what someone told me about Fortran vs C in general, not necessarily applicable to BLAS libraries.
    Fortran at least used to be very common in heavy scientific computing, but I would bet that relies on GPUs or other accelerators these days.
    [-]
    - Galanwe 12 days ago
      > I think pretty much all linear algebra libraries are still Fortran and are unlikely to ever be C
      That is more of an urban legend than reality in 2024. Fact is, although the original BLAS implementation was fortran, it has been at least a decade since every Linux distribution ships either OpenBLAS or ATLAS or the MKL, which are both written in a mix of C and assembly. All modern processor support is only available in these implementations.
      LAPACK itself is still often built from the Fortran code base, but it's just glue over BLAS, and it gets performance solely from its underlying BLAS. Fortran doesn't bring any value to LAPACK, it's just that rewriting millions of algebraic tricks and heuristics for no performance gain is not enticing.
      [-]
    - yosefk 12 days ago
      The question is how much new numeric code is written in Fortran vs C/C++. My guess is way below 10%, certainly if we measure by usage as opposed to LOC and I would guess by LOC as well.
      Is Fortran legitimately faster than C with the restrict keyword? Regardless of the function call cost diffs between the two - meaning, even if we assume Fortran is somehow faster here - there's no way numeric production code does enough function calls for this to matter. If Fortran was faster than C at any point in time I can only imagine pointer aliasing issues to have been the culprit and I can't imagine it still being relevant, but I am of course ready to stand corrected.
      [-]
      - josefx 12 days ago
        Restrict probably helps. I can't say much about fortran but C still has warts that can significantly impact its math library. For example almost every math function may set errno, that is a sideeffect that isn't easy to eliminate by the compiler and might bloat the code significantly. For example with gcc a single instruction sqrt turns into a sqrt instruction, followed by several checks to see if it succeeded, followed by a function call to the sqrt library function just to set errno correctly. I just started to disable math-errno completely once I realized that C allows several alternative ways to handle math errors, which basically makes any code relying on it non portable.
      - eyegor 12 days ago
        I don't think fortran is faster by language virtue, but it's certainly easier to scrap together high performance numeric fortran code. And ifort/aocc are amazing at making code that runs well on clusters, which is not a priority for any other domain. Fortran is absolutely still on top for modern numerics research code that involves clusters, talk to anyone who works in simulations at a national lab. If your code is mostly matrix math, modern fortran is very nice to work with.
        Emphasis on modern because maintaining 70s-80s numeric mathematician cowboy code is a ninth circle of hell. It has a bad rep for that reason.
        [-]
        semi-extrinsic 11 days ago
        When you speak about code from the 1970s in particular, you need to appreciate the extremely limited language features that were available.
        They did not even have while-loops in the language until Fortran 77. While-loop functionality used to be implemented using DO-loops mixed with GOTOs. You can't fault the people of that era for using the only tools that were available to them.
      - ant6n 12 days ago
        At the core, Fortran uses multidimensional numerical arrays, with the shapes being defined being defined in the code. So the compiler knows much more about the data that’s being operated on, which in theory allows better optimization.
        I thought blas/lapack is still written in Fortran, so most numerical code would still be built on top of Fortran.
  - StableAlkyne 11 days ago
    > any evidence of their still strong position will be greatly appreciated
    Fortran still dominates certain areas of scientific computing / HPC, primarily computational chemistry and CFD. - https://fortran-lang.org/packages/scientific - you don't hear about most of them because they're generally run on HPC centers by scientists in niche fields. But you do get their benefit if you buy things that have the chemical sector in their supply chain.
    The common thread is generally historical codes with a lot of matrix math. Fortran has some pretty great syntax for arrays and their operations. And the for-loop parallelization syntax in parallel compilers (like openmp) is also easy to use. The language can even enforce function purity for you, which removes some of the foot guns from parallel code that you get in other languages.
    The kinds of problems those packages solve tend to bottleneck at matrix math, so it's not surprising a language that is very ergonomic for vector math found use there.
    Same for Matlab, it's mostly used by niche fields and engineers who work on physical objects (chemical, mechanical, etc). Their marketing strategy is to give discounts to universities to encourage classes that use them. Like Fortran, it has good syntax for matrix operations. Plus it has a legitimately strong standard library. Great for students who aren't programmers and who don't want to be programmers. They then only know this language and ask their future employer for a license. If you don't interact with a lot of engineers at many companies, you aren't going to see Matlab.
    [-]
    - embwbam 11 days ago
      I work for the National Solar Observatory, creating Level 2 data for the DKIST Telescope’s observations of the sun. (For example, an image of the Temperature that lines up with the observation)
      Just the other day, the solar physicist I work with said “yeah, that code that runs on the supercomputer needs to be rewritten in Fortran” (from C, I think.
      He’s nearing retirement, but it’s not that he’s behind the times. He knows his stuff, and has a lot of software experience in addition to physics
    - moregrist 11 days ago
      While it's not as ubiquitous as it used to be, Matlab is still very heavily used within shops that do a lot of traditional engineering (ME, EE, Aero, etc.).
      This surprises people who just write software, but consider:
      - The documentation and support is superb. This alone can justify the cost for many orgs.
      - Did I mention the support? MathWorks has teams of application support engineers to help you use the tool. The ability to pay to get someone on the phone can also justify the price.
      - The toolkits tend to do what specific fields want, and they tend to have a decent api. In contrast, you might end up hacking together something out of scipy and 3-4 weirdly incompatible data science packages. That's fine for me, but your average ME/EE would rather just have a tool that does what they need.
      - It also has SIMULINK, which can help engineers get from spec to simulation very quickly, and seems deeply embedded in certain areas (eg: it's pretty common for a job ad with control systems work to want SIMULINK experience).
      Is Python gradually displacing it? Probably.
      (Honestly, I wish it would happen faster. I've written one large program in Matlab, and I have absolutely no desire to write another.)
  - samatman 12 days ago
    Julia has gained a lot of mindshare in array-heavy coding, with a lot of hard work put into making it possible to get comparable speeds to C++ and Fortran. The manual[0] offers a good introduction to what's possible: this includes sparse and strided arrays, as well as the sort of zero-copy reinterpretation you accomplish with pointer wizardry in the Fine Article.
    [0]: https://docs.julialang.org/en/v1/manual/interfaces/#man-inte...
  - chillfox 12 days ago
    If everyone is just using the Fortran libraries instead of reimplementing it in a modern language, then that's evidence that it's still being used for that purpose.
  - aragilar 12 days ago
    Matlab has the issue of not being open source (similarly IDL), but I still see it popping up (though naturally, if I've got any choice it the matter, it's going to be ported to Python). I've also seen new Fortran codebases as well, mainly where the (conceptual, not computational) overhead of using helper libraries isn't worth it.
    I'd suggest Python (via the various memoryview related PEPs, plus numpy) does provide most of the required information (vs. e.g. Go or Rust or Ruby), so I'm not sure that proving much other than if the value of using a library to handle multidimensional arrays is enough such that the combined value of a more general language plus library beats Matlab or Fortran (likely due to other ecosystem or licensing effects), people will switch.
    [-]
    - RugnirViking 12 days ago
      we still get new Matlab codebases at my work, also R. In my case it comes from academics in the power systems and renewables fields who move into industry and keep using the tools they used there. We try to gently ease them into python and version control and sensible collaboration but we get new hires all the time
  - xdavidliu 11 days ago
    > Matlab & Fortran have been decimated by Python and C/C++ around 1990s and 2010s, respectively.
    Nit: this failed to be respective; the years are the other way around
    [-]
    - kbelder 11 days ago
      Interesting... maybe the respective comparison was Matlab to Python and Fortran to C/C++? This sentence actually had three parallel clauses.
      But that was a great nit to find.
      [-]
      - xdavidliu 11 days ago
        yes that's what I originally meant. Matlab to Python was 2010's and Fortran to C/C++ was 1990s. (Probably not the exact time frame, but that's close)
- lifthrasiir 12 days ago
  But format conversion is inevitable because C and FORTRAN has a different axis ordering anyway, isn't it?
  [-]
  - blt 12 days ago
    Not really, you can always change the indexing to account for it. For example, the GEMM matrix multiplication subroutines from BLAS can transpose their arguments [1]. So if you have A (m x n) and B (n x p) stored row-major, but you want to use a column-major BLAS to compute A*B, you can instead tell BLAS that A is n x m, B is p x n, and you want to compute A' * B'.
    As the article mentions, NumPy can handle both and do all the bookkeeping. So can Eigen in C++.
    [1] https://www.math.utah.edu/software/lapack/lapack-blas/dgemm....
    [-]
    - lifthrasiir 12 days ago
      That's conceptually still a format conversion, though the actual conversion might not happen. Users have to track which format is being used for which matrix and I believe that was what the GP was originally complaining for.
      [-]
      - yosefk 12 days ago
        Agreed, realistically you're going to either convert the data or give up on using the function rather than reimplementing it to support another data layout efficiently; while converting the code instead of the data, so to speak, is more efficient from the machine resource use POV, it can be very rough on human resources.
        (The article is basically about not having to give up in some cases where you can tweak the input parameters and make things work without rewriting the function; but it's not always possible)
        [-]
        semi-extrinsic 11 days ago
        Talking about rough on human resources, I take it you've never written iterative Krylov methods where matrices are in compressed sparse row (CSR) format?
        Because that's the kind of mental model capacity scientific programmers needed to have back in the days. People still do similarly brain-twisting things today. In this context, converting your logic between C and Fortran to avoid shuffling data is trivial.
- DarkUranium 9 days ago
  I'm working towards my own language, and I'm sort of stuck on this issue, too:
  Namely:
  1. Should the "default" (syntactically-sugar'd) arrays in the language (e.g. `int[]`) be restricted to simple contiguous storage, or should they also hold (potentially negative) stride, etc?
  2. Should I have multidimensional arrays? (to be clear, I'm talking about contiguous blocks of data, not arrays-holding-pointers-to-arrays)
  3. Should slicing result in a view, or a copy? (naturally, some of these options conflict)
  I'm honestly erring towards:
  1. "fat" arrays;
  2. multidimensional arrays;
  3. slicing being views.
  This allows operations like reversing, "skipping slices" (e.g. `foo[::2]` in Python), even broadcasts (`stride=0`) to be represented by the same datastructure. It's ostensibly even possible to transpose matrices and such, all without actual copying of the data.
  But there are good arguments for the reverse, too. E.g. "fat" arrays need 1 more `size_t` of storage per dimension (to store the stride), compared to "thin" arrays. And a compiler can optimize thin ones much better due to more guarantees (for example, a copy can use vector instructions instead of having to check stride first). Plus "thin" arrays are definitely the more common scenario and it simplifies the language.
  So, yeah, still not quite sure despite liking the concept. One obvious option would be for `int[]` to be a thin array, but offering `DataView<int>` [1]. The problem with that is that people implementing APIs will default to the easy thing (`int[]`) instead of the more-appropriate-but-more-letters `DataView<int>`. Ah, dilemmas.
  [1] (not actual syntax, but I figured I'd use familiar syntax to get the point across)
- sa-code 12 days ago
  What kind of support would you have hoped for?
  [-]
  - infogulch 12 days ago
    A way to reinterpret a slice of size N as a multidimensional array with strides that are a factorization of N, including optional reverse order strides. Basically, do the stride bookkeeping internally so I can write an algorithm only considering the logic and optimize the striding order independently.
    [-]
    - Animats 12 days ago
      That's where you end up after heavy bikeshedding. Lots of features, terrible performance, as the OP points out.
      [-]
      - infogulch 11 days ago
        I agree with you on sparse arrays and multidimensional slices, but this is basically the same as what you'd do manually. Saying that "track strides for me" is "lots of features" is a bit uncharitable.
londons_explore 12 days ago
> What can we do about this? We can't change the layout of pygame Surface data. And we seriously don't want to copy the C++ code of cv2.resize, with its various platform-specific optimizations,
Or... you could have sent a ~25 line pull request to opencv to fix this performance problem not just for you, but for thousands of other developers and millions of users.
I think your fix would go here:
https://github.com/opencv/opencv/blob/ba65d2eb0d83e6c9567a25...
And you could have tracked down that slow code easily by running your slow code in gdb, hitting Ctrl+C while it's doing the slow thing, and then "bt" to get a stack trace of what it's doing and you'd see it constructing this new image copy because the format isn't correct.
[-]
- yosefk 12 days ago
  25 line pull request doing what? Supporting this format efficiently is probably more LOC and unlikely to get merged as few need this and it complicates the code. Doing the thing I did in Python inside OpenCV C++ code instead (reinterpreting the data) is not really possible since you have less knowledge about the input data at this point (like, you're going to assume it's an RGBA image and access past the area defined as the allocated array data range for the last A value? The user gives a 3D array and you blithely work on the 4th dimension?)
  And what about the other examples I measured, including pure numpy code? More patches to submit?
  [-]
  - londons_explore 12 days ago
    inside the python opencv bindings?
    Or just special case resize, since there the channel order doesn't matter. If you interpret R as A you'll still get the right answer.
    [-]
    - yosefk 12 days ago
      The pygame API gives you a 3D RGB array and a separate 2D alpha array, which happen to reference the same RGBA/BGRA memory. Are you suggesting that OpenCV Python bindings (in most of their functions, and likewise other libraries including numpy operators themselves) should have code assuming that a WxHx3 RGB array really references the memory of a WxHx4 RGBA array?..
      If you really want a solution which is a patch rather than a function in your own code, I guess the way to go would be to persuade the maintainers of pygame and separately, the maintainers of its fork, pygame-ce, to add a pixels4d_maybe_rgba_and_maybe_bgra() function returning a WxHx4 array. Whether they would want to add something like that I don't know. But in any case, I think that since closed source projects, open source projects, and code by sister teams exist that will not accept patches, or will not accept them in a timely manner, it's interesting what you can do without changing someone else's code.
C4stor 12 days ago
All of this seems unnecessary, and easily replaced in the provided benchmark by :
i2 = np.ascontiguousarray(pg.surfarray.pixels3d(isurf))
Which does the 100x speedup too and is a "safe" way to adjust memory access to numpy strides.
Whether the output is correct or not is left as an exercise to the author, since the provided benchmark only use np.zeros() it's kind of hard to verify
[-]
- yosefk 12 days ago
  I just measured this with the np.ascontiguousarray call (inside the loop of course, if you do it outside the loop when i2 is assigned to, then ofc you don't measure the overhead of np.ascontiguousarray) and the runtime is about the same as without this call, which I would expect given everything TFA explains. So you still have the 100x slowdown.
  TFA links to a GitHub repo with code resizing non-zero images, which you could use both to easily benchmark your suggested change, and to check whether the output is still correct.
ggm 12 days ago
Why does numpy do column order data?
Is it because in much of the maths domain it turns out you manipulate columns more often than rows?
(not a mathematician but I could believe if the columns represent "dimensions" or "qualities" of some dataset, and you want to apply a function to a given dimension, having data in column-natural order makes that faster.)
Obviously you want to believe naievely there is no difference to the X and Y in the X,Y plane, but machines are not naieve and sometimes there IS a difference to doing things to the set of verticals, and the set of horizontals.
[-]
- Pinus 12 days ago
  Hang on, doesn’t numpy use C (row major) array ordering by default? checks docs Seems like it does. However, numpy array indexing also follows the maths convention where 2D arrays are indexed by row, column (as does C, by the way), so to access pixel x, y you need to say im[y, x]. And the image libraries where I have toyed with with numpy integration (only Pillow, to be honest) seem to work just fine like this — a row of pixels is stored contiguously in memory. So I don’t quite see why the author claims that numpy stores a column of pixels contiguously, but I have only glanced at the article, so quite probably I have missed something.
  [-]
  - KeplerBoy 12 days ago
    You're right. Numpy stores arrays in row-major order by default. One can always just have a look at the flags (ndarray.flags returns some information about the order and underlying buffer).
- sdeer 12 days ago
  Probably because Fortran stores matrices and other multidimensional arrays in column order. Traditionally most numerical computation software was written in Fortran and numpy calls those under the hood. Storing in row order would have meant copying the data to column major order and back for any call to Fortran.
  [-]
  - Thrymr 12 days ago
    NumPy can _store_ arrays in either row-major (C-style) or column-major (Fortran-style) order. Row-major is the default. Index order is always row, column.
- bee_rider 12 days ago
  It is possible that my brain had been damaged by convention, but it looks much tidier to have your vector multiplicand on the right of the matrix when doing a matrix vector multiplication y=Ax, where A is the matrix and x is the vector, y is the result.
  So, x and y must be columns. So, it is nice if our language has column-order as the default, that way a vector is also just a good old 1d array.
  If we did y=xA, x and y would be rows, the actual math would be the same but… I dunno, isn’t it just hideous? Ax is like a sentence, it should start with an upper case!
  It also fits well with how we tend to learn math. First we learn about functions, f(x). Then we learn that a function can be an operator. And a matrix can represent a linear operator. So, we tend to have a long history of stuff sitting to the left of x getting ready to transform it.
  [-]
  - SailorJerry 12 days ago
    I think the reason I prefer columns is I do the mental expansion into large bracketed expressions. If x is a row and kept inline, then the expansion gets really wide. To keep it compact and have the symbols oriented the same as their expansion, you'd have to put the x above A and that's just silly.
  - planede 12 days ago
    Your reasoning starts out nicely with y=Ax, but you get the wrong conclusion. The layout of x and y are not affected at all by row or column order, they are just consecutive numbers in either case. So you have to look at the layout of A.
    For the row-major layout the multiplication Ax ends up being more cache-efficient than for the column major layout, as for calculating each component of y you need to scan x and a row of A.
    [-]
    - bee_rider 12 days ago
      I’m pretty sure BLAS will tile out the matrix multiplication internally anyway, so it doesn’t matter. At least for matmat would, is matvec special?
      [-]
      - planede 11 days ago
        However matmat is done, row-major vs column-major for both matrices shouldn't make a difference (for square matrices at least).
        I don't know if tiling is done for matvec. I don't think it makes sense there, but I didn't think about it too hard.
  - Joker_vD 12 days ago
    Well, there used to be a tradition in universal algebra (and category theory) of putting functions/operators at the right side of the arguments but it seems to have ultimately died out. And "y = xA" is the standard notation in the linear coding theory, even to this day: message vectors are bit strings, not bit columns.
    [-]
    - bee_rider 11 days ago
      I think there’s also some other pretty big field that tends to put the vector on the left… statistics? Economics? For some reason I couldn’t find any links, so maybe I just got this from, like, one random statistician. If only they’d told me about sample sizes.
  - montebicyclelo 12 days ago
    y = x @ W + b
    Is how you'd write it in NumPy and most (/all?) deep learning libraries, with x.shape=[batch, ndim]
    I'd personally prefer that to be the convention, for consistency.
- thayne 12 days ago
  Numpy was originally designed for mathematicians and scientists, and in those domains convention often lines up better with column major order. For example, vectors are a single column, and the column index comes before the row index in many notations. So using column major order meant familar formulas and algorithms (including translating from fortran code which is column major, or matlab for that matter) are easy to translate to numpy code.
  Also, numpy is built on some fortran libraries, like BLAS and LAPACK that assume a column major order. If it took row major input, it would need to transpose matrices in some cases change the order, or rewrite those libraries to use row major form.
  [-]
  - kolbusa 12 days ago
    Copying is usually not necessary. Often times you can swap data and/or shape arguments and get a transposed result out. While it is true that Fortran BLAS only supports col-major, CBLAS supports both row- and col-major. Internally, all the libraries I worked on use col-major, but that is just a convention.
- rdtsc 12 days ago
  I think in math columns as vectors is a more common representation. Especially if we talk about matrix multiplication and linear systems.
antirez 12 days ago
MicroPython allows to do even more than that, and is fantastic and I wish it was part of standard Python as well. Using a decorator you can write functions in Viper, a subset of Python that has pointers to bytearrays() and only fast integer types that have a behavior similar to C (fixed size, they wrap around and so forth). It's a very nice way to speedup things 10 or 100x without resorting to C extensions.
[-]
- yosefk 12 days ago
  I dunno if it's "more than that" since they're not directly comparable? What you describe sounds more like numba or Cython than what TFA describes which is a different use case?..
  [-]
  - antirez 12 days ago
    Indeed, I don't mean they are directly comparable, but it's a great incarnation of the "Unsafe Python" idea, because it does not go too far with the subset of Python that you can use, like introducing a complete different syntax or alike. There are strict rules, but if you follow them you still kinda write Python that goes super fast.
    [-]
    - 3abiton 8 days ago
      Iam curious about the quantification of "super fast", do you have some pointers?
      [-]
      - antirez 8 days ago
        Viper is normally half the speed of using the MicroPython decorator to directly write assembly.
comex 12 days ago
Do you actually need ctypes here, or can you just use np.reshape? That would cut out the unsafety.
[-]
- Retr0id 12 days ago
  You can exploit memory unsafety in cpython using only built-in methods, no imports needed at all: https://github.com/DavidBuchanan314/unsafe-python/
  [-]
  - yosefk 12 days ago
    You... probably shouldn't, but, very interesting stuff in that link! I didn't know CPython had no bounds checks in the load_const bytecode op
- yosefk 12 days ago
  I think you'll have a problem with BGR data absent ctypes, since you have a numpy array with the base pointing at the first red channel value, and you want an array with the base pointing 2 bytes before the first red channel value. This is almost definitely "unsafe"?.. or does numpy have functions knowing that due to the negative z stride, the red value before the blue value is within the bounds of the original array? I somehow doubt it though it would be very impressive. And I think the last A value is hopelessly out of reach absent ctypes since a safe API has no information that this last value is within the array bounds; more strictly speaking all the alpha values are out of bounds according to the shape and strides of the BGRA array.
- im3w1l 12 days ago
  I think what you want is a combination of swapaxes and flip (reshape does not turn rows into columns, it turns m input rows into n output rows), but yeah.
  Actually let me include a little figure
```
  original  reshape  swapaxes

  abc       ab       ad
  def       cd       be
            ef       cf
```
jofer 12 days ago
I'm confused on the need for cytpes/etc here. You can directly modify somearr.strides and somearr.shape. And if you need to set them both together, then there's numpy.lib.stride_tricks.as_strided. Unless I'm missing something, you can do the same assignment that ctypes is used for here directly in numpy. But I might also be missing a lot - I'm still pre-coffee.
On a different note, I'm surprised no one has mentioned Cython in this context. This is distinct from, but broadly related to things like @cython.boundscheck(False). Cython is _really_ handy, and it's a shame that it's kinda fallen out of favor in some ways lately.
[-]
- yosefk 12 days ago
  You need to modify base pointer in this example, specifically to point 2 bytes before the current base pointer (moving back from the first red pixel value to the first blue value.) I don't think you can do it without ctypes, maybe I'm wrong.
  What does Cython do better than numbs except static compilation? Honest q, I know little about both
  [-]
  - jofer 12 days ago
    You actually can do the "offset by 2 bytes back" with a reshape + indexing + reshape back. But I suspect I'm still missing something. I need to read things in more depth and try it out locally.
    By "numbs" here, I'm assuming you mean numba. If that's not correct, I apologize in advance!
    There are several things where Cython is a better fit than numba (and to be fair, several cases where the opposite is true). There are two that stick out in my mind:
    First off, the optimizations for cython are explicit and a matter of how you implement the actual code. It's a separate language (technically a superset of python). There's no "magic". That's often a significant advantage in and of itself. Personally, I often find larger speedups with Cython, but then again, I'm more familiar with it, and understand what settings to turn on/off in different situations. Numba is much more of a black box given than it's a JIT compiler. With that said, it can also do things that Cython can't _because_ it's a JIT compiler.
    If you want to run python code as-is, then yeah, numba is the better choice. You won't see a speedup at all with Cython unless you change the way you've written things.
    The second key thing is one that's likely the most overlooked. Cython is arguably the best way to write a python wrapper around C code where you want to expose things as numpy arrays. It takes a _huge_ amount of the boilerplate out. That alone makes it worth learning, i.m.o.
    [-]
    - jofer 11 days ago
      Ah, right, you mean _outside_ the memory block of the array! Sorry, my mind was just foggy this morning.
      That's not strictly possible, but "circular" references are with "stride tricks". Those can accomplish similar things in some circumstances. But with that said, I don't think that would work in this case.
ruined 12 days ago
oh THIS is why image byte order and dimensions are so confusing every time i fuck with opencv and pygame
well, half of why. for some reason i keep doing everything directly on /dev/fb0
[-]
- TeMPOraL 12 days ago
  Image byte order, axis directions, coordinate system handedness when in 3D... after enough trying, you eventually figure out the order of looping at any given stage of your program, and then it Just Works, and then you never touch it again.
sylware 12 days ago
I wonder how it would perform with the latest openstreetmap tile renderer which was released not long ago.
gugagore 12 days ago
Technically, image resizing does in general care about the color channel ordering, because color spaces are in general not linear. https://www.alanzucconi.com/2016/01/06/colour-interpolation/
[-]
- quietbritishjim 12 days ago
  That article fails to touch on the fundamental issue with its title "The Secrets of Colour Interpolation": RGB values are a nonlinear function of the light emitted (because we are better at distinguishing dark colours, so it's better to allow representing more of those), so to interpolate properly you need to invert that function, interpolate, then reapply. The difference that makes to colour gradients is really striking.
  [-]
  - topherclay 12 days ago
    Do you mean converting the RGB value to LAB values in the CIELAB color space and doing the interpellation there?
    Is there a better way to do it?
    [-]
    - CarVac 12 days ago
      No, you just need to linearize the brightness.
      [-]
      - drjasonharrison 11 days ago
        Typically this is done by using a look-up table to convert the 8-bit gamma encoded intensities to 10-bit (or more) linear intensities. You can use the same look-up table for R, G, and B. Alpha should be linear.
- yosefk 12 days ago
  While this is a valid point, cv2.resize doesn't actually implement color conversion in a way addressing this issue; at least in my testing I get identical results whether I interpret the data as RGB or BGR. So if you want to use cv2.resize, AFAIK you can count on it not caring which channel is which. And if you need fast resizing, you're quite likely to settle for the straightforward interpolation implemented by cv2.resize.
- planede 12 days ago
  Even if the RGB components correspond to sRGB, to linearize you apply the same non-linear function to each component value, independently. So even if you do the interpolation in a linear colorspace, the order of the sRGB components does not matter.
Joker_vD 12 days ago
Yep, welcome to the world of RGBA and ARGB storage and memory representation formats, with little- and big-endianness thrown in for the mix, it's all very bloody annoying for very little gain.
qaq 12 days ago
Well with mojo it looks like you will have both safety and performance
[-]
- grandma_tea 12 days ago
  Closed source and not relevant.
  [-]
  - ianbutler 12 days ago
    Mojo was open sourced months ago iirc
    https://github.com/modularml/mojo
    [-]
    - viraptor 12 days ago
      Those are examples and docs. Mojo is still closed.
      [-]
      - ianbutler 12 days ago
        Their standard lib is in there, seems more open to me
        Theres the announcement where they open sourced the modules in their standard lib: https://www.modular.com/blog/the-next-big-step-in-mojo-open-...
        There's the standard lib: https://github.com/modularml/mojo/blob/nightly/stdlib/README...
        [-]
        viraptor 12 days ago
        Yeah, that's some minimal progress, but really not that interesting. (As in, it's cool that it exists, but given we've got Python stdlib and numpy already open, it's not really new/exciting) And doesn't allow you to port to platforms they don't care enough about.
        [-]
        qaq 12 days ago
        It might be not interesting to you. Having a lot of Rust features with much smarter compiler and Python syntax is pretty interesting to me.
        [-]
        viraptor 12 days ago
        No, mojo itself is interesting. The stdlib itself that's posted isn't. I mean, is not very interesting code beyond "what are the implementation details". Any decent programmer could redo that based on python's stdlib.
Too 12 days ago
Clickbait title. The speedup only applies to one particular interaction between SDL and numpy.
intelVISA 12 days ago
Isn't all Python, by design, unsafe?
[-]
- emmanueloga_ 12 days ago
  As others commented "safety" is a heavily overloaded term, for instance there's "type safety" [1] and "memory safety" [2], and then there's "static vs dynamic typing" [3], and "weak vs strong typing" [4], so talking about types and safety offered by a language can be very nuanced.
  I suspect when you say "unsafe by design" you may be referring to the dynamic type checking aspect of Python, although Python has supported for a while type annotations, that can be statically checked by linter-like tools like PyRight.
  --
  1: https://en.wikipedia.org/wiki/Type_safety
  2: https://en.wikipedia.org/wiki/Memory_safety
  3: https://en.wikipedia.org/wiki/Type_system#Type_checking
  4: https://en.wikipedia.org/wiki/Strong_and_weak_typing
  [-]
  - d0mine 12 days ago
    Python both type and memory safe e.g., ""+1 in Python is TypeError and [][0] is IndexError
    But you can perform unsafe operations e.g.,
```
    import ctypes
    ctypes.string_at(0)  # access 0 address in memory -> segfault
```
    [-]
    - eru 11 days ago
      Yes, though `ctypes' is kind of marked as `unsafe' in Python by convention.
- westurner 12 days ago
  ctypes, c extensions, and CFFI are unsafe.
  Python doesn't have an unsafe keyword like Rustlang.
  In Python, you can dereference a null pointer and cause a Segmentation fault given a ctypes import.
  lancedb/lance and/or pandas' dtype_backend="pyarrow" might work with pygame
  [-]
  - Retr0id 12 days ago
    You don't even need a ctypes import for it, you'll get a segfault from this:
```
   eval((lambda:0).__code__.replace(co_consts=()))
```
    cpython is not memory safe.
    [-]
    - westurner 12 days ago
      Static analysis of Python code should include review of "unsafe" things like exec(), eval(), ctypes, c strings, memcpy (*),.
      Containers are considered nearly sufficient to sandbox Python, which cannot be effectively sandboxed using Python itself. Isn't that actually true for all languages though?
      There's a RustPython, but it does support CFFI and __builtins__.eval, so
      [-]
      - maple3142 12 days ago
        The example given by parent does not need eval to trigger though. Just create a function and replace its code object then call it, it will easily segfault.
        [-]
        jwilk 12 days ago
        Complete example without eval:
        def f(): pass f.__code__ = f.__code__.replace(co_consts=()) f()
        [-]
        Retr0id 12 days ago
        yup, eval was just there for golfing purposes
- eru 12 days ago
  Is this supposed to be a joke?
  Have a look at the linked article to see what meaning of 'unsafe' the author has in mind.
  [-]
  - intelVISA 12 days ago
    Safety is catching (more) errors ahead of time ...for which Python is grossly unsuitable imo.
    Fun lang though, just not one that ever comes to mind when I hear 'safe'.
    [-]
    - recursive 12 days ago
      Memory corruption vs type safety.
      "Safety" is an overloaded term, but that happens a lot in software. You'll probably get best results if you try to understand what people are talking about, rather than just assuming everyone else is an idiot.
    - eru 12 days ago
      That's one definition of safety. But it's not the one the author uses in this case.
      The generic snide about Python that has nothing to do with the article ain't all that informative.
      You could make the same remark you just made about all Python being unsafe about Rust: 'Isn't all Rust, by design, unsafe?'. And compared to eg Agda or Idris that would be true, but it wouldn't be a very useful nor interesting comment when talking about specifically 'unsafe' Rust vs normal Rust.
      [-]
      - intelVISA 12 days ago
        I would agree, Rust is not safe. We need to encourage more formal rigor in our craft and avoid misconceptions like 'safe Rust' or 'safe Python'. Thus my original comment :P
        [-]
        dahart 12 days ago
        It’s a term of art in this case. Or someone proposed the term improper noun https://news.ycombinator.com/item?id=32673100
        Fine and good to advocate rigor, whether or not it’s specifically relevant to this post, but maybe be careful with the interpretation and commentary lest people decide the misconception is on your part?
        eru 12 days ago
        Eh, even Agda is only safe in the sense of 'does what you've proven it to do'. That doesn't mean that if you eg write a machine learning library in Agda, your AI won't come and turn us all into paperclips.
        So it doesn't really make sense to pretend there's some single meaning of 'safe' vs 'unsafe' that's appropriate in all contexts.
idkdotcom 12 days ago
Safety is one of Python's greatest advantages over C or C++. Why would anyone use unsafe Python when P0ython doesn't have other features such as type safety and all the debugging tools that have been built for C and C++ over time is beyond me.
Python is a great language, but just as the generation of kids who got out of computer science programs in the 2000s were clueless about anything that wasn't Java, it seems this generation is clueless about anything that isn't Python.
There was life before Python and there will be life after Python!
[-]
- jandrese 12 days ago
  Seems to me if you could do most of the work in Python and then just make the critical loop unsafe and 100x faster then that would certainly have some appeal.
  [-]
  - gibolt 12 days ago
    Plenty of people would gladly not have to learn another language (especially C).
    You could also benefit from testing blocks of code with safety enabled to have more confidence when safety is removed.
    [-]
    - TylerE 12 days ago
      Or if you’re going to learn another language might as well learn nim. Keep most of the python syntax, ditch the performance and the packaging insanity.
    - KeplerBoy 12 days ago
      It was explained pretty well in the blog: Installing the OpenCV python package is easy, fast and painless as long as you're happy with the default binary. Building OpenCV from source to get it into a C/C++ program can quickly turn into a multi-hour adventure, if you're not familiar with building sizeable C++ projects.
      [-]
      - idkdotcom 12 days ago
        [dead]
  - idkdotcom 12 days ago
    [dead]
- bongodongobob 12 days ago
  I don't use Python because of type safety, I use it because it's easy to write. I couldn't give a single fuck about some missed pointers that weren't cleaned up in the 2.5 seconds my program ran. I'm not deploying public code.
  I tend to prototype in Python and then just rewrite the whole thing in C with classes if I need the 10000x speedup.
  [-]
  - SunlitCat 12 days ago
    That easy to write part is something, I am not so sure about. It took me ages to understand why my first try at writing a blender plugin didn't work.
    It was because of white spaces not lining up. I was like, really? Using white spaces as a way to denote the body of a function? Really?
    [-]
    - nottorp 12 days ago
      I'm curious... what rock have you been living under in the past 33 years since python was launched?
      [-]
      - SunlitCat 11 days ago
        Let's see, I was living under the:
        - Amiga E - Assembler (on Amiga) - perl - tcl/tk - javascript - vbscript - java - vba - c - c++ - python
        rock. Granted at some of those rocks, I just took a quick peek (more like a glance), but a rock with whitespaces being important was, at that moment, new to me!
        [-]
        nottorp 11 days ago
        Oh cmon, I can still write some z80 assembly from memory and remember the zx spectrum memory layout somewhat, but I check new programming languages now and then :)
        [-]
        SunlitCat 11 days ago
        :)
        I just meant that for someone coming from a language(s) where different kinds of delimiters are used to denote function bodies, a language that puts such a huge emphasis on whitespace was kinda throwing me off to get an error message slapped at me even though I was replicating the example letter by letter (although, obviously, I missed the importance of whitespace).
        [-]
        nottorp 11 days ago
        It's all fuzzy, but I'm sure I knew about the importance of whitespace in python long before I actually tried python.
        Even Raymond's article which was one of the first whined about it [1]. I definitely remember reading that one and thinking I have to check out that language later.
        [1] https://www.linuxjournal.com/article/3882
        [-]
        SunlitCat 11 days ago
        Thank you for the article. I have to put it on my "to read" pile! :)
        [-]
        nottorp 11 days ago
        It's from 2000, python has grown up in the mean time and it has added complexity. Put a (small) grain of salt on Raymond's praise.
- SunlitCat 12 days ago
  Thing about academia (like computer science and the like) is, you need a programming language you can get across easily, quickly and can get people to produce satisfying results in no time.
  Back then it might have been java, then python, till the next language comes around.
  The java thing was so apparent, that when you looked at c++ code, you were able to spot the former java user pretty easily. :)
  About python, sometime ago, I was looking into arudino programming in c and found someone offering a library (I think) in c (or c++, can't remember). What i remember is, that this person stopped working on it, because they got many requests and questions regarding to be able to use their library in python.
  [-]
  - pjmlp 12 days ago
    > The java thing was so apparent, that when you looked at c++ code, you were able to spot the former java user pretty easily. :)
    Well, Java was designed based on C++ GUI development patterns typical in the 10 years of C++ GUI frameworks that predated Java.
    Yet, people keep using these kind of remarks.
- gibolt 12 days ago
  "Why would anyone use" -> "when" usage generally means lots of use cases are being ignored / swept under the rug.
  >1 billion people exist. Each has a unique opinion / viewpoint.
- omoikane 12 days ago
  The "unsafe" in the title appears to be used in the sense of "exposing memory layout details", but not in the sense of "direct unbounded access to memory". It's probably not the memory safety issue you are thinking of.
  [-]
  - yosefk 12 days ago
    You can absolutely get direct unbounded access to memory with ctypes, with all the bugs that come from this. I just think/hope the code I show in TFA happens to have no such bugs.
- eru 12 days ago
  > Why would anyone use unsafe Python when Python doesn't have other features such as type safety and all the debugging tools that have been built for C and C++ over time is beyond me.
  C and C++ 'type safety' is barely there (compared to more sane languages like OCaml or Rust or even Java etc). As to why would anyone do that? The question is 'why not?' It's fun.
  [-]
  - idkdotcom 12 days ago
    [dead]
- kragen 12 days ago
  while python is eminently criticable, yosef kreinin has designed his own cpu instruction set and gotten it fabbed in a shipping hardware product, and is the author of the c++fqa, which is by far the best-informed criticism of c++; criticizing him with 'this generation is clueless about anything that isn't Python' seems off the mark