Quadsort: a stable non-recursive merge sort

(github.com)

312 points | by geophertz 286 days ago

16 comments

  • nightcracker 285 days ago

    C's qsort() is notoriously bad because the comparison function isn't inlined, meaning the majority of time is spent in call overhead.

    Zhangxp1998 ported quadsort to C++ and compared with std::sort and found it to be slower: https://gist.github.com/zhangxp1998/0e2fa30656c894017d183e0d...

    Shameless self promotion, if you want a faster sorting implementation, check out pdqsort (https://github.com/orlp/pdqsort) which is almost twice as fast as libstdc++'s std::sort for randomly shuffled integers, and has all sorts of tricks for input distributions with patterns (like pre-sorted, reverse sorted, single out-of-order element appended, etc).

    • vhvjkyhkogvv 285 days ago

      Comparing a stable sort to a normal sort is unfair. Compare it with std::stable_sort.

      • nightcracker 285 days ago

        That is a fair point, I missed that quadsort is claimed to be stable.

        • zhangxp1998 285 days ago

          I agree is unfair. The point of my benchmark was that the OP's claim "quad sort is faster than quicksort" is false.

          • abjKT26nO8 285 days ago

            libstdc++'s std::sort doesn't implement quicksort per se. It implements introsort[1]. I'm curious how a pure C implementation of introsort would fare against std::sort.

            [1]: <https://en.wikipedia.org/wiki/Introsort>

            • zhangxp1998 285 days ago

              " It begins with quicksort, it switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted and it switches to insertionsort when the number of elements is below some threshold. "

              • zhangxp1998 285 days ago

                It's just quick sort with insertion sort for small base cases.

                • abjKT26nO8 285 days ago

                  You forgot about heapsort. It's a combination of three sorting algorithms, not two. However trivial the difference may seem, I'd still prefer to look at a "C introsort vs C++ introsort" benchmark than a "C quicksort vs C++ often quicksort, but not really" benchmark.

            • vhvjkyhkogvv 285 days ago

              Too late to edit but I read the article again and they compares it favorably to qsort so it makes sense to point out it's worse than std sort.

            • abjKT26nO8 285 days ago

              libstdc++'s std::sort doesn't implement quicksort per se. It implements introsort[1]. I'm curious how a pure C implementation of introsort would fare against std::sort.

              [1]: <https://en.wikipedia.org/wiki/Introsort>

            • nneonneo 286 days ago

              How does this fare against Python's famous Timsort (used by several languages and systems)? How about the dual-pivot quicksort used by Java for primitive arrays?

              Someone has to have put together a nice benchmark for comparing many sorting algorithms. I wish that the author had done some benchmarking first, so that the proposed algorithm can properly be positioned w.r.t. state-of-the-art techniques.

              • monadic2 285 days ago

                I’d like to see this too but benchmarking sort algorithms is a pain in the ass due to the wide array of sorting shapes and sizes, I wouldn’t expect a well maintained benchmark suite for language X.

                • labawi 285 days ago

                  AFAICT dual-pivot quicksort is not a stable sort, so quadsort should fare better if actually need a stable sort.

                • jph 286 days ago

                  Summary: this is a non-recursive merge sort with improvements.

                  Benchmark of quadsort() versus C qsort():

                  * ~10x faster on forward-order items

                  * ~2x faster on reverse-order items

                  * ~equivalent on random-order items

                  Improvements:

                  * Ordering: when blocks of items are in order, or in reverse-order, then do special case handling, which gives quadsort O(n + log n) instead of qsort O(n * log n).

                  * Boundaries: compare data rather than traditional merge sort wasteful boundary checks.

                  • eru 286 days ago

                    If you have a k sorted (or reverse sorted) blocks, you can get O(n log k) performance in merge sort variants. Especially, if k is a small fixed constant, like 1 or 2 or 10, you should get linear performance.

                    A few language's standard library sorts implement that already.

                    For a serious implementation, of course, you not only care about the asymptotic performance, but also absolute runtimes.

                    • chalst 286 days ago

                      Good summary. Quibble: O(n + log n) = O(n).

                      • kannmig 286 days ago

                        I think OP made the distinction intentional to “show their working”, so to speak.

                    • xucheng 286 days ago

                      Quick sort is not the fastest sorting algorithm. It would be nice if there is a benchmark comparing with other state of the art algorithms like Timsort.

                      • nwellnhof 285 days ago

                        It depends on the implementation but Quicksort typically beats Timsort with random data. Quicksort is unstable though, so that's not a fair comparison.

                      • proc0 286 days ago

                        Interesting, but I'm surprised if this is the first time we have sorting algorithm that is swapping more than two elements at a time. I would have guessed every possible iteration of sorting algorithms has been already explored, proven and tested.

                        • thaumaturgy 286 days ago

                          It isn't. Sorting networks (https://en.wikipedia.org/wiki/Sorting_network) provide the best possible in-place sorting combinations for N elements, where N is currently < about 11. If somebody could find a way to generalize the sorting network algorithm for any number of input elements, they'd be settling the issue of sorting things pretty much forever.

                          At lazy glance, it looks kind of like Quadsort has re-derived a variation of the 4-element sorting network.

                          • csense 286 days ago

                            > If somebody could find a way to generalize the sorting network algorithm for any number of input elements

                            In section 2 in that Wikipedia article, it links to many constructions that do exactly that.

                            A couple are O(n log(n)), but complicated and impractical. The algorithms people use are O(n log^2(n)) ones.

                            FWIW, sorting networks are data-independent. I.e. for any input, you always compare-and-swap the same elements.

                            • thaumaturgy 286 days ago

                              I was imprecise. I was talking about the construction of optimal sorting networks. You're correct to point out that there are several approaches now for constructing sorting networks for arbitrary numbers of inputs, but as noted in that section, they all have some very important tradeoffs. I said, "...provide the best possible in-place sorting combinations for N elements, where N is currently < about 11"; the 11 there was a reference to this part from the second section of that article:

                              "For one to eleven inputs, minimal (i.e. size-optimal) sorting networks are known, and for higher values, lower bounds on their sizes S(n) can be derived inductively using a lemma due to Van Voorhis: S(n + 1) ≥ S(n) + ⌈log2(n)⌉. The first ten optimal networks have been known since 1969, with the first eight again being known as optimal since the work of Floyd and Knuth, but optimality of the cases n = 9 and n = 10 took until 2014 to be resolved.[11] An optimal network for size 11 was found in December of 2019 by Jannis Harder..."

                              Clearly I also should've said < ~12.

                              • csense 280 days ago

                                The point I was trying to make is that sorting networks restrict your algorithm to a data-independent memory access pattern.

                                For some particular N, you might be able to find the smallest sorting network that sorts N elements. And then you might be able to find an even faster algorithm that sorts N elements using some strategy that's not equivalent to a sorting network (e.g., if your strategy does data-dependent memory access).

                          • Hendrikto 286 days ago

                            > I would have guessed every possible iteration of sorting algorithms has been already explored, proven and tested

                            The search space is infinite, so exhaustively exploring it is impossible.

                            • foota 286 days ago

                              Nit: it's possible that there is a finite number of Pareto optimal sorting algorithms, and it may be possible to enumerate those.

                              • dhash 286 days ago

                                Oh man if I could efficiently enumerate algorithms across the Pareto front of anything I’d be a happy camper.

                                Procedures that enumerate Turing machines are generally very easy, or nigh impossible

                                • foota 286 days ago

                                  You'd probably want to start by defining equivalence and then work from there. If your criteria for equivalence is loose enough you're already done, e.g., if you just look at runtime big O for randomized arrays or something you can't do better than n lg n so there's just the one Pareto optimal choice, with many possible implementations.

                                  • eru 286 days ago

                                    It depends on your model of computation.

                                    O(n log n) is only the frontier for comparison based sorts that know nothing about the distribution of inputs.

                                    If your sorting algorithm is allowed to do anything else on your data, like hashing or looking at bits or arithmetic, different lower bounds might apply.

                            • jsnell 286 days ago

                              It's a sorting network, invented in the 1950s.

                              • senderista 286 days ago

                                If you think about it, the "quadswap" is just unrolled merge sort on 4 elements.

                                • kzrdude 286 days ago

                                  GrailSort (GitHub) and related they swap blocks at a time, isn't that a precedent?

                                • kccqzy 286 days ago

                                  This reminds me of a programming exercise I was asked to write when I first learned programming: write a sorting program generator that given N, generates a program that sorts an array of N elements optimally: the generated code has N! branches, one for each possible permutation. With some CSE help from the compiler, it can be really quite fast at the expense of code size.

                                  The author's explanation isn't entirely clear, but it seems similar to the above construction with a fixed N and then a merge sort afterwards.

                                  • praptak 286 days ago

                                    A truly optimal sorting for a given N is a nontrivial problem. By truly optimal I mean the actual absolute minimum in the number of comparisons, no O(...) approximation.

                                    For five elements the lower bound from counting the permutations is ceil(log2(5!)) which says you cannot sort 5 elements in less than 7 comparisons.

                                    An actual 7 comparison algorithm exists but it is not very easy to write it. For greater numbers it gets much much trickier - in general the log(#permutations) lower bound cannot be met.

                                    • kccqzy 285 days ago

                                      Agreed. My use of the word "optimal" in the original comment was a bit careless.

                                      • praptak 285 days ago

                                        Hope my comment did not got across as a disagreement. Just wanted to add some relevant detail.

                                    • gizmo686 286 days ago

                                      A program generator seems rather advanced for a beginner's assignment...

                                      When I started, I wrote an Excel spreadsheet to generate what I would now describe as a 50 element array.

                                      I also recall seeing a project to programatically generate mnemonic operators in Haskell, limited only by the ability of the compiler to not run out of memory. Sadly, I can't seem to find it.

                                      • kccqzy 286 days ago

                                        > A program generator seems rather advanced for a beginner's assignment...

                                        Ha, it's nothing more complicated than string concatenation.

                                    • xeeeeeeeeeeenu 286 days ago

                                      In his benchmark, the author is assuming that qsort() is implemented using quicksort, but that's not necessarily true. For example, glibc is using mergesort (although it falls back to quicksort if the system is short on memory).

                                      • hinkley 286 days ago

                                        I can’t imagine the sort of bugs you get when your code relies on stable sort but calls qsort and everything works great until the machine is under heavy load.

                                        • Sean1708 285 days ago

                                          The most annoying part of development, your users can and will rely on any observable behaviours of your software.

                                          • hinkley 284 days ago

                                            Someone recently did something very stupid/clever with an API that I wrote.

                                            In the code review I initially complained, but then I couldn’t think of any other way to interpret the design. So I guess that’s a feature now. ‘Course later we found a performance problem, but, you know…

                                      • zhangxp1998 286 days ago

                                        qsort has to invoke your comparison function repeatedly, which incurs a lot of overhead. Try C++'s std::sort

                                        • zhangxp1998 286 days ago

                                          See https://gist.github.com/zhangxp1998/0e2fa30656c894017d183e0d... for a comparison of quadsort with C++'s std::sort. The compare functions are inlined.

                                          • thedance 286 days ago

                                            Thanks for saving us the time. The punch line:

                                              Summarize: Slower than std::sort except on random tail.
                                            
                                            Also a very important tidbit that std::sort is 10x faster than c's qsort for ordered inputs.
                                          • alexchamberlain 286 days ago

                                            This feels a little unfair; the function is invoked the same number of times, but C++ has a mechanism for removing the overhead of calling a function (inlining).

                                            • zhangxp1998 285 days ago

                                              I looked at disassembly of generated binary, sure, function calls inside quad sort were also inlined.

                                          • Naac 286 days ago

                                            Since we're already using O(N) space, it would be interesting to see how this compares to Radix sort[0], which is O(N) space but O(N) time ( due to just hashing everything ).

                                            Like others have said, it would be cool to see quadsort stacked up to other current state-of-the-art sorting algorithms.

                                            [0] https://en.wikipedia.org/wiki/Radix_sort

                                            • stkdump 285 days ago

                                              Small nitpick: radix sort has nothing to do with hashing.

                                              • darawk 285 days ago

                                                A radix sort is a type of hashing, no? You're bucketing the items based on a reduced form projection of them onto some smaller subspace.

                                                • stkdump 285 days ago

                                                  Radix sort (similar to bucket sort) groups items based on individual digits of the non-hashed values. If you hash the values before, you will end up with the data being sorted according to their hash, but they will appear almost random in their unhashed form.

                                                  • dahart 285 days ago

                                                    A hash function is any function that maps arbitrary data to fixed-size values. A radix is a type of hash. Hashes are not defined as random or required to sort differently than the unhashed values. If you define a hash function that returns the first 32 bits of it’s input, then you have a hash that sorts almost the same as the unhashed values, as long as the first 32 bits are changing frequently, and you also have a hash function that you can call a radix.

                                                    • stkdump 285 days ago

                                                      I have never heard about hash function in the context of radix sort or anything similar as you describe. Wikipedia says about hash functions, that they "[S]cramble the bits of the key so that the resulting values are uniformly distributed over the key space". I would say that isn't the case for the function in radix sort that is used to 'pick' a digit.

                                                      • dahart 284 days ago

                                                        Well, good you were here, now you have heard about radixes as hashes. ;) It's good to see and understand the connections and relationships between these things.

                                                        You're right that a radix doesn't scramble the key, but the quote you've picked is a qualified subset of hash functions. That paragraph is attempting to define a practical/good hash function that is used in specific ways. Not all hash functions scramble the bits, and the Wikipedia article is very clear about this if you read the whole thing.

                                                        You skipped over two important sentences that came before it, and a whole sub-section on radix hashes after it:

                                                        "A hash function is any function that can be used to map data of arbitrary size to fixed-size values." (very first sentence, emphasis mine.)

                                                        "In some cases, the key is the datum itself." (Right before the 'scramble' quote)

                                                        https://en.wikipedia.org/wiki/Hash_function#Radix_conversion...

                                                        String hashing is sometimes similar to radix as well: "Simplistic hash functions may add the first and last n characters of a string along with the length" and I've seen string hashes in production that do only the first n characters and stop. That kind of hashing is frequently useful in small, embedded systems, video games, etc. where you have a limited set of strings and a good idea of how well distributed the keys are.

                                                  • namdnay 285 days ago

                                                    I think it's different in that you don't care what the output of a hash is, as long as it's sufficiently unique or whatever. Whereas here the buckets are are intimately tied to the input. It's more of an arithmetic hack I'd say, as it only works on decimal numbers

                                                    • klyrs 285 days ago

                                                      Radix sort works in any base

                                                      • level3 285 days ago

                                                        Radix sort also works on strings, etc. (anything that can be lexicographically ordered)

                                                        • namdnay 285 days ago

                                                          I guess it could work well for sets of strings that you know will not go above a certain length? But after that it might become painful, i.e. the algorithm complexity will start depending on the maximum string length

                                                          • level3 285 days ago

                                                            It's the same problem when working with numbers, since you have to assume some maximum number of digits as well. Radix sort essentially treats numbers as zero-padded strings.

                                                            When sorting single words consisting only of the letters A-Z, for example, you can think of it as the same thing but with 26 buckets (27 if you pad with spaces instead of As) instead of 10. Or you can think of it as a specific subset of numbers in base 36, if that makes more sense to you.

                                                            • snovv_crash 285 days ago

                                                              Well, it just means you actually see the complexity. Normally that is hidden in a strcmp, which is actually O(L) for the Length of the shorter string.

                                                              • level3 285 days ago

                                                                No, the point of radix sort is that you don't have to do comparisons. Radix sort on strings is the same as radix sort on numbers, just with more buckets.

                                                                • namdnay 285 days ago

                                                                  What they mean is that a standard comparative sort can also become very long if the strings are long, because strong compare can take up to the length of the string to return a result

                                                          • namdnay 285 days ago

                                                            Sorry, I meant decimal as in excluding some rationals e.g. not 1/3.

                                                        • Sean1708 285 days ago

                                                          Nope, if you hash the inputs then you won't be able to order them properly.

                                                    • QuinnWilton 286 days ago

                                                      I've never seen a sorting algorithm that uses a non-binary comparison function to order values. Is that a novel technique?

                                                      It seems really obvious in hindsight, so I'm sure there's just prior art I don't know about.

                                                      • klodolph 286 days ago

                                                        This is still binary comparison.

                                                        There are, in general, a number of different sorting algorithms which are optimized for a specific number of elements. In this case, four. It uses five binary comparisons to sort four elements.

                                                        You can find other algorithms like this, such as an algorithm that uses seven comparisons to sort five items, or one that uses ten comparisons to sort six items.

                                                        • QuinnWilton 286 days ago

                                                          Ah, I completely misunderstood what was going on with the quad swap at the start. Rereading it makes more sense. Thanks!

                                                        • eru 286 days ago

                                                          If tried that in eg Haskell to see if I can beat the standard library's highly optimized sort.

                                                          In theory, you can get O(n log k) performance, where k is the number of distinct elements (so k <= n). And crucially: you don't need to know k up-front.

                                                          In practice, all my attempts were absolutely slower than the standard approach based on binary comparisons only. (But that's saying more about me than about the domain.)

                                                          • tyingq 286 days ago

                                                            Perl and C++ use the <=> "spaceship" operator for that. Cmp for strings in Perl.

                                                          • loeg 286 days ago

                                                            Seems it's mergesort but with a slightly more complicated comparison primitive.

                                                            • proc0 286 days ago

                                                              Or it's like mergesort without the wasted steps that are proportionally less needed as data becomes less random.

                                                            • chkas 286 days ago

                                                              This Quicksort is almost twice as fast

                                                              https://rextester.com/XHCGA23293

                                                              • pvidler 286 days ago

                                                                Only in the random case. Already sorted in either direction and it's ~10x slower.

                                                                • chkas 286 days ago

                                                                  Who wants to sort sorted data. If the input data is more often sorted, you can test this before sorting.

                                                                  • dgudkov 285 days ago

                                                                    Data can be partially sorted and it happens quite often. As I understood, it's exactly the purpose of quadsort - to leverage locally ordered sequences.

                                                                    • imtringued 285 days ago

                                                                      Sort your data. Store it somewhere and then later add more unsorted data.

                                                                      • simias 285 days ago

                                                                        Sorting sorted or mostly-sorted arrays is not uncommon in many use cases.

                                                                        • chkas 285 days ago

                                                                          "Mostly-sorted" is a very vague definition.

                                                                          • seanhunter 285 days ago

                                                                            Usually what people mean by mostly sorted in CS is that there is some small K such that each element in the input is no more than K places from the position it would be in if the input was sorted.

                                                                            • chkas 285 days ago

                                                                              According to this definition, the "random tail" test data is not "mostly sorted".

                                                                              • seanhunter 285 days ago

                                                                                Well you could extend the definition to allow a small number of items which are entirely out of place. The point is that the right sort algorithm depends a lot on tthe distribution of input data and how much you care about worst-case vs average case trade offs.

                                                                  • thdespou 285 days ago

                                                                    I'm a bit sceptical because I don't see any mathematical proof. Only benchmarks which do not prove a lot. It may be faster only by a constant factor on a particular machine but for sufficiently large n it would be as fast as mergesort.

                                                                    1000000 is peanuts. We need to see convergence for about billions+ of numbers.

                                                                    • mcherm 285 days ago

                                                                      No one is claiming that this sort algorithm (or any other) is asymptotically faster than O(n log(n)). It can be mathematically proved that no sort algorithm can improve on this. The object with sort algorithms is to find one with good constant-factor performance on typical inputs.

                                                                      Frankly, I am more persuaded by the arguments in favor of an algorithm like "Tim-sort", which doesn't claim to micro-optimize hardware more efficiently (that's basically what "better swapping" is claiming), but instead claims that the algorithm is particularly efficient on some commonly-seen patterns (like "partially-sorted", or "pieces of the list are reverse-sorted"). Of course, any kind of argument about why one sort is better than another are inferior to actual benchmarks.

                                                                      • davuinci 285 days ago

                                                                        To be precise, we can prove that no sort comparison-based algorithm can do better than O(n log(n)) comparisons. There are some variations though that do better than that, though they are not based on comparisons (i.e counting sort, radix sort).

                                                                        • amcsi 285 days ago

                                                                          I'm quite a noob, and this is a bit off-topic, but if it's mathematically proven that no sorting algorithm can be faster then O(n*log(n)), but we know for sure that checking if an array is sorted is O(n), then doesn't that prove that P ≠ NP?

                                                                          • C4stor 285 days ago

                                                                            No it doesn't, since O(n*log(n)) is in P too. P = NP doesn't imply that the complexity of finding the solution has the exact same complexity than verifying it, just that both are polynomial.

                                                                      • aliabd 286 days ago

                                                                        Man, that's a good gif

                                                                      • WillyNourson 285 days ago

                                                                        Oh lord it's almost 1k long

                                                                        • msafadieh 285 days ago

                                                                          quad_swap64 and quad_sort64 are only about 350 lines combined.

                                                                        • ummonk 286 days ago

                                                                          I’ll read the article when I get the chance, but would this be in-place?

                                                                          • N3XT 286 days ago

                                                                            From the article:

                                                                            "These operations do require doubling the memory overhead for the swap space."

                                                                            It seems it is not in-place.

                                                                            • dpbriggs 286 days ago
                                                                              • ummonk 286 days ago

                                                                                Okay, that looks like it is proportional to the size of the array, so definitely not in-place.

                                                                              • metalliqaz 286 days ago

                                                                                I'm pretty sure it's in-place, thus the need for "swap space" to hold the values that are being moved.

                                                                                • cwzwarich 286 days ago

                                                                                  The term "in-place" almost always means O(1) additional space, whereas this uses O(n) additional space.

                                                                                  • gpderetta 285 days ago

                                                                                    technically quick sort needs O(log N) additional space and it is still considered in-place. I guess the threshold for in-place-ness would be less than linear additional space?

                                                                                    • utopcell 285 days ago

                                                                                      quicksort is not stable. the disadvantage of classic mergesort is that it needs to copy elements out of the input array, although there exist (slower) in-place variations.

                                                                                  • tedunangst 286 days ago

                                                                                    In place kinda means the opposite of in swap space.

                                                                                    • Dylan16807 285 days ago

                                                                                      But "swap space" without the "in" is pretty ambiguous. If it just holds 1 or 4 elements as you perform a single logical swap, you're still sorting in-place.

                                                                                      (This sort does not use the term that way, but another sort could.)

                                                                                    • ummonk 286 days ago

                                                                                      It looks like it is doubling the "swap space" used in a standard merge sort, not the "swap space" use in a quicksort.

                                                                                      • davrosthedalek 286 days ago

                                                                                        You technically do not need any swap space to swap two numbers. So algorithms which only use swaps could be implemented fully in place.

                                                                                        • utopcell 285 days ago

                                                                                          while you can indeed swap in-place for PODs [1], this is not true for c++ objects. Also, in-place swapping is not sufficient for in-place sorting.

                                                                                          [1] example: int a, b; a ^= b ^= a ^= b;

                                                                                    • dntbnmpls 286 days ago

                                                                                      No. He uses a temporary array for swaps. It's in the swap example he provided.