Map/Reduce for Mortals

(billwadge.wordpress.com)

76 points | by herodotus 10 days ago

11 comments

  • darkkindness 10 days ago

    The conversation Bill's touching on -- the tradeoffs between different notations -- is really valuable but I think it misses something a lot of us desire in a syntax: compositionality. Take the provided "onion" notation, loops, and the "new" syntax. They all look something like this:

      ┌────────┬────────────────────┐
      │myreduce│┌─────┬────────────┐│
      │        ││mymap│┌────────┬─┐││
      │        ││     ││myfilter│x│││
      │        ││     │└────────┴─┘││
      │        │└─────┴────────────┘│
      └────────┴────────────────────┘
    
    (The math one looks more like this:)

      ┌────────┬────────────┐
      │myreduce│┌────────┬─┐│
      │        ││mymap   │x││
      │        ││        │ ││
      │        ││myfilter│ ││
      │        │└────────┴─┘│
      └────────┴────────────┘
    
    But none of them provide the same kind of "putting pieces together" feeling as this:

      ┌────────┬─────┬────────┬─┐
      │myreduce│mymap│myfilter│x│
      └────────┴─────┴────────┴─┘
    
    which we see in the wild as this:

      (myreduce ∘ mymap ∘ myfilter)(x)
    
    or this:

      x | myfilter | mymap | myreduce
    
    or this:

      myreduce mymap myfilter x
    • nikonyrh 10 days ago

      I didn't spot Clojure here yet:

        (->> [1,2,3,4,5]
             (filter #(-> % (mod 3) (= 1)))
             (map    #(* % %))
             (reduce +))
      
      #( ... ) is an anonymous function, arrows are threading macros: https://clojure.org/guides/threading_macros

      If you wanted more trickery you could play with transducers and compose them with "comp".

      • fulafel 10 days ago

        List comperhension version without threading:

          (reduce +
            (for [n [1, 2, 3, 4, 5]
                  :when (= 1 (mod n 3))]
              (* n n)))
      • vmchale 10 days ago

        > reduce(lambda t,x t+x,map(square,filter(lambda x: x % 3 == 1,[1,2,3,4,5])))

        > Clear? As mud

        > the lambdas and the functions obscure what is being computed.

        Simply use J!

        g =: +/ @: : @ (] * =&1@(3&|))

        Or:

        f =: +/ @: *: @ (#~ (=&1@(3&|)))

        • enragedcacti 10 days ago

          This already exists in Python in the form of comprehensions

            sum(x * x for x in [1,2,3,4,5] if x % 3 == 1)
          
          I'm not familiar with Lucid but maybe evaluating the implementation of comprehensions could help solve some of the design challenges.
          • bade 10 days ago

            Might work, worth thinking about

          • lalaithion 10 days ago

            Having pointfree functions and curried application makes the functional example more readable, as it removes the obscuring lambdas:

                (reduce (+) . map (^2) . filter ((==1).(%3))) [1,2,3,4,5]
            
            The above is pseudo-Haskell. If we want to
            • lgas 10 days ago

              In actual Haskell I would write it as

                  sum [ n^2 | n <- [1..5], n % 3 == 1 ]
              
              which seems to express the idea fairly directly and I'm fairly sure has a close analog in python.
              • lightsighter 10 days ago

                python: sum([n * n for n in range(1,5,3)])

                • marai2 10 days ago

                  haskell: sum [n^2 | n <- [1,3..5]]

                  I just remembered haskell's "range" operator ".." can also take a step size.

                  ['a', 'd' .. 'z'] => "adgjmpsvy"

              • 1-more 10 days ago

                yeah to me this just hollers for the kind of "pour it through the functions" approach we see in functional languages or even with Ramda.pipe in JS. Ezpz.

                • adamch 10 days ago

                  Or in functional rust

                    (1..5)
                      .into_iter()
                      .filter(|i| i % 3 == 1)
                      .map(|i| i * i)
                      .sum();
                  
                  I don't know if we really need a new syntax for map reduce.
                  • crooked-v 10 days ago

                    That's also basically the same syntax as doing it in JS, aside from the lack of builtins:

                        import { range } from 'ramda';
                    
                        range(1, 6)
                          .filter(i => i % 3 === 1)
                          .map(i => i * i)
                          .reduce((a, b) => a + b, 0);
                    
                    Or, if you're using the pipeline operator (experimental, stage 1, nobody can quite decide on exactly the syntax but this gives the rough idea):

                        import { range, sum } from 'ramda';
                    
                        range(1, 6)
                          |> x => x.filter(i => i % 3 === 1)
                          |> x => x.map(i => i * i)
                          |> sum
                    • 1-more 9 days ago

                      We can get rid of these arrows using ramda's curried list functions, at the expense of being slower and harder to read but making us feel more clever:

                        pipe(
                          range,
                          filter(pipe(
                            modulo(__, 3), // x => x % 3
                            equals(1) // y => y === 1
                          )),
                          map(flip(Math.pow)(2)),
                          sum
                        )(1,6)
              • oneplane 10 days ago

                > "that’s as close as I can get in wordpress but you know what it looks like"

                Um, I still don't know what it looks like. I know mod and there is a range of numbers it seems and an i to the power of two, but I really have no idea what the other symbols mean or if their position/layout in that fashion is important. I can probably reverse engineer what the mathematician writes by reading the code but I'm still confused as to why many articles about software assume a complete mathematics syntax knowledge.

              • np_tedious 10 days ago

                What exactly is this pyLucid? I searched for it and found a CMS and a generative video ML library. Don't see how either one connects with this.

                https://github.com/jedie/PyLucid https://github.com/yelantingfeng/pyLucid

              • proc0 10 days ago

                List comprehension in Haskell is probably the best approach so far, imho. However once you're using these tools then map, reduce, filter should be easy to understand, so the terseness and clarity of it doesn't come without a cost.

                let mod3 n = if n `mod` 3 == 1 then n else 0

                sum [ i^2 | i <- [ mod3 n | n <- [1..5] ] ]

                • vmchale 10 days ago

                  > sort of objects are ranges? In Python they would be iterators, and that works out fine. In Haskell they would be lists.

                  > I think the best choice is to make them vectors (arrays)

                  In Haskell they are lazy linked lists! Which in theory is more efficient than vectors (in terms of space).

                  • dnautics 10 days ago

                    this seems to be a syntax problem. The for loop is not really 'equivalent' in complexity to the functional solution. In Elixir, I'd do this:

                       list
                       |> Enum.reduce(0, fn ->
                          x, acc when mod(x, 3) == 0 -> acc + x * x
                          _, acc -> acc
                       end)
                    
                    But the presented functional solution throws more functions at it, which might look like this, which I think is quite clear.

                        list
                        |> Enum.filter(&(mod(&1, 3) == 0))
                        |> Enum.map(&(&1 * &1))
                        |> Enum.reduce(acc, &Kernel.+/2)  #better yet use Enum.sum
                    • nesarkvechnep 10 days ago

                      You might even want to use the Stream module. Beautiful Elixir.

                      • cutety 10 days ago

                        Stream would technically better, however given the discussion is about Map/Reduce, the only thing Stream has in common with Map/Reduce is it's lazy. If you wanted something comparable (mapping is done in parallel, reducing as well just over partitions), then you'd want to use the Flow[1] library. As it does the same thing as Stream.map |> Enum.reduce just parallelized/partitioned, and what's great is the Flow module is more-or-less a drop in replacement for Enum/Stream (with a few caveats like calling Flow.partition before Flow.reduce). But, with just some quick a dirty benchmarks you can see Flow outperforms Stream on all but the smallest data set (range 1..100):

                            with_stream = fn range ->
                              range
                              |> Stream.filter(&(rem(&1, 3) == 0))
                              |> Stream.map(&(&1 * &1))
                              |> Enum.reduce(0, &Kernel.+/2)
                            end
                            
                            with_flow = fn range ->
                              range
                              |> Flow.from_enumerable()
                              |> Flow.filter(&(rem(&1, 3) == 0))
                              |> Flow.map(&(&1 * &1))
                              |> Flow.partition()
                              |> Flow.reduce(fn -> [0] end, fn val, [acc | _] ->
                                [Kernel.+(val, acc)]
                              end)
                              |> Enum.sum()
                            end
                            
                            iex(4)> Benchee.run(
                            iex(4)>   %{"stream" => with_stream, "flow" => with_flow},
                            iex(4)>   inputs: %{"small" => 1..100, "medium" => 1..10_000, "large" => 1..10_000_000}
                            iex(4)> )
                            Operating System: macOS
                            CPU Information: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
                            Number of Available Cores: 4
                            Available memory: 8 GB
                            Elixir 1.9.4
                            Erlang 22.2.1
                            
                            Benchmark suite executing with the following configuration:
                            warmup: 2 s
                            time: 5 s
                            memory time: 0 ns
                            parallel: 1
                            inputs: large, medium, small
                            Estimated total run time: 42 s
                            
                            Benchmarking flow with input large...
                            Benchmarking flow with input medium...
                            Benchmarking flow with input small...
                            Benchmarking stream with input large...
                            Benchmarking stream with input medium...
                            Benchmarking stream with input small...
                            
                            ##### With input large #####
                            Name             ips        average  deviation         median         99th %
                            flow          0.0994        10.06 s     ±0.00%        10.06 s        10.06 s
                            stream        0.0782        12.78 s     ±0.00%        12.78 s        12.78 s
                            
                            Comparison:
                            flow          0.0994
                            stream        0.0782 - 1.27x slower +2.72 s
                            
                            ##### With input medium #####
                            Name             ips        average  deviation         median         99th %
                            flow           83.87       11.92 ms    ±20.48%       11.30 ms       25.53 ms
                            stream         74.88       13.35 ms    ±32.02%       12.32 ms       30.22 ms
                            
                            Comparison:
                            flow           83.87
                            stream         74.88 - 1.12x slower +1.43 ms
                            
                            ##### With input small #####
                            Name             ips        average  deviation         median         99th %
                            stream        4.98 K        0.20 ms    ±87.16%       0.169 ms        0.56 ms
                            flow          0.70 K        1.42 ms    ±21.58%        1.35 ms        2.52 ms
                            
                            Comparison:
                            stream        4.98 K
                            flow          0.70 K - 7.06x slower +1.22 ms
                        
                        [1] https://hexdocs.pm/flow/Flow.html