Serverless Raspberry Pi Cluster with Docker

(blog.alexellis.io)

247 points | by alexellisuk 2412 days ago

9 comments

  • mrmondo 2412 days ago
    > 'Serverless is an architectural pattern resulting in: Functions as a Service, or FaaS'

    Then call it FaaS - it's not serverless and that term is misleading, marketing bunk. Even the title of the post describes the servers used in it's deployment, I'm quite sick of this term - I don't find it at all helpful when describing application architecture.

    • alexellisuk 2412 days ago
      It is what it is - and I put a paragraph at the start of the blog for folks who are still confused. It explains that Serverless is a pattern and not a literal term. I even wrote a blog post about it - we've done such a poor job in this industry of explaining it. https://blog.alexellis.io/introducing-functions-as-a-service...
      • senko 2412 days ago
        Maybe if a better term is used it wouldn't need so much explaining.

        To me, "Functions as a Service" is massively more obvious than "Serverless".

        • xupybd 2411 days ago
          And the cloud would be better as a server somewhere on the internet but terms like this get created because it's easier to sell a one word concept.
          • alexellisuk 2411 days ago
            Yes... can we all start berating every blog post that uses the word "cloud"? There are no clouds in this server! :-D
        • DJHenk 2411 days ago
          It is not to me. Isn't a 'Function as a service' just a... service? Why not just call it a multi server "Platform for services".
    • stonewhite 2411 days ago
      Exactly, because what can be considered serverless can wildly vary, such as:

      Google App Engine, Google Big Query , AWS Lambda, AWS Athena

      The name (which is just a marketing term, not technical) just reflects that you don't deal with servers but with services.

    • jabzd 2411 days ago
      I consider it "serverless" similar to how "stainless" steel only lessens, but isn't a full 100% guarantee. That makes me feel slightly better about the term, ha.
    • avip 2411 days ago
      Even more so confusing as the term "server" is currently pretty meaningless IMO (exhibit one: tiny rpi board is also a server).
    • bananarepdev 2411 days ago
      I agree with you, to an extent. By the same criteria, some PaaS services could also be considered serverless, but the term does not apply to them. However, since the term evokes the idea of "less infrastructure to manage" and, consequentially, "lower costs", it works well to draw the attention of executives.
    • anotherbrownguy 2411 days ago
      AWS Lambda is "serverless" to customers. You only pay for running the functions, even though Amazon pays for running the underlying architecture which obviously runs on servers.
      • yeukhon 2411 days ago
        Well, the difference between that and running on Elasticbeanstalk would be you choose the sizing while Amazon chooses the sizing for you in Lambda. The reason people seem to advocate FaaS is because Lambda can be viewed as a job sent to queue and then runs based on event criteria (time based or event trigger based). Look at Lambada like writing and calling a job from queue (eg celery).

        Nothing fancy about serverless.

    • zwischenzug 2411 days ago
      I disagree; I find it very helpful.

      It's not terribly complicated either - if you are only concerned with code, then your deployment is 'serverless' in the sense that _you are not concerned with servers_.

      Of course 'servers' are involved; the point is whether that's of concern to your deployment.

      • frou_dh 2411 days ago
        Daemon processes (100% software) also get called "servers".

        In an old job, in an effort to disambiguate, I would always use "computer" to mean hardware (losing battle).

      • icebraining 2411 days ago
        I'm not sure that distinction is clear. For example, in AWS Lambda, you still have issues like functions being "warm" or not, and people write code specifically to handle that. Meanwhile, nowadays you can acquire and configure even dedicated machines using just code.
        • joseph 2411 days ago
          If you have to configure a machine, even in code, then you aren't "serverless". To me, serverless means that you do not manage the machines or even have direct access to them.
          • icebraining 2411 days ago
            What's a machine? :) A dedicated server sure is, but a VPS? An Heroku Dyno? What's the line?
            • monkmartinez 2411 days ago
              > What's a machine?

              I would say "machine" means the hardware and/or the operating system of the "server" (which is a fancy word for "computer").

              The "and/or" part being very important!

        • zwischenzug 2411 days ago
          Fair points. I'd say it's as clear as 'middleware'. When you are trying to get to the limits of performance (or just optimizing), then you tend to start thinking about the platform it runs on.

          This is the problem with most abstractions in IT I guess - they're beautiful and clean until reality bites.

      • lucaspiller 2411 days ago
        Wouldn't that definition apply to services like Heroku as well though?
    • erikpukinskis 2411 days ago
      > I don't find it at all helpful when describing application architecture.

      If you can draw the application architecture without anything that is best labeled a server, wouldn't that make it a "serverless" architecture?

  • elcapitan 2412 days ago
    This may be a stupid question, but is there an actual use case for building clusters of raspberry pi's?
    • zeta0134 2412 days ago
      Part of me wants to believe that a solar powered raspberry pi cluster has a legitimate use case. The things use like ~1W of power, so the board itself is surprisingly efficient compared to a "real" server.

      But really, I think the primary use case is cost. Actually having access to that many physical machines to play with in a classroom or home learning environment is sort of new! The market hasn't really had such accessible linux computers at "Ehh, if it breaks I'll just buy a new one, no big deal" prices. It's educational, and the more stable the ARM support is, the better a student's skills will transfer over into the real world of systems administration.

      • creshal 2412 days ago
        > The things use like ~1W of power, so the board itself is surprisingly efficient compared to a "real" server.

        Try 3.5 watts[1], not counting overhead of most USB power bricks being incredibly inefficient.

        A current-gen 35W laptop CPU will be some 10 times faster[2] as a RasPi, have much faster storage available (SATA3 or NVMe versus… USB2), much faster I/O (GBit LAN and GBit Wifi versus… USB2), and a lot of other benefits. (Like an integrated screen and battery and keyboard and …) It also won't need external hardware to communicate with other cluster members – that 10-port ethernet switch will need power, too.

        One RasPi is relatively energy efficient; RasPi clusters… not so much.

        > But really, I think the primary use case is cost.

        Indeed.

        [1] http://raspi.tv/2016/how-much-power-does-raspberry-pi3b-use-... , see the numbers for "Multi-threaded CPU Tests", which is the most applicable for server workloads

        [2] Running that script manages ~9 runs/second on an i7-6700HQ, vs. ~0.9 run/second on a RPi3.

        • ineedasername 2411 days ago
          They're using Pi Zero's in this post, which draw much less than others, between 0.4W and 1.0W, probably safe to assume 0.7W as an average load [1]

          And at $5 each, if we're talking hardware costs for setting up a "toy" cluster for, say, self-learning or student labs, that's hard to beat. I suppose you could do better using VMs for a virtual cluster, but that adds other complications unrelated to the clustering task. But I agree there doesn't otherwise seem to be much practical purpose here, and the overhead of running an OS on each Pi really cuts into performance compared to a single chip w/ multicores instead.

          [1] https://www.jeffgeerling.com/blogs/jeff-geerling/raspberry-p...

        • marmaduke 2411 days ago
          If you had ten of them, ideally you'd use a USB power supply where that overhead is a much lower percentage.
        • omgwtfbyobbq 2411 days ago
          You can probably shave off another .2W by disabling the HDMI and LEDs, but an RPi3 at load will probably be at 4+W from the wall.

          At the same time, you're comparing the power consumption of lets say 10 whole RPis/platforms to the consumption of a single processor. Stick that processor in a platform (laptop), and it's going to use much more than 40W.

          Like you said, you get a lot more with the laptop, but given your benchmark (10x difference), my guess is that 10x RPis would still be more power efficient than a laptop with a 6700HQ at that specific task.

        • marmaduke 2411 days ago
          6700HQ is 45 W TDP?
          • jtuente 2411 days ago
            • wulfklaue 2411 days ago
              /Looks at price...

              Gets you 11 Pi's ... Gets you only 1 Intel CPU, no memory, motherboard, heatsink, fans.

              Reminds me of the Celeron® Processor J3455... 10W rating on Intel there page. On AVERAGE! Then when you see the real power usage under load for MB + CPU + 16GB memory, its actually doing 35W.

              Where as the Pi's are doing 3.7W max per piece. So even with 4 pieces to match the performance, your still half the wattage.

              If Intel really scaled that good in power vs performance, why are we not seeing x86 phones all the time?

      • deelowe 2411 days ago
        If you do a TCO calc against TDP and performance, you'll find that a cluster of RPIs is lower TCO than a more traditional low power intel solution.
        • deelowe 2411 days ago
          That should be higher tco...
    • _b8r0 2411 days ago
      I have a Clusterhat[1] and a Pi Zero cluster in an MPI (Beowulf) configuration.

      It's the 3rd beowulf cluster I've ever built, the 2nd being from recycled PowerMacs and the 1st being built with Pentium IIs. It's the most powerful Beowulf I've ever built. It's also the smallest. It fits in my hand and it runs off USB.

      Now you know what it is, I'll tell you about what I use it for.

      The first problem I used it for was to approximate 1 billion digits of Pi. I started with Monte Carlo methods, but while they scale well they're non-optimal. Eventually I managed to implement a Chudnovsky-type algorithm that worked despite the limitations of the Pi 3 head node and Pi zero nodes.

      Most recently I wrote code to explore the Mandelbrot set. Using some custom software I knocked up, I set a start and finish x,y,z,w and h coordinate set and it renders individual frames which are then stitched together with ffmpeg.

      I need to rebuild the cluster because I made some booboos with how it was set up, and there's been substantial advances in the HAT configuration. I'm thinking of doing it over christmas.

      What I've found works best are:

      * Learning about problems * Learning about scaling problems * Learning about scaling problems with solution constraints * Learning about scaling problems with solution constratints over a very long period of time.

      As long as you're not in a rush to finish calculations and don't mind picking something up, pecking at it and coming back later (like say, a week or so) the Pi is mostly fine. Although ISTR my final Pi approximation was in the order of minutes to run to a million digits.

      I know other people host sites, I just like doing basic maths problems to improve my maths and algorithms knowledge.

      [1] - https://clusterhat.com/

    • vorpalhex 2411 days ago
      It's a cheap way to play with clusters while still using pretty common hardware and operating system choices. In addition the pi is small, the hardware is pretty simple and available, and it has enough flexibility for any basic project.

      In terms of any performance benefit? No.

      • marmaduke 2411 days ago
        Is that very different from run a dozen containers on a single machine?
        • vorpalhex 2410 days ago
          Containers come with their own set of expectations, needs, requirements and problems. They can be a useful tool but I suspect the cluster of Pis is a much more accurate model for multi-computer modeling.
        • tyingq 2411 days ago
          Probably better models bandwidth constraints, being able to simulate network splits easily, etc.
    • gkya 2411 days ago
      Learning how to work with computer clusters is an actual use case I believe.
      • Slimbo 2411 days ago
        You can do that on any reasonably powerful desktop and virtualization and not have the hardware costs.
        • zdkl 2411 days ago
          But then you're missing out on a whole class of hardware related problems that you may need to learn.
    • snarfy 2411 days ago
      If you used the gpu maybe, but in reality you are better off using a modern intel or amd cpu. Not only do they have much better performance, but they even have better performance per watt.
  • metakermit 2412 days ago
    If you still like going "serverful", but want an easy method of deploying to your Raspberry Pi using Docker, check out this open source Docker Hub for ARM alternative

    https://marina.io/

    (Full disclosure: I am one of the authors)

  • gkya 2411 days ago
    Does anybody really use this FaaS thing? This is the weirdest of the New Things I've seen recently.
    • alexellisuk 2411 days ago
      Despite appearing like "magic" it's basically smaller micro-services, but with a different packaging, deployment and monitoring model. The entire Alexa skill set is driven from these functions.
    • tracker1 2411 days ago
      Been using AWS Lambda for a few things... it works surprisingly well if you have intermittent processes that can fit in the memory and time constraints... for example, one of the lambdas I worked on is triggered from S3 uploads (CSV for processing from a client), the Lambda will parse the CSV into bundles of JSON objects that are then sent into SQS for processing individual items, which can take up to 2 minutes each item.

      You can build pipelines from S3, SQS, SNS, Lambda to do a lot of work very quickly in parallel with less overhead than similar self-hosted or self-managed solutions. You don't have to worry about spinning up extra VMs, or dealing with overprovisioning. It all just works.

    • bananarepdev 2411 days ago
      Like most new tech, it has it's applicability. Depending on your needs, and the characteristics of your application load, it can save you money.
      • dvfjsdhgfv 2411 days ago
        Could you give an example of a scenario where it can actually save me money?
        • brango 2411 days ago
          Chromeless looks promising. Being able to run a load of scrapers in parallel at certain times could be useful if time sensitivity is an issue. E.g someone sends you a batch of URLs to screenshot and each site takes a while to render. You could use lambdas to run all those processes in parallel instead of wasting money keeping capacity lying around just for those spikes.
        • jclulow 2411 days ago
          I suspect it makes economic sense only when your workload is relatively elastic, having a relatively low duty cycle. The ability to pay less when you aren't actually using any resources is likely of more economic benefit as it becomes more fine-grained. If you aren't in that position, other models of lease or ownership of computing resources probably merit consideration.
          • dvfjsdhgfv 2411 days ago
            Well, to be honest I pay 40€/month for a baremetal server (i7 SkyLake, 64 GB RAM, 4TB HDD). It's a powerful machine running many services, including a few virtual machines. I consider whatever it does would qualify as low-duty. Now, I know that each month I'm paying 40€ for all this. The last time about I read about serverless was when someone directly discovered the money saving part is quite tricky: https://news.ycombinator.com/item?id=14982220.
        • Prefinem 2411 days ago
          When the cost of a Dev Ops or Sys Admin is more than the service.
          • dvfjsdhgfv 2411 days ago
            That's a general answer on how you can save by using the cloud, I specifically meant the "serverless" variant.

            (Apart from that, I think this is a misconception - Amazon seems to have convinced people you don't need a sysadmin anymore, whereas in fact once you start exploring the whole AWS infrastructure and its complexity, you quickly realize you still need sysadmin's knowledge plus understanding of how their services work and all their quirks.)

            • Prefinem 2411 days ago
              I understand that with AWS, you need a sysadmin. That is why I was saying with serverless (in this case, AWS Lambda) you didn't need a sysadmin. I assume it's the same with Google Functions and whatever Azure has.
              • tracker1 2411 days ago
                But you will need other things... you need some kind of fronting system to tie things together, you need some sort of DB/Storage. And while you can get away with fewer admins, someone will be spending part of their time in a sysadmin role. It's more a matter of how much can get done with how many admins.
                • Prefinem 2411 days ago
                  I get the point you are trying to make, but as AWS offers many services, I don't need a sysadmin for a DB (RDS, Dynamodb, ElasticSearch, S3) or for APIGateway nor for any coordination between systems (SNS, SES)

                  True serverless let's me offload that cost to AWS instead of having a sysadmin

                  • tracker1 2411 days ago
                    Who maintains the database, schema, updates? Application deployments, testing, qa, updates? There's someone doing the job, even if it's fewer people, or someone with multiple hats.
                    • Prefinem 2411 days ago
                      > database, schema, updates

                      Whoever created them

                      > Application deployments

                      Build pipeline

                      > testing, qa

                      QA / Customer Support

                      So, still no sysadmin.

                      Not saying it has to be this way, just saying originally that serverless can save you money.

                      In fact, we are in the process of moving all our APIs over to AWS Lambda w/ ES and it's going to save us 25-50% of our EC2 costs.

                      We might be able too do the same since we are re-writing in another language, but without AWS Lambda, we would have never gotten that shot.

  • zxcmx 2412 days ago
    This is super awesome and fun but personally I had to migrate my micro datacenter from pis to nucs.

    The "armhf tax" is that you tend to have to build your own images for stuff :( Then you need your own build infra (or "heath robinson" qemu builds) because pis run out of memory building a lot of stuff... but mainly if C++ is involved so ymmv.

    That said, I got a rack of 8 pis doing nothing right now, so...

    (unrelated http://www.bitscope.com/product/BB04/ is handy if you want to rack a lotta pis, not affliated...)

    There is probably a micro business for someone running a slick docker build system for armhf handling the qemu emulation or toolchain dirtiness "under the hood" in the cloud somewhere, on x86-64 boxes with a lot more than 1GB of RAM.

    • jacobush 2412 days ago
      You could try running a Raspberry but add a Network Block device from another machine and. (xNBD: https://bitbucket.org/hirofuchi/xnbd/wiki/Home )

      Then export a RAM disk from that machine and add the NBD disk as swap on the Raspberry. It would be slow, but builds would complete. Then you'd need only one low-to-moderate power machine (a PC presumably) in your Raspberry cluster, just with lots of RAM in the one PC.

      • Artemis2 2411 days ago
        Scaleway has a good array of ARM cloud offerings: https://www.scaleway.com/armv8-cloud-servers/

        Could this work for speeding up builds of ARM images and then deploying locally?

        • alexellisuk 2411 days ago
          Packet.net can go one further - take those 8 cores and upgrade to 96 cores and 120GB RAM.. it's ARMv8 which will be next for OpenFaaS when the Docker support catches up :-)
    • rubenbe 2412 days ago
      Can you elaborate on your NUC setup? Which NUC did you choose?

      I have a similar RPI rack collecting dust for the same reason, hence the question.

    • omgwtfbyobbq 2411 days ago
      I've built ABS (archlinux) packages by having some swap space on my pogoplug mobile's boot drive (1TB USB). It's a slog, but stuff will finish sooner or later, and by that I mean later or really late.

      Now that I'm thinking about it, I'd like to see if going from the RPI's USB3->SATA adapter->M.2 adapter->16GB of Optane ($40+tax locally) would work, and if it did work (a big if), what performance is like.

      Edit - Scratch that, I just remembered the Pi 3 is still USB 2.

  • lngnmn 2411 days ago
    Serverless meme means that all I got is a chrooted directory with only systemd and /lib but without signals, similar to an apache vhost with a cgi-bin, so I could run my full-stack crap and it is supposed to be so damn cool because actual server maintenance is now someone else's problem?
  • Hortinstein 2411 days ago
    great article, been a huge fan of your stuff in the past. Helped me get some good ideas of things to use my Pi and Clusterhat with. https://clusterhat.com
  • syvanen 2412 days ago
    So this blog instructs to install commercial docker instead of moby. I wonder how the licensing goes if you were to use this.
    • filipn 2412 days ago
      I believe it actually installs the Docker community edition (CE) which is under the Apache 2.0 license. Docker (the product) is assembled using the Moby libraries and components.
      • alexellisuk 2412 days ago
        Correct, Moby is not a distribution of a container runtime.. it's not the "docker" you are looking for. Docker CE is and that's what's used in the guide.
      • maniktan 2411 days ago
        Full Disclosure: I, Manik Taneja, am the Product Manager for all the open source efforts at Docker and work on the Moby Project.

        As suggested here, the blog post only talks about installing the Docker Community Edition (CE) that is published under Apache 2.0 License. This is the official incarnation of the Docker Product and is provided to have:

        - a consistent user experience across different linux distributions - strong security guarantees - regular bug fixes and updates

        Moby Project serves as the upstream for the entire Docker Product and includes all open source components that make up Docker, such as runc, containerd, notary, moby, infrakit, linuxkit, libnetwork, hyperkit, vpnkit, datakit, etc.

    • zeta0134 2412 days ago
      I imagine the licensing goes something like this:

      https://www.docker.com/components-licenses

      It's my understanding that the bulk of the components that make Docker work are fully open source, and some extra support and deployment related things are the only things that are commercially licensed. This would include Docker Swarm, as it's a part of Docker itself and not something separate. IANAL though.

      I'm pretty sure the reason they go for plain Docker over Moby is sheer ease of use. Despite being a bit weird to understand under the hood, Docker is just dead simple to get up and running with, and using the clustering mode that's built right in is easier to teach new readers than Moby, which from its Github page is obviously designed for folks that are already rather comfortable with Docker. Straight from Moby's Github page:

      "Moby is NOT recommended for: Application developers looking for an easy way to run their applications in containers. We recommend Docker CE instead."

      https://github.com/moby/moby

    • alexellisuk 2412 days ago
      That's wrong - this is the free / open-source Docker version. How did you get that impression?
  • imtringued 2412 days ago
    I'm surprised that the raspberry pi monoculture prevailed even until today.