BeagleY-AI: a 4 TOPS-capable $70 board from Beagleboard

(beagleboard.org)

133 points | by smarx007 31 days ago

19 comments

  • mynegation 30 days ago
    I wish there was some reference of what can I actually do with 4 or 8 TOPS and 4 or 8 or whatever GiB of accelerator memory. Can I run speech recognition? Run a model for object detection in images? In video? LLMs - probably not. Stable Diffusion also seems out of question. But really - what can I run?
    • juitpykyk 30 days ago
      For scale, Microsoft requires 40 TOPS to certify a (consumer) computer as Copilot capable.

      https://www.techpowerup.com/320933/microsoft-copilot-to-run-...

    • yjftsjthsd-h 30 days ago
      Agreed! What I really want to know from any hardware accelerator is:

      * What can it accelerate?

      * What software supports it (and what's the state of driver support)?

      * How much faster or more power efficient is it vs CPU/GPU?

      * How exactly do I use it? (Okay, I buy the thing, it arrives in a box, I plug it in... what exact steps do I take to get to running real workloads on it?)

      • randunel 30 days ago
        There's about two 240p object detection demo videos with source code, one picture object detection demo, a bunch of forum posts with incomplete code about some audio processing, and they're all about a different SBC. Software wise, there's not much to it, you're on your own.
    • jdietrich 30 days ago
      4 TOPS is last-gen smartphone territory - you aren't going to have a good time running big generative models, but it's a useful amount of performance for typical edge workloads like machine vision and robotics.
      • eternityforest 30 days ago
        Efficientdet gives almost acceptable performance for security cams on an i5 if you only decode keyframes, I wonder how many NVR channels it could handle?
      • sciencesama 30 days ago
        google coral tpu is 4Tops btw
        • hcfman 30 days ago
          These are INT8 TOPS are they not? That being the case, the Orin NX series are 50 TOPS I believe, the Orix AGX being 100 TOPS.
          • KeplerBoy 30 days ago
            Orin AGX is advertised with 275 int8 Tops.
        • mkl 30 days ago
          From 2019 though.
    • robcohen 30 days ago
      One thing that comes to mind is AI detection in Frigate https://docs.frigate.video. I’ve not played very much with AI but I’ve dabbled with Coral this past month and it would definitely be nice to have a fully integrated machine that could be dedicated to my security video feeds.
      • bombela 30 days ago
        I was given a coral USB by a friend, and it is handling detection for 6 cameras without issues on frigate.

        I have configured the cameras (had to go through an activX on IE6 via a MS Windows VM, yay) to serve two streams. 480x360 5fps for the AI detection, and 2592x1944 25fps for recording.

        Only the low quality stream is decoded on the computer, for a very light load.

    • explorigin 30 days ago
      4 TOPS is the same as the Coral Accelerator. Look on YT for how people have used it. Another poster mentioned video object recognition with Frigate.
    • sheepybloke 30 days ago
      The bigger question for me is if it's easy to uses or add the accelerator memory to your application. For something like a virtual assistant or something with some basic image recognition, it can be really confusing about how to take advantage of the different hardware tools that are there. This is especially true in the maker space, when you're trying to leverage open source tool kits and don't necessarily have the time or knowledge to write a full, lower-level driver for open source tools kits.
    • gigatexal 30 days ago
      Probably good enough to run openCV and turn it into a people and object recognizing brain for cameras but not sure.
  • Alupis 30 days ago
    BeagleBoard is akin to a Raspberry Pi. It's not meant to be a powerhouse PC, or in this case, a powerhouse AI computing platform.

    It's meant for embedding, tinkering, and learning.

    To that end, 4GB of RAM on an AI Accelerator board is fine - the expected workloads will not consume a lot of RAM. This also makes the lack of NVMe sufficient as well.

    For more "horsepower" there is also the BeagleBone AI-64[1], which claims up to 8 TOPS.

    [1] https://www.beagleboard.org/boards/beaglebone-ai-64

    • mmoskal 30 days ago
      Looks like 1/10 of TOPS and 1/3 of mem bandwidth of Jetson Orin Nano (the TI chip seems able to run 3733MT/s at 32 bit, so 15GB/s) but much cheaper.
      • reaperman 30 days ago
        The 5-year old Google Coral Edge TPU does 4 TOPS for $40-130. Comparing exact features, the $70 BeagleY-AI probably most directly compares to Coral's $130 option.

        https://coral.ai/products/

        https://www.aliexpress.us/item/3256806327903926.html?src=goo...

      • Alupis 30 days ago
        For reference:

        - BeagleY-AI - 4 TOPs - $72.00

        - BeagleBone AI-64 - 8 TOPs - $187.50

        - Jetson Orin Nano Dev Kit - 40 TOPs - $499.95

        The 20 TOPs Jetson Orin Nano seems to only be available in a commercial module, not in a dev kit. The commercial module is priced at $259.00.

        • phlipski 30 days ago
          Keep in mind that once you've prototyped a product on the BeagleY-AI board - you can actually BUY the TI AM62A74 chips in volume for ~ $25. Good luck sourcing the SoC on that Jetson Orin Nano board...
      • Aurornis 30 days ago
        You can buy a full desktop GPU for the price of a Jetson Orin dev board. Not really a good comparison point for a $70 SBC.
        • imtringued 30 days ago
          A full desktop GPU would drain your battery and take up too much space so it is inferior in almost every aspect for the use cases that the Jetson Orin is built.
          • KeplerBoy 30 days ago
            Talk about Laptops with dedicated GPUs, that's the most Tops/Watt you're going to get without any of the software stack problems.

            Probably the solution to look for if you need >30 Watts of compute.

        • moffkalast 30 days ago
          The Orin boards can do what a desktop GPU can do while drawing 10-20x less power. At the cost of terrible software support and hardly any memory bandwidth.
    • pjmlp 30 days ago
      The first PC where I installed Slackware 2.0, had 16 MB of RAM....

      Not a powerhouse indeed.

      • yjftsjthsd-h 30 days ago
        Your first PC probably couldn't run any modern AI programs. I know there are times when it's reasonable to question whether we really need so much hardware for the things we're running, but I'm pretty sure currentcurrent-gen AI is actually a place where we're using the hardware to its limit.
  • rcarmo 30 days ago
    This looks interesting, but the trouble with these small built in accelerators is that they were mostly designed when YOLO was the pinnacle of edge applications. These days they’re grossly underpowered…
    • bradfa 30 days ago
      Grossly underpowered compared to a fat stonking GPU, yes. But the whole BeagleY-AI board probably consumes less power at full tilt than just the fans to keep a modern 2-slot NVIDIA GPU cool.

      For embedded applications, the C7x DSPs are mighty. Lots of existing C6x DSP code can easily be recompiled to target the C7x with minimal effort as well, which is a big deal. Making your C6x code work efficiently on the C7x may require more than just a recompile, but being able to leverage existing C6x DSP codebases with minimal investment while getting the extra performance of the C7x is a very big deal.

      I don't think a quad-core ARM Cortex-A53 with a pair of C7x DSPs is trying to compete with an NVIDIA Hopper. But if you're making a $100 embedded product that has to handle video, this seems quite attractive.

      • rcarmo 30 days ago
        I’ve been trying to use rknn on RK3588 boards (which can go a bit higher in wattage due to the rest of the SOC and peripherals), and I get your point, but the overall landscape for non-GPU accelerators is still pretty much… just plain bad.

        I’m hoping for Ryzen APUs to be a good stopgap for larger models, or for enough support for Mali GPUs (which can add a little more compute) to be more usable, but in general, you have a huge abyss between “oh, that’s a face in that picture” and “here’s my current estimation of movement for this human”.

    • teaearlgraycold 30 days ago
      What do the cool kids use these days? I know YOLO is still relevant but I’m not sure what’s the new stuff.
      • rcarmo 30 days ago
        Well, I’ve been playing around with RK3588 boards. YOLO is still their reference example, and rknn and TFlite are just… too challenging to use for anything else, really.

        (I’ve been looking at timeseries and audio stuff, so I can sort of butcher my models to fit, but it’s still too small. And a GPU is still too power-hungry, etc. Am hoping for Ryzen APUs to be a useful stopgap.)

    • tucosan 30 days ago
      Plus, they have way too little memory for any meaningful LLM workloads that go beyond simple inference tasks or running yolo.
  • cjs_ac 30 days ago
    I find it interesting that the single-board computer market seems to be coalescing around the Raspberry Pi B models as the standard form factor. This device in particular has almost all of the same IO connectors as the Raspberry Pi 5 (one microHDMI port is missing) in all the same places, so it should be compatible with Raspberry Pi 5 cases. I wonder whether the pinout on the 40-pin header is the same as that on the Pi?

    I think most of us here are familiar with the fact that amd64 machines are made entirely from commodity parts: ATX cases, ATX power supplies, and so on. I wonder whether there's a similar commodification in the offing for the Pi form factor?

  • kitd 30 days ago
    4 TOPS-capable

    So, reach out & it'll be there?

  • synergy20 30 days ago
    Jetson Orin Nano has no video hardware encoder, probably no hardware decoder either, you need use CPU to do the heavy computing in 2024.

    Not sure about BeagleY-AI here, since it has DSP inside, probably also doing software encoding, using DSP instead of CPU though.

    For power efficiency, I would think the cheap hardware encoder is the way to go, surprised both are using software as the encoders.

  • sandwichukulele 30 days ago
    I don't know anything about the Beagle. How is it different than say a Raspberry pi 5? It says open source hardware but what does that actually mean. I'm curious about buying one but I want to understand them better first
    • dade_ 30 days ago
      Over 10 years later, I can say that the original BeagleBone was far more reliable than the original Raspberry Pi. The Texa Instruments specifications & documents for the processor were far better as well. I've had several Raspberry Pis just randomly die, but every one of the Beagle Bones are still going. I've heard similar feedback from people with IOT startups. RPis work great until some just randomly die.

      Today I find the quality of the Raspberry Pi is dramatically improved, the tooling to help a beginner get started is amazing, and of course there's just simply a much larger user base that have likely resolved the problems you'll encounter.

      However the real magic of a Beagle bone is listed in this features: Arm Cortex-R5 subsystem for low-latency I/O and control

      There are many use cases such as robotics, where there is a need real time control and Beagle bones have this capability built in, whereas with Raspberry Pi you'll see people connecting an Arduino and a Raspberry Pi together to meet this requirement. Its kludgy, burns tons of time, and it also adds up cost wise.

      In any case if you're just starting out, and you ever get interested in these things, know that you'll likely end up with many of them over the years. RPi is great for beginners, but if I was deploying something into the field as an IOT startup etc, I still prefer a Beagle.

      My 2c

      • synergy20 30 days ago
        beaglebone pru is not r5 MCU?
        • mastax 30 days ago
          But BeagleY-AI does have an R5 MCU.
    • smarx007 30 days ago
      The main difference is the Cortex-R5 subsystem. Beaglebone, the famed predecessor, had PRUs. These features paired with a powerful Cortex-A processor are really useful if you have a complex program that needs to do something (IO/control, for example) with a very high precision time-wise. If you have high punctuality demands but a rather simple algorithm, you can simply use an Arduino or Raspberry Pi Pico (RP2040). If you just have heavy load, you can use Raspberry Pi. One common use for Beaglebone was (is?) 3D printer control.

      As far as I am concerned, the AI acceleration in BeagleY-AI's DSP is just a fad (TI DSP itself is quite respected in the industry).

      I would probably say it's better to buy a Pi 4/5/Zero and one of RP2040/Arduino/micro:bit/Nucleo and master them separately, possibly with a UART/I2C/SPI link in between before moving on to a single package.

    • Retr0id 30 days ago
      https://openbeagle.org/beagley-ai/beagley-ai - at a glance, it looks like there's everything you need to be able to fab your own boards, if you really wanted to.
    • Gibbon1 30 days ago
      Similar but Beaglebones have flash drive controller which they boot off so they do better at not corrupting themselves on power cycling. I don't know about the BeagleY-AI but the Beaglebone black's power sequencer has won't fix issues that cause the board to fail to start if the power comes on too slow.

      Open source hardware means they give you schematics and the board layout as an Altium project. You can take that and customize it to fit your needs. When it comes to processors that can run embedded linux that's actually common. Advantage if you don't modify the layout of the high speed buses your prototype will likely just work.

    • a2800276 30 days ago
      Well it's certainly a good idea to get a better understanding of potential purchases. Unmotivated impulse purchases just clutter up your mind, your house and are bad for the environment.

      "If you have to ask" you're almost certainly better off with a rasp5 or probably 4 if you are looking for a mini PC to get some use out of that monitor you have flying around. Raspberry has a larger community and is therefore more accessible when starting out.

      If you're looking to get a better understanding of the "low-level" workings of computers, consider an Arduino.

    • moffkalast 30 days ago
      > Quad-core 64-bit Arm®Cortex®-A53 CPU subsystem at 1.4GHz

      It's slower.

      > Memory 4GB LPDDR4

      Has less and slower memory.

      > Arm Cortex-R5 subsystem for low-latency I/O and control

      Might use less power at idle, the Pi 5 is extremely power hungry when doing nothing (~5W).

      > Dual general-purpose C7x DSP with Matrix Multiply Accelerator (MMA) capable of 4 TOPs

      An AI chip with questionable software support.

      The rest seems about the same, it's loosely based on the same PCB layout.

      • kkielhofner 30 days ago
        > An AI chip with questionable software support.

        As many can attest with the random "ARM SBC of the day" the Raspberry Pi continues to reign supreme because the software and ecosystem is second to none in the ARM SBC space. You can Google "[insert project/task here] raspberry pi" and unless it's completely ridiculous you will have your pick of implementations, how-to guides, and things that are often packaged up really nicely for even the novice Raspberry Pi user.

        NPUs on ARM boards are nothing new. Like many people I have a graveyard of SBCs including things like the Khadas VIM3 (2019). If you Google "khadas npu" you will find that it basically only supports OpenCV and that took three years[0]. That's not nothing but compared to GPU it almost is.

        The BeagleBone ecosystem is likely the clear very distant second to Raspberry Pi. I'm hoping they do better.

        As we can see from the near-absolute dominance of CUDA (> 90% market share) software support and ecosystem matters much, much more than "X TOPS".

        [0] - https://opencv.org/blog/working-with-neural-processing-units...

        • moffkalast 30 days ago
          Speaking of CUDA, Nvidia's probably somewhere close to Beaglebone in support for their Jetsons. Though the Jetpack builds are all there is, so everyone's just resigned to dockerize anything they want to actually work on them lol.

          Getting the cheapest Orin Nano with 20 TOPS for ~2.5x the price of this is still probably the better choice for any proper inference.

  • Havoc 30 days ago
    Other boards have NPUs with 4 tops too plus more mem. Eg the orange pis.

    … but doesn’t look like any of the usual LLMs suspects can run on NPU so not sure it’s much use. I’ve seen some opencv code but that’s about it

  • shadowpho 30 days ago
    4GB of RAM seems low. No NVME even though it has PCIe :(
    • diggan 30 days ago
      Doesn't PCIe mean you could trivially add support for NVME though via an extension board?
      • shadowpho 30 days ago
        Huge PITA.

        Do they even support raspberry pi hats? If yes that's another $25, plus now it blocks your fan unless you get the proprietary sideways fan hat. It also won't mount into any cheap cases anymore.

        This means it goes from "hey I can buy this for $70, throw into a case with 128gb nvme and run stuff" to "ok so $70 + $25 hat + different fan hat + $40 to get taller clearances"

  • franciscop 30 days ago
    Beware of the documentation though, which is literally a blank page "coming soon!":

    https://docs.beagleboard.org/latest/boards/beagley/ai/

  • andrewmcwatters 30 days ago
    I think we need some sort of mlbench but for hardware ranking... I have no idea what these devices are capable of and neither do the vendors apparently.
    • phlipski 30 days ago
      MLCommons provides MLPerf which is a series of benchmarks anyone can run and compare results. For devices like the BeagleY-AI the "edge" category provides a series of various benchmarks to run covering things like object detection, speech recognition, and now smaller LLM's.

      https://mlcommons.org/benchmarks/inference-edge/

  • Double_a_92 30 days ago
    I can't wait for it to be sold out, or only being sold at shops that wan't like 40$ for international shipping...
  • RobotToaster 30 days ago
    Is the low level firmware open source?

    I know it isn’t on the rpi, which runs a proprietary broadcom version of microsoft threadx.

  • renewedrebecca 29 days ago
    With the two DSPs, seems like this would be a nice board to build a guitar effects pedal around.
  • T-A 30 days ago
    • sunshine-o 30 days ago
      Ok Microsoft, let's add a copilot key to our keyboards.

      But can we then remove the Windows key? unclear...

    • weikju 30 days ago
      reminds me of the "multimedia PC" of the 90s. But at least that branding didn't enforce cloud services and Microsoft telemetry
  • fithisux 30 days ago
    4GB RAM is not much.
    • moffkalast 30 days ago
      And it's not honest work either. LPDDR4 is now more expensive per GB and in smaller packages than LPDDR4X, plus it uses twice as much power. It's a lose-lose-lose situation not to use it.
  • RantyDave 30 days ago
    AI accelerator? Really? I thought AI acceleration in this day and age meant more than multiplying two matrices together. If not, what's CUDA all about?
    • imtringued 30 days ago
      You are confusing programmable shaders and GPGPU with AI.

      AI only needs matrix multiplication and activation functions to be accelerated. For everything else the GPU is already so fast there is no point in further improvements.

  • lproven 31 days ago
    I honestly thought this thing was going to run TOPS-10 or TOPS-20.

    I'm a bit disappointed.

    https://en.wikipedia.org/wiki/TOPS-10

    https://en.wikipedia.org/wiki/TOPS-20

    • johnohara 30 days ago
      Given the recent HN post about an update to the OpenVMS project I thought the same thing when I saw "TOPS-capable." Hmm, maybe some kind of port from long-retired propeller heads at DEC/Compaq/HP.

      Nope. Looked up "4 TOPS" and learned the famed Motown group was still performing though some of the original members had passed.

      Finally determined that "TOPS" did indeed mean "trillion operations per second."

      Wasn't so long ago these types of things were only found inside some mountain laboratory surrounded by heavily-guarded 5 mile restricted access perimeters.

      ~$70.00 sometime in June 2024. Don't tell me the industry isn't still sprinting all-out 54 years after TOPS-10 was first introduced.