10 comments

  • cshenton 10 days ago
    You absolutely cannot implement stream compaction “at the speed of native” as WebGPU is missing the wave/subgroup intrinsics and globally coherent memory necessary to do that efficiently as possible.
    • tehsauce 10 days ago
      It's possible you might not need direct access to wave/subgroup ops to implement efficient stream compaction. There's a great old Nvidia blog post on "warp-aggregated atomics"

      https://developer.nvidia.com/blog/cuda-pro-tip-optimized-fil...

      where they show that their compiler is sometimes able to automatically convert global atomic operations into the warp local versions, and achieve the same performance as manually written intrinsics. I was recently curious if 10 years later these same optimizations had made it into other GPUs and platforms besides cuda, so I put together a simple atomics benchmark in WebGPU.

      https://github.com/PWhiddy/webgpu-atomics-benchmark

      The results seem to indicate that these optimizations are accessible through webgpu on chrome on both MacOS and Linux (with nvidia gpu). Note that I'm not directly testing stream compaction, just incrementing a single global atomic counter. So that would need to be tested to know for sure if the optimization still holds there. If you see any issues with the benchmark or this reasoning please let me know! I am hoping to solidify my knowledge in this area :)

    • FL33TW00D 10 days ago
      Oh look it's subgroup support landing last week: https://github.com/gfx-rs/wgpu/pull/5301
      • jsheard 10 days ago
        That's a wgpu-specific extension, not part of the actual WebGPU spec, so you can't use it on the web.

        https://github.com/gpuweb/gpuweb/blob/main/proposals/subgrou...

        There is a proposal for supporting subgroups in WebGPU proper but it's still in the draft stage.

        • FL33TW00D 10 days ago
          I'm aware. It is an implementation of the linked proposal.

          The `wgpu` implementation linked will make its way into Firefox eventually. Dawn will follow up with a similar one for Chrome.

          I was linking it to demonstrate there are no technical hurdles and it's only really approval remaining.

          • sp332 10 days ago
            Ok, but that's not what "landing" means.
      • pjmlp 10 days ago
        Native extensions unusable on Web browsers don't count.
        • littlestymaar 10 days ago
          Then nothing involving WebGPU counts since it's not implemented on other browsers than Chromium and not on Linux even in Chromium…

          WebGPU is brand new, and the paint is still wet. It doesn't make sense to dismiss things that haven't landed in browsers yet as “unusable on the web”.

          • mr_toad 10 days ago
            There’s an advanced setting in Safari to enable it, but I can’t say how well it works. In this instance it doesn’t.
          • pjmlp 9 days ago
            Welcome to Web standards, and Google's ChromeOS transformation of the Web, with help of many Web developers out there.

            Doesn't change the fact that is a Web standard, for Web browsers.

            • littlestymaar 9 days ago
              It is a WIP web standard. And the spec is still evolving most things are stable at that points, but new features are still being added, like this one!).

              And that's how the web works, it was the same for WebRTC which spent 2-3 years in such a state, same for MSE, etc.

    • torginus 9 days ago
      I think compilers should be smart enough to substitute group-shared atomics with horizontal ops. If it's not already doing it, it should be!

      But anyways, Histogram Pyramids is a more efficient algorithm for implementing parallel scan anyways. It essentially builds a series of 3D buffers, each having half the dimension of the previous level, and each value containing the sum of the amounts in each underlying cells, with the top cube being just a single value, the total amount of cells.

      Then instead of doing the second pass where you figure out what index thread is supposed to write to, and writing it to a buffer, you just simply drill down into said cubes and figure out the index at the invocation of the meshing part by looking at your thread index (lets say 1526), and looking at the 8 smaller cubes (okay, cube 1 has 516 entries, so 1100 to go, cube 2 has 1031 entries, so 69 to go, cube 3 has 225 entries, so we go to cube 3), and recursively repeat until you find the index. Since all threads in a group tend go into the same cubes, all threads tend to read the same bits of memory until getting down to the bottom levels, making it very GPU cache friendly (divergent reads kill GPGPU perf).

      Forgive me if I got the technical terminology wrong, I haven't actually worked on GPGPU in more than a decade, but it's fun to not that something that I did cca 2011 as an undergrad is suddenly relevant again (in which I implemented HistoPyramids from a 2007ish paper, and Marching Cubes, an 1980s algorithm). Everything old is new again.

    • masspro 10 days ago
      You seem knowledgeable, and I’m possibly going back into a GPGPU project after many years out of the game, so: overall do you see a good future for filling these compute-related gaps in the WebGPU API? Really I’m wondering whether wgpu is an okay choice versus raw Vulkan for native GPGPU outside the browser.
      • jsheard 10 days ago
        The answer to that for any given feature is "can untrusted code be trusted with that?". Wave intrinsics are probably doable. Bindless maybe, but expect a bunch of bounds checking overhead. Pointers/BDA, absolutely not.

        Native libraries like wgpu can do whatever they want in extensions, safety be damned, but you're stepping outside of the WebGPU spec in that case.

        • littlestymaar 10 days ago
          What's BDA in that context, please? I can only confidently assume it's not “battle damage assessment”.
          • jsheard 10 days ago
            Buffer Device Address, the Vulkan name for raw pointers in shaders.
      • tormeh 10 days ago
        Don't know about GPGPU, but can give you a probably correct answer: Compared to "native" APIs you trade features for compatibility. It's always going to lag behind Vulkan/DX/Metal. Are you ok with excluding platforms? Vulkan/Metal/DX. If not, then I'd give wgpu a chance. Wgpu is also higher-level than Vulkan, which is borh a pro and a con.
        • pjmlp 9 days ago
          Middleware, the portability, latest features of native APIs, and nice GPGPU tooling.
    • dekhn 10 days ago
      shhh... you might cause their reality distortion field to fail!
      • Archit3ch 10 days ago
        The demo doesn't work on mobile Chrome. Worse, the blog post crashes the embedded browser in the HN app. May I suggest just linking to the demo instead?
    • spintin 10 days ago
      This is the eternal browserbros. attempt to make us think native has zero value now that we have a completely captured and bloated browser.

      The browser is dead, the only thing you can use it for is filling out HTML forms and maybe some light inventory management.

      The final app is C+Java where you put the right stuff where it is needed. Just like the browser used to be before Oracle did it's magic on the applet.

      • worik 10 days ago
        > The browser is dead,

        Yea. Nah!

        That obit is a bit premature

      • teaearlgraycold 10 days ago
        So you're telling me you write Java professionally?
        • pdpi 10 days ago
          Funnily enough, in a world with WASM, we might actually have Java in the backend and C in the frontend rather than vice versa as it would've been likelier in the 90s.
          • pjmlp 9 days ago
            The irony of half world backed by VC money, trying to reinvent Erlang, Java and .NET application servers, while pretending to be innovative.
          • spintin 10 days ago
            WASM is adding GC... recreating the wheel of the applet but without escaping the problem of javascript glue.

            Go is just Java without the WM.

            Rust is just a native compiler that creates slow programs and complains a lot.

            • worik 10 days ago
              > Rust is just a native compiler that creates slow programs and complains a lot.

              Good morning Troll

              I'll give you "complains a lot."

            • neonsunset 9 days ago
              Corrective upvote from me - the comment is too funny
            • junon 9 days ago
              You had me all the way up until the rust bit.
        • spintin 10 days ago
          It's pretty much the only professional language you can write.

          If you consider respect and responsibility.

  • SuboptimalEng 10 days ago
    Ah, so that's how you do it. Having a template for WebGPU projects is a good idea. I'll have to do the same so I don't waste time setting up web graphics projects.

    Cool project btw! Adding this to my long list of graphics blogs to read.

  • lukko 10 days ago
    Hmm, why does the hydrogen atom look like a d orbital?

    Isn't it 1s1 in the ground state so the probability distribution would look like a sphere.

    • plus 10 days ago
      I assume it's just mislabeled, it's a high-angular-momentum hydrogenic orbital, chosen because it looks cool and because it's trivial to evaluate (a spherical harmonic times a simple radial term).
  • bhouston 9 days ago
    nice! (I notice WebGPU support is now up to somewhere around 60% of all browsers: https://web3dsurvey.com/webgpu)
  • codewiz 9 days ago
    WebGPU has been under development since 2017, and has been a working draft since 2021. What issues are holding the W3C from publishing the final standard? Is there a timeline?
  • pjmlp 10 days ago
    Only if we are talking about the state of native in ca 2015.
  • dailykoder 9 days ago
    Nice, now we just need to be able to directly boot into chrome without that OS bloat around it.
  • worik 10 days ago
    Exciting!

    But: "Error: Your browser does not support WebGPU"

    Sigh

    • jsheard 10 days ago
      Chrome and Chromelikes are still the only browsers shipping stable WebGPU, on Firefox it's behind a flag, and on Safari it's only on the TP branch. Then on Chrome it's not available on Linux yet, only Windows and Android, and only on a subset of Android GPUs.

      We have a way to go yet.

      • capitainenemo 10 days ago
        I was trying it out in nightly firefox, and regardless of the webgpu flags I tried, it still errored:

            Shader '' parsing error: the type of `SCAN_BLOCK_SIZE` is expected to be `u32`, but got `i32`
            10 │ @id(0) override SCAN_BLOCK_SIZE: u32 = 512;
               │                 ^^^^^^^^^^^^^^^ definition of `SCAN_BLOCK_SIZE`
  • spxneo 10 days ago
    but somebody told me a while back on here that WebGPU was outdated? whats the consensus ?
    • chrysoprace 10 days ago
      You may be thinking of WebGL. My understanding is that WebGPU is effectively supposed to supersede WebGL.
    • adfm 10 days ago
      WebGPU is most definitely not outdated. It's a unified interface for all things floating point. From the datacenter to the watch on your wrist. However, most folks not deep into the inner workings will ever touch it. What it does do is close the door on the App Store model. Apple already knows this, which is why we have the AVP.
    • ramon156 10 days ago
      Why would wGPU be outdated if it was in the middle of being supported for browsers?
      • spxneo 10 days ago
        this is what I want to know but someone said it was not suitable for running GPU powered games on here i will see if i can dig up the thread
        • pjmlp 9 days ago
          Depends on how much GB you want to force into users browsers, plan to work around browser blacklists and deep variation in API support, and protect the game code.

          There is a reason why in 2024, there is yet to exist a WebGL 2.0 game that can match Infinity Blade from 2011, the game used by Apple to demo iPhone's OpenGL ES 3.0 capabilities.

          WebGPU on top of that, is Chrome only for the time being, still years away from a sound 1.0 release on Safari and Firefox.

        • lukan 9 days ago
          It sounds like that someone is really just "someone". And people online have lots of opinions. Not all worth reading.

          But that someone probably just said, that with WebGPU you do not get the power you would have with a native feature set and this is true. So we likely won't see AAA games anytime soon in the browser. But it is definitely suitable for games in general.

  • superkuh 10 days ago
    Nice. Web browser exploits at the speed of GPU driver bugs. A fount that will never run dry.
    • tkzed49 9 days ago
      lame. gpus are sick, and the web is cool.