Level of Gaussians: Real-Time View Synthesis for Millions of Square Meters

(zju3dv.github.io)

183 points | by corysama 20 days ago

12 comments

mrwyz 20 days ago
Cool, but not touching this; no license and requires Inria's proprietary rasterizer.
People should stop basing all of this new research on proprietary software, when we have open source implementations [1][2].
[1] gsplat: https://github.com/nerfstudio-project/gsplat [2] opensplat: https://github.com/pierotofy/opensplat
[-]
- jacoblambda 19 days ago
  It has a license now fwiw.
  It's a pretty basic "free for non-commercial use, contact us for commercial use" license.
  And the Inria rasterizer is not proprietary either. It's non-commercial open source with the option to purchase a commercial license.
  These are perfectly reasonable tech stacks for research projects to build off of. If you have an issue with the license, implement it yourself based on the papers (which all outline the necessary details to do so).
  [-]
- cubefox 20 days ago
  I'm surprised anything in 3D Gaussian splatting uses a rasterizer. I thought those were only used for polygonal data.
  [-]
  - reasonableklout 19 days ago
    Rasterization is actually why 3D Gaussian Splats have been so successful. Being able to render 3DGS scenes by iterating over the objects and drawing the pixels each one covers is much faster than ray-marching every pixel, which is how neural radiance fields (the last hot 3D reconstruction technology) are rendered.
  - VelesDude 20 days ago
    I mean technically rasterization means taking any vector data and plotting it in a 2D space... so I guess it is correct.
    But yes, I know what you are getting at. This would normally be done via a software/shader pipeline rather than a GPU's polygonal process.
blovescoffee 20 days ago
Crazy good results but without the paper (which the link at the time just goes back to the site) it's a bit difficult to check how good. What data is required, how long are training runs/how many steps?
[-]
- logtempo 20 days ago
  Using 200 photos taken with a conventional camera, at a refresh rate of 105 frames per second - the quality of video game images - the result gives the illusion of walking through the video. Better still, if you zoom in, you can see finer details, such as the spokes of a bicycle wheel, in excellent detail.
  It use neural network techniques, but it's not strictly using NN.
  it do the same result as Nerf from google in 30min, nvidia result in 7minute. It can achieve more than 100fps if you let it train longer.
  https://www.inria.fr/fr/3d-gaussian-splatting-vision-ordinat...
- was_a_dev 20 days ago
  The fact we have released code before the paper is wild. Typically the promise of open sourced code never comes to fruition
speps 20 days ago
Actual title is: Real-Time View Synthesis for Large Scenes with Millions of Square Meters
Which makes more sense than: Real-Time View Synthesis for Square Meters
[-]
- corysama 20 days ago
  Title edited. Thanks. I couldn't fit the whole title. But, didn't think I cut out "Millions of"...
londons_explore 20 days ago
Please Google, implement this in google maps (especially on mobile).
It's been over a decade and we're still stuck with 2D maps and boxy untextured buildings.
[-]
- astrange 20 days ago
  The reason there isn't much investment here is that it's expensive to update the image data and the result isn't very useful.
  You barely ever need to look at 3D photogrammetry buildings for anything and there aren't many questions it answers outside of curiosity.
  I do wonder if they could integrate street view images into it better.
  [-]
  - londons_explore 19 days ago
    Even old image data is pretty useful. If they could make a 3d view that seamlessly integrated satellite, plane, and street level imagery into one product, it would be a much better UX than having to manually switch to street view mode.
    [-]
    - astrange 19 days ago
      Well, almost all of satellite view is actually plane images. Satellite images aren't good enough resolution for 3D as far as I know.
      The other problem is you can only update them in sunny weather. So SAR is a lot more useful because it can see through clouds.
  - logtempo 19 days ago
    it could be a service for local uses: you select an area and ask Google to render it. Could be even premium service hehe
- cubefox 20 days ago
  Google uses texture mapped polygons instead of 3D Gaussians, so this wouldn't work for Google Maps. But there actually is a collection of libraries which does the same thing for polygonal data: https://vcg.isti.cnr.it/nexus/
  One of the guys working on this is Federico Ponchio. His 2008 PhD thesis, which provided the core insight for Unreal Engine's Nanite, is referenced at bottom.
  [-]
  - londons_explore 19 days ago
    > Google uses texture mapped polygons instead of 3D Gaussians,
    Time to switch I'd say...
    Polygons are a poor fit, especially for trees and windows and stuff that needs to be semitransparent/fluffy.
    I suspect the gaussians will compress better, and give better visual quality for a given amount of data downloaded and GPU VRAM. (the current polygon model uses absolutely loads of both, leading to a poor experience for those without the highest end computers and fast internet connections).
- bufferoverflow 20 days ago
  Google Maps has 3D (in some areas). Click on Layers -> More -> Globe view.
  Looks like this: https://i.imgur.com/wcCJmbd.png
  [-]
  - londons_explore 19 days ago
    but thats desktop not mobile.
    And when you zoom right into streets, storefronts and stuff are barely visible because they haven't properly integrated street level imagery.
- leodriesch 20 days ago
  I am really impressed by the Apple Maps implementation. I think it also uses textured polygons, but does so in a very good looking way and at 120 fps on an iPhone, showing even a whole city in textured 3d.
  [-]
  - martinkallstrom 20 days ago
    Apple bought a Swedish startup called C3 and their became 3D part of Apple Maps. That startup was a spin-off from Saab Aerospace, who had developed a vision system for terrain-following missiles. Saab ran a project with the municipal innovation agency in Linköping and the result was that they decided this tech should be possible to find civilian use cases for. C3 decided to fly small Cessnas in grids across a few major cities and also Hoover Dam, and built a ton of code on top of the already extremely solid foundation from Saab. The timing was impeccable (now many years ago) and they managed to get Microsoft, Apple and Samsung into a bidding war which drove up the price. But it was worth it for Apple to have solid 3D in Apple Maps and the tech has stood the test of time.
    [-]
    - dxjacob 19 days ago
      I remember seeing a Nokia or Here demo around that time that looked like similar or the same tech. Do you know anything published about it with technical details? Seems like enough time has passed that it would be more accessible. I would love to learn more about it.
jiggawatts 20 days ago
So this is just Level-of-Detail (LoD) implemented for Gaussian splats? Impressive results, but I would have figured this is an obvious next-step...
Also, is it bad that the first thing I thought of was that commanders in the Ukraine war could use this? E.g.: stitch together the video streams from thousands of drones to build up an up-to-date view of the battlefield?
littlestymaar 20 days ago
Gaussian splatting feel magical, and with 4D Gaussian splatting now being a thing, 3D movies that are actually 3D, and in which you can navigate could be a reality in the coming years. (And I suspect the first use-case will be porn, as usual).
[-]
- datascienced 19 days ago
  Movies can become games as well.
angusturner 20 days ago
Can anyone familiar with 3d graphics speculate what would be required to implement this into a game engine?
I'm guessing that adding physics, collision-detection etc. on top of this is non-trivial compared to using a mesh?
But I feel like for stuff like tree foliage (where maybe you don't care about collisions?), this would be really awesome, given the limitations of polygons. + also just any like background scenery, stuff out of the player's reach.
[-]
- modeless 20 days ago
  It's easy to render these in a game engine. I'm sure physics and collision detection are possible. The big, huge, gigantic issue is actually lighting.
  These scenes come with real world lighting baked in. This is great because it looks amazing, it's 100% correct, far better than the lighting computed by any game engine or even movie-quality offline ray tracer. This is a big part of why they look so good! But it's also a curse. Games need to be interactive. When things move, lighting changes. Even something as simple as opening a door can have a profound effect on lighting. Anything that moves changes the lighting on itself and everything around it. Let alone moving actual lights around, changing the time of day, etc.
  There's absolutely no way to change the baked-in lighting in one of these captures in a high quality way. I've seen several papers that attempt it and the results all suck. It's not the fault of the researchers, it's a very hard problem. There are two main issues:
  One, in order to perfectly re-light a scene you first have to de-light it, that is, compute the lighting-independent BRDF of every surface. The capture itself doesn't even contain enough information to do this in an unambiguous way. You can't know for sure how a surface would react under different lighting conditions than were present in the pictures that made up the original scan. Maybe in theory you can guess well enough in most cases and extrapolate, and AI can likely help a lot here, but in practice we are far away from good quality so far.
  Two, given the BRDF of all surfaces and a set of new lights, you have to apply the new lighting to the scene. Real-time solutions for lighting are very approximate and won't be anywhere near the quality of the lighting in the original scan. So you'll lose some of that photorealistic quality when you do this, even if your BRDFs are perfect (they won't be). It will end up looking like regular game graphics instead of the picture-perfect scans you want. If you try to blend the new lighting with the original lighting, the boundaries will probably be obvious. You're competing with perfection! Even offline rendering would struggle to match the quality of the baked-in lighting in these captures.
  To me the ultimate solution needs to involve AI. Analytically relighting everything perfectly is infeasible, but AI can likely do approximate lighting that looks more plausible in most cases, especially when trying to match captured natural lighting. I'm not sure exactly how it will work, but AI is already being used in rendering and its use will only increase.
  [-]
  - esperent 19 days ago
    You've elucidated very clearly an issue that I've been thinking about since the very first time I saw gaussian splats. The best idea I've had (besides "AI magic") is something like pre-calculating at least two different lighting states, e.g. door open and door closed, or midday and evening, and then blending between them.
    Do you know if anyone has tried this? Or otherwise, what're the best current attempts at solving it?
    [-]
    - jacoblambda 19 days ago
      There is at least one "large scale gaussian splatting" type paper that did splatting for a few city blocks and they used data from across the day to build the final model such that you could set the time of day and the model would roughly reflect lighting at that time.
  - angusturner 18 days ago
    Fascinating - thanks for the detailed reply. I can’t believe I failed to think of lighting but this makes so much sense.
    It’s almost like for a pure ML solution you would need a nerf (or similar) which is conditioned on the entire (dynamically changing) scene geometry and lighting positions?
    Graphics isn’t my area of ML though, so I’m sure there’s a lot of nuance that I don’t appreciate.
  - rallyforthesun 20 days ago
    Thanks for pointing out the challenges with gaussian splattings. Are there any AI based relighting methods out there? Some prompt based editing like nerf2nerf or Language-embedded NerFs maybe?
- corysama 20 days ago
  I worked in game engines for a long time. The main hurdle is just that it’s new. There’s a many-decade legacy pipeline of tools and techniques built around triangles. Splats are something new.
  The good news is that splats are really simple once they’ve been generated. Maybe simpler than triangles depending on how you look at it. It’s just a matter of doing the work to set up new tools and pipelines.
- Karliss 20 days ago
  Game physics are often using a separate mesh from the one used for rendering or even combination of primitive shapes anyway. So it doesn't matter how graphics part is rendered. No point wasting resources on details which don't affect gameplay, and having to much tiny collision geometry increase chance of having player stuck or snag against it.
- gct 20 days ago
  Given they're indexing into a tree, animation will be a pain.
  [-]
  - littlestymaar 19 days ago
    This particular implementation yes, but at the same time it's mostly for landscapes (or maybe scenes), but you don't need to use this kind of LOD stuff for the things you want to animate.
Retr0id 20 days ago
I hope the next-gen google earth looks something like this.
corysama 20 days ago
I got excited because the code was just released. But, apparently the paper is still not available? Sorry...
cchance 18 days ago
Every time I see these I wonder what if you use those massive 50 gigapixel images and a few of them and throw it into gaussians
lend000 20 days ago
Looks even better than Microsoft flight simulator. Awesome!
KaiserPro 20 days ago
Sorry to be naive, but isn't this basically applying pointcloud decimation to achieve dynamic level of detail?
Am I missing something or is there a new concept that doesn't exist in standard point cloud renderers?
[-]
- i_love_cookies 19 days ago
  [dead]