Visualizing Large Datasets on the GPU with Vega and MapD

(mapd.com)

117 points | by tmostak 2465 days ago

5 comments

coherentpony 2465 days ago
> MapD uses Vega to drive the rendering engine directly on the result set of a SQL query without ever requiring the data to leave the GPU
How big is the dataset? If it can't ever leave the GPU then it is at most a few GB? Unless there are several GPUs at play then it's N * (a few GB). If there are a few GPUs at play then this dataset would fit into DDR3 RAM on a single mainstream Xeon node, or entirely into MCDRAM on a Xeon Phi node.
Please correct me if I'm wrong.
[-]
- tmostak 2465 days ago
  MapD customers typically run our product on multiple servers with multiple GPUs per node. So 4 servers with 8 Nvidia P40s each has 4X192GB = 768GB of VRAM. Note MapD compresses data and also keeps data in CPU RAM as needed. Even two servers with these GPUs or 4 servers with gamer GPUs is enough to query and visualize an 11B record shipping dataset without a hitch (https://www.mapd.com/demos/ships), a demo running on four servers with 8 Nvidia 1080 Tis each.
  Other customers with smaller datasets (i.e. less than a few hundred million records) are able to run with a single GPU.
  We're not going after petabyte size datasets (where <100ms querying is rarely important), so ability to scale has rarely been an issue.
  [-]
  - qeternity 2465 days ago
    I would love to see some comparisons to other mpp in memory databases as, and I mean this with all due respect, it's difficult to gauge what the impact of GPUs are. Have you benchmarked against anything like Memsql? Also, do you mind if I ask where you got the ship data from?
    [-]
    - ethikal 2464 days ago
      For some benchmarks, take a look here: http://tech.marksblogg.com/benchmarks.html
  - tmostak 2465 days ago
    Note that although we can run on CPU, CPUs do not have the graphics pipeline and the memory bandwidth necessary for interactive server-side visualizations like this.
    [-]
    - coherentpony 2465 days ago
      Cool, thanks for the explanation.
    - Faith_ 2465 days ago
      I'm making $86 an hour working from home. I was shocked when my neighbour told me she was averaging $95 but I see how it works now. I feel so much freedom now that I'm my own boss. This is what I do, ===http://www.millionaireprofit.cf/ - ( 3 ) -
  - dogma1138 2465 days ago
    Do you use Pascal's unified memory for over-provisioning? Since Maxwell only supports unified virtual memory addresses with the host upto the VRAM limit.
    [-]
    - tmostak 2464 days ago
      MapD predates good virtual (unified) memory support on GPUs and so we built our own caching mechanism where each GPU has its own buffer pool and pulls from a CPU buffer pool (i.e a network of buffer pools). This approach still has the advantage of giving us a lot of control of where we put data, how much we leave for other processes, as well as allowing us to pin data in VRAM on a specific GPU.
  - Harmony_ 2465 days ago
    Yeah it`s Possible......Anybody can earn 250$+ daily... You can earn from 6000-12000 a month or even more if you work as a full time job...It's easy, just follow instructions on this page, read it carefully from start to finish... It's a flexible job but a good eaning opportunity.. HERE►http://usawork.cn.to
- Asooka 2465 days ago
  AMD are working[1][2] on a graphics card that includes a 1TB SSD, which appears to your application as having 1TB of RAM and then just swaps to the SSD. I would say that's a pretty good approach to enabling virtual memory on the GPU - you get really fast swapping and you don't tax the PCI bus and CPU with moving data to and from GPU RAM. No release date AFAIK sadly.
  [1] https://www.amd.com/en-us/press-releases/Pages/amd-radeon-pr...
  [2] https://www.youtube.com/watch?v=g-8pMM2wV7k
greyskull 2465 days ago
At first, I thought it was referring to AMD's new Vega GPU family. I was hoping they found a particularly good use case for it.
[-]
- shusson 2465 days ago
  Honestly I find that Vega, the visualisation grammar, is much more interesting. Finally we might have a standard for describing charts!
- microcolonel 2465 days ago
  Yeah, I feel tricked frankly, given the proximate mention of GPUs.
  [-]
  - tmostak 2465 days ago
    Sorry, as the OP that wasn't the intention (the Vega rendering API has been around for some time and predates our porting it to GPUs).
    [-]
    - microcolonel 2465 days ago
      Fair enough, name collisions are becoming very hard to avoid.
  - jnbiche 2464 days ago
    Vega the visualization language has been around far longer than AMD's Vega.
jarmitage 2464 days ago
@tmostak large dataset visualisation like this looks great, but one of the most appealing parts of Vega for me is interaction. It's just as easy with Vega to create composite interactions for filtering and navigating data as it is to visualise it. Is there any scope for this type of architecture to support more than just serving rendered PNGs? (Can it do that at 60fps? :P)
[-]
- tmostak 2464 days ago
  We've considered video rendering using the H264 encoders on the GPU, another approach might be to create a format that has all info needed for interactions and that could be deciphered on the frontend (i.e in WebGL). So it's something we're definitely looking at.
  We already do provide some support for hover interactions in our charting library on top of Vega, but it would be nicer to do this in Vega.
  Right now customers still get a lot of value from server-side rendering simply bc we can render visualizations of billions of records interactively and in real time, which is difficult or impossible to achieve in other platforms without some sort of pre-computation.
  [-]
  - jarmitage 2464 days ago
    Thanks for your reply, and yes I can imagine your tools are already very useful. Are there any examples of real-time online? It would be interesting to see what your SoTA is!
    [-]
    - tmostak 2464 days ago
      We do support real-time ingest (helped by the fact that we do not need to index on insert). The only example we have online of that now is our Tweetmap demo (https://www.mapd.com/demos/tweetmap/)
- lmeyerov 2464 days ago
  We separated the analytics from rendering at Graphistry, connecting GPUs on both ends. You'll notice most of their visualizations aren't showing much geometry in the client: Leveraging that, we aim for 15fps+ on datasets 100X+ bigger than normal stuff in browsers, and soon, even more. And we're looking into including MapD as part of our backend GPU analytics stack... so, yes!
  And if you're into visualization engineering like that, we're hiring :)
edejong 2464 days ago
A lot of these visualisations would benefit greatly by using 2d or 3d kernel density estimates instead of a simple scatter plot. See for example: https://youtu.be/Xz_7Ej6JsMY
[-]
- tmostak 2464 days ago
  We can already do simple histograms (heatmaps). More complicated features such as Gaussian weighting and hexagonal bins are coming.
chenster 2465 days ago
Can I use it on a MacBook Pro? My concern is that it doesn't have a dedicated GPU.
[-]
- shusson 2464 days ago
  you can use MapD on your MacBook pro in CPU mode, but it will not be able to do any vega rendering because that requires a GPU.