Visualizing Large Datasets on the GPU with Vega and MapD

(mapd.com)

117 points | by tmostak 2465 days ago

5 comments

  • coherentpony 2465 days ago
    > MapD uses Vega to drive the rendering engine directly on the result set of a SQL query without ever requiring the data to leave the GPU

    How big is the dataset? If it can't ever leave the GPU then it is at most a few GB? Unless there are several GPUs at play then it's N * (a few GB). If there are a few GPUs at play then this dataset would fit into DDR3 RAM on a single mainstream Xeon node, or entirely into MCDRAM on a Xeon Phi node.

    Please correct me if I'm wrong.

    • tmostak 2465 days ago
      MapD customers typically run our product on multiple servers with multiple GPUs per node. So 4 servers with 8 Nvidia P40s each has 4X192GB = 768GB of VRAM. Note MapD compresses data and also keeps data in CPU RAM as needed. Even two servers with these GPUs or 4 servers with gamer GPUs is enough to query and visualize an 11B record shipping dataset without a hitch (https://www.mapd.com/demos/ships), a demo running on four servers with 8 Nvidia 1080 Tis each.

      Other customers with smaller datasets (i.e. less than a few hundred million records) are able to run with a single GPU.

      We're not going after petabyte size datasets (where <100ms querying is rarely important), so ability to scale has rarely been an issue.

      • qeternity 2465 days ago
        I would love to see some comparisons to other mpp in memory databases as, and I mean this with all due respect, it's difficult to gauge what the impact of GPUs are. Have you benchmarked against anything like Memsql? Also, do you mind if I ask where you got the ship data from?
      • tmostak 2465 days ago
        Note that although we can run on CPU, CPUs do not have the graphics pipeline and the memory bandwidth necessary for interactive server-side visualizations like this.
        • coherentpony 2465 days ago
          Cool, thanks for the explanation.
        • Faith_ 2465 days ago
          I'm making $86 an hour working from home. I was shocked when my neighbour told me she was averaging $95 but I see how it works now. I feel so much freedom now that I'm my own boss. This is what I do, ===http://www.millionaireprofit.cf/ - ( 3 ) -
      • dogma1138 2465 days ago
        Do you use Pascal's unified memory for over-provisioning? Since Maxwell only supports unified virtual memory addresses with the host upto the VRAM limit.
        • tmostak 2464 days ago
          MapD predates good virtual (unified) memory support on GPUs and so we built our own caching mechanism where each GPU has its own buffer pool and pulls from a CPU buffer pool (i.e a network of buffer pools). This approach still has the advantage of giving us a lot of control of where we put data, how much we leave for other processes, as well as allowing us to pin data in VRAM on a specific GPU.
      • Harmony_ 2465 days ago
        Yeah it`s Possible......An­­­yb­­­od­­­y c­­­an e­­­ar­­­n 250$+ da­­­il­­­y... Y­­­o­­­u c­­­an e­­­arn f­­­ro­­­m 6000-12000 a m­­­on­­­th o­­­r ev­­­en m­­­or­­­e if y­­­ou w­­­or­­­k a­­­s a­­­ fu­­­ll ti­­­me j­­­ob...I­­­t's ea­­­sy, j­­­us­­­t fo­­­ll­­­ow in­­­str­­­ucti­­­ons o­­­n th­­­is pa­­­ge, re­­­ad i­­­t car­­­ef­­­u­­­l­­­ly f­­­ro­­­m st­­­ar­­­t t­­­o f­­­in­­­i­­­sh... I­­­t's ­­­a fl­­­exi­­­b­­­l­­­e j­­­o­­­b b­­­u­­­t a g­­­o­­­o­­­d e­­­a­­­n­­­i­­­ng o­­­pp­­­or­­­t­­­u­­­n­­­i­­­t­­­y.. HERE►http://usawork.cn.to
    • Asooka 2465 days ago
      AMD are working[1][2] on a graphics card that includes a 1TB SSD, which appears to your application as having 1TB of RAM and then just swaps to the SSD. I would say that's a pretty good approach to enabling virtual memory on the GPU - you get really fast swapping and you don't tax the PCI bus and CPU with moving data to and from GPU RAM. No release date AFAIK sadly.

      [1] https://www.amd.com/en-us/press-releases/Pages/amd-radeon-pr...

      [2] https://www.youtube.com/watch?v=g-8pMM2wV7k

  • greyskull 2465 days ago
    At first, I thought it was referring to AMD's new Vega GPU family. I was hoping they found a particularly good use case for it.
    • shusson 2465 days ago
      Honestly I find that Vega, the visualisation grammar, is much more interesting. Finally we might have a standard for describing charts!
    • microcolonel 2465 days ago
      Yeah, I feel tricked frankly, given the proximate mention of GPUs.
      • tmostak 2465 days ago
        Sorry, as the OP that wasn't the intention (the Vega rendering API has been around for some time and predates our porting it to GPUs).
        • microcolonel 2465 days ago
          Fair enough, name collisions are becoming very hard to avoid.
      • jnbiche 2464 days ago
        Vega the visualization language has been around far longer than AMD's Vega.
  • jarmitage 2464 days ago
    @tmostak large dataset visualisation like this looks great, but one of the most appealing parts of Vega for me is interaction. It's just as easy with Vega to create composite interactions for filtering and navigating data as it is to visualise it. Is there any scope for this type of architecture to support more than just serving rendered PNGs? (Can it do that at 60fps? :P)
    • tmostak 2464 days ago
      We've considered video rendering using the H264 encoders on the GPU, another approach might be to create a format that has all info needed for interactions and that could be deciphered on the frontend (i.e in WebGL). So it's something we're definitely looking at.

      We already do provide some support for hover interactions in our charting library on top of Vega, but it would be nicer to do this in Vega.

      Right now customers still get a lot of value from server-side rendering simply bc we can render visualizations of billions of records interactively and in real time, which is difficult or impossible to achieve in other platforms without some sort of pre-computation.

      • jarmitage 2464 days ago
        Thanks for your reply, and yes I can imagine your tools are already very useful. Are there any examples of real-time online? It would be interesting to see what your SoTA is!
        • tmostak 2464 days ago
          We do support real-time ingest (helped by the fact that we do not need to index on insert). The only example we have online of that now is our Tweetmap demo (https://www.mapd.com/demos/tweetmap/)
    • lmeyerov 2464 days ago
      We separated the analytics from rendering at Graphistry, connecting GPUs on both ends. You'll notice most of their visualizations aren't showing much geometry in the client: Leveraging that, we aim for 15fps+ on datasets 100X+ bigger than normal stuff in browsers, and soon, even more. And we're looking into including MapD as part of our backend GPU analytics stack... so, yes!

      And if you're into visualization engineering like that, we're hiring :)

  • edejong 2464 days ago
    A lot of these visualisations would benefit greatly by using 2d or 3d kernel density estimates instead of a simple scatter plot. See for example: https://youtu.be/Xz_7Ej6JsMY
    • tmostak 2464 days ago
      We can already do simple histograms (heatmaps). More complicated features such as Gaussian weighting and hexagonal bins are coming.
  • chenster 2465 days ago
    Can I use it on a MacBook Pro? My concern is that it doesn't have a dedicated GPU.
    • shusson 2464 days ago
      you can use MapD on your MacBook pro in CPU mode, but it will not be able to do any vega rendering because that requires a GPU.