Making Direct3D games faster in Wine using modern OpenGL

(comminos.com)

216 points | by BlackLotus89 2252 days ago

9 comments

  • platz 2252 days ago
    I'm fascinated by how someone with this domain level knowledge finds a problem like this and solves it.

    Is there some correlation between WoW players and wine/graphics programmers?

    Is this a game dev who happens to play WoW?

    Externally, the phenomenon of a gamer (i.e. user) having domain level knowledge of complex technical understanding is intriguing.

    (I'm partially motivated by the fact that as a generalist programmer I don't think id get anywhere the level of understanding needed to produce something like this)

    • zanny 2251 days ago
      This guy had a problem, slow fps in a game, knew the generalized view of how Wine worked, and used the tools they knew of to try to fix the problem.

      I've never written a line of DX, GL, etc but I know what command buffers / driver synchronization / AZDO are like this article mentions.

      I also play WoW on Linux, and am kind of embarassed I didn't think to try perf monitoring the game for easy to fix huge slowdowns like this. I kind of assumed since WoW is one of the most popular Wine games and generally pushes the DX api support to make sure it always works that the main Wine devs would optimize it more.

      That being said, buffer_storage is a GL 4.4 extension and Wine has this awful habit of trying to strictly support OSX, which will never see OpenGL beyond 4.1, and I'm not sure if buffer_storage is available there. That alone might mean these patches are never merged mainline, which would be... inconvenient.

    • bronxbomber92 2252 days ago
      Looks like he's just a talented under-grad student with some OpenGL side project experience.
  • twtw 2252 days ago
    "Fundamentally, it’s a function that maps a slice of GPU memory into the host’s address space, typically for streaming geometry data or texture uploads"

    Can someone clarify this for me? Are OpenGL/D3D buffers that get stuff memcpy'd into them by the CPU actually "slices of GPU memory," or are they more often reserved driver memory that eventually get DMA'd to the GPU? (I realize both probably happen at different times, but I'm curious which is more typical for modern systems)

    It seems like spending CPU cycles writing every byte over the bus would perform much worse than a fast write to sysmem followed by a DMA transfer.

    EDIT: I looked into it, and it seems like the typical implementation is that map returns a pointer to some pinned driver sysmem and unmap kicks off an async DMA to GPU memory.

    • elFarto 2252 days ago
      > Can someone clarify this for me? Are OpenGL/D3D buffers that get stuff memcpy'd into them by the CPU actually "slices of GPU memory," or are they more often reserved driver memory that eventually get DMA'd to the GPU?

      The answer is, as with all things OpenGL, it depends. You might get back a pointer to GPU memory that you can directly write to, or you'll get back some chunk of system memory the driver has.

      The ARB_buffer_storage extension improves matters as you can almost guarantee that you'll get GPU memory, and you can keep it mapped for the entire lifetime of your application (the old buffer APIs wouldn't let you keep it mapped during a draw call). The downside is that you're now responsible for synchronising access to that data.

      But as for "is it quicker?", maybe. DMA transfers aren't free, they take time to setup. Usually they need to operate from a limited pool of source memory. If the driver has to take a local copy of your data to copy it (which it will do for every glBufferData/SubData call), then you might as well copy it yourself, GPUs aren't hurting for PCIe bandwidth these days. In addition, you can use a separate thread/CPU core to do the copy, since unlike every other OpenGL call, mapping memory and memcpy'ing doesn't require an OpenGL context.

  • lostmsu 2252 days ago
    The site looks beautiful and loads insanely fast. Can we easily reuse the whole theme?
    • stickydink 2252 days ago
      It is delightfully simple, nice and clean.

      A pleasing font family, no JavaScript, some basic CSS, stick to basic HTML tags and use them properly.

      https://comminos.com/css/default.css

      • Tech-Noir 2252 days ago
        > no JavaScript

        Not to knock an otherwise nice site (CTRL & + improves readability for me personally, though), but:

            (function(d, s, id) {
              var js, fjs = d.getElementsByTagName(s)[0];
              if (d.getElementById(id)) return;
              js = d.createElement(s); js.id = id;
              js.src = 'https://connect.facebook.net/en_US/sdk.js#blahblahblah';
              fjs.parentNode.insertBefore(js, fjs);
            }(document, 'script', 'facebook-jssdk'));
        
        AFAIK, even for sites that want to feed the monster, that's unnecessary:

        http://chrisltd.com/blog/2015/04/social-share-like-buttons-w...

        https://sharingbuttons.io/

        • kbenson 2252 days ago
          So I decided to test the christld ones:

          Twitter seems to work (brings up form).

          Facebook redirects to an error.

          Linkedin seems to work (brings up form when logged in).

          Pinterest seems to work (brought up create board dialog when logged in)

          Google+ seems to work (brings up share form).

          I do have to say that for me that these would actually work and the JS likely wouldn't in some cases, since I make heavy use of Firefox's containers now to sandbox a lot of online identities, and just have new windows for certain URLs automatically load in the correct container.

    • TheCycoONE 2252 days ago
      Though it has a poor mobile experience due to the lack of a <meta name="viewport" tag.

      https://developer.mozilla.org/en-US/docs/Mozilla/Mobile/View...

    • andrepd 2251 days ago
      It's HTML and some simple CSS. It's enough to make a website look good, and it's fast (as webpages should be) on any moderately modern computer. Why don't we all do this again?
  • stefan_ 2252 days ago
    This a great and concise writeup, thanks.

    But I was left wondering what the actual problem was. Why is glBufferMap slow? Is it just the impedance mismatch between D3D and GL that don't have the same synchronization guarantees for that specific call? Why does Wine have it's own command handling thread when very likely, the underlying OpenGL driver has one, too?

    • Jasper_ 2252 days ago
      glMapBuffer doesn't have any ability to declare that you won't overwrite data. All you can say is whether you want read/write access to the buffer. So the driver has to assume that the client might overwrite in-flight data, so synchronization is required.

      As for the command stream handling, it makes decent sense to do translation up-front and your drawing commands into a command stream so a separate thread can just hammer through it as fast as possible, rather than doing GL calls in-line with the translation. Partly so the game can return to doing its thing as fast as possible, and partly to fix issues with GL's threading model being horrible ( see e.g. https://bugs.winehq.org/show_bug.cgi?id=24684 )

      • kllrnohj 2252 days ago
        > glMapBuffer doesn't have any ability to declare that you won't overwrite data.

        Well, yes and no. glMapBufferRange, which is basically a drop-in replacement for glMapBuffer, does have such a flag, it's GL_MAP_UNSYNCHRONIZED_BIT.

        glMapBufferRange requires OpenGL 3.0 whereas glMapBuffer exists all the way back to OpenGL 2.0 but this looks more like just an oversight in Wine than anything else.

    • twtw 2252 days ago
      This is an excellent question. The current wine implementation using OpenGL should mirror the behaviour of D3D, since it is using glMapBufferRange and passing GL_MAP_UNSYNCHRONIZED_BIT when possible. I am also wondering what is actually going on that is hurting performance.

      EDIT:

      After further research and more thought, I suspect that the "pipeline stall" doesn't involve waiting for the GPU to complete work using the buffer, just waiting for the driver. The map/unmap with overwrite or discard is working as intended, but the persistent buffer heap he implemented outperforms it because it reduces the number of calls into the driver required.

      I initially had the impression that the existing wine implementation was somehow deficient, but really what the author did was find a way to use the new persistent buffers feature to optimize D3D code using the older per-frame map/unmap method.

      This is in fact what the post essentially said (after a re-read), I just misunderstood and thought the pipeline stall was actually waiting for the GPU. The note in the post about the "GPU" line really being the driver is important.

      • acomminos 2252 days ago
        There's two main parts to the stall, which aren't well illustrated by the diagram (I'll get on updating it):

        1. Waiting for the resource to exit the command stream (wined3d_resource_wait_idle).

        2. Waiting for the CS thread to finish after the map (occurs in wined3d_cs_map).

        It's a pipeline stall because the D3D thread has to wait for the CS thread to do things, and thus is unable to dispatch more commands to the CS (and thus the GPU) during this time. I don't consider the actual glMapBufferRange to be part of the stall.

        Edit: saw your edit :)

        • twtw 2252 days ago
          Thanks for clarifying, that makes sense.

          I'm glad you saw the edit. I initially wrote it based on my first interpretation of what you wrote, but after re-reading I realized that your description was totally accurate, I just misunderstood.

          Great work, and thanks for writing it up.

  • garaetjjte 2252 days ago
    Another option is using gallium nine, which directly uses D3D state tracker in driver skipping GL layer. https://wiki.ixit.cz/d3d9 (though on nvidia nouvenau will be probably slower than propertiary GL driver)
    • stuaxo 2251 days ago
      Weird, since Nouveau itself is based on Gallium.
      • garaetjjte 2251 days ago
        I mean that D3D -> Nouveau could be still slower than D3D -> GL -> NVIDIA.
  • flafla2 2252 days ago
    Excellent writeup. I would be curious as to the specific considerations given to a heap allocator on the GPU. Related: I'm not too familiar with Wine patches - what is the easiest way to view the final source code of this patch?
  • Sytten 2252 days ago
    Very nice article and project! Have you talked with Wine people to see if you could eventually merge your patch in the official codebase? Seems like it would help a lot of us linux gamers :)
    • l1n 2252 days ago
      From the article: "I hope to mainline this once the patchset becomes more mature."
  • hawski 2252 days ago
    Wined3d probably can use Vulcan instead of OpenGL in the future. Did anyone start working on such a port?
  • asdfv09s9d80fu9 2252 days ago
    Really cool! Would love if the blog had RSS though!