Create mixed reality models in PowerShell

(cosmosdarwin.com)

173 points | by cosmosdarwin 2294 days ago

7 comments

  • oelmekki 2293 days ago
    Cool tech. If anything, it shows that windows MR tools are mature enough to be used in most languages... on Windows.

    I know one can work on projects that target Hololens using only Unity3D, but it doesn't help much since the SDK needed to build them are only available on Windows.

    Given the "recent" interest for Linux at Microsoft and their mea culpa regarding Internet Explorer, I would have hoped they made their Hololens SDK crossplatform. Platform lock-in worked well for iOS, but then again, I wonder if Android would have had so much success if it hadn't the only crossplatform mobile SDK (I wonder if I'm not mistaken, here: did Blackberry have a crossplatform SDK?).

    Anyway, it's cool to see them going forward. Best of luck.

  • throwaway7645 2293 days ago
    This is pretty impressive. Powershell has been pretty slow for a lot of my uses lately. The author said this took ~30 secs. How long would a similar script take in Python? They're both slow on the language performance spectrum, but I bet Python would be significantly faster if anyone is less lazy than me to actually write some code.
    • da_chicken 2293 days ago
      PowerShell is slow, but writing to files the way he is doing is just a bad pattern. He uses Add-Content not just in loops, but in nested loops. The problem with that is that each time Add-Content is called the system opens the file, adds the content, flushes to disk, and then closes the file. That's a lot of overhead.

      In his wine glass script he also uses the PowerShell patten of creating an array ($x = @()) and then appending to it (foreach ($i in $set) { $x += $value * $i }). The problem here is that PowerShell arrays are fixed in size. To append a value to an array, the system creates a new array, copies all the values over with the new one, and then disposes of the old array. It works fine up to about 100 items, but it gets noticeably slow after that. Since it's done with a loop it makes it somewhere between O(n log n) and O(n^2). It's better to just output all the values as a single array ($x = $set | ForEach { $value * $_ }) or to create an ArrayList or List<String> or some C# collection that supports an O(1) append.

      He also assigns a lot of variables in his loops and then uses them only once to format a string. He could eliminate those variables and just embed the expressions in the strings.

      I got a ~25% performance improvement (210 ms to under 140 ms) with just the script embedded in the article when I switched to StringBuilder instead of Add-Content, and that script doesn't have the poor array pattern. StreamWriter would work, too, with less memory pressure than a StringBuilder. I suspect that with better code you could easily get this down below 3 seconds.

      • lburton 2293 days ago
        I got a lot more than a 25% improvement.. you seem to have a much quicker machine or perhaps you were running one of the earlier scripts? For the wine glass script it went from >2 minutes to 1.5 seconds when output buffered. https://github.com/cosmosdarwin/obj-in-powershell/pull/1

        Also for those that haven't come across it https://github.com/dlwyatt/PowershellProfiler is pretty useful.. despite add-content being the obvious offender here..

        • da_chicken 2293 days ago
          I have a 10 year old PC. The 210ms to 140 ms time was with the single cylinder script, which is much shorter, and the < 3 second prediction was for the wine glass script. I just guessed by looking that it wouldn't be hard to improve by an order of magnitude. Sorry if I wasn't clear!
        • juststeve 2292 days ago
          get-content is very slow as well
      • throwaway7645 2293 days ago
        Thanks for the analysis. This is one of my main PS issues in that all the obvious ways are in fact wrong. I tried 3-4 different file write methods before and all were unbearably slow.
        • ygra 2293 days ago
          I wouldn't say any obvious way is wrong. Dropping down to the .NET mechanisms should be more of a last resort if performance is really abysmal for some reason, but apart from that, common sense applies. As they said, opening, writing to, and closing a file repeatedly in a loop is a stupid idea. I wouldn't say that's an obvious way to do file I/O in PowerShell.

          On that note, what are the obvious ways you've tried?

          • throwaway7645 2292 days ago
            Agreed that having to drop to .NET is a fail. So usually when trying to learn to do file I/O in a new language you google it. The first few examples I came across were slow and when I looked into it, it was doing as this example did. It's been awhile, but it wasn't obvious how to do it with a stream without resorting to a C# esque version which at that point, you might as well use C#. I'm not sure why there isn't a PS oneliner option like Python's with statement. I really like the idea of PS, but some simple things are more trouble than they're worth.
            • ygra 2292 days ago
              Well, the most natural way would be to use the pipeline. Instead of writing to the file in the innermost ForEach-Object, just pipe the whole pipeline into Out-File. I'd say it's the most straightforward and natural way, considering that PowerShell is a shell.
            • juststeve 2292 days ago
              > Agreed that having to drop to .NET is a fail.

              So many times i run into performance walls with powershell, that calling the .net library directly is basically required.

      • juststeve 2292 days ago
        piping to foreach-object is super slow as well, it's much faster to use foreach($x in $y) {}
        • da_chicken 2292 days ago
          Here, yes, I agree that the foreach statements would be better, possibly even significantly, but in a general case the difference is usually not worth bothering with.

          While ForEach-Object is like an order of magnitude slower, we're talking like 100 ms vs 10 ms to iterate through 10,000 items. (https://blogs.technet.microsoft.com/heyscriptingguy/2014/07/...) In other words, you have to have a good number of objects to make foreach meaningfully faster that when you do, you often find that either a) lose time to powershell.exe allocating more system memory to store the collection, or b) the loop itself is two or three orders of magnitude slower already so it's ultimately a trivial optimization.

          Also, the foreach statement waits to allocate the entire set to memory ($y in your example) while ForEach-Object begins processing as soon as the first object comes through the pipeline. If you're using the output of a command for $y that returns output with some lag, you might find that ForEach-Object actually turns out to be faster because the first objects start being processed immediately. If you're returning data over a network or dealing with particularly large objects, ForEach-Object can be better.

          Finally, the foreach statement itself doesn't work with pipelines either as input or output, so it's not appropriate for a lot of scenarios.

          • juststeve 2292 days ago
            Is this based on experience..? I would disagree..

            The pipeline is powerful, but it's slow. There's a meaningful performance gain I've seen with foreach(), and even more so when loops need to be nested. I believe the speed increase is because the runtime/clr/jit can infer the types contained within the collection before it actually starts executing the statement. yes, using generic collections with foreach helps as well, because (i think) powershell can completely avoid redetermining what every type of object is at execution time (as the collection is generic underneath).

            The downside is readability and memory usage as you pointed out, but i could argue that piping to foreach-object can also be painful to read, and many times the loop could be refactored into a proper function and given a meaningful name....

            • da_chicken 2291 days ago
              Yes, based on my experience it really isn't worth bothering with.

              Compare:

                Measure-Command {
                    # ~4,000 files
                    $Files = Get-ChildItem -Path "C:\Windows\System32\DriverStore" -File -Recurse 
                
                    foreach ($file in $Files) {
                        $file.LastWriteTime.Date;
                    }
                } | Select-Object -Property TotalMilliseconds
                
                Measure-Command {
                    Get-ChildItem -Path "C:\Windows\System32\DriverStore" -File -Recurse | ForEach-Object {
                        $_.LastWriteTime.Date;
                    }
                } | Select-Object -Property TotalMilliseconds
              
              On my system and run hot (i.e., after running both multiple times), the foreach statement takes 790-855 ms. The ForEach-Object takes 860-890 ms. That's a 10% cost at most, and this particular operation is trivial. Is that "meaningful"? In some senses, yes, because 10% is a lot, but realistically, no because the scripts both run in less than a second. I'm not writing an application here.

              However, let's take something decidedly non-trivial:

                Measure-Command {
                    # ~4,000 files
                    $Files = Get-ChildItem -Path "C:\Windows\System32\DriverStore" -File -Recurse 
                    $Algorithm = @('MD5','SHA1','SHA256')
              
                    foreach ($file in $Files) {
                        Get-FileHash -Path $file.FullName -Algorithm ($Algorithm[(Get-Random -Minimum 0 -Maximum 3)]);
                    }
                } | Select-Object -Property TotalMilliseconds
                
                Measure-Command {
                    $Algorithm = @('MD5','SHA1','SHA256')
                    Get-ChildItem -Path "C:\Windows\System32\DriverStore" -File -Recurse | ForEach-Object {
              
                        Get-FileHash -Path $_.FullName -Algorithm ($Algorithm[(Get-Random -Minimum 0 -Maximum 3)]);
                    }
                } | Select-Object -Property TotalMilliseconds
              
              
              Now, run hot, foreach takes an average of 20.3 seconds over three runs. ForEach-Object takes an average of 20.8 seconds over three runs. Run cold (i.e., after a reboot), they both take about 63-65 seconds. This is on a slightly aging laptop with a spinning metal disk.

              Most of my workloads involve calling commands that involve significant time like Get-FileHash does. I'm doing things like splitting PDFs based on content and inserting them into an SQL database, or fetching 8,000 records from Active Directory and verifying file share permissions. I have found that for nested loops that I tend to use a ForEach-Object in the pipeline and then use foreach statements for operations on arrayed properties in the object and that works well, but for the outer loop, no, it's not usually worth the time to refactor or eliminate the pipeline.

              • juststeve 2291 days ago
                ok, you've partially convinced me :)

                i mean, if the code is 90% I/O bound i agree: micro-optimisations can be a waste of time. But tight, nested loops that are only memory bound i think is worth the effort.. so for me, if i need need to do a lot of field comparisons for 300k accounts or groups from AD, i will download the ad data into memory and use foreach(), hashtables and generic collections wherever possible.. i actually think were making the same point anyway. I now reach for C# before powershell for situations like above where performance matters, but powershell does have advantages for smashing out some quick work. But i honestly cringe when i hear people talk about powershell scripts that take minutes to execute for tasks that should be reasonably quick.

                • da_chicken 2291 days ago
                  No, I'm right there with you. Almost all my tasks are I/O bound by local disks, network disks, or some form of network data store. The more I get into PowerShell and begin using complex tasks, the more I find I'm really just writing C# code.

                  And, yes, generally the easiest path to faster performance is just loading everything into memory and working with it there. In my PDF script I need to validate that the ID numbers are being read are valid, so I pull all 20,000 of them and put them into a HashSet<String>. That's extremely fast to validate with compared to an array and performs a bit better than a HashTable. There are times when I need to use an SqlDataReader or StreamReader and read row by row or line by line (4 GB+ text files suck), but that's only when memory has become a problem.

                  I've been meaning to really dig into the High Performance PowerShell with LINQ article (https://www.red-gate.com/simple-talk/dotnet/net-framework/hi...), but I just don't have a need for it at present unless I refactor an existing script and they all work great.

                  • juststeve 2290 days ago
                    Yep, it really depends on what you're trying to do hey.. I mean, fancy powershell can't out perform good SQL either if the data is already sitting in a database... Thanks for the link though, those LINQ queries are super fast. i would like to incorporate this into some code at work, but dont quite have the need for it right at this moment.
    • mpw222 2293 days ago
      PowerShell is terribly slow compared to basically anything, but the slow bit here is probably the I/O. Each Add-Content call is a opening appending and flushing to the file, in the inner loop. The fact that Foreach-Object is vastly slower than the foreach keyword and that Add-Content itself is much slower than System.IO doesn't help, but I suspect this is basically I/O and the fact that this type of file access pattern tends to drive AV software crazy. Using a StreamWriter and getting the associated buffering would probably be a lot faster.
    • marviel 2293 days ago
      Seems a good place as any to mention the superb Trimesh library in python[1], which makes working with and creating .stl and .obj files very easy.

      [1]https://github.com/mikedh/trimesh

    • na85 2293 days ago
      On my old i7 powered ThinkPad, generating flow data from a 2D airfoil modeled using vortex panels was nearly instantaneous using Python. Can't imagine that generating 40 vertices would be taxing for it.
  • dingo_bat 2293 days ago
    I never knew mixed reality viewer was so awesome!
  • wodenokoto 2293 days ago
    Who's Jeffrey and why should he be proud?
    • ygra 2293 days ago
      Jeffrey Snover, creator of PowerShell, I guess.
  • k_sze 2293 days ago
    The real lesson here is to make sure you have a real programming language development environment on any laptop you bring with you.

    I’m only kidding, of course.

    This post is actually quite cool.

    • Udik 2292 days ago
      > make sure you have a real programming language development environment on any laptop you bring with you

      Or just write Javascript and execute it on your browser.

  • Udik 2292 days ago
    > What apps do I have? Aha! PowerShell.

    Yep. And a browser that will execute any javascript at breakneck speed.

  • oblio 2293 days ago
    This should be unflagged, IMO.
    • darklajid 2293 days ago
      I agree.

      There's a vouch link for bad comments. Is there something similar for stories? Can a number of people "unflag" this?

      • oblio 2293 days ago
        If there is, I don't see it :(
        • mschuster91 2293 days ago
          Me neither, despite >6k points... @dang, is there a way "vouch" can be implemented for ordinary users?
          • 2510c39011c5 2293 days ago
            well, how to define “ordinary user” is a huge problem for the system administrator — access control is at the root of all security issues...
            • mschuster91 2293 days ago
              In HN speak, another karma threshold (downvoting ~500 points, flagging ~1k points if I'm not mistaken). ;)
              • 2510c39011c5 2293 days ago
                it is all about different labels...and different labels have different visibility, in terms of effort to get to that information...so I guess by narrowing unflagging privilege to a very small group of users, the mod hopes to make it significantly more difficult for people to advocate a certain post here through purposeful manipulation...thus perhaps “unflagging” capability, as one of the last lines of defense, is only given to a very small group of ultimately verified users, through manual assignment or some thrshold based on percentage rather than an absolute karma value...

                that’s my guess...

    • megaman22 2293 days ago
      Why in the world would this be flagged? This is the kind of interesting technical hacks that I, at least, come here for.
    • JepZ 2293 days ago
      Is there a way to see why it was flagged?