We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Whip Your Scripts into Shape: Optimizing PowerShell for Speed

00:00

Formal Metadata

Title
Whip Your Scripts into Shape: Optimizing PowerShell for Speed
Title of Series
Number of Parts
60
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer
Production Year2018

Content Metadata

Subject Area
Genre
Abstract
With a handful of techniques tucked away in your tool belt your PowerShell code will be running faster than ever. Buckle up for a demo heavy session that can’t drop below 50 mph. When every second counts, you can’t sit around waiting for your script to finish.
Shape (magazine)Scripting languagePower (physics)Mathematical optimizationTwitterScripting languageAddress spaceShape (magazine)GoogolComputer animation
Shape (magazine)Mathematical optimizationScripting languageLocal ring
Shape (magazine)Mathematical optimizationScripting languageInheritance (object-oriented programming)Software testingMeasurementModule (mathematics)Bounded variationIterationDefault (computer science)Personal digital assistantCodeAuto mechanicControl flowVideo game consoleOnline chatPower (physics)VarianceMultiplication signMathematicsSoftware testingModule (mathematics)Point (geometry)CASE <Informatik>MeasurementGame controllerMiniDiscFunction (mathematics)SpacetimeSocial classFitness functionSoftwareCodeMereologyScripting languageSlide rulePhysical systemGastropod shellBitQuicksortDifferent (Kate Ryan album)Pattern languageDecimalSemiconductor memoryTerm (mathematics)DemosceneResultantVideo game consoleGreen's functionBounded variationTouchscreenRandom variableVariable (mathematics)Patch (Unix)Demo (music)Presentation of a groupMechanism designFeedbackTwitterLie groupOpen setNumeral (linguistics)RandomizationStatement (computer science)Maxima and minimaComputer animation
Scripting languageShape (magazine)Mathematical optimizationControl flowBounded variationVideo game consoleSoftware testingInstallable File SystemVideo game consoleRun time (program lifecycle phase)Maxima and minimaFunction (mathematics)Average
Online chatBounded variationIterationControl flowDigital filterScripting languageMathematical optimizationShape (magazine)Convex hullEvent horizonSoftware testingCartesian coordinate systemGame controllerFile format2 (number)Object (grammar)Event horizonDatei-ServerRight angleComputer animation
Digital filterControl flowBounded variationIterationEvent horizonScripting languageMathematical optimizationShape (magazine)Single-precision floating-point formatTouch typingPhysical systemSoftware testingMaxima and minimaService (economics)System callCartesian coordinate systemEvent horizonDifferent (Kate Ryan album)BitLaptopSet (mathematics)MultilaterationMultiplication signPhysical systemStaff (military)Tracing (software)MultiplicationVariable (mathematics)Service (economics)Hash functionAliasingMobile appTable (information)Overhead (computing)1 (number)Functional (mathematics)Scaling (geometry)
Installable File SystemControl flowBounded variationMathematical optimizationScripting languageShape (magazine)Software testingDialectService (economics)Link (knot theory)MIDIProgrammschleifeLengthMilitary operationModal logicBuildingRevision controlWeightPairwise comparisonMultiplication signSpeech synthesisProcess (computing)String (computer science)Game controllerComputer fileNumberDirection (geometry)Level (video gaming)Type theoryMereologyWave packetRevision controlQuicksortFunctional (mathematics)Service (economics)AbstractionOverhead (computing)Directory serviceWeightProgrammschleifeContent (media)Object (grammar)Physical systemFile systemSeries (mathematics)Hard disk driveLengthOrder (biology)Statement (computer science)FrequencyDrop (liquid)Exterior algebraOperator (mathematics)Demo (music)Server (computing)Equaliser (mathematics)AliasingScripting languageFunction (mathematics)Electric generator
Installable File SystemBounded variationControl flowSoftware testingScripting languageMathematical optimizationShape (magazine)WeightData typePhysical systemRandom numberElectronic mailing listMaxima and minimaoutputObject (grammar)Internet forumIterationDependent and independent variablesCountingOnline chatParallel portWebsiteVariable (mathematics)Electronic mailing listMultiplication signUltraviolet photoelectron spectroscopyTask (computing)Game controllerLoop (music)Exterior algebraBounded variationOperator (mathematics)Connected spaceSocial classDefault (computer science)Process (computing)RandomizationObject (grammar)Video game consoleScripting languageFunction (mathematics)CodeSoftware testingMachine code9 (number)2 (number)Open setPositional notationMathematicsWeightComputer animation
Control flowBounded variationInstallable File SystemSoftware testingParallel portShape (magazine)Mathematical optimizationScripting languageRead-only memoryBefehlsprozessorIntegrated development environmentTouch typingIntegrated development environmentLaptopSemiconductor memoryWorkstation <Musikinstrument>Game controller2 (number)Lenovo GroupAsynchronous Transfer ModeQuicksortBounded variationPhysical systemSequenceProfil (magazine)Server (computing)InternetworkingMathematicsScripting languageComputer animation
Shape (magazine)Scripting languageMathematical optimizationMeasurementMeasurementMultiplication signRandom variableReflection (mathematics)Software testingResultantComputer animation
Mathematical optimizationScripting languageShape (magazine)CodeSoftware testingEvent horizonSlide ruleDemo (music)Software testingSlide ruleDefault (computer science)Level (video gaming)Data managementIterationDecision theoryProjective planeProcess (computing)Multiplication signBefehlsprozessorScripting languageLink (knot theory)Open setCodeNatural numberComputer file2 (number)Semiconductor memoryGoodness of fitMathematical optimizationLine (geometry)Standard deviationMachine code10 (number)Physical systemMathematicsTask (computing)Computer animation
Coma BerenicesJSONXML
Transcript: English(auto-generated)
Welcome to Whip Your Scripts Into Shape. I'm Josh King. You can find me at king.geek.nz and I'm on Twitter, at windosnz.
If you're anything like me and you just go by Twitter avatars, that's me. Or rather, it's not me, but that's who you'll find if you Google Joshua King. Assuming you don't find the sports person first. So, I'll address my accent up front. I'm from New Zealand.
Roughly there. So, I come from a town called Hastings. I work for the local city council there. And to answer the follow-up questions that I generally get after that is, it took three flights to get here, was about 16 hours in the air.
Back home, it's 7 a.m. tomorrow. Yeah, let's go on. So, the topic of today is speed and power shell. There are two things you generally don't hear in the same sentence, unless it's followed with some sort of complaint or curse or, you know, generally negative sentiments.
Frankly, my first experience with power shell was really slow startup times. They have been getting better, but if that's your first experience with power shell, it's gonna taint your perception of it going forward.
And unfortunately, it is a really, really common complaint. So, how are we going to look at speed today? We're gonna be doing some A-B testing. And for the purists, I am bastardizing that term a little bit. But in general, we'll have a control
and a variance or a variant and running them and seeing how they compare. So, I guess in general, if you're looking for improvements, you've gotta measure it. There's no point trying to make changes for anything, whether it's speed or disk space
without having something there to prove that what you're doing is right. Make one change at a time. And I guess this applies to all of IT because if you've got a network issue, you don't go and change your network card, your patch leads, your fly leads, your switch all at the same time. Because one of those things probably fixed it.
But you're never gonna know what. And test more than once. A lot of this is gonna come down to random variables. You might have something just recently crashed and free up a lot of memory. Or your system might be doing something really intensive at the time and you just ran into bad timing
that affected your results. The reality is that by taking measurements, you are going to be impacting performance a bit. So you need to try and minimize that impact. So my tests are gonna be run through a runbucket module and the next couple of slides
are gonna call out a couple of different modules. So I apologize for it sounding a bit like an advertisement. But like I said, we're gonna have a control, which is a script block, a variation, which is a script block, and it's gonna do its thing. So a little bit more about that, I'm using here. As I said, I'm gonna be calling out some modules. It's using PoshRSJob behind the scenes
and it's running each of those controls and variants 100 times each, unless the example up there shows something else, which is generally done because those test cases took way too long for us to sit here for 100 iterations. Just in case of any massive outliers,
I'm ignoring the slowest and the fastest run of each test in both the control and the variant. And I'm using the stopwatch class rather than measure command. And this really only comes up in the first example. Spoiler alert. But measure command tends to ignore console output.
So if that's part of what you're testing, it might not be the best fit for measuring things. And again, another module. I'm using PowerShell Universal Dashboard to display the results. I did this presentation at work
and this slide is here because of some feedback I got from that. So you're gonna be seeing some notifications pop up during this. And they look a little something like this. It isn't Slack, I haven't left Twitter open. They're part of it.
They basically tell me that all my test cases are finished, so that means the results are ready to have a look at. And another module, they're created via burnt toast. So got 11 demos today that are covering 10 concepts.
They will be quick, so we'll be not here for hours. And these demos may have unexpected results and that's sort of the point. Just to throw a spanner in the works there, one of these tests are meant to fail and if it doesn't, then you're gonna be wondering which one of those statements was a lie.
I'm using numerous VS code and PowerShell tools and if you're interested in how any of those mechanics and how that's all wired up, have a chat at lunch. The only other thing to keep in mind is you won't see me actually running any code. That's all happening behind the scenes
as I pull up each demo and that's mainly in interest of me not forgetting to start them and us having to sit here in awkward silence for a while. Just to go through what the dashboard's gonna look like. Controls on the left, variation in the middle, results on the right.
The speeds in the middle there are all milliseconds accurate down to three decimal places and in general, everywhere, green is good, red is bad. What I'm using to determine the winner is the average because like I said, we've got all sorts of random occurrences that could get in the way.
So the maximum might be high, but the average overall is still low. And a minus there in the percentage means it was, say, 80% quicker. So let's pull up our first demo
when I mentioned console output. So let's talk about Chatter. Have you ever noticed that running get child item when it displays everything to the screen is a lot slower than if you, see? Is a lot slower than if you're putting that into a variable.
And that's because console outputs slows things down a little bit. I don't know if that's intentional or not, but I tend to use it for looking for patterns and if I see something not in the pattern that I expect, I know to stop it and dive in and find out what's wrong. So the difference here between our control and our variation is one,
we're just dumping it out to the console and the variation, we're assigning it to the items variable. This is probably the only demo where the output of the two things are different. Everywhere else I've tried to end up with the same output.
And as you can see, we've shaved off, let's say at 13 milliseconds, on average, by not having the console output. And also the maximum on this is consistently a lot lower. So even if your averages are pretty close there,
your maximum runtime you're gonna be waiting for is pretty low. And of course this scales, this was testing my driver's folder. That was a whole file server. That's gonna be a much different story.
So the first thing we all learn when we're learning PowerShell is filter left, format right. And I think we've all just taken that as a universal truth. But let's go ahead and measure it and see what happens.
So in our control, we're getting a bunch of events out of the event log. Specifically we're looking at the application log. We've chosen a specific ID. And we just want everything that was created in the last hour. So in that example we are piping getWinEvent to where object.
But getWinEvent lets you filter directly on the commandlet. This, assuming nothing's failed, is probably going to be the one that has the most significant improvement today, I hope.
So just by changing that from one pipeline going to where object, we've dropped from two and a half seconds to 15 milliseconds. And conceptually that makes sense because what our control's doing is it's grabbing everything in the application log,
chucking it through the pipeline, and then figuring out which bits it needs. Whereas when we're using a filter hash table, it's not dealing with that extra data that it doesn't need or that we don't need.
Stop poking me. So we get into this a little bit later on, but commandlets and functions in and of themselves have a little bit of overhead. So if you're getting multiple sets of data,
if the example that I couldn't pull off here was getAdUser, if you're pulling users from multiple OUs or under multiple situations, it's probably better to, rather than making the same call twice, making one call and then splitting them out later.
So like I said, I would have preferred to do this with AD but I couldn't get it working on my laptop. So we're back to win event. So we've learned filter hash table's really good. Getting a bunch of stuff from our application event log and a bunch of stuff from our system event log.
Rather than doing that, we can do one call to get win event and then split the events up. So we end up with the same variables at the end, app events and sys events. And fingers crossed, we shave off five milliseconds.
I could say this difference is relatively tiny but when you start applying this to AD or much larger data sets or much more calls, so say if you were doing four different event logs, it scales quite well.
Well, in this case, I picked ones that I know aren't common between them. So yes, good call.
Yeah, you would need, so yes. So in my AD environment, say our staff are in one OU, our contractors are in another, I could get them at the same time and then split them based on distinguished name.
So one thing that's fairly popular with PowerShell is the one-liner contests and trying to make your lines as short as possible. But using things like aliases and shorthand
does have a performance impact. So here you'll see I'm calling service rather than get service, the question mark alias rather than where and I've shortened first to dash F. Rather than doing that, when we expand everything out,
we see it's about, not quite there but almost twice as fast. Now to be fair, this one is probably mostly impacted by just using service there. So if you didn't know, you can drop get
off the start of any get dash noun cmdlet and PowerShell should be able to figure it out and run the appropriate command. But what it's doing is when you type in service, it's looking for a cmdlet and it doesn't find anything and I am simplifying this so I apologise.
It tries to find a cmdlet, doesn't find it, tries to find a function, doesn't find that, drops down to your system pass, tries to find an executable, doesn't find that and then it goes back to the start and it's get. So it's doing a lot of work there in order to get to that point.
Right, the pipeline. And, jeez, that completed quickly. So this one I'm scrutinising my temp directory which are filled up with random stuff. And basically I just want to get out any that,
anything that's longer than 5,000 bytes. And that's being done by piping to weir object. The alternative as of, I think it was version, PowerShell version four is the weir method.
So your syntax is basically the same. You'll see the filter script is identical between them but it takes about two thirds of the time to run that. And again, like all of these examples
where I'm showing the file system stuff, I'm running on an SSD here and on a really small directory. When I do this sort of stuff on a file server, it shaves off a lot of time.
So loops are a foundational part of any sort of scripting and you'll often see this construct here. Again, I'm picking on that port temp directory on my hard drive. We're piping the contents to four each and this time loading up a total length.
And of course the alternative to doing that is putting get child items into a variable and then running four each over them. I'm sure it's not news for most people in here
but those two four eachs aren't equal. The one up the top next to the pipeline is an alias for four each object which is a commandlet that's abstracting things and doing extra processing and has all sorts of overhead. Whereas function is a statement in the language.
And again, it's two thirds the speed. It takes two thirds the time to run as compared to the piping it.
So what I'm about to show probably isn't the best demonstration of this but there's a very real possibility that in a lot of our scripts we're doing things we don't really have to be doing.
So we're doing all sorts of unnecessary operations. The reason why I picked this demo is because I recently found out about the version object or type. So this control example is going to be how I was going to start trying to compare version numbers at one stage.
So I'm comparing my PowerShell version which comes out as an object that looks something like that comment. And I'm going to compare it against the string. And I know from looking at that that I need to compare the five at the front of that string to the major number, the one to the minor number and so on. And there's a period between each of those numbers
so I can split it and then start going down with a series of if statements and just doing comparisons seeing if it's greater than or equal to and returning true or false depending on where it trips. Of course, as you might have guessed from the title that's not necessary.
Just cast that string as a version and you can just make the comparison directly. And like I say, this isn't a good example but it does work in that we've shaved off .4 milliseconds
but at that scale, it's a third. But again, I really wanted to put this one in because of that casting, it really helped.
So here's where all the public speaking training comes in. So I mentioned before there's a bunch of abstraction happening in PowerShell to make your life easier. At the end of the day, it's running on .net
and that abstraction comes with natural overhead. So in this example, I'm generating a thousand guilds and I'm just dumping them out. Rather than doing that, we can use the, we can generate new guilds via this.
The output is exactly the same and if we generate a thousand of them, so other examples I've seen where this applies is things like pinging servers. There's a, I think it's the net connection class
you can do it through. I always have to look it up when I do it but if you're able to write it, doing anything in the .net syntax is gonna be faster. I mean frankly, if you wanted the absolute best speed, you'd be writing machine code but nobody's got time for that.
So we've been using collections a lot so it's time to become a collector.
The most common thing I see in scripts is this notation here where we're defining an empty array and then using the plus equals operator to assign new values to that array or to that collection. But even though you're using that plus equals, well, that collection doesn't actually have a method
of adding new items to itself. You're going to PowerShell, here's a collection, here's a new item, I need these two things mushed together and it goes, all right, I'll create a new thing with that attached and then it just repeats. Destroying and recreating the object over and over again.
Instead of doing that, and the default way I've started doing this now is using an array list which does have an add method so you can add items to the collection. So this example here in both cases, generating a thousand random numbers,
adding it to the collection. The only thing I will say about array lists is using that dot add method will output things to the, will output to the console. So hence why I'm doing null equals there to silence that.
Yeah, yes? Yes, and I didn't put that in as a demo but yes. I think if you do, I think if you do
redirection like that it's closer. But for me I just always go for that one. But yes, definitely going out now is a lot slower, yes.
So yeah, that's another one where the increase is significant. Well sorry, but you know, the speed up is significant.
All right, so after you've become a collector the next step is becoming a hoarder. So our control here is a variant from the last example
and our variation is just assigning the output of that for each loop directly into a variable. This one is faster but personally I still go
with the array list just because I like the syntax. But if your goal is speed, doing that works. And I guess the mother of all speed ups
is parallelization, it's always a tongue twister. So as you might have guessed, I'm using PoshRS job for this one. But the example code I've got here is I've got four targets out on the internet, the two Google DNS servers,
quad nines and a random news site from New Zealand. Looping through all each of those, pinging it twice just to slow things down a little bit, and then returning an object just saying what the host was and whether or not it responded.
And the alternative there is using PoshRS job to do those test connections in parallel. So again, probably not so much for this room but for the benefit of people at home. Everything so far has been basically a sequential operation.
So think of it like the supermarket, all the tasks you're doing are people waiting to get through the checkout, and the supermarket's got one checkout open. Checkout operator can only process a customer in about a minute. So if there's 30 customers, it's gonna take that operator 30 minutes. If they opened 10 checkouts,
they can process 10 customers at once. Each customer still takes a minute to get through the checkout, but now it's taking three minutes to process all 30 customers. And that's what using a parallel job like this is gonna do. And as you might expect, the maths didn't add up.
Hang on, let me rerun that and hope it works. That's probably the worst one to have run because it takes about 10 seconds.
Like I said, the awkward silence of waiting for it to finish. All right, I know what it is, I'm not on the internet.
So we'll talk through this one. Generally what you'd see here with four hosts and two echo requests is that it'll take about eight seconds
to complete the control. And the maths works out that it should take about two seconds to complete the variation because it's doing all four of them at once rather than four of them in sequence. Yes, didn't think about signing into the Wi-Fi. In fact, airplane mode.
So I guess the question after seeing all of those is are they a silver bullet? And the simple answer is, well, yeah, no. So depending on the data you're working with
or the systems you've got or the workstation you're working on, you might, and did I say data sets? You might run into memory consumption issues. You might run into CPU bottlenecks. So you need to sort of profile what you're doing. Also by trying to optimise
that you might break a working script and we'll touch more on that in a second. And your environment's a snowflake. Just being here, looking around the conference in general, I see a lot of Lenovo laptops. I see a lot of Dell laptops. In New Zealand, we have a lot of HP.
Our server environment at work is HP. I'm going to go out on a limb and say here, maybe not as much, at least maybe. But what I'm getting at there is I don't know what's gonna work well in your environment.
That's your baby. So the takeaways from this is don't assume or don't be a stick in the mud. If you've learnt to do something one way, don't think that that's the only way to do something. If you hear of or see or think of a new way of doing something, give it a go.
Test it, see how it works. See if you like the syntax. You might find you like that better than the way you've been doing things. Measure twice, like I said at the start, and as happened here with the Wi-Fi. So random variables will come into play.
And running a test once isn't gonna be a true reflection of truth. So every time I've run these demos, I've had different results. I've had different things not be faster, depending on which way the wind's blowing.
And revisit. Microsoft's really good at pushing out updates. And chances are those updates are going to occasionally potentially make something slower, make something faster. So there's no harm in, especially if you've saved all your test cases, trying them again in six months, 12 months, two years.
Just keep an eye on it. And worry about optimization after it's working. Some of these things I've started doing naturally, like the ArrayList, is just the default way I do collections now. But if they're not second nature to you, work the way you're comfortable, and then start worrying about any optimizations.
If that's even what you care about, and I'm not here to say, you should be caring about speed. It's just something that might come up. And test your code. So I mentioned on the previous slide, you might break a working script.
If you've got good code coverage, if you do break your script, both should pick it up, and you'll know right away. Especially if you're making one change at a time. Then you'll know what broke it. And I have a note. Glen's doing a PES decision at three and four or five, so if you're interested in testing your code,
go and see that. So, yeah, bang on time. So you can get the slide deck, all the demo, all my VS code stuff at that link. I sort of wanted to throw the floor open for questions or comments, or hearing your favorite way of speeding things up.
I will mention, I brought some chocolates from New Zealand so if you come and find me at lunch, help yourself. Yes? I know you mentioned you guys make concessions for CPU or memory consumption by doing this kind of stuff. So what's a good way to do that?
Open task manager and keep an eye on it. All right, cool, this is really fast enough. The CPU's have been making this up. Yep. Well, and the thing is that you might be fine if you're not using the system for anything else at the time. But yeah, the reality is just,
what I tend to do is have the resource monitor up and I just keep an eye on it. Thankfully, working in a place like New Zealand, I don't generally deal with massive data sets. So we're not dealing with thousands of AD users or tens of thousands of lines in a CSV file. So I personally don't run into it very often.
Yes? What I found personally. Okay, sure thing. Yeah. I just recently went through a project
of speeding up the health chat we were running. Yes. Taking 14 minutes to run and I went through each stage to look for ways to optimize it using parallel coding. And we went from 14 minutes to 90 seconds. Nice. Actually. Okay, sure. One thing I found is that you're firing up a lot of jobs.
Yes. The RSL, you're using the standard PowerShell job. Let's say you kick off 32 jobs simultaneously. Yes. There's initially a large, the jobs are longer running. Yes. So you just staggered them slightly, like half. Yeah. Oh, I didn't mention that. I am doing the 100 iterations
but they're throttled to 25 at a time. Yes, yeah. So I set it to 25. For most of mine, that seemed to be the sweet spot.
Anyone else? Awesome. Well, thank you all for coming. Thank you all for having me. And. Thank you.