We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Five Steps to Make Your Go Code Faster & More Efficient

00:00

Formal Metadata

Title
Five Steps to Make Your Go Code Faster & More Efficient
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Go is a pragmatic choice for developing reliable and robust programs, especially in cloud environments. However, with big data demands, the expensive economy and the ecology aspects, every Go developer will inevitably be required to handle efficiency issues in critical parts of their Go applications or services. In this talk, Bartek Płotka, author of "Efficient Go" O'Reilly book and maintainer of open-source Go projects, will walk you through 5 simple steps that will guide you on how to make effective and pragmatic optimizations in your code in a data-driven manner. The audience will learn about essential open-source tools and strategies that allow them to make their code faster or use fewer resources like memory or CPU when needed. There will also be a chance to win free signed copies of the "Efficient Go" book!
Keywords
Data miningArithmetic meanMathematical optimizationMultiplication signMathematicsInheritance (object-oriented programming)CodeSoftwareComputer animation
Mathematical optimizationDistribution (mathematics)EmailOpen sourceScalabilityNormal (geometry)Open setUniqueness quantificationInheritance (object-oriented programming)Physical systemElectric generatorComputer animation
Component-based software engineeringBlogRevision controlDatabaseMereologyVelocitySoftware developerProjective planeOrder of magnitudeImplementationOpen sourceTouch typingProduct (business)Computer animation
Game theoryImmersion (album)NumberGraph (mathematics)Semiconductor memoryFeedbackData managementProduct (business)Different (Kate Ryan album)Computer animation
Revision controlRead-only memoryVirtual machineProcess (computing)Heegaard splittingCodeProjective planeConfiguration spaceSoftwareComplex (psychology)Inheritance (object-oriented programming)Computer animation
Service (economics)Point cloudMereologyHuman migrationProjective planeComputer animation
FreewareLoop (music)ModemCoroutineLeakInheritance (object-oriented programming)Semiconductor memoryComputer animation
AlgorithmGamma functionCodeMereologyAlgorithmStreaming mediaMathematicsObject (grammar)Panel painting
Error messageMathematicsPhysical systemComputer animation
CodeAlgorithmLevel (video gaming)Mathematical optimizationVirtual machineSoftwareBitCuboidComputer animation
Random numberLink (knot theory)Inheritance (object-oriented programming)Link (knot theory)Computer animation
Lemma (mathematics)MaizeGame theoryManufacturing execution systemBenchmarkSoftware testingExecution unitSoftware developerComputer programmingDefault (computer science)DataflowTraffic reportingBenchmarkExtension (kinesiology)Mathematical optimizationFunctional (mathematics)Computer fileInheritance (object-oriented programming)IterationComputer configurationLogical constantLoop (music)Level (video gaming)Multiplication signElement (mathematics)Program slicingMacro (computer science)Test-driven developmentMathematicsVariable (mathematics)2 (number)ResultantBoilerplate (text)Structural loadINTEGRALData dictionaryUnit testingString (computer science)Electronic signatureLocal ringDiagram
BenchmarkModemContext awarenessExt functorManufacturing execution systemChannel capacityVarianceElement (mathematics)LengthLaptopInheritance (object-oriented programming)Integrated development environmentTerm (mathematics)MathematicsBefehlsprozessorProfil (magazine)Power (physics)ResultantMultiplication signFunctional (mathematics)AverageTouchscreenBenchmarkMathematical optimizationMereologyProduct (business)Point (geometry)Decision theoryComplex (psychology)EstimatorIterationSemiconductor memoryProgram slicingBitAnalytic continuationOpen setString (computer science)Macro (computer science)Operating systemPanel painting
View (database)BefehlsprozessorElement (mathematics)Cycle (graph theory)Functional (mathematics)Computer animation
RootView (database)BenchmarkWorld Wide Web ConsortiumExecution unitComputer fileInteractive televisionSpeicherbereinigungCodeBefehlsprozessorChannel capacityAbsolute valueLibrary (computing)Standard deviationFiber bundleProfil (magazine)Execution unitComputer animation
BenchmarkResultantDifferent (Kate Ryan album)IntegerParsingReading (process)Key (cryptography)Software bugSet (mathematics)CASE <Informatik>Inheritance (object-oriented programming)Functional (mathematics)Point cloudMathematical optimizationStandard deviationMereologyBenchmarkLibrary (computing)Panel paintingComputer animation
Mathematical optimizationLink (knot theory)Link (knot theory)Computer animationProgram flowchart
Transcript: English(auto-generated)
Okay, welcome back, so while you all have been walking in, I've been quickly reading this book, Fish and Go. It reads very quickly, and now Bartek has made sure that my code is 10 times quicker, so tell us everything about it, thank you.
Thank you very much, everybody. So welcome, I hope your travels went well. Mine were totally canceled, flight canceled, change of the route, so I had lots of adventures, but generally, I'm super happy I made it, and we are at the FOSDEM.
So in this talk, I would like to invite you to learn more about efficiency of our goal programs, and there was already two talks that I have been on who mentioned optimizations in its name and generally how to make software more efficient. I wonder where this, I don't know, it's not hype,
but it's already three talks about one topic. Why it's so popular? Is it because everybody save me money? That might be a reason, but I'm super happy we are really uncovering this for Go, because Go alone might be fast, but that doesn't mean that we cannot, doesn't need to care about making it better
and use less resources when we execute it, right? So let's learn about that, and turns out that you can save literally millions of dollars if you optimize some code, sometimes in production, long term, so it really matters, right? But before we start, short introduction. My name is Bartolomei Potka.
I'm an engineer at Google. Normally I work at Google Cloud, Google managed Prometheus service, but generally I'm open source passionist. I love Go, I love distributed systems, observability topics. I maintain Tanos, which is like open source
scalable Prometheus system. I maintain Prometheus as well, and generally, yeah, lots of things in open source. I mentor a lot, and I suggest you to check, also try to mentor others. It's super important to bring new generation of people up to the speed in the open source, and yeah, I'm active in the CNCF. And recently, as you see, I published a book,
and I think it's kind of unique. Everybody's doing TikToks now, and YouTube, and I was like, yeah, let's be old school, because you need to be unique sometimes in the world, and I really enjoyed that. I learned a lot during that, and I would love you to learn as well. So I'm kind of summarizing some concepts
from my book here in this talk, so let's go. And I would like to start with this story, and apparently some of the talks, one of the best talks have to start with the story, but this is something that kind of maybe triggered me to write a book, right? So imagine that, I mean, yeah,
that was kind of five years ago. We just started the project called Thanos Open Source. Really doesn't matter what it does right now, but what happens is that it has microservices. It has, I think, six different microservices written in Go, like you put in Kubernetes or any other cloud, and it's just a distributed database.
And one part of this database is compactor. It's like a component. Again, doesn't matter much what it does. What it matters is that it touches object storage, and it processes, you know, sometimes gigabytes or terabytes daily of metrics, right, of some data. So what happened is that at the very beginning of implementation, as you can imagine,
you know, we implemented, yeah, MVP. It kind of functionally worked, but of course, you know, the implementation was kind of naive, definitely not optimized. We didn't even run any benchmark, right, at all, other than just running on production and just, yeah, kind of works. So, and you're laughing, but this is usually, you know,
what development in a high velocity looks like, and it was working very well until, of course, more people put load into this, and you know, we have some issues like ooms. You know, one user pointed us to some graphs of, you know, incredibly, you know, high spike of memory usage on the heap,
on the Golang heap, right? And you can see it's a drop, which means, you know, there was a restart, or someone killed this, and yeah, and the numbers are not small, like 15 gigs. I mean, for a large dataset, maybe it's fine, but it was kind of problematic, right? So it was really interesting to see
what different feedback and what different suggestions community were giving us, and I mean community, everybody, like users, other developers, maybe product managers. We don't know sometimes who they all are, but you know, probably depending on their background, the answers, the proposals were totally different, right? So I would like you to kind of, you know,
check, and like check if you had the same situations in your experience, because you know, this is kind of like very ongoing problem, and I would like to, yeah, showcase this. So you know, first suggestion was that, can you give me a configuration that doesn't oom?
As like, what, do you expect me, like very new project to have like flags, not oom, or like useless memory? This is not as simple as that, yet many, many users are asking us this question, or person's, or person's project, probably you heard this question. Okay, what configuration I should use so it uses less memory, right? Or like, it just, it's more optimized.
How can I optimize using configuration? It's just, you know, it's not as simple as that. I guess, you know, maybe in Java, in JVM, you have lots of performance flags, you sometimes tune things, and it's better, but you know, it's not so simple. It's a goal, like kind of low level, you, I mean, yeah, you need to do more than that, right? Another, you know, interesting approach,
but very, very good in some way, is that just, okay, I will just put this process into bigger machine, and it's done. And that's totally valid, you know, solution, maybe short term, maybe sometimes it's enough, but you know, in our case, it was not sustainable, because of course, you couldn't grow vertically more and more, and also, even if you would maybe find
the big enough machine that was working for your data set, then you know, obviously you were overpaying a lot if the code is naive and maybe wasting a lot of memory, right? Then finally, you know, the most fun approach. Okay, let's split this one microservice into, you know, like a scheduler, and then, you know, workers,
and then we'll just replicate in my super nice computer, Kubernetes cluster, and you know, it will just horizontally scale, so I can use many, many hundreds of small machines, so it will work. Yes, but you know, you are putting on small kind of microservice so much complexity that it will be like more expensive generally, right?
Because of the network cost, like distributed systems, you know, injects, you know, things that you have to replicate data finally. So if you overpay more and more and more, and you are kind of distributing this non-optimized code to different places, that's not always the solution.
Sometimes the code cannot be optimized more, and we can, you know, we should probably horizontally scale, but not in the very beginning of the project, right? Yet, that was the first suggestion from the community, right? Of course, you can just switch from Thanos to something else, right? That's also a solution. And then, if you have this approach, and probably you will just jump for project, this is not super efficient, but maybe, you know,
some parts of the project are better or some worse. That's an option. Some suggestions, of course, paying for vendor, right? Like, they will solve the performance for me for real money. So, but yeah, that's not always a good solution. Like, that's just giving up, and also, you know,
migration of data, huge cost of learning new tools, and so on. And you know, all of this work, where in the code, we have this. And it's like, you know, it's bumping, and super easy ways that could be avoided, right?
And yeah, so, you know, of course, that was malloc, so in C++, I mean, in Go we don't have malloc and so on, but you know, memory overhead, memory leaks like that, like, are very common in Go, like, just imagine how many Go routines sometimes you put, you created, you forgot to close some kind of abstraction, and the Go routine is leaking, and so you are leaking memory like this malloc, right?
So, and you know, what actually, you know, was the solution was some contributor finally came up, investigated, thought about this efficiency problem on the code level, algorithm and code level, right? And rewrote, or like, rewrote small part of the compactor to stream data, right?
So instead of building maybe the kind of resulted object that compactor is doing in memory, it was as soon as possible streaming that to file system. Easy, generally easy, easy, easy change. Yet, there was lots of discussions, lots of stress, lots of weird ideas,
and I would just find it like, over time, amusing that this story was repeating in many, many cases, right? So, and you know, that's not only, you know, of course, experience, so many kind of nice examples where only small character change, two character change there, and you know, so much kind of like, improvement over like, large systems.
So sometimes, sometimes there are like, very easy waste that we can just pick it up and just do it, right? But we need to know how, right? So, kind of two learnings from this story. One is that software efficiency and code level and algorithms for changing code, you know, matters.
And learning how to do it can be useful. And second learning is that there is common pitfall, I think, generally, in this years, because in the past, we have premature optimizations. Everybody was playing with the code and trying to over-optimize things. I think now we are lazy, and we are more like, into DevOps,
into changing, you know, configuration, into horizontal scaling, because we have this power, we have cloud. And this is usually, you know, more chosen solution than actually checking the code, right? And I call it closed box thinking, and I think this is a threat a little bit in our ecosystem. So we should acknowledge that there are different levels.
We can sometimes scale. We can sometimes put more bigger machine. We can sometimes throw right to rest, if that makes sense. But, you know, that's not the first solution that should come to your mind, right? Okay, before we go forward, I have five books to share, and I will share the link to quiz at the end. And it was super simple, but pay attention, right?
Because maybe there will be some questions around, and you need to answer, send me an email, and I will just, you know, just choose five people, lucky people, to have my book. So, yeah, pay attention. All right, five steps. Five steps, yeah, for efficiency programs.
One thing I want to mention, I don't know if you have been in a previous talk, or like before previous, his is, he kind of explained a lot of optimization ideas, like I think, and then Maciej before, like he mentioned string optimization with internings,
has just mentioned, I think, something around, you know, pre-allocations, and many kind of like, I think, padding, struct padding, and generally, you know, all those kind of ideas. This is fine, but it's optimizing stuff, it's not like looking for dictionary of things I did in the past.
It's kind of more fuzzy, more involved. So what I would like you to focus, it's not a particular way of how we optimize an example I will show, because it's super simple and trivial, but how we get there, right? How we found what to optimize, how we found if we should even optimize, okay? So focus on that. So first step, first suggestion I would have,
and this is from book, I kind of found, yeah, I don't know, like I defined this name TFBO, which is essentially a flow for development, efficiency-aware development that worked for me, and generally, I see other professionals doing that a lot as well. So test, fix, benchmark, optimize. So essentially, what it is,
it's like a TDD with something else, and TD, you are probably familiar with, test-driven development. You test first, as you can see, and only then you kind of like implement or fix it until the test is passing, right? I would like to kind of do the same for optimizations as well. So we have benchmark-driven optimizations,
because as you can see, we benchmark first, then we optimize, and then we profile, right? And I will tell you later why, but all of this is a closed loop, right? So after optimizations, we have to test as well. All right, so it feels complex, but we'll make one loop, actually maybe two, during the start on a simple code, so let's do it.
So let's introduce a simple function, super simple, super stupid. We are creating millions element, I mean, a slice with million elements, and each of those elements are just a string, a constant string, pause them. Super simple.
It's a first, kind of first iteration of this program we want to write. So what we do regarding TFBO? Okay, so we test, right? I mean, now we have a code, for example, and we want to maybe improve it. We test, test-driven development, so let's assume I already had the test, right, but the test could look like this,
and then I'm ensuring, okay, it's passing, so nothing functionally I have to fix. So what next? So next is this measurement, it's a benchmark, and again, has already mentioned how to make benchmarks, but I have some additions, extensions to that that you might find helpful. Something I want to mention is that
we were talking about microbenchmarks, because the same level of testing behavior, like for example, for this small function, we have this create, unit test is totally enough, right? This is on micro level, we are making just unit test, it's fine, but sometimes if you have a bigger system,
you need to do something on macro level, like integration test, end-to-end test, whatever bigger, right? And the same happens in the benchmarks, right? This is microbenchmark, this is the kind of unit benchmark. There are also macrobenchmarks I covered in my book, and then you need to have more sophisticated kind of setup with load testing, with maybe some automation,
with some observability, like Prometheus maybe, which measures over time some resources, but here we have a simple unit, create function, we can just make it simple with microbenchmarks. And it has already mentioned, but there is a special signature in a test file you have to put, and then there are optional helpers, for example,
that I like, actually, to put almost everywhere, report allocs, which is by default making sure that this function will measure allocations as well, and the reset timer, which is super cool, because it resets the measurement, so anything before you allocate, you spend time on, it will be discarded from benchmark result, so benchmark will only focus on what will happen
within this loop iteration, right? And then this for loop, you cannot change it, don't try to change it, always copy, this is a boilerplate that has to be there, right? Because it allows Go to make repeatable, check the repeatability of your tests
by running it hundreds of times. Okay, so how we execute it, already, again, has this mentioned, but this is how I do it, to focus to one test, but this is not enough, in my opinion, right? By default, it runs only one test, one second. I recommend to actually make sure
you explicitly state some parameters, right? And I have one-liner in Bash, for example, that I often use, so what it is, essentially, I'm kind of creating some variables so I can reference this result later on in a short-term future, v1, for example,
so this will create a v1.txt file locally. It will run this benchmark, it will actually run it sometime, I specify, again, which is super amazing, because it was like, okay, so I have this v1 file, what I was doing with it, and then you check in your Bash history, okay, oh, that was one second,
and then that was something else, right? So it's kind of useful, and then this is crucial, I don't know why I didn't learn in the beginning, maybe you learned the count, dash count, right? So what it is is that it runs the same test couple of times, six times, actually, and so one second, six times, and this is super important, because then you can use further tools, you will see,
to check how reliable are your results. It will essentially calculate the variance between the timings, for example, so if the variance is too big, then your environment is not stable, right? And then I pin to one CPU, this is super important to generally pinning,
not to one, right? Just pick something that works for you, for concurrency, pick something that runs on production, maybe, or similar, but always between tests, don't change that, right? So that's super important. And also, I recommend to change less than number of CPU, because your operating system has to run on something, right, so those things matter.
Also, don't run this on laptop without power connected, because you will be CPU throttled. There are lots of kind of small things that you think, oh, it doesn't matter. No, it matters, because then you cannot rely on your results, right? So try to make this serious a little bit, and at least don't put, don't benchmark on your lap in the bed, because they will be overheating,
so yeah, small things, but it matters, right? I was doing that all the time, by the way, yeah. All right, so result, you know, result looks like this. You can see many of them, but this is not how I use it, or how we supposed to use it, apparently. There is amazing tool called Benchstat, and it just screens in more human readable way,
and you can see it also aggregates and have some averages over those runs, and tells you within this percentage, for example, the time latency, there is a variance of 1%, which is tolerable, for example, right? And you can kind of like customize what exactly,
how it calculates this variance, and so on. So we can trust it, like it's within 1% of, I guess, free, you could trust it, depend on what you do, but generally it's not too bad. Allocations, fortunately, are super stable, right? So, hey, so we benchmarked, we measured, okay, we know our function has these numbers, like, I mean, what's next, right?
Everybody was like, yeah, let's make it faster, let's make it faster, but wait, wait, wait, like why? Why should we make it faster? Maybe, okay, maybe that's a lot, 100 megabytes of every, you know, create invocation, but maybe that's fine, right? So this is where I think we are missing a lot of experience, usually.
I mean, you have to set some expectations, right? Like, to what point you are optimizing? And, you know, usually we don't have any expectations, like, okay, yeah, I mean, even from product management, here we have maybe functional, you know, requirements, but never really concrete performance requirements. So we don't know what to do, and honestly,
if you don't, just ignore those requirements, it's like, okay, I don't have, I just want to make it faster, then this premature optimization is always, right? Because it's always premature because it's a random, the random goal you don't really understand, right? So maybe, maybe just make it fast, right? That's also, like, very fuzzy, obviously, that's not very helpful.
So what is helpful? And I know it's super hard, I know it's kind of uncomfortable, but I suggest doing some kind of efficiency requirements spec. Super simple, as simple as possible, I call it rare. So there were efficiency requirements, and what it means, essentially, try to find out some kind of function, right?
Some kind of, you know, complexity, but not asymptotic complexity, just more concrete estimation of the complexity based on inputs, right? And for simple functions, like, for example, our function, we can estimate, you know, what in our minds we think should happen, roughly, right? So, you know, for runtime, we know we,
one million time, we do something. We don't know how many now seconds, let's pick 30, this is actually pretty big for one iteration of just append, but just really pick some number. Sometimes it's good, it's just, you know, you can iterate over this number, but if you don't know where you go, then, you know, how you can make any decisions.
In allocations, it's a little bit better, a little bit easier, because we expect a slice of six, of one million elements of strings, and as we learned from Mach-E talk, every string has these two parts. One part has 16 bits, which has length and capacity,
or maybe capacity not, but then, yeah, length, capacity, and pointer, and then there is other parts which lies in the heap, but for this, you know, 16 bytes, we can assume that we'll be 16, right? So every element is 16 bytes, so now we just multiply. That's our function, that's what we all expect, right?
And with this, we can, you know, kind of expect that every invocation of create should, you know, kind of allocate 15 megabytes, but what we see, we allocate 80 megabytes, right? So already, we see that, oh, there might be like easy ways we do, or something I don't understand about this program,
and this is what leads us to better, to spotting maybe easy wins, and spotting, you know, if we need to do anything, right? In terms of time latency, it's already kind of like, more than we kind of expected, right? But this is more of a guessing. I just guessed this 30 seconds, right? Okay, so what we do. Now we know we are, you know, not fast enough,
not allocating, we are over-allocating, right? So then we profile, then we check, okay, we have a problem. Now let's find what's going on. And this is where, on micro level, we can, you know, use profiling very easily by just adding those two flags.
It will gather memory profiles and CPU profiles in the file, like v1.mempprof. On macro level, you can, there are other ways of gathering profiles, but you can use the same format, the same tools. There are even continuous profiling tools in open source, like parka.dev, I really recommend them,
and it's super easy then to gather those profiles over time. So this, what we want to really learn is that what causes this problem? And this is like a CPU profile, and we could spot, and the wider means it spends more CPU cycles, the deep depth doesn't matter, this is just how many functions we have, right?
So you can see that create, of course, is one of the biggest contributor, but the grow slice, right? Like, why we spend so many cycles growing slice? Ideally, I know how many elements I have, kind of why it doesn't grow me once, right? And then we can check, and by the way, you can use this go tool, pprove-atp locally,
I kind of use it a lot on this file to kind of expose this kind of interactive UI. You can do the same for memory, but honestly, this is not useful because Append is a standard library function, and they are not very well exposed, right? So they're hidden, so this is not very helpful. Actually, CPU profile was more helpful,
because it pointed us to the grow slice, and if you just Google for that, you will notice this comes from Append, and then you can go to documentation of Append and learn what it actually does. And as you probably are familiar, because this should be a trivial case, Append resizes the slice, or it resizes the underlying RI whenever it's full, right?
And resizing, it's not super simple. It has to kind of create a new, bigger RI and copy things over, and garbage collection will kill the old one, but not fast enough because of the garbage collection, so we kind of aggregate that as another allocation, right?
So this is what happens, and kind of the fix is to just pre-allocate, right? So to tell, when you create the slice, okay, how much capacity you want to prepare for that, and thanks of that, so what we do now, okay, we did optimize our TFBO.
Now we test before with the measuring, because if you are not testing, if this code is correct, then you might be, yeah, we'll be happy that things are faster, but functionally broken, so always test. Don't be lazy. Run those unit tests, easy. And then once they are passing, you can comfortably measure.
Again, I just changed v2 just to specify another variable on our file system, and then I can do a bunch that v1.txt and then v2.txt. Actually, I can put like 100 of those variables. It will compare all of them, but here we can compare two, and not only we have absolute values of those measurements,
but also a diff, right? So you can see we improved a lot, and if we check absolute value in regards to our efficiency requirements, you see that we met our threshold, roughly, but we estimated it, so it's totally good. 15 megabytes, we have 15 megabytes, and then it's faster than our goal.
So now we are good to go and release it, right? So that's kind of the whole loop, and you kind of do it until you're happy with your results. So yeah, this is it, and learnings. Again, pipe learnings. Follow TFBO, test, fix, benchmark, optimize. Use benchmarks that are built into Golang. They are super amazing.
Go test, let's bench. Set the clear goals. Goals are super important here, right? And then profile, and you can come in, Golang uses pprof, which you can Google as well. It's like amazing kind of protocol, kind of set of tools, integrated with other clouds and so on, and use it every day
whenever I have to optimize something. And then finally, the key is to try to understand what happens, what I expected, and what's wrong. Reading documentation, reading code, this is what you have to do sometimes. And a general tip, whenever you want to optimize something super, super carefully in some bottleneck part of your code, I mean, avoid standard library functions,
because they are really built into generic functionality. It will test, I mean, it will do a lot of things with different edge cases that you might not have. So a lot of times, I just implemented my own parsing integer function, it was much faster. So this is a general tip that always works, but again, do it only when you need it,
because you might have the bugs in this code, right? So that's it, thank you. You have a link here, bwplotcat.dev. Thank you. Thank you.