Testing and Automation in the Era of Containers

Video in TIB AV-Portal: Testing and Automation in the Era of Containers

Formal Metadata

Testing and Automation in the Era of Containers
(with Go)
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Software engineering Core dump Perspective (visual)
Building Java applet Code INTEGRAL Latin square Multiplication sign Modal logic 1 (number) Mereology Disk read-and-write head Fault-tolerant system Perspective (visual) Mathematics Strategy game Different (Kate Ryan album) Core dump Formal verification Computational science Software framework Descriptive statistics Physical system Social class Programming language Boss Corporation Wrapper (data mining) Closed set Software developer Bit Unit testing Type theory Normal (geometry) Quicksort Resultant Writing Programming paradigm Slide rule Dataflow Service (economics) Connectivity (graph theory) Streaming media Rule of inference Product (business) Latent heat Operator (mathematics) Energy level Software testing Mathematical optimization Mobile Web Standard deviation Inheritance (object-oriented programming) Projective plane Expert system Line (geometry) Human migration Mathematics Software Integrated development environment Formal grammar Rewriting Point cloud
Programming paradigm State observer Suite (music) INTEGRAL Java applet Code Connectivity (graph theory) Execution unit Similarity (geometry) Mass Fault-tolerant system Perspective (visual) Scalability Bookmark (World Wide Web) Portable communications device Product (business) Twitter Number Medical imaging Latent heat Mathematics Causality Strategy game Internetworking Different (Kate Ryan album) Operating system Energy level Software testing Software framework Error message Physical system Exception handling Programming language Standard deviation Projective plane Expert system Interactive television Unit testing Degree (graph theory) Mathematics Order (biology) output Software testing Hill differential equation Catastrophism Resultant
Covering space Building Distribution (mathematics) Projective plane Unit testing Fault-tolerant system Formal language Subject indexing Goodness of fit Arithmetic mean Personal digital assistant Computer configuration Software testing Quicksort Error message Systems engineering Physical system Exception handling Library (computing)
Service (economics) Collaborationism Port scanner Mereology Disk read-and-write head Game theory
so runoff allows for Veronica
okay so thank you for being here in my talk does who could make it and for those who were told that they they were not allowed to be standing on the sides they sat down so they're very clever okay so my talk is very self-explanatory and it is very simple and it's again I think that everybody says it's just thirty minutes so we cannot just go through every single thing that we do but I'm gonna talk to you about how we do it at core OS from my perspective and yeah so it will be more like a chat rather than this is the best practice that you should do and it is wrong if you don't do it this way it's just like sharing our perspectives okay so I'm a senior software engineer at Korres now well okay so Who am I
dad right now I work on automating Red Hat Linux or support for tectonic and well now in general I work with distributed systems but before I used to work in scientific computing and then a period of mobile but all of this through through all of this time I have been always using Linux so these are very important aspect for me and then well kubernetes fourth is Linux for the cloud so I am really really happy to be working with this so this is like how the top will look like today so the first is how we do it that chorus and this has two perspectives versus strategies that we have been following those who have worked those that have not worked and on the second part it will be what I've learned then change of parting and you will see what I'm talking about with this then why go is great for tooling because in the end I mean this is not a core OS room this is not the testing room so I have to explain why I go it's great for this and some closing thoughts okay so of course even though we do have recently new team that takes care of testing and automation it's not like the old school flow where people would write their code and some testers just make the QA things no it is not like that like every team takes care of their own code at their own tests oh by the way I know that a lot of people are not do not like or do not enjoy like a lot of texts in in the slides but since this is live stream or this slides are going to be consumed by people not watching the talk a lot of things will be redundant so feel free to just listen to me or just read it or if you're a super concurrent about it okay so every team takes care of their own tests so this is pretty cool for accountability reasons but also because our products are very different so we also have very strict release automation guidelines and well the the main goal is that if some tests don't pass you shouldn't merge it this is not always a hundred percent true but we aim for that okay so some of as I said our products are very different like we have tectonic that is more enterprising we have EDD that everybody loves and then we have container Linux and so on so for example even though I don't work directly with the head city team they're the ones that have a tougher testing strategy for many reasons of course because the city is one of the most popular pieces of software right now but also because well the backgrounds of their engineers they follow like a very very strict workflow so I don't know if you have seen this but you can look for this on the internet like in more than 100k lines of code 60k are just for testing and all of this includes like from unit tests to integration migration and to end then we also use basil basil is like a pain is the worst thing that could ever happen to me I have never okay so basil is a great tool when when you don't have to work with it I only went only okay to refer it's a great tool when you get to benefit from it but when you have to benefit others through it it just doesn't work at least for us it has been really really painful there like a couple of in-house basil experts but like it's not the norm at the company so it has been really painful a perspective for this I mean and it's not part of the slides but you're here you're here to to listen to me so I will tell you what I hate it it's because well all of our workflow is like 90% I don't know 95% face didn't go so basically is a great tool written and for Java so Java see all those are great environment so now you start using kubernetes and you realize that they use basil for very specific reasons and I'm not going to discuss that but well kubernetes is one of the most popular go projects and then you have this tool that works better with Java or C or Python projects so it's just like so a lot of times I have talked to people I'm not going to disclose names but like hey so hey Dalton so hey Dalton how how do I use this go rule for basil and Dalton will tell me oh no like that rule for basil and go doesn't work so what you know what I did I used the the basic rule for Python and I created a wrapper and then I did it in five more steps and then we have it so it's breezy it is never like that okay so back to to the experience and how we do it at Korres testing and automation teams work on building targeted tools not not QA as I said even though we're a team we don't work together on the same project like every one of us work toward different efforts for example some work on the Prometheus team some work on the Titanic team or me that I work for the Red Hat Linux team that is like in the ether or something but then for example I don't know if you're here because either you're
trapped here at Magus you cannot go out or because because you really wanted to hear my talk but well if you read the description of my talk it's that something about telling you the story of building a framework inside Korres and well that is no longer true because we would start building one for testing on automation but as I said that team is new so we're still learning a lot and evolving and with learning is like screwing up a lot so and but that is always cool like we're trying to adapt this workflow and you know we try to replicate as much as we can just like many other infrastructure companies like the Google infrastructure workflow you know some sort of the real name for the testing automation team is engineering services so okay on the other hand it's my experience and what I have learned so I had never worked in a testing or automation and 100% oriented team so because when I was hired they were looking more for a distributed systems person so they because they wanted did this value more than a testing person so of course I have struggled to to succeed at many things but my perspective is also a fresh look because I don't have I didn't have like all the I don't know all the best practices or bad practices or the things that testing people used to do like another type of workflows which is also like the reason of this talk so at the beginning we were trying to focus a lot on the test coverage like be super stringent with our repos and as I said at the beginning like all of her code had to be like a hundred percent in passing and stuff like that so you know in my perspective from my perspective that was a little bit of too much of work for the benefit that we could get especially because the test coverage thing was only based on unit testing and as you will see or as you might know already this is not enough for well containers kubernetes distributed systems anymore okay so also this causes like in complete end-to-end scenarios so the very definition of end-to-end testing means like they have to be as comprehensive as possible so but it it's not in to end if you only have one end and another end test you also have to test like everything in between so when you when developers like me or any like you I don't have the empathy necessary empathy with the system they or are not that familiar with the system well we cannot create effective end to end test scenarios so I always mentioned my experience in Latin America because I'm Mexican and I used to work like in that side of the world so when you work in those environments you're working very with very very limited resources so you have to do it's I'm not saying that it's better or this worse but you have to deal with non-technical people non-technical bosses that won't give you money to a new server or in your class or whatever so you have to be super super efficient and an optimization and the other hand in privileged or bleeding edge cultures like the one that I work on now can afford over engineer over engineering and rewrites this is not bad at all with this what I mean is that for example instead of following a more formal testing workflow what we have done a lot of times is write a component write an operator or write something and if it doesn't mean the standards that that we were looking for either in in quality or in sorry or in tests test results yeah we throw it away and write another one we can't afford to do that also because well we have very good engineers that are very good at writing code really fast but not all teams can afford this so what I recommend is of course having very skilled people on your team but also having well not getting rid of code just because it doesn't work like so fast rather iterate on what what really works ok so testing distributed systems is hard and new considerations are necessary I don't know how many of you are familiar with the formal verification of a distributed system but it is really really hard so when I say the formal verification I'm talking about the academic aspect of it so and not I mean very very few distributed systems in the world are formally verified and that is not good that is not bad I mean in our ideal world every single distributed systems would be verified but it is very expensive it is very hard and it takes a lot of time so well we have things like monitoring we have things like testing and many many tools that we use today and maybe a lot of people here yes so container service meshes kubernetes solve any problems and they are great like for example fault tolerance that is not built and built in with the girl programming language but well note that we have kubernetes we don't have to think about that anymore we don't have to solve like the full tolerant problems anymore directly so but that doesn't mean that if we don't have to solve the problem we don't have to be familiar with how it works so with this I mean that we need to have different levels or of specialist skills of course at Korres
we build those systems that and users have have to rely on so we have to be like very specific in in our approaches but it doesn't matter like even if you're like the end user for production you still have to to be familiar with how the tools that you use work ok so then there are two perspectives with testing and containers the first one is how using containers for testing and the second how to test containers okay so for using containers for testing I will just go through this super quick so you package your test suit suite and make your system runs burn against it the benefits is or are that it is practical need fast portable scalable etc everything that you you see the benefit of a docker image or any contain image then it says it's a standard for distributed components for example at Korres we're pushing towards that strategy right now because it is very easy like since we work in very different teams towards one goal like let's say tectonic tectonic includes that permit is supported ltd support all of that so where the goal towards testing this new testing strategy is like to package all the tests in a container and just run our systems against it so that is super effective and super easy of course easier said and done so we're still working on that and then the second perspective is testing containers right so testing distributed systems is hard okay so if you go to the kubernetes documentation you can find this and specifically in the end-to-end test suite it is not uncommon that a minor change may pass all unit and integration tests but cause unforeseen changes at system level so as we said and to end by definition should be as comprehensive as possible so what I'm trying to say with this is that usually people who work with distributed systems have plenty of experience and have therefore different backgrounds so a lot of our best practices for testing that we used to have may not be enough anymore no it's not that they may not be enough anymore they're not enough okay because just just as this quote says like at a system level many many interactions can happen and like the components of a distributed system might work on their own but not when they interact or they might work when they interact but they can interact in many different ways so they can work at let's say seven different interactions but at the 8th it's broken and when it's broken it doesn't matter that all the other scenarios worked so the fact that we don't have to worry about fixing things anymore doesn't mean that we don't have to know how they work and especially how they fail because if we know how they fail we know how to fix or how to implement solutions from the very beginning for example I was talking about full tolerance and the go program language versus the elixir program language and the certain the latter one has pulled tolerance included so I was like okay so why doesn't go have full tolerance and that is not like of my business right now but my friend Jana jvd told me okay so and go what you do is instead of like crying or whatever your favorite way to deal with fault tolerant is what you should do is design for failure in mind so not only catch exceptions like you do in Java or any other programming language but you design with failure in mind that sounds horrible but that is how you have to do it and the only way to effectively design with failure in mind is being aware and being very familiar with your with your system and with your tools like for example being familiar at any degree that you need it of course you don't need to be like an expert in every single level if you don't have to but if you work with kubernetes in a production level well know how that works if you build the operative system behind that well you know how to you you have to know how that works ok so unit testing is always important but with distributed systems we have two outcomes either it is incomplete or it's too complicated because you have to see every single input that has to happen in order to to make it complete to make it fail or well pass or not pass and massive integration tests are an anti-pattern and when when you use only unit testing sorry to to assess the health of your code well you have you have well basically what I have seen is that people rely a lot of mocks you know but imagine the larger your distributed system is and the more the interactions it has the mock you have is like massive or it does it's not sustainable anymore it's just another system on its own so what do you do well of course there are testing and automation tools and frameworks etcetera but also we have the conception and that is true that only three nodes are necessary to test a whole distributed system it doesn't matter how large or small it gets after that so but what matters is not how many notes but the number of inputs that that we put so then the MOC gets massive so monitoring and support teams must not act as your systems nanny because a lot of people what what we do is like when we build our distributed system we and we don't test it correctly and we just send it to production that way well we just rely super heavily on either your monitoring strategy or your support team and that is not the goal so right in this talk I found scary similarities with many people that works on similar projects for example Cindy and you can find her in Twitter escapa construct likes to write about it well I don't like to write about it but but she she writes very cool articles and medium posts about it her strategy is more to talk about the testing but from the monitoring perspective she has this very deep knowledge about the difference between observability and monitoring for me is the same nothing I'm not saying that it is bad or wrong or good or bad it's just that I don't have all that specification knowledge and well on the other hand since I have an academic background I always every single problem that we have I try to tackle it like from from my experience where what I'm very good at so I was reading this paper which is really really good just look at like that on the internet and they found that 92% of the catastrophic system failures are the result of incorrect handling of non fail errors so this is none other thing than error handling like the trying catch or whatever it looks like in your favorite programming language and this brings me to the go tooling perspective just to
finish so when you go to well if you haven't already to read about the orange the origins of gold cover that is the coverage tool for forego you can see like that go was created with tooling in mind not that you need to read that article to know that but in case that you're still exploring go us for your first projects that is a very good introduction for go tooling also the error handling and the the famous articles of errors as values written by Rob Pike are a very very good introduction into why go is really really good for tooling okay and all of this because in the experiences that well to sum up everything that I have said already is like one distributed systems need much more than unit testing in disease too we have to be familiar what we're building even if we don't have to fight against it anymore if we don't have to fight against fault tolerance because kubernetes now takes care of that it doesn't mean that we don't have to know how it works because if sometimes it doesn't work as the vendors promise we have to know how to fix it or how or not even that layer but the next layer and well 3go is a very nice language for tooling to create your own tools me as a testing and automation engineer at kora is working on the bleeding edge of the containers and that sort of technology with go we have found that the the the tools that you create only we're just under library or are a very good option for all of the reasons that you can find at any go talk like this in place either this index etc I also recommend Kelsey's Kelsey Hightower's kubernetes the hard way not as an introduction or anything like that but a means to really
understand what you're doing even if you don't get to do your own kubernetes distribution or on distribution but just to know how it
works and how to fix things ok so I try to really really explicit but all the times up okay so thank you and we're hiring it is small because if you heard the news we are being now part of Red Hat and hi Derek but and I don't know how the hiring will work right now but well we are always told that for our talks we should put they were hiring [Laughter] and one thing everybody at Korres wants to know was to let you know the head container Linux will still be alive [Applause]