I'm going M.A.D
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 64 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/45930 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 201119 / 64
3
7
10
11
17
19
21
28
33
34
35
37
40
44
48
49
52
55
57
59
62
63
64
00:00
Physical systemTrailSoftware developerSystem administratorPoint (geometry)Process (computing)Operator (mathematics)Right angleComputer animationXML
01:31
CodeConfiguration managementOperator (mathematics)CodeSoftware testingGastropod shellFormal languageDifferent (Kate Ryan album)Software developerCodeLine (geometry)System administratorScripting languageSelf-organizationMereologyWritingStapeldateiConfiguration spaceComputer animation
02:40
CodeScripting languageCodeOperator (mathematics)Physical systemSoftware developerComplex (psychology)TouchscreenComputer animation
03:45
Metric systemMetric systemPhysical systemTerm (mathematics)InformationTask (computing)Memory managementInformation overloadMultiplication signPoint (geometry)BefehlsprozessorData miningFigurate numberData analysisMessage passingSemiconductor memoryPiSystem administratorArithmetic meanDifferent (Kate Ryan album)Noise (electronics)SpacetimeComputer animation
07:26
InformationBuffer overflowCodePhysical systemMetric systemInformationFreewareRight angleUnit testingSoftware testingProjective planeSoftwareMessage passingService (economics)Operator (mathematics)Information securityTest-driven developmentEscape characteroutputWritingEmailTerm (mathematics)WordFunctional (mathematics)CodeMultiplication signScripting languageProcess (computing)Software developerOffice suiteCuboidProduct (business)Bit rateSoftware bugExecution unitLecture/ConferenceMeeting/Interview
11:28
Software testingBit rateCodeGraph (mathematics)Multiplication signNumberSoftware testingExecution unitMessage passingCovering spaceTask (computing)Projective planeUnit testingComputer animationDiagram
12:55
Software testingPort scannerLine (geometry)Lie groupKolmogorov complexityFunction (mathematics)NumberAlgebraic closureCodeBuildingPhysical systemPauli exclusion principleReflection (mathematics)Revision controlDirection (geometry)Game theoryOperator (mathematics)Goodness of fitOrder (biology)Line (geometry)CodeSoftware testingMetric systemNumberProduct (business)Computer fileSocial classTerm (mathematics)Code refactoringMathematicsAreaPrice indexLengthSource codePhysical systemGraph (mathematics)1 (number)QuicksortSoftware developerSoftware bugMultiplication signComplex (psychology)Slide rulePoint (geometry)BlogLibrary (computing)Graph (mathematics)Limit (category theory)Network topologyFunctional (mathematics)Complex systemObject-oriented programmingAlgebraic closureBuildingFitness functionSystem callFormal languageStatisticsRight angleLatent heatLangevin-GleichungExtension (kinesiology)Object (grammar)Cartesian coordinate systemTouch typingComputer animation
20:12
Virtual machineCAN busMetric systemCodeBefehlsprozessorDivisorSemiconductor memoryLoop (music)LeakSoftware developerSoftware testingStructural loadProduct (business)Video gameFigurate numberPhysical systemRight angleCASE <Informatik>System callSystem administratorStaff (military)Configuration managementScripting languageOperator (mathematics)Multiplication signCartesian coordinate systemTrailExtension (kinesiology)Continuous integrationIntegrated development environmentCode refactoringComputer animation
23:32
CodeMathematicsQuicksortGoodness of fitWeb 2.0Cartesian coordinate systemMobile appCodeSoftware developerPhysical systemOperator (mathematics)Point (geometry)Cache (computing)Ocean currentElectronic mailing listStaff (military)Metropolitan area networkLine (geometry)Web applicationMetric systemLecture/Conference
24:38
Data managementConfiguration spaceMathematicsCodeContinuous integrationExecution unitSoftware developerMetric systemMetric systemSoftware developerLine (geometry)Term (mathematics)Operator (mathematics)Execution unitSoftware testingCodeCartesian coordinate systemDifferent (Kate Ryan album)Self-organizationRevision controlClosed setQuicksortFormal languageGroup actionNatural languageConfiguration managementPhysical systemBinary decision diagramSoftware frameworkShared memory1 (number)Goodness of fitProgramming paradigmWordFunction (mathematics)Web pageType theoryScripting languageNetwork topologyDivergenceUnit testingProcess (computing)Backdoor (computing)DivisorBlogContinuous integrationComputer animation
29:10
Metric systemProduct (business)Cartesian coordinate systemSoftware developerGroup actionOperator (mathematics)TheoryGame theoryComputer animation
29:53
Menu (computing)Unit testingVirtual machineIntermediate languageNatural languageIntegrated development environmentPoint cloudSoftware testingINTEGRALCuboidScripting languageProduct (business)CodeSoftware developerFormal languagePhysical systemMetric systemCollaborationismInstance (computer science)Operator (mathematics)FeedbackSoftware frameworkNatural numberPresentation of a groupDampingDifferent (Kate Ryan album)BitTerm (mathematics)DivisorGoodness of fitCartesian coordinate systemPower (physics)Dependent and independent variablesMereologyChemical equationInstallation artContinuous integrationMessage passingVirtualizationExecution unitQuicksortAnalytic continuationReal numberTest-driven developmentVideo gameComputer animation
38:28
MathematicsData storage deviceGraph (mathematics)Term (mathematics)Multiplication signScaling (geometry)Exception handlingSineInterface (computing)Visual systemUnit testingGraph (mathematics)Metric systemCuboidCache (computing)Musical ensembleHardy spaceVirtual machineEvent horizonServer (computing)CodeExecution unitLattice (group)Software developerLevel (video gaming)Software frameworkQuicksortProgram slicingDifferent (Kate Ryan album)Partition (number theory)Block (periodic table)ChainRevision controlCASE <Informatik>Point (geometry)ImplementationKey (cryptography)Standard deviationTime seriesSoftware testingFile systemSystem callResultantAddress spaceFormal languagePhysical systemFile formatDatabaseCloningRoundness (object)AbstractionOpen setBitFacebookVariable (mathematics)Projective planeGene clusterProcess (computing)Integrated development environmentCartesian coordinate systemModal logicOperator (mathematics)Position operatorContinuous integrationTwitterFunction (mathematics)VirtualizationRobotLibrary (computing)Computer fileComputer animation
46:54
Computer animationLecture/ConferenceXML
Transcript: English(auto-generated)
00:07
Ladies and gentlemen, our next and final speaker in the system track is Spike Morelli, who will be doing a talk entitled I'm Going Mad. Thank you very much.
00:27
Is it working? Okay, all right. Sorry about that. I've messed up. I've gone mad already. So, I'm going mad. I wanted to share what I've gone through in the past six years from being a completely system-administrator
00:44
actually doing development, and then from being a full-time developer going back to do system administration, and how that worked out for me. How many developers are in the audience? Operations folk? Do we have any QA people?
01:03
Some. All right. So, I can say that this will work for you, but it has worked for me pretty well in my various jobs. So, I'm going mad. It's monitoring-aided development.
01:22
And one point that I'd like to start with is that we all code. And even when you start with your one-liner on a batch shell, you're coding. You're effectively coding. And then maybe your one-lines become not enough
01:41
and you move to a batch script. And then you can go farther and start to do Perl, Python, Ruby, anything that you want. But you all code. Even if you are an admin, you're coding. If you are a QA person, testing automation is becoming bigger and bigger and testing automation requires a lot of coding.
02:01
And also, we've seen a huge uptake with configuration management lately. When you have a configuration management, either if you use Puppet or Chef, so a DSL, writing in Ruby, you're still coding. So, we're still talking about code. But somehow, especially for people in operations where I come from, you don't feel like a developer. You're not treated as a developer.
02:24
And this is heartful, because this is one of the reasons, the fact that operations and development use two completely different languages and don't recognize each other. It's part of the reason why, in organizations, we have these kind of problems.
02:41
And this is a snippet of a puppet manifest. And as you can see, it looks a lot like a piece of code. So, if we all code, we should try to talk to each other more than we do, and we should try to learn from each other more than we currently do. And it's weird as a thing of not feeling as a developer, as a system person.
03:05
Why don't I feel as a developer? Why are my scripts not treated as if they were code? When everybody, when people start developing and they wrote the first hello world, how is there any more...
03:21
It's kind of about complexity, because hello world is not more complex than your automation script. And nonetheless, you're kind of falling in feel as if you were a developer when you were writing your first hello world. People in operations are still the people that I know and met in my career do not feel ever as developers.
03:41
People are something completely different. Now, monitoring metrics is a lifetime love for me. Since I started doing my career as an assist admin, the fact that I had Nagios and then Nagios checked all my systems,
04:01
for me, it was awesome. The first day that someone showed me that Nagios, I was, wow. Now, something is checking things for me. Now, I can sleep much better than I used to. And then this huge uptake there is when, from monitoring, I discover metrics. And metrics is something that it kind of is...
04:24
It gets, talking of monitoring, talking of metrics, some people use one term rather than the other. They all kind of belong to the same world, but effectively, they have two completely different meanings. Metrics tell you how a system is doing.
04:42
Monitoring tells you much more like, is the system okay? Is the system broken? And if you think about medicine, if you think about the metaphor of a doctor, doctors in our Western culture, they come in when you get sick. So when you get sick, you go and see the doctor,
05:00
and the doctor tries to backtrack what happened, how you were doing in the last 10 months, and figure out what might have gone wrong. And when I went to China and I had kind of a... I got to see how all Chinese doctors used to work. A Chinese doctor always follows you. He follows you when you are okay.
05:22
And you pay them to keep you healthy. And when you get sick, that's when you don't pay them, because they failed. And in this way, metrics to me are a much better way of understanding and looking at a system, because you kind of follow the system, rather than just getting to know when something is broken or something works.
05:46
And I spent some time in data warehousing. And in data warehousing, I met this guy that was a pure data geek at its best.
06:00
He has these huge piles of data, heaps and heaps of data, and he could pull out incredible things out of these heaps of data. And that's where I got my love for metrics, comes from this experience. And this guy had a great message, and one day he told me,
06:21
there is no such thing as too much data. Only data you don't know how to make sense of. And this is very important, because obviously there is a cost in having all this data and having all these metrics. There is a data analytics dev room, which was really interesting, and they talk a lot about how you do data mining
06:42
and what you can get out of that data. And certainly it isn't a simple task. But that task has huge payoffs. And so sometimes you come across people that say, well, why would I want to keep certain metrics? Why would I want to, you know, I just keep CPU or I just keep memory?
07:02
There's a few things. What's the point of keeping more of them? What's the point of keeping them for longer? You know, they're going to take all this space, and then there's noise. How am I going to find all the good informations that are hidden in all this data? So, why there is a cost? And that point obviously is partially true.
07:21
There is a cost to it, but it's important to recognize the benefits. And so, certainly you have to mind information overflow. There are no free lunches. You cannot expect that by starting collecting data instantly you'll be able to figure out everything about your systems
07:40
that you didn't know before. And there certainly is a risk that you will be misled by all those metrics. So, I didn't test my code. As a system guy, when I started, I did no testing whatsoever.
08:03
Obviously, I would run, and when I'm saying testing, I'm kind of talking about unit testing to begin with, but mostly I would run my scripts and test them maybe on a dead time that barely were VMs, maybe some Zen, beginning of Zen,
08:20
and you would do some of them, then you would have a box in the office to run some of this stuff on, and that was mildly all right. But you know unit testing, nothing, because that's not how we roll in Hubs, apparently. And then I met, so I changed company, I changed job, and I went to this company where there were better developers.
08:41
And not just because there were better developers, and better maybe is, I'm abusing the term, but they were more friendly toward operations, and so they really believed in testing and they wanted to talk about it. And so they were doing talks during lunch and various things to which we ended up going
09:01
as an operations department and inspired me. I got inspired by how they were approaching software development, how much value they were putting on testing, and how much they were getting out of this testing. They had metrics to prove that testing for them worked really well. They produced better softwares, fewer bugs,
09:21
they had less problems in production. And so I started to kind of wanted to do this testing, but then I still couldn't because systems don't do testing at all. But then I moved on as I started my own business and I wanted to actually make something, and I guess it's the same, it's happening the same even with free software projects,
09:41
all the new free software projects that are coming out these days, they all have tests, and they actually go on how much coverage will we get to it. They get on the tests. So I started to do something, and the amazing thing is that as I started to do testing, I realized that actually, in a way, as a person,
10:02
I was doing testing already. And when I started to do TDD, I realized that I used to do TDD with Nagios because you would bring up a service check in Nagios, and then you would bring up Nagios' monitoring system,
10:20
and then you would bring up the system and you would see the check pass, which is what people do in test-driven development. They write the test first, they see the test failing, then they write the code, they satisfy the test, and they see the test pass. And that gives them a certain amount of certainty that that code is correct.
10:41
And test-driven development is really interesting for many reasons. One is security. I realized that sometimes developers don't care too much about security depending on what they're doing, and there are always fights between operations and development because developers maybe don't care as much as operations
11:00
would like them to do about security. TDD is great for security because you can write tests with in mind like user input. So if you have a function that validates email, you can write a test that checks for escape characters in your email. So TDD is a great way to do testing.
11:21
And so when I started applying testing, I also started to record the success rate on my tests, how I was adding testing, and I had this kind of thing coming up. This is more the beginning of a project. And this graph tells a story that you will not be able to see
11:40
if you just looked at some numbers at any given time. And what this graph is telling me, it's telling me that I didn't do as much testing in the beginning where I was introducing unit testing, and then I got to tag 01 where I wanted to kind of do a release, and I actually matched and all my tests passed.
12:00
And then I diverged again. I was adding tests but some of them were failing and I wasn't really caring, and then again another release and I matched again. And now there is an interesting thing happening up there. I'm not diverging anymore, and I'm not diverging anymore because I've introduced TDD. And so I'm writing the test before writing my code, and I'm not forgetting.
12:21
When you can set up in GitHub or whatever you use, you can set up your hooks to run your tests, your unit tests, when you do a commit and reject the commit if the tests don't pass. Now, as you get your tests and you think, okay, well, I'm adding this test, I'm doing well, but then you have the question of
12:41
how much does this cover of my code base? Because it's useless if you're adding a lot of tests for just one small portion of your code. And then you can start to measure this as well and graph it. And you can get something like this. And something like this at a first glance,
13:00
again, it says something, and you can see that you didn't have coverage, you've gone up in the beginning and then you kind of got up to the 80 and then you've got down from 80 so you've lost coverage probably because you add the code and you added some tests, but those tests weren't covering your code. And then you go up to the top and you're reaching 100% coverage. And it's interesting that you've reached 100% coverage with Tag02,
13:24
but you've added almost no tests, just a few tests. So that is an interesting thing. And we'll come back to it in a minute. And then you start thinking, well, I've got this coverage, I've got this test, are there any other interesting metrics? How big is my code base?
13:42
And you can say, it's lines of code. Well, lines of code is a really bad metric because how big is your actual code is not really what you're interested in, but it's a good starting point. So you kind of go, thanks, but no thanks. If you start to graph it, nonetheless, it says something interesting again,
14:03
because as you can see, you've got a number of tests and your coverage, and then your lines of code going up. And now, do you remember from the previous slide that I had achieved 100% coverage on Tag02 even though I didn't have many tests?
14:20
Look what happened to lines of code, it went down. Why it went down? Because I refactored. So in this graph, having these metrics recorded and having them graphed tells me something that I would not necessarily catch if I didn't have something like this. So at this point, I know, and if you think about,
14:43
you know, over the span of a year, you could pinpoint every time you refactor, every time you change something big in your code. But obviously, as you said, lines of code is not a good metric. What you're really interested in is complexity, which is your enemy. You don't want complex code because you're more likely to induce bugs
15:01
the same way that you don't want complex systems because they're more prone to failure. But how do you measure complexity? And the things that I started thinking with is some ideas that I got from SpanAssassin and SpanMan, simple scoring. So you begin to think, you know, how do I assess complexity and what metrics can I use to address complexity?
15:24
And so one thing that you can start to do is call graphs. Call graphs you can use. There are many, many libraries that will scan your code and it will tell you a call graph, so which function is calling which function and it will build a tree. And so by that, you can see the amount of nesting.
15:40
Now, if you have a lot of nesting where you get functions, then maybe you get like a four, five, I've seen six, ten layers of nesting. That is really bad. That counts. That is a metric that you can use to score your code as a complex code. Number and size of functions. There are people that will limit, that will impose arbitrary limits on the length of your functions
16:05
because they say long functions are hard to read and again, it's easier to introduce bugs. So you can use that as a metric to judge the complexity of your code. The other thing is code closure. Now, code closure is something I came across recently
16:22
with Michael Feathers. He made a post on his blog and he was saying, if your code is good code, you tend to not change the same files many times. You tend to add new files or add classes. If you think in terms of object-oriented programming and so the idea that you extend your classes,
16:42
you don't necessarily attach classes that you've written before. And so he graphed his commits on GitHub and you could see that a lot of the files were never attached again. It was introducing new files or adding functions and then he had a couple of areas where he had like hundreds of changes.
17:02
And so he targeted those places for refactoring. So it's again a metric that helps you write in better code which then in turn works better in production and keeps your operation happier. And again, then you have complexity of the build system. If you have a complex application,
17:23
you generally end up with a complex build system. So the complexity of the build system is a good indicator and can be used for scoring to judge the complexity of your code. And again here, a graph of those things and you can see that complexity, and this line is reassuring
17:40
because it tells me that despite several changes, my code complexity has not gone up. And also, it's important to do it with style. Style can also be a source of metrics. How good is your code? So for example, if you use stuff like Lintian, which is a checker that will run through your code
18:03
and tell you a lot of things about your code if you're naming the way you're naming your variable, makes sense. The length of your lines, all sorts of things that you can pick up. And there are specific ones like PEP8. I use a lot of Python, so I'm using PEP8.
18:22
And then reward-beautiful-code. This is another thing that I found to be really important. This works both for operations and for development. Especially for operations, this is really important. This is the thing that I picked up in development. Developers tend to reward beautiful code.
18:42
They tend to reward good code, so they put value on doing it right. Operations, if operations does it right, nobody notices. Culturally, that is how it's expected, and that is also harmful. Because nobody is supposed to notice that,
19:01
and nobody gives value to what operations is done directly. And this creates a conflict between operations and development, because operations are going, well, they get all the credit. We don't get any credit. And so you create this animosity. Now, the thing with metrics is,
19:25
people complain, the biggest complaint that I've heard when I talked about metrics, is that as soon as you introduce a metric, people will start gaming it. And all the companies that they try to introduce metrics, for example, to judge developers, how good they were doing, how good the stuff was,
19:43
have more or less, to a certain extent, failed, because developers will start gaming those metrics, just to get a raise at the end of the year or something. So there are problems with metrics. Again, there are no free lunches, it's not just the cost of analyzing, the cost of rising, there is a mindset that has to be changed
20:05
in order to make good user metrics, but they are extremely powerful. And then, of course, all the stuff that I was doing, and I was doing it manually, then you shouldn't be doing it manually. Really. And so I started to use the CI, and I stopped exporting data when I was doing commits,
20:22
or looking at my commits and scanning my code, then I started using Hudson, then it's been renamed to Jenkins, Buildbot is great as well, but anything that is continuous integration, it is really useful. And continuous integration is also being picked up by operations. For system scripts, there are people that use continuous integration
20:40
to run through the puppet, the Chef manifest, again, all this configuration management stuff has changed a lot, the stuff in operations. But in doing this, I kind of forgot where I was coming from, because I kind of got really excited about the metrics and about the development. And where I came from is this, and this is really bad.
21:03
And if you haven't been on call, you probably don't realize what it is to be on call. When I wasn't on call in my first year of system administration, I didn't really know what I was talking about. The first time that I went on call, it really shook me. And when I had this, I had a specific ringtone
21:23
on the phone that the company gave me, and we couldn't change the ringtone. After I left the company, when I was in a public place and someone would have the same ringtone, I would twitch. So you kind of have these things that, you know, it really gets into your life, it's really difficult to kind of, being on call really sucks big time.
21:43
And so you want to avoid it. And one thing that you can do to avoid it, and one way, one path that has worked for me, is to introduce metrics because how is greater than if. And it is never too early to start monitoring your application's behavior.
22:02
And this is key, this is where operations and development can start to collaborate much more. Operations can bring to development, and this is happening to a certain extent in new development environments. Put monitoring in those development environments. Add monitoring, track CPU usage, memory usage,
22:21
and add those metrics. And those metrics can help developers because, again, we do TDD to catch, for example, TDD is good for refactoring because you know that if you have your tests and you refactor and you break something, you will know. Now, how about using metrics for CPU usage or memory usage to figure out that when you refactor,
22:42
you actually introduce a loop in your code or a memory leak. Wouldn't that be useful? Wouldn't it be useful to figure out that you've introduced a memory leak before it hits production, before you finish the sprint if you're doing Agile or whatever you're doing and you get two months later, a month later to run, maybe you're still testing
23:01
and you're still doing load testing and you figure it out before production, but it happens later. And you have to go back and kind of figure it out, how it went wrong, where it went wrong. So monitoring and having metrics from day one can be really, really helpful. And it has helped me directly in some of the code that I've written because I'm still not a great developer
23:22
and I made a lot of mistakes. And testing has saved my life in many cases and having metrics has saved me in many, many cases. Another thing that can be done is to write code that is monitoring friendly. This is another thing that developers can help, can kind of come together, a point of contact
23:41
between development and operation in the sort of DevOps kind of cultural change. And here, guys, it's a small flask app and as you can see, I've got a slash mon slash status, a slash mon slash self-test and a slash mon slash metrics. So if you have stuff like that, I can point my Nagios,
24:02
my monitoring system to those kind of things and I can get very easily, very simply, a status of an application or the performances. For example, if it's a web app, you can store the last 100-ton code in a memcache. And this is happening a good deal in system tools.
24:22
Like if you think of memcache, for example, memcache has a... You can tell it to the port and you can run a status command and you get out a list of the current status of memcache, which is really useful to judge how your caches are doing. And the bottom line is that ops is changing, operations is changing.
24:45
Configuration management has made a huge difference in how operations has been moving in the last couple of years, three years. And we're closing into something that looks much more what developers are used to. And this is really important because, again,
25:01
in this conflict between operations and development that many organisations have, one of the big problems is the language that both parties are speaking. And the fact that both parties can speak code is greatly helping to reduce that divide. Configuration management also has given birth to these infrastructure as code sort of thing,
25:22
which basically means that since now, how my systems are set up is close to writing a piece of code. Now, my infrastructure really can be represented with code, which can help in this process. And then you have behaviour-driven development, which is something fairly new, but it's something that developers love.
25:43
There are a lot of developers that really like to do behaviour-driven development and they're using the cucumber, a rubber framework. Those are applications that will allow you to write in native languages, like in plain English or whatever is your language, in natural language. You write your tests and you say something like,
26:01
when I connect to such and such page, I expect such and such output. And those when and expect are keywords that tools like cucumber, rubber framework or Lettuce, if you do Python, will know how to interpret and convert into a test. So now you have something that developers are really happy with.
26:22
They love to do BDD. And now operations can use it. There is a plug-in called Nagios Cucumber that allows you to run tests written in natural language with Nagios to monitor your application. So now there is no longer this kind of a... Developers write their own tests and then they pass it to operation,
26:41
which have to rewrite the tests into something else to fit whatever monitoring infrastructure they're using. The two groups can use the same language and the same tools. And then continuous integration, as I was saying, it's already happening in ops. And having these things in common can greatly help with the dialogue.
27:02
And so you can help, but how do you help? If you are an op, realize and accept that you code. Don't think that because you are in operations, the fact that your operations justifies not testing, not using unit testing, not using the sort of paradigms that developers are using.
27:22
Learn from your developers. Understand how they do it, why they do it, what they do, and adopt those kind of techniques. There's lots of good stuff. And advertise your achievements. I touched on these earlier. Developers are generally identified as the ones that produce the features,
27:41
produce what is sold to customers. So that is what is visible. Operations are never visible. And so start to advertise your achievements. Start to talk about it and engage your developers. In my experience, when I started to actually go to developers and ask, well, how could I test my scripts?
28:03
They were more than happy to talk to me. So really, it's not that they don't care, it's just that they're speaking a different language and there is a hurdle in getting over that diversity. So if you ask for things that they recognize as familiar, they will be more than happy to help you.
28:23
If you're a developer, treat ops as developer. Understand that they're writing code and recognize that. Share the knowledge, how you're doing. Again, do the opposite of what ops are supposed to do in terms of getting in contact. Code applications are easier to monitor, like we went through,
28:43
and learn from operations, like tap into the knowledge about metrics and monitoring, because it can be really useful. And the most important metric, this is something that I stole from Patrick's talk, Patrick Dubois gave a talk about DevOps in London last week.
29:01
It was a great talk and it was all about culture, no tools. And his talk was about trust. And trust is the most important metric. Trust is the most important metric because if you're trying to get these two groups together to talk to each other, there is a gain to be made there,
29:22
because if you have like ten people, and each one of those people can in theory produce ten of whatever, you need to work, and then one of those people doesn't trust, one of the developer doesn't trust one of the ops people, then they will hold their work, waiting for the ops of the group that they will like,
29:43
or maybe they will install their applications on their own, they become a bottleneck, so your production goes down, because you're not trusting each other. And to close, so don't let uncertainty drive you insane. Go mad.
30:01
Thanks everybody. Questions? No?
30:20
Thank you. No? One? One? Oh, there you go. Well, uh, okay. So there's a little part of your presentation I didn't catch, because what is TDD? What is the theme?
30:41
TDD. Oh, TDD, sorry. Test-driven development. It's the fact that you develop your tests before you develop your code. So rather than writing your code and then you go on and says, well, what really should this code look like, how it should behave, and then writing a test, you write your test first,
31:00
and then you write the code that make that test pass. That is much more likely to guarantee that you will have all your tests in place, that all your tests will cover all your code. Does that make sense? Hi. Could you speak to the automation of tests
31:21
in a deployment kind of scenario? So the automation in test and deployment, sure. The thing that most people I've seen doing and I've done myself is done with virtual machines. So what you end up doing is generally trying to spin up and create virtual machines, environments from scratch,
31:44
which is what sort of QA is used to, and then you deploy your code so that some can deploy your code to any machine that you want or so could build bot. So the idea is that you use something like, even for example, spin up an instance if you want, so you can tell it to create an instance with sync, KVM,
32:04
or in the cloud if you want, deploy your code to it, and then run a script, and that script can run all your tests and give you all your things. If you're doing other things like with continuous integration, you could run metrics on the same box that Watson or whatever
32:24
is the continuous integration that you use runs on. That is perfectly fine. It is not that good to do integration testing. It works well to get metrics from unit tests, not so well for integration testing. For integration testing, really you would want either develop the environments that are on demand
32:41
or environments that you can at least clean out between runs because you don't want to reuse the same environment twice. Does that clear the question? Good.
33:01
Yeah, I think we're good. My question might sound a bit cynical,
33:31
but do you think it's really possible for developers and operations to trust each other? You said you spoke to some developers when you were in operations and they were really interested in helping you
33:43
making their applications more testable and everything, but my experience is exactly the opposite. I'm in operations and most developers don't want to talk to operations, want to just code and do nothing else, and they have their own idea how it should work, which is really hard to get to them
34:01
and to explain to them how the real life works. Well, so the thing is that I'm not saying that developers should behave and take on responsibilities of operation. I totally appreciate that as a developer, I don't want to know. In fact, when I start coding, I'm bothered by the fact that I have to install something
34:20
or take care of something because it breaks. I wouldn't want to do that. So I appreciate that. But what I'm saying is that in terms of how you handle certain problems, that kind of thing can happen transparently. So think, for example, continuous integration. It's a good thing where both operations and development
34:43
contribute to that system and then without having like... So, say, as an operation, I contribute to the fact that when your continuous integration spins up an instance on Xen or KVM, it installs monitoring. It installs monitoring that monitors everything that's happening on that box.
35:04
As a developer, then, I deploy my application on that box where I have my application end up on the box and I don't have to know that those metrics have been collected and how. All I care is that I get that feedback.
35:20
So I don't require the developer to be involved with creating that environment. I'm saying that developer has an advantage if he takes those metrics, he looks at those metrics and takes what operation can give him. So there is a collaboration that is possible without requiring either party
35:42
to actually learn the details of how the other side is working. You shouldn't need to know the inner details of how to set up Nagios. That is irrelevant. But if you think, like, for example, in behavioral-driven development
36:01
and writing a test in natural language, that is a good example. The developer doesn't have to learn anything about how Nagios work. He writes his test with cucumber, with rubber framework, and then he passes the test to operation. So there, operations and development, they give to each other,
36:21
they gain from each other, without actually having to learn anything in terms of the underlying details. Does that make sense? No, go ahead. Why doesn't it make sense? Because in the end, from what I've seen, developers live in a different world.
36:40
Can you speak a bit louder, please? Sorry. So from what I've seen, developers live in a different world and they are... Sure, but is that good? No. Right, okay. So if we agree that it's not good, what is the simplest thing that you can do to try and make it work? Try and fix that.
37:01
And from my experience, the simplest thing that you can do, it's kind of to congregate around testing and congregate around common tools. Because the fact that you're using completely different terminology and things, it contributes to the fact that, as a developer, I don't want to know about all that stuff.
37:23
So your idea is that the two worlds of developers and ops can come around exactly around testing, can you use that as an intermediate language? Yeah, you use testing as an excuse, as a common language to talk about what has to eventually happen in production.
37:46
Is it better now? I think there's one there.
38:10
Just a practical question. When you talk about the scripts that you write and the testing that you apply over them and all that, which are the languages that you use,
38:22
which are the tools that you use to unit test those scripts or those pieces of code or whatever you want to call them? Sorry, I didn't catch the beginning of your question. When I spoke about what? When you talk about the code that you write.
38:42
The code graph. Yeah, the code that you write. So in which language do you write that code? Which kind of things does that code do and what problems does it address and how you test it? So which framework or whatever do you use to unit test that
39:03
and even to record those unit tests results? Sure. So I develop mostly in Python. So the tools and things that I use are pretty much all around Python and a bit of Ruby these days, but still. So for the code graph and that kind of thing, I use pycode graph, which works really well in outputs in dot format.
39:26
So you can even graph it with graph bits. That works really well. I mentioned PEP8 and LinkedIn to do the style check on my code. And I used to use Buildbot for the continuous integration.
39:43
I ran in problems configuring Buildbot. And at that time, I couldn't really be bothered with figuring them out. And so I tried Hudson, and Hudson seems to do all it in an easy way. So I switched to that. The other thing that I use, I use KVM on my box.
40:04
There's Python libvert to drive. Libvert is an abstraction on top of KVM, well, on top of every possible known virtualization system, except for the other ones. So from Python, you can drive, for example,
40:21
creating a new virtual machine and then deploying, getting Hudson to deploy code to it and then run all your checks in there. For the behavioral driven development, I use Lactose, which is a Python clone of Cucumber. Robo framework is also very interesting and it's probably more known than Lactose on the big level.
40:46
It is more powerful, but for simpler stuff, I would say Lactose is much more approachable. So I would recommend that, at least to start with. What else is there? To store metrics, I've used a few different things.
41:03
I tried to use SQLite because I didn't need anything big. And it was mostly just me. So I used SQLite to store all the metrics in the beginning. And then for other metrics and system metrics, I used to use RDD tool and then I moved a lot of stuff to Graphite just because it's simpler.
41:24
The problem with RDD tool is that it expects things to be in the exact time series. And if you miss certain time slots, it will give you trouble. Graphite is more tolerant in sending sporadic events,
41:40
which is what kind of you end up doing when you just send metrics, when you do commits and similar things. I think that's about it. That's pretty much all the tools that I'm using. Oh, and for unit testing, I use the built-in unit test framework. So, yeah, the unit test that comes with the standard lib.
42:04
Pytest is cool as well. They have a bunch of advantages. The other thing you might want to look at is TOX. T-O-X, which allows you to set up different environments with even different versions of Python.
42:20
So you can run, like concurrently, you can test on 2.4, 2.5, 2.6, and 2.7 and 3, all of them, and test all your code. And that is also interesting because you can collect metrics. It's another case where metrics can be very useful. You can collect metrics and see how your code performs on different versions of Python. So that can be interesting because you might say,
42:41
well, I focus on my development on 2.6, because I get better performances. Yeah, that's all. And I think we're done. Done. Yes. Done. There are still some more questions.
43:09
No. Are there still some more questions? Otherwise, I would say thank you. There's one more.
43:25
Hi. I have to implement application monitoring in a quite heavy corporate environment. And I'm facing the position of, I would say, or the reluctance of the operations people
43:42
to send so much event, so much data into their system. Do you have any comment on that? So obviously, if you have a lot of data, it's problematic. It can be a pain. All this stuff, I've been doing it mostly for myself,
44:00
so I had a small necessity. I used to deal in the back a couple of jobs ago with a lot of data, and we had like huge MySQL clusters, or HDFS also used that. It works pretty well. There is a project called OpenHDMS,
44:24
which has been launched by the... What's this called? Not Twitter people. D... No, DIG. One of these big companies, now I'm forgetting the name, which is basically RDD. So you had, you get kind of the sort of thing
44:40
that Rand Robin databases do, and an RDD tool does in terms of giving you graphs and all of that and storing metrics, but on HDFS. So that's quite interesting. It gives you a back end. As a file storage, it scales really well, and at the same time, you still have an interface that you would manage,
45:02
like you would kind of normally interface the RDD. The other interesting things that you can do if you want to use sort of RDD-based tools, now RDD supports RDD-Cached-D. RDD-Cached-D caches your reads and writes, and you can chain them, so you can have different boxes
45:25
that store different slices, you sort of partition your data, and then you have sort of a, you ask all the RDD-Cached-D and you cache that, and so you have like one master node in chain to all the others,
45:40
and that kind of makes scaling RDD easier and fairly possible, fairly feasible. Otherwise, it really depends also on what kind of data. I mean, for some things you could think of using, for stuff like this, probably a key value store would work really well, because at the end, really, metrics,
46:02
we're really talking about a label and a value and a point in time. You know, you have three variables, you just have, you know, the point in time. So using something like Redis works really well. Redis and Cassandra and Mongo, depending what you're doing for metrics,
46:21
I would probably use Redis over Mongo. And the thing with Cassandra, it works really nicely, but it comes with kind of a heavy luggage, because you have to get all the thrift and other things coming out of Facebook, which are really nice, but then you have to maintain them. And so Redis, in that sense, is a bit more lightweight.
46:43
Anything else? No. Thank you very much, then.