We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Software Archaeology for Beginners

00:00

Formal Metadata

Title
Software Archaeology for Beginners
Subtitle
Code, Culture and Community
Title of Series
Number of Parts
199
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Most open source projects are rightly proud of their communities, long histories (both measured in time and version control), passionate debates and occasional trolling. Newcomers to these communities often face an uphill battle, though. Not just in understanding decision making processes and community standards, but in coming to terms with often complex, contradictory, and poorly documented code bases. This talk will introduce you to the concepts and tools you need to be an expert code, culture, and community archaeologist and quickly become productive and knowledgeable in an unknown or legacy code base
65
Thumbnail
1:05:10
77
Thumbnail
22:24
78
Thumbnail
26:32
90
115
Thumbnail
41:20
139
Thumbnail
25:17
147
150
Thumbnail
26:18
154
158
161
Thumbnail
51:47
164
Thumbnail
17:38
168
Thumbnail
24:34
176
194
Thumbnail
32:39
195
Thumbnail
34:28
SoftwareProjective planeQuicksortTurnbull, JamesOpen sourceCodeMereologyWaveGoodness of fitMultiplication signGodThermal conductivityTheorySet (mathematics)Machine codeLecture/Conference
MereologyOpen sourceVideo gamePoint (geometry)Mathematical analysisProduct (business)Projective planeNumberQuicksortCovering spaceCodeChecklistView (database)Electronic mailing listEmailOpen setLecture/Conference
Table (information)Reading (process)Goodness of fitMereologyProjective planeInteractive televisionOpen sourceResultantSound effectDifferent (Kate Ryan album)Point (geometry)QuicksortGroup actionCollaborationismMultiplication signSoftware testingPrice indexCodeBit2 (number)BlogEmailFlow separationComputer animation
Projective planeElectronic mailing listDependent and independent variablesReading (process)Goodness of fitEmailQuicksortDisk read-and-write headBitLecture/Conference
QuicksortMechanism designSet (mathematics)TelecommunicationCollaborationismComputer animationLecture/Conference
CodePhysical systemAxiomOpen sourceProjective planeTrailEmailMechanism designTime zoneFormal languageBranch (computer science)GodDemosceneCategory of beingKey (cryptography)Differenz <Mathematik>Software bugQuicksortElectronic mailing listComputer animation
MathematicsMultiplication signProjective planeState of matterTelecommunicationOpen sourceLine (geometry)Message passingTerm (mathematics)Formal languagePosition operatorAreaDependent and independent variablesReflection (mathematics)CollisionRule of inferenceLevel (video gaming)Context awarenessACIDQuicksortNormal (geometry)Patch (Unix)Reading (process)Right angleForm (programming)Content (media)InformationShared memoryResultantData conversionNear-ringException handlingSymmetric matrixSpeech synthesisEmoticonEmailStatement (computer science)Goodness of fitLink (knot theory)Core dumpRadical (chemistry)Point (geometry)BitEvent horizonLecture/Conference
Maxima and minimaMassGame controllerCollaborationismNumberProjective planeOpen sourceOnline helpQuicksortEmailComputer animation
Sheaf (mathematics)BitService (economics)Hand fanProjective planeSoftwareProduct (business)Open sourceLecture/Conference
Data miningProjective planeLine (geometry)Different (Kate Ryan album)QuicksortPhysical systemMultiplication signRevision controlTheoryBitTerm (mathematics)Digital rights managementDependent and independent variablesLecture/Conference
Dependent and independent variablesEuler anglesProjective planeMathematicsQuicksortOpen sourceTelecommunicationAuthorizationTrailSummierbarkeit
Term (mathematics)Projective planeBitPoint (geometry)Electronic mailing listEmailQuicksortMereologyTelecommunicationData conversionPrime idealKernel (computing)Metropolitan area networkMultiplication signLecture/Conference
Lattice (order)Floppy diskMereologySoftwareVolumenvisualisierungPrototypeProjective planeOpen sourceQuicksortComputer animation
Open sourceProjective planeSoftware frameworkWritingRight angleBitLecture/ConferenceJSON
MereologyQuicksortProjective planeMathematicsFormal grammarGroup actionMathematical analysisResultantRight anglePhase transitionPointer (computer programming)Product (business)ParsingLecture/Conference
Software engineeringBitMultiplication signConfiguration spaceSoftwareAreaCodeProjective planeMereologyLevel (video gaming)Open sourceQuicksortOrder (biology)Data structureContext awarenessStudent's t-testComputer animation
Self-organizationOnline helpOpen sourceBlogConfiguration managementPoint (geometry)SummierbarkeitProjective planeQuicksortLecture/Conference
AuthenticationElectronic mailing listEmailQuicksortMultiplication signSoftware engineeringSoftware developerComputer architectureProjective planeDisk read-and-write headMedical imagingGroup actionWhiteboardSystem callPoint (geometry)Category of beingGraph (mathematics)JSONXMLUML
DiagramSign (mathematics)Traffic reportingWordEvent horizonProjective planeQuicksortInteractive televisionCartesian coordinate systemMereologyDatabase transactionForceLecture/ConferenceComputer animation
BitFormal languageIntegrated development environmentLevel (video gaming)Projective planeElectronic program guideProjektiver ModulProduct (business)QuicksortPatch (Unix)Point (geometry)Repository (publishing)RootFrictionComputer fileSoftware developerPublic domainExecution unitPersonal digital assistantLecture/ConferenceComputer animation
System administratorSoftware testingMultiplication signProjective planeSoftware bugBitLecture/ConferenceComputer animation
Projective planeSoftware testingSoftware developerIntegrated development environmentOpen sourceSocial classBitRandomizationQuicksortMereologySystem callSeries (mathematics)Lecture/ConferenceComputer animation
QuicksortOrder (biology)Software testingProjective planeMessage passingAttribute grammarPhysical systemPower (physics)Software bugLecture/Conference
Projective planeSoftware testingSoftware developerDisk read-and-write headUnit testingMultiplication signPhysical systemMereologyOrder (biology)Operator (mathematics)SubsetGoodness of fitExecution unitMixed realityQuicksortOperating systemINTEGRALDifferent (Kate Ryan album)JSON
CodeBitSuite (music)Software developerMatrix (mathematics)MereologyTraffic reportingReal numberProcess (computing)Symmetric matrixGoodness of fitLecture/ConferenceJSONXMLUML
Touch typingQuicksortLine (geometry)CodeDivisorCode refactoringProjective planeMedical imagingMathematicsSeries (mathematics)Similarity (geometry)Metric systemComputer animationLecture/Conference
BitGoodness of fitFrustrationProjective planeCodeComputer animationLecture/Conference
Particle systemMereologyFile formatCodeGoodness of fitLecture/Conference
Open sourceProjective planeIdentifiabilityMultiplication signMathematical analysisQuicksortJSON
Multiplication signReverse engineeringSound effectCodeVisualization (computer graphics)Projective planeBitMatrix (mathematics)MathematicsSet (mathematics)Profil (magazine)Group actionSummierbarkeitOpen sourceComputer animation
Projective planeCodePresentation of a groupExecution unitMathematicsComputer animation
Projective planePatch (Unix)Software maintenanceParameter (computer programming)QuicksortCategory of beingLecture/ConferenceComputer animation
Projective planeIntegrated development environmentBitContinuum hypothesisQuicksortOpen sourceLecture/Conference
Element (mathematics)Projective planeJSONComputer animationLecture/Conference
WebsiteProduct (business)DiagramQuicksortMultiplication signProjective planeSoftwareOpen sourceVapor barrierJSONComputer animationLecture/Conference
Point (geometry)Goodness of fitTexture mappingDiagramPopulation densityExterior algebraDifferent (Kate Ryan album)BitView (database)JSONComputer animation
Point (geometry)Software testingMultiplication signGroup actionInstance (computer science)Lecture/Conference
Transcript: English(auto-generated)
He lives in the Big Apple, so he's having a small jet lag, but he's really enjoying talking for us all. You might know him from Puppet, you might know him from Docker. Have a big warm applause for Mr Turnbull.
Good afternoon. This is Software Archaeology for Beginners, Code, Community and Culture. I'm James Turnbull. As the kind introduction said, you may know me from such open source projects as Puppet and Docker, but I've been in the IT industry for about 20 years and working
with open source for about, I don't know, 10 or 15 of those, it seems like a very long time anyway. I have a funny accent, I'm originally Australian, I think there's a lot of people thankfully here in the room with funny accents, so if I say something you don't understand, or if I speak really fast, you can do what most Americans do and go ahhh, and wave your hand and I'll try and slow down.
So, I'm also reasonably well known for having written seven technical books, what my wife describes as doing things for free, about open source software. I also write really bad code, you can have a look at my GitHub, and every now and again I write rants about various things, usually about technology sort of stuff.
I just finished a book about Logstash last year, and this is, I assure you this is the only self-promotional part of the talk. You should buy it mostly because it pays for my mortgage, and also because Logstash is awesome, and in a long tradition of writing about technologies for companies that I now work for, I'm also writing a book about Docker.
So, who here has contributed to an open source project? Can you put your hand up if you've submitted some code? Wow, I knew I had a fairly captive audience, but I thought I'd try. Who here has ever had a bad experience submitting code to a project? Have you ever been rejected, had someone troll you, or there's a few more reluctant hands there.
Honestly, if you've had a bad experience with your contribution or open source project, put your hand up. Okay, well we'll say it's about half the room and the rest of you are lying. So, those sort of questions are the reasons that I started to think about this talk.
And last year I joined the Docker community, not long after it open sourced, the code was open sourced, and I basically went through what I call my checklist for how to actually become part of a project, and I was really excited about Docker, and I was like, I want to make an impact here, I want to be part of this community, this is part of something
I really want to engage with. And I thought to myself, there's a bunch of things I do that I've never bothered to write down or say anything about that sort of helped me understand how to become part of that community, both from a technical point of view and from a cultural community point of view. And the second aspect of it was, over the last year or so I've been talking to a
bunch of people about their experiences in open source, and a number of people have said to me, it's really lucky I have a thick skin, because otherwise the asshole I first dealt with when I first submitted a question on a mailing list would have scared me off for life. And I think there's far too many of those experiences, I think there are some amazing open source projects out there, I think there are also isolated individuals and islands
of projects where things aren't necessarily so great. And so some of the stuff I'm going to talk about are ways you can avoid that problem, ways you can avoid those issues. I'm going to cover three sort of broad things, I'm going to talk a bit about culture and why that matters, I'm going to talk a bit about community, and I'm
going to talk a bit about code and documentation and tests and how you actually make an impactful contribution to a project. So, several people over the last couple of days have said to me, but I thought you were going to talk about profiling Go code for 40 minutes. And I said no, no, no. There are far smarter people who can do that talk, in fact there are probably some
Googlers here who can put together that talk more awesomely. But more importantly, why would I give a crap about talking about the culture and community stuff? I think the reason it's really critical is that making a good technical contribution to an open source project is sometimes the easier part of the problem, the easier part of
the engagement. Making a contribution to an open source project that is actually well received, that someone is accepted and generates you that sort of trust and collaboration is much harder to do. So that's why I'm going to talk a bit about culture and community. So culture, really quickly, it's the way you think, the way you act, the way you
interact. And it's kind of important to open source, mostly because the way you see the project, the way the project interacts as a group is a pretty good indication of how easy it is for you to be able to get inside the door and also gives you a pretty
good indication of the health of the project. So I guess the reason I start looking at cultural things is I'm interested in a couple of different things. I'm interested in watching how people interact, and a big part of how I start to
engage with the community is I spend a lot of time reading mailing lists, lurking on IRC, reading blog posts, reading documentation, seeing how people in the community interact. And sometimes at this point I go, you know, I found something wrong with this project, and I could submit a patch, I could get involved, but I'm not going to because these people are really trollish.
So it's a pretty good indication to me of how healthy that community is. Most of the time, though, I go, okay, I can overcome this, I can subvert and infiltrate this community, and I can make it a bit better. And the reason I bring this up as the reading and listening side of things is that all too often, and this is something I used to do a lot when I was younger,
is I would find something wrong, and my immediate response would be to fire off an email to the project's mailing list going, I found a broken thing. Aren't I awesome? Here's a fix for the broken thing. And I found that in the past that telling people their stuff is shit and broken generally doesn't endear you to them. It generally doesn't sort of make them feel like, wow, I'm going to welcome this guy
into my community, and he's going to help me out. So I found that a bit of patience, a bit of reading, and a bit of understanding about a community is kind of a valuable asset. And also, I found that if I'm going to, I'll use the word, infiltrate or subvert a community that I feel could be a bit more welcoming or a bit more friendlier,
having a good understanding of how they interact and what's going on is a good idea. And this leads me on to sort of the second aspect of the things I care about, which is generally tend to be community. I describe community as sort of like both the people inside the community and their infrastructure of the community. So the way people interact, the way they collaborate, the tools, all of the sort of
mechanics of the community. And I care about two aspects of this when I'm sort of looking at a new community. I look at how the community communicates and look at how it collaborates. And I found that over the years of watching people come into new communities,
I found that there's a couple of really key mistakes people make. The first one is communities that tend to be sort of solidified around the tools they use and the way they communicate. Some communities are very email centric. And in open source, this is particularly true. We're often distributed across the world. We often speak different languages.
We often have different time zones. So we're very email centric culture. Same with IRC. And I found people, if you swing by a project and you go, a good example here is I submit a unified diff attached to an email to the mail that's going, here's a fix for a problem. And that project is centered around GitHub or around a ticket tracking system, then more
often than not, someone is not going to take that contribution and go, I'm going to incorporate this in the code base. I'm going to go to the trouble of creating a branch and pulling this in. So the first lesson I always tell people is use the mechanisms the project uses. And realistically, you're not always going to like those mechanisms. Like get a bunch of engineers in a room and ask them which bug tracking system they
hate the most. Seriously, you can't get engineers in a room to ask them which bug tracking system they like the most. It's just not, it doesn't happen. The other thing I found, something that I wasn't very good at when I was younger, is asking questions is really important, but more importantly, asking the right questions in
the right way. One of the things I dread when I wake up in the mornings is I've got like a backlog of a couple of hundred emails in various mailing lists that I care about. And I see a question that says, please tell me how to how to build, you know, solve some massive problem, like using using using Docker or Puppet or something else.
And I look at it and go, OK, I'm going to respond to this email and point this person at the documentation. I'm going to maybe point them at some specific links. But it's a pretty frustrating, it's one of those sort of sigh, here we go again. So learn to ask good questions. And there's plenty, if you Google the term, ask a good question, there's four or five
references out there that give some pretty good advice about this. But ask contextual questions. So ask questions that say, I have a problem in this context and it looks, you know, and this is the sort of answer I'm looking for, can you help me? That is going to be a far better received communication than please do my homework for me. A lot of people who send that email do not realize that's what the statement
they're making is, please do my homework for me. When in reality, they're actually asking for fairly specific help, they just don't know where to go. And that's often a reflection of a project's poor documentation or it's hard to navigate or understand what's going on. But overall, if you ask, you can enhance the quality of that communication by asking
better questions. The other thing is overshare. I remember I submitted a patch very early on to an approach I was working on that changed the behavior of something pretty important. And I thought it made perfect sense to me and I had a commit message that was terrible. It was like one line and, you know, naturally I got a little bit crucified by the
project. They were not very amused by my change, radical change in the core behavior of this particular application, because I hadn't explained why or why I cared. And I find that we're all busy people, but throwing a couple of paragraphs of text together saying, I found this, I think we should change this
behavior in this context for these reasons. Here's the pros and cons. Provide the sort of amount of content that's going to ignite some sort of discussion. I guarantee you it's very hard to be trollish to somebody who is coming in with like, I found a problem, here's some data around the problem, here's some metrics, here's some information, here's a solution I proposed,
and here's some pros and cons. It's very hard to say to someone, you know, or you can still say to someone, you know, something fairly trollish like, you know, fuck off, we're not changing that. But you come across looking like a real douchebag if you do that. And as humans, we're not very well wired to, you know, people
don't actually want to look like douchebags most of the time, certain notable exceptions, mostly politicians. If you actually articulate your question and share all of that information, you're going to encourage that community to respond to you in sort of a positive way. They may not be happy about your change, but you engage in actual conversation versus a one line, you know, or neither ignoring you or
one line response. And this last thing, be polite and be upbeat. I cannot, I guess, emphasize enough this approach. And the reason I can't emphasize enough is that we, as I said earlier, we tend to use mediums like email that are not very
full of emotion. It's very hard to determine whether someone, how emotional someone is in an email, emoticons aside, IRC is the same way. They don't have a lot of content associated with, a lot of contextual feeling about them. So you don't know whether someone's being sarcastic.
Someone may not speak the same level of English as you. Someone may not speak English at all, might be their third language. I'm always embarrassed at an event like FOSDEM when most people in the room tend to speak four languages and I can barely speak one. So I find it's really important that if you're going to communicate something, be polite, be positive and be upbeat.
And there's a couple, the first reason obviously there is that you want to try and provide that sort of like I am, this is a positive contribution. But the second thing is it makes you look better and it makes you feel better about your communication. I talked earlier about your people having trollish responses. It's very hard for someone to be a troll back to you if you keep
being positive and upbeat. And my colleague Solomon Hykes describes this as killing people with kindness. Because you will look like an idiot if you continue to be like, I fucking hate you. Why have you had this stupid question? Whereas somebody is like, thanks very much for responding. That's really awesome. I don't quite understand why this is an issue. Here's what I've got. Thanks again so much.
It's really awesome. I do that all the time, even when I look at somebody's communication and go, wow, that person's are really being awful to me. But by responding in that way, by responding in that sort of upbeat, sort of positive way, I actually subvert their communications means. They isolate themselves. And what I want to do is I want to see a lot more open source communities
isolate those people who are trollish. I look at this like I'm from the Pacific sort of region. I look at this like there's a lot of little tiny Pacific islands and there's a lot of rising seawater. The positive, polite, happy people are the rising seawater. Guarantee you they're going to win. Those islands are going to drown.
Not, it's probably bad for the islands. I apologize for that. But generally speaking, as a metaphor, it's a good one. And collaboration, I find that lots of little things like contributing help on the mailing list, answering people's questions, submitting lots of little pull requests,
little things have big impacts in open source projects and sending one thing through, sending one thing over, saying like, there's a massive, my massive contribution to the project, I've refactored this entire thing, is often not as valuable as sending, as being seen to be a contributor to a large number of things
and earning those sort of trust through those little steps. Now, this is a section that's probably going to earn me some controversy. I'm not a big fan of telling people things are fucked. I really, and by the way, I'm Australian. I tend to swear a little bit. It's OK here, I think.
But in America, I get in a lot of trouble. But I'm not a big person who is very negative. Like, I hate all software, yes, but things in communities and things in open source projects, I tend not to approach things like, this is fucked. How do we, you know, I tend to look at things as like, here's something that I can improve. Here's something that I can look at and I can leave it
a bit better than I found it. So if I look at something in a project, and I was having a discussion last night with some friends of mine who work on the puppet project, and the puppet just moved from Red Mine to Jira. And I talked earlier about no one liking ticket management systems. So we went from a room of people who went, you know, I really hate Jira.
It's a piece of crap. And I'm like, yeah, but, you know, so is RT and so is Red Mine. And everything's kind of a piece of crap. It's a different sort of piece of crap. You know, I think that's a, you know, I prefer to look at it in terms of like, there's an opportunity here, like, how do we make this a bit better rather than I'm going to be completely negative and down on this whole sort of project.
This is not to say you shouldn't be sort of, you know, you shouldn't respond to negative things and that you should sort of avoid conflict. But there's a way to engage positively with people, a way to engage in such a way that you leave things a bit better than you found them. And hopefully you educate someone a bit better about how maybe they could deal
better next time with a problem. And I describe this as something my mother used to say, you catch more flies with vinegar than honey. We're certainly catching more flies with honey than vinegar. And this is a sort of a, you know, I'm not sure whether this is scientifically true, I sure hope it is. But you can change people's attitudes and engage with people a lot more
strongly if you do it in such a way that it's not an aggressive response. There's a lot of people out there who would argue that there are some open source projects you really want to hit with a big stick. I'm not sure I like that approach.
I think sometimes you should definitely call out bad behavior, but don't do it in such a way that you're being aggressive or you're going, you know, like you're an asshole is not a way to call out bad behavior. Here's why I didn't like this communication. Here's why this was disrespectful. Here's the way we address this better. And the example I use here all the time, I think the Linux kernel
mailing list is a prime example of this. I would describe some of the conversations in there as robust. That's a very diplomatic way of saying, wow, I wouldn't fucking join that community. And obviously there's a very high technical bar there. But I also look at that and go, a lot of this communication lacks any semantic value, like insulting one another or being aggressive or vicious or unpleasant.
To be honest, I'm like, I don't want to be part of that. I don't want to get up in the morning, have to gird my sort of spirits to read the mailing list of things because people are like that. I don't want to be part of that. And some other people have been far more articulate than me about that community. Sarah Sharp, for example, you know, and I think that they're going to struggle
to grow beyond a certain point, to scale beyond a certain point. And you see a little bit of that in some subsystems because of that behavior. Now, I don't really mind that Linux can run the project any way he likes. I personally don't like that approach. And I tend to look at projects in terms of like, how can I positively impact that?
How can I make that better? And I'm going to keep doing that where I can. So on to the meaty part of the thing, more meaty, perhaps not so much touchy-feely. I have an approach I take to learning how to contribute to an open source project.
And this is probably more clearly the software archaeology piece. I guess someone described the earlier part of this talk to me yesterday as software anthropology. So this is more the archaeology piece. And I describe it as the fine art of software dumpster diving. And those of you who may remember dumpster diving, it's where you dive into a dumpster to find, you know, old DVDs and floppy disks and documentation, stuff like that.
It was sort of a software hacking thing back in the day. I'm very old, so I remember this. So these are the things I look at when I first start to look at an open source project. I'm going to talk about each of these individually in a bit more detail. But the first thing I do is I read all the documentation.
And this is probably the first thing that everyone encounters when they start an open source project. And there's been a dozen really great talks on how to write awesome documentation. But one of the things that I always found really strange about that is that people talk about documentation frameworks and automation and stuff like that. I find there's some very simple things you can do to add a huge amount of impact to documentation.
And one of the things that I look at is, I tend to come across a new project and I'm like, I'm reading their install docs, or I'm reading their contributing guide, or something like that. And I start editing on the fly. Like, I write books, so my English is hopefully pretty good. My spoken English is obviously Australian, so it's terrible. But my written English is pretty good.
So I tend to edit things on the fly. And I find that small, subtle changes like improving spelling and grammar. And we had a previous talk just about Wikipedia. There's nothing more grating than trying to have to stop your understanding of some technology, because you're trying to understand the documentation. Occasionally, I'm looking at a sentence and I'm going, I can't parse this sentence. It doesn't make sense to me.
And as a result, I completely stop any sort of analysis of the technical part as I go, if I turn it that way, does it make more sense? So little subtle things like, I'll one parse through, and you'll pretty much, if you looked at my GitHub commit history for new projects, I would say 80% of my commits, first commits. Focused on the installation documentation,
and they probably include a bunch of grammar and spelling fixes. I find that that sort of stuff, little things, big impact on a project. If someone finds the documentation easy to consume, particularly the first piece of documentation they see easy to consume, then they're likely to embrace the product far faster.
So I take this approach for every part of the documentation, like every time I do a new thing, like I'm trying to configure something or work something out, I look at that bit of documentation and I rewrite it or I edit it. And it's not a time-consuming process, like editing a document for me is quite quick. And it's also a thing you don't need to have any actual software engineering experience,
like any basic, a fairly basic level of technical skill in order to actually do this. So it means that it opens this opportunity up to a bunch of people who are not necessarily confident enough to contribute to the code base. And then there's a much broader sort of rewrite, like I think a lot of software, a lot of open-source software projects
suffer from what I call a lack of narrative. And if I can explain this, if you read the documentation, it tends to be disjointed islands of things, like here's a bit of somewhere where somebody has cared enough about to write a document, and here's a bit over here about this thing. If you look at, say, a good example might be like a lot of infrastructure bits of software,
like Puppet, there are some topics that people come across all the time where there is some detailed level of documentation. And this is, I guess, going back in the day, documentation is very different these days. And there are other areas where there's no documentation. I was talking about Zookeeper to someone the other day. Some of you may know Zookeeper. It's a configuration management data store.
If you try and find help documentation and blog posts and how-tos for Zookeeper, there's a handful of them online because it's not something that people mess with all that much. Beyond the basic steps, it's not something that people put a lot of effort into.
Or it's knowledge that lives inside an organization. If you actually change that story, if you actually say, what would the documentation look like if I wanted to cover an actual story, like going from, and this is the way I also write books, is I say, how do you start at the beginning? What's the beginning of the story? What's the problem I care about? And then how do I build on that? Like, how do I install the basic thing?
How do I install the basic thing and then expand its use over here? How do I do all of these things until I get to the point of more complex, advanced things? And you lay out a whole story. That's a non-trivial investment in a project, but you can do lots of little things that join things together, that create more of that, here's a getting started, all the way up to being an advanced user story.
And I think that the problem with a lot of open source documentation is it lacks that narrative. This is another thing that until recently I hadn't really thought about very much, but as software engineers a lot of us spend a lot of time on whiteboards drawing pictures. I have never encountered a group of software developers
when they're faced with a whiteboard, literally they can't sit in a room without attempting to draw it at some point. And the reason I find this really interesting is almost none of those drawings ever end up in the documentation or in the project. Like, occasionally you'll find someone's drawn something, an omni-graphic picture or an image magic picture, whatever, but they rarely appear,
like all of this stuff we have in our heads about how something operates rarely appears. And I found that a really fast way to understand how a project works is to draw some pictures. And this can range from an architecture drawing, I look at Puppet for example, I look at the sort of questions that are asked on the mailing list back when we first started the project, and there are a couple of really cool questions that are asked all the time.
One was, how does Puppet's authentication work? It uses SSL and I'll be able to understand how SSL works. A few simple diagrams would have probably saved us a shit ton of support questions. Same with things like, what ports does it use? A few simple diagrams save yourself paragraphs and paragraphs of documentation.
There's an English saying, a picture shows a thousand words, I think this is very true. Same with subsystems and things like that. A complex part of your application or your project, like the interaction between some transaction, or how an event flows through, or how logging works, all that sort of stuff could be simplified and made very clear to someone if you draw some pictures.
I've got a bunch of pictures I drew for the Logstash project that I haven't managed to get into the documentation yet, that I think make it a lot easier to understand how a log event goes through Logstash. And humans consume this stuff a lot easier than they do words. And particularly, if you don't speak English as a first language and all the documentation is in English,
then a picture clarifies a bunch of things for you. And I think this is something that not enough projects do and not enough people contribute, and anyone can do it. It's relatively easy to achieve. This is something that we drew early on, it's like, what are the basics of the Docker system? This helped me a fair bit when I first tried to understand the sort of workflow.
And I'm always surprised that more projects don't do this. The next thing I find really interesting is that not a lot of projects document their development requirements and their contribution well. Pretty much every project has something in there, maybe there's a contrib or a contributing file
in the root of the Git repository or whatever it happens to be. But it's often not really a very detailed document, not a very clear document. And this is the entry point for anyone who wants to contribute a patch to your project. This is the way that they're going to see, the first thing they're going to see is like, how do I do this?
So you should make this document incredibly simple and incredibly easy to use. So the first thing I tend to do after I've sort of had to read the documentation and actually use the actual product is I tend to go, alright, let's see if I can build that development environment and then document it. And then contribute that back upstream to the project.
I love watching when somebody just says, I have this problem, I want to contribute a patch, I think I know how the solution works, how do I do that? And you can refer them to a really simple guide that explains how everything works and how they can avoid a bunch of the missteps, they can avoid the project uses a DCO or something like that. You submit it with one.
You can reduce the level of friction involved in them contributing. So those of you who probably know me, I'm a sysadmin so I'm a bit skeptical about tests. And it took me a long time to be persuaded of the value of them because I was a bit of an idiot.
But I found that one of the really interesting ways to engage with a project user is actually run the tests. And a couple of things you learn from this. The first one is that if there are broken tests, a really, really fast way to become very popular in a project is to fix the broken tests. Because no one wants to. Seriously.
If the tests are broken, it's because someone's looked at it and gone, I'll get to that. This feature is more important or this bug fix is more important. If you fix broken tests, you'll be a very, very popular person in a project. I kid you not, some of the contributions I've seen to projects where someone has said
the build now actually runs in the development environment we claim it does, and the tests all pass, and everyone goes, woo-hoo, first time ever! And another aspect of this, sort of related to this, is what do the tests cover? So I did this random crisscross on top of Docker.
I cheated a little bit by only choosing a little bit of Docker. You can visualize this, it shows you how much test coverage there is, how much of the methods and classes inside a project actually have test coverage. Wow, do we suck in Docker. 10%, that's terrible, that's not true, but I cheated a little bit. But there are lots of open source projects out there where there are really complex parts of them
where the test coverage sucks. If one of your first commits is adding a bit of test coverage to a subsystem that's previously not tested, that goes enormously long way to A, giving you an understanding of how that project works. You have to understand how that subsystem works in order to test it, but it will give you a huge amount of trust and credibility in that,
because no one else is going to contribute that. People are always keen to contribute features. Everyone goes, wow, that awesome new feature. Somebody who contributes tests, somebody who improves the quality of a project, that contributor, I'm like, I want 20 of those guys. I'm happy to have features are awesome.
I'm happy to have that, but people who fix bugs and contribute tests, I'm like, you people are awesome and I will give you t-shirts, I will buy you drinks. So anyone who contributes some test coverage to Docker, by the way, drinks are on me. Has to be a good test, has to pass. But that's a really powerful sort of message to send.
Tests also represent, for a lot of projects, the only documentation some parts of the system have. And I find that being able to understand how the tests work, and it took me a long time to get my head around, particularly in Puppet, how testing works. We had a mix of test unit and RSpec, and I was like, what makes a good acceptance test?
Why is this acceptance test different from an integration test? What's a unit test mean? Because I came from an operations system in background where these things are not typically something you do. In order to become a better developer, working out how the test worked and working out what they tested and why they tested it and the things that those tests cared about made me a better developer.
Well, it made me a developer of sorts. We'll judge how good a developer I actually am. The other aspect of this that I found really interesting that I've started doing recently is there are a bunch of really great metric suites out there that tell you a bit about the code base. And the reason that I find this really interesting is that if you're in the Ruby level,
this is like metric foo, they produce reports, and the reports reveal very interesting things. And the things they reveal tend to be code smells and trouble subsystems. And obviously there's two reasons why I care about this. The first one is that obviously this is probably a shit part of the code and someone probably should look at it.
The second thing is the people that fix the shit part of the code, they're very popular in a project because no one else wants to touch that code smell. If that code smell has been there for more than, say, three months, no one is ever going to fix it. If you fix one of those code smells, if you run metric the other time, you find the fact that there's this method and it's got, you know, it's 4,000 lines long.
If you refactor that method and break it into an appropriate sort of, refactor it into something appropriate, that is a very, there's a serious amount of kudos attached with that particular thing. No feature in there, no massive behavioral change, but you've managed to improve something, you've managed to improve the quality of the project. So I find running the metrics over the top of things, and in a similar note, style and linting.
Again, little things, seems like a very small thing, can have a huge impact on a project. Think about the style of a project. One of the things that people find a bit frustrating about, I think Puppet's a good example of this, is that occasionally the style is a little bit hard to understand.
It very heavily uses Ruby metaprogramming example. If you're a newcomer to that, and before the engineering team at Puppet Labs built a bunch of comments into the Puppet code base, it was very hard to understand. I find that a lot of little things like improving the running style checkers
in the Go world, you've got Go tool and Go format, making it easier for people to understand that code, adding code comments, identifying things where somebody has used some pep8 in Python. Python is another good example of this. All of that stuff lowers the bar to entry.
It means that the next person who comes along is probably going to find it easier to contribute to that project. This comes back to the sentiment I said earlier about leaving things slightly better than you found them. Lastly, and this is probably the hardest thing to do, but something that is really interestingly valuable to a project is profiling a performance.
As a sysadmin, I spend a lot of time doing performance analysis, and I'm quite good at it. So I find that I actually have a bunch of useful skills to contribute to an open source project and identify places in that project where potentially there is something bad, like the performance sucks, or there's a bottleneck, or there's something odd going on.
I do a lot of this sort of profiling work to say, where are we going wrong? What's happening? Somebody's complaining about the fact that it takes a long time to do this particular action. I find this performance profiling data, it's often tied to code smells, and if you run things like metrics, style, and linting and profiling,
you'll probably find the same subsystems come up again and again and again. By making small changes across those subsystems, by making them a little bit easier to understand, better documented, even visualizations, better style, better code comments, better refactoring, not only are you likely to improve or make some of those performance problems go away,
but you're also likely to expose those a little bit more clearly for someone to say, ah, there's a project I can take on and refactor. So, how are we doing for time? I have not got much. So what have we learnt?
If you want to contribute to an open source project, if you want to make an open source project better, you've got to appreciate the fact that they are more than just their code. The culture of the project, the community, how it interacts, all things you need to take notice of, and all things you need to improve on. As I said, the concept of little things make big impacts, communicate well, make change.
By the way, I have yet to provide a presentation where there wasn't a cute kitten in it. Apparently this is a requirement. It's a very cute kitten too. Little things, big impacts, small changes to documentation, making contributions easy to do, making people more accepting and open to contributions.
And leave things better than you found them. So, if you come across a project, you want to contribute something, do it in the right way, do it in such a way that you've added some value to that project instead of, and it resulted in an argument or a fight. It's not hard to do. Positive, upbeat, friendly contributions,
make people in that project happy to see you, happy to see your contributions. Make somebody a project maintainer's day by going, someone has submitted a patch or a pull request or some updated documentation that requires me just to merge it and it makes my project more awesome.
That person is going to be happier for that. They're going to respond to the next contributor in a better way and you're going to make that community a much friendlier place. Now, I know I sound like a total hippie. Like I know I sound a bit like, you know, wow, freedom and happiness and joy. But I want to work in that sort of environment. I want to work in an environment where I contribute to projects where people go,
that's awesome, where as somebody else in a sort of open source community, I know Fred says quite well, make more awesome. Don't make more assholes. Make more awesome. So, questions?
There was one hand that went up. I stunned you into silence. That's awesome. Sorry, I'll give you the microphone.
Thanks. I've got a question about the flip side. So this is about how to, you've given a great talk about how to get involved in a project. What about if you're trying to start off a project to get more people involved? Because you can have great documentation, have great diagrams and stuff on the site, but that doesn't necessarily encourage people to come back and then contribute because it does make it easier for them to go off and use your stuff, which you do want, but they don't necessarily come back and then start contributing more.
Do you have any thoughts about that? I think when I think about starting an open source project, I look at all the projects, I go, what are these projects that I love that have things that I love about them? It's really easy to contribute. The barrier entry is really low. And I just copy it all. I literally take the way they've done it.
I look at a project like, I love the way the Docker project does contributions. I look at that and I would steal that entire methodology. You know, stuff like that. Find a project you respect and admire and just steal their methodology. Hello, thanks for the brilliant talk.
Just more a comment than a question. You said that about diagrams, you said that humans tend to consume diagrams more than documentation, textual documentation.
I would have said it a bit differently. I would have said that sighted people tend to consume diagrams. So for people who do diagrams, I think it's good, but it's even better if there is a textual alternative to diagrams.
I agree. From an accessibility point of view, you wouldn't ever only have diagrams, you need documentation to supplement that. I agree. From an accessibility point of view, you should cover both. It's a good point. Please, you were talking about tests,
and some people come and fixing them and whatever, but shouldn't the build be failed if the tests are failed? Sorry, I didn't hear the question. If people have questions, please come down. I think people are going in and out. It's too hard to answer the questions as a group. Thank you very much for your time. Enjoy the rest of the conference.
Bye.