We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Open source is just about the source, isn't it?

00:00

Formal Metadata

Title
Open source is just about the source, isn't it?
Title of Series
Number of Parts
84
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Your project's code base is rock solid, you are rolling releases early and often, your test suite is comprehensive and running regularly, your code is well performing without any glitches. Everything is in place that defines a successful open source project - or isn't it? This talk tries to highlight some of the key questions software developers will quickly be faced with when dealing with open source: In addition to coding skills, topics like people management, naming, trademark enforcement, licensing, patents, pr and more become topics to deal with.
Open sourcePresentation of a groupRoundness (object)Software engineeringGoodness of fitOpen setComplex (psychology)Sanitary sewerElasticity (physics)Task (computing)XMLUMLLecture/Conference
Patch (Unix)CausalityMultiplication signElasticity (physics)QuicksortOpen sourcePresentation of a groupBus (computing)Cycle (graph theory)Element (mathematics)Hand fanProcess (computing)1 (number)Revision controlUser interfaceData storage deviceDatabaseScalabilityInformation securityMereologyWordOrder (biology)Right angleBlogSpeech synthesisInformation technology consultingFlow separationMathematicsGoodness of fitComputer animation
Open sourceMachine codeFocus (optics)Physical lawLibrary (computing)CASE <Informatik>Decision theoryControl flowInstance (computer science)Game controller1 (number)Different (Kate Ryan album)BuildingSoftwareSimilarity (geometry)Endliche ModelltheorieDivisorStandard deviationAxiom of choiceSet (mathematics)Server (computing)Product (business)Direction (geometry)Fiber bundleDerivation (linguistics)Point (geometry)Theory of relativityComputer configurationQuicksortIdeal (ethics)Revision controlNear-ringComputer animation
SoftwareTheoryExistenceInternetworkingPoint (geometry)ArmHacker (term)Copyright infringementPatch (Unix)Rule of inferenceProduct (business)Open sourceMatching (graph theory)WordCellular automaton1 (number)Decision theorySimilarity (geometry)Goodness of fitAttribute grammarSearch engine (computing)InferenceEndliche ModelltheorieFeedbackOffice suiteScaling (geometry)EmailLie groupComputer animation
Order (biology)Multiplication signGoodness of fitSpeech synthesisSummierbarkeitCodeOpen sourceSingle-precision floating-point formatCASE <Informatik>Web pageInstance (computer science)InformationFeedbackReading (process)Presentation of a groupElectronic mailing listBookmark (World Wide Web)Real numberStudent's t-testInheritance (object-oriented programming)Point (geometry)TwitterMereologyFlow separationWritingSoftwareMusical ensembleType theoryOnline helpCivil engineeringSubsetPhase transitionSpacetimeResultantSocial classSoftware developerVideo gameTrailWebsiteHypermediaTerm (mathematics)
AlgorithmEmailElectronic mailing listTelecommunicationCASE <Informatik>Server (computing)WebsiteOnline helpDebuggerWeb 2.0Patch (Unix)Flow separationSide channel attackProcess (computing)Formal languageVirtual machineMultiplication signDirection (geometry)NP-hardLevel (video gaming)Point (geometry)ImplementationBuffer overflowBenchmarkStack (abstract data type)FrustrationCodeWritingMassGoodness of fitSoftware bugSystem callField (computer science)Software engineeringWeb browserRight angleData conversionSound effectProxy serverWeightSampling (statistics)Installation artUser interfaceDataflowBuildingSpeech synthesisStrategy gameCuboidTrailGroup actionScripting languageDampingMedical imagingMoment (mathematics)Theory of relativityMetropolitan area networkScaling (geometry)Interface (computing)Software testingInternet forumVarianceEndliche ModelltheorieElasticity (physics)Computer animation
Differenz <Mathematik>Patch (Unix)Real-time operating systemOpen sourceMathematicsLocal ringFeedbackOnline helpTrailRight angleMultiplication signContext awarenessPoint (geometry)Cycle (graph theory)Software testingRule of inferenceTelecommunicationMachine visionBlogDependent and independent variablesCognitionRevision controlLaptopVariable (mathematics)Control flowConfiguration spaceVirtual machineTask (computing)Ocean currentBuildingSoftware development kitSoftware developerWebsiteSpeech synthesisCore dumpOrder (biology)CollaborationismCodeMachine codePC CardGame controllerElectronic mailing listMessage passingOffice suiteInsertion lossTape driveWritingBlock (periodic table)SubsetCASE <Informatik>Solid geometryDifferent (Kate Ryan album)Process (computing)Bus (computing)WordMultiplication
Multiplication signPatch (Unix)YouTubeScaling (geometry)Row (database)Presentation of a groupLine (geometry)EmailCanonical ensembleTask (computing)Revision controlVideoconferencingState of matterLattice (order)Data managementView (database)Uniform resource locatorRight angleTelecommunicationFraction (mathematics)Mobile appRule of inferenceStrategy gameMathematicsSoftware bugWeb pageHypermediaElectronic mailing listWikiLaptopMassWeb 2.0File archiverFood energyConnected spaceGroup actionHost Identity ProtocolQueue (abstract data type)Client (computing)Formal languageGoodness of fitNeuroinformatikFlow separationDecision theoryInternetworkingSemiconductor memoryOcean currentPC CardOnline chatSpacetimeProduct (business)Computer configurationWave packetLevel (video gaming)Ideal (ethics)MultiplicationData structureWordLoginControl flowSubsetMeasurementOnline helpObject (grammar)Point (geometry)Design by contractDivisorPhysical systemComputer animationPanel painting
FeedbackEvent horizonSelectivity (electronic)Semiconductor memoryMultiplication signOpen sourceMathematicsFerry CorstenPhysical systemMereologyArithmetic meanComputer animation
Open sourceOnlinecommunityBitEvent horizonPhysical systemPerspective (visual)DistanceEmailData managementQuicksortNumberRight angleObservational studyTelecommunicationBuildingElectronic mailing listCASE <Informatik>Row (database)Open setIntegerKeyboard shortcutOnline helpUtility softwareCombinational logicMoment (mathematics)MomentumWebsiteDependent and independent variablesExtension (kinesiology)ResultantElectronic data processingContext awarenessGradientWritingLevel (video gaming)Scheduling (computing)Lecture/Conference
Source codeProgramming languageFraunhofer-Institut für Informations- und DatenverarbeitungSoftware developerFraction (mathematics)CognitionJava appletMultiplication sign1 (number)NumberBoundary value problemDifferent (Kate Ryan album)Data managementOpen sourceDemosceneKernel (computing)Power (physics)Axiom of choiceGoogolBuildingIntegrated development environmentVirtual machineInsertion lossException handlingGame controllerProcedural programmingWebsiteSpacetimeAtomic numberProcess (computing)Moment (mathematics)Line (geometry)Machine codePattern languageSeries (mathematics)Machine learningForm (programming)Rule of inferenceOffice suiteCubeSet (mathematics)Physical systemService (economics)Link (knot theory)Universe (mathematics)Data structureLemniscate of BernoulliLecture/Conference
Lecture/ConferenceComputer animation
Transcript: English(auto-generated)
So, our next presentation covers all the complex, maybe complex tasks, open source projects face that are beyond pure code, and I'm very happy to introduce Isabelle for the presentation
and please give a warm round of applause for her. Okay, good morning. I'm going to continue in English because I know at least one person here doesn't speak
German well enough to follow the presentation. If you don't speak English, sorry. So this is going to be about open source and how it's not just about source. How am I going to tell you something about what open source is all about? I'm a software engineer at Elasticsearch, we do lots of open source, we've got Logstash,
we've got Kibana, we've got Beats, and Elasticsearch Core, of course, which are all Apache licensed. Apart from that, I happen to be director of the Apache Software Foundation. I'm co-founder of Apache Mahout. Would you raise your hand if you know the project?
One, two, three. I want to talk to you why you know it after the presentation. Apart from that, I'm co-founder of Berlin Bus Words, it's a conference on all things scalable and storage, which happens in Berlin. So if you need an excuse to make your employer pay for a trip in June, like sunny, nice,
to Berlin, go to this conference. Okay. In order to wake you up, how many of you are running their own open source project? About a little more than half, okay. How many of you have ever contributed?
Pretty nearly everyone, okay. Speaking of contributions, anyone who wrote about an open source project in their blogs, publications, press articles, whatever, okay, good. Did you ever help other users getting started with open source?
Come on, I want to see your hands. I don't do this meetup trick of handing out microphones and asking questions. Okay, nice, nearly everyone. How many of you are using open source in your day job? Keep your hands up if you contribute to open source as part of your day job. Nearly half of everyone, kind of, sort of-ish.
Okay. How many of you are using open source in their spare time? Yes, nearly everyone, good. Okay. So why should all of you care about what open source is about apart from the technology
behind it? Let me tell you a story. So I convinced my mom to use Ubuntu several years ago. You remember this user interface that they used very early on and then switched to Unity which looked completely different? So totally screwed up my mom. I would totally would-
So she's not the only one, awesome. She's also a fan of Shotwell. So after missing a few upgrade cycles, we suddenly had to go from one version to the other except database schema wasn't compatible anymore. I was really, really happy to have my husband to dig through these database entries and convert to the new schema, jumping several versions.
So it was one day lost, mommy's still happy and I'm happy that I have my husband who can do that. So if you do use open source in your spare time, you definitely want to know how it works so that you can talk to the right people in order to fix your problems or that you can fix these problems yourself.
If you're- I've seen quite a few of you who are using open source as part of your day job. Essentially, this boils down to betting your business on an external dependency. What happens if this project stops receiving any security updates?
What if you need a tiny little change to the project to make it work for you, but you don't have the time and skills to do that yourself? Can you motivate the project to do it or can you motivate a consultant to do that for you? And what happens to this patch afterward? How does it get applied to upcoming new versions?
So I would suggest that even if you are not building open source yourself but only using it, you still want to understand how these projects work if you're betting your business on it so you know what's going on. Last but not least, I've seen a few of you who have raised your hands when I asked whether you run an open source project.
When you get started, coding probably is like the top most thing you want to do and you want to focus on. There are a few things to keep in mind even when you start out. The first thing that you want to think about when starting an open source project, in my personal opinion, is to think about what your goals are with doing that.
Do you want to build a business around that software that breaks into an existing market by changing the economics? That goal might have an influence on your decision with respect to licensing, for instance.
Do you want to collaborate with others who've got the same need as yourself to fix the problems that you have and that others may have so that you don't have to do all the work that may require a different community model, for instance? Or do you simply just want to build up your CV? Do you want to build up your reputation and skill set that also may decide how you run
your project? Essentially, it boils down to how much control do you want to exercise personally, versus how robust should that project be. If you would want to build a company around that project, my personal take would be that you probably want to control the direction of that project.
If that at some point is supposed to be a product, you don't want to give up that control. If however this is like, I want to build that thing, but I want to collaborate with others to build something that's bigger than what I could achieve, then you would want to think about how robust that project should be, how easy it should be for people to
contribute, and how interesting it should be for people to contribute. Okay, now, what factors do I think about that are not code? Let's go for the easy ones. These are just the legal ones. You want to think about copyright, patents, and you want to think about trademarking.
Let's focus on copyright first, because this is like kind of sort of trivial. This is inspired by a post that was published at GNU.org. Essentially what you want to decide first is, do I care about any and all of my downstream
users, including those that use derivative versions of my software? In that case, you go for a copyleft open source license. If you do libraries, especially if there are other libraries around that do similar stuff, go for lgpl so that people decide to use your library.
If you go for server software, there's a huge wealth of discussion around lgpl, but if you go for something like stuff that's going to be hosted somewhere anyways, where users typically won't run it on their own, you want to at least take a look at the lgpl because they said something for you.
For everything else, go for lgpl and you're pretty much set to go. There's the other option, I only want to ensure that those users get the four freedoms that use my very own project. That's the ideal case for non-copyleft license.
When do you have something like that? If you build something that's teeny-tinyish and you don't care about license enforcement anyway, go for non-copyleft to begin with, because if you don't enforce it, then there's no real reason. If you have libraries that pushes the standard forward that's near and dear to your heart
but which isn't widespread yet, go for some non-copyleft license. And again, if you want to have a project that changes established economics, non-copyleft, especially Apache software licenses, well established among businesses, so if you want
to drive some competitor out of the market, my personal choice would be non-copyleft, probably Apache licensed. So much for easy stuff. Software patents, sorry, I'm not going down this dragon hole, if you want to talk software
patents, go out there, they do great work countering software patents, ask them about what this is all about. Trademarking. Why should you care about trademarking? First of all, what makes a good non-infringing project name?
When you build your first open source project, you don't want the company to go after you because you are infringing their trademark. Even worse, you don't want another open source project going after you for infringing their project name. Just a little anecdote, how many of you know the name Hadoop?
Keep your hands up if you know the story behind it. The story behind it is that the child of the project's founder had a little stuffed toy elephant that was called Hadoop. They're cutting one's coins a sentence that children are very good at coming up with
non-infringing project names. So that's how Hadoop came into existence, it's also how Nutch, if you are searching for an internet scale search engine, nowadays it's more like a crawler, like an internet crawler, go and have a look at Nutch.
Nutch used to be the first word that this project founder's child ever said. So if you have children, make a little note of all the words that they invent. Might be useful one day. Okay now you've got a non-infringing great name. Why should you continue to care?
Well only if you really take care of your trademark, it remains a trademark or if you register it. So you have to decide if it's okay for people to sell copies of your software on eBay without mentioning where that software actually comes from, like without attribution.
It's typically not a good thing. You also need to take decisions like, is it okay for a fish pedigree of company to use a logo that's similar to yours? There's one over there, that one was actually created.
You have to take decisions on whether you should register your trademark or not and you will have to find and counter trademark infringing occurrences of your name and logo. What's pretty good for that is if you have like a Google alert for your project,
it's good to have anyway because people are discussing your project so it's great to have that feedback what people are using it for. It's also great for people using your trademark for conferences that you are not involved with. It's great for you finding products that are using your trademark without having contacted you first.
Then you need to identify whether that usage is actually infringing. So if this is a product that's sufficiently similar to your project so that there could be confusion, it could be infringing. And then you have to actively go out and fight that infringement. From experience and the Apache software foundation, what's usually sufficient is to send a reasonably
friendly email to the person or to the marketing department and tell them like this is our trademark policy, you're not following it, please fix it. Typically this is sufficient if you do it often enough. So much for the easy legal stuff.
So it's like easy rules, all fine. We're going to go to the slightly messier topic of people aspects. Honestly, I don't believe in the lonely brilliant hacker. I believe that's a lie.
Are we one of us who writes great software, stands on the shoulders of giants, either reusing software others have built or reusing ideas of others or even better collaborating with other people. Also great software is refined over and over so it's not like bright ones.
So I believe that a project without people is a dead project. And a project with a single point of failure when it comes to contributors is pretty high risk. At Apache we've got the saying of community over code. What's more important, there's nothing more important than to have a vital community behind
your open source project because this is what keeps it alive. So what do I mean with people and what do I mean with community? Part of those people that should be interesting to you are potential users of your project. Where do you find them?
How do you turn them into actual users? And once converted, how do you retain them? So there's a term for that. It's called marketing. How do you do marketing? You go to social media, you tweet about it, you probably use your own hashtag.
You use a separate Twitter handle for Apache Mahout. We have the at Apache Mahout Twitter handle which retweets all interesting news that are related to Apache Mahout. Use your true mentions of your project to find out what people are using it for. You get involved in these discussions to find out more.
Why should you do that? First of all, it's good feedback for your development, like what are people really interested in? On the other hand, you will come into the situation where people ask you, why should I use this project? What are other people doing with it? At Apache Mahout we used the Hafsos Wiki page which was just powered by an alphabetically
sorted list of people who admitted to using Apache Mahout and sometimes they were brave enough to tell us what they were doing with it. This was extremely helpful to answer the question of which people are using your project. What can I do with it?
Look at this. Those are the real world use cases. There's also this one instance when I went to an ApacheCon in Amsterdam and heard one of Apache Solar's users talk about what they did with the project. It's typically much more believable, much more approachable if your downstream users
talk about what they do with your project than you selling it. So if you can get some of that information on your project public, that's super nice. At some point you may want to decide if you want to run your own conference, if
your project becomes really, really successful, or you may want to leverage some pre-existing one. There's a couple things you can do without conferences. You can talk to the press. You can write press announcements and hand them out. I've made very good experiences with talking to Heise people here in Germany.
People at Software and Support also usually are open to receiving news that they can publish on their site. It helps if these press announcements are generally understandable, not just for the hardcore geeks of your project. What you can do as well is that over time news magazines will come to you asking for
guest articles so you can start writing them yourself. You can start reviewing books if you have the time in order to know what other people are writing about your project. If you really have lots of time left, you can write your own books or you can start writing your documentation such that it can be published as a book as well.
So speaking of writing books and supporting users, some of your downstream users will be happy reading just through the docs. Many of them won't. Quite a few will prefer going to conferences like FrostCon here, being told what's new,
being told what's interesting. So you will end up giving talks at conferences. More important than that, you will probably end up talking to people in the hallway. What I found helpful is to have your presentation at the beginning of a day or at the beginning
of a conference even because that means that people will come to you and ask you questions on your project because it's easier to remember me standing here than me remembering everyone in the audience. Sorry, I'm pretty good with faces but not good enough to remember everyone. You may end up standing at a booth answering questions, just being available.
Here at FrostCon, you should of course check out the Elasticsearch booth outside. We've got Philip with us who is happy to answer any questions and happy to channel all of your runs to the project and company internally.
There's another booth by the Apache Software Foundation that you should check out. They've got nice stickers and they can answer all the questions about Apache. Of course you should also check out the Free Software Foundation booth. These are just my three main favourites and there are many more outside.
Over time you will have to do some kind of support. What does support look like? You will have people just beginning in your project and you want to mentor them and you want to support them, not just scare them away. There will be questions that come in over and over again and you will have answered
them already instead of telling them go to the frequently asked questions page. Makes this frequently asked questions page linkable and links them to the correct question. That's pretty helpful and for anyone searching for the same answer, they will find the correct and detailed answer without you having to type that up every day.
One hint about that is that if your first time users and beginners are happy, they might one day turn into successful contributors. Are there any students in this room or people dealing with students?
Are you aware of the Google Sum of Code internships? One, two, three. So essentially it's a way of getting you paid to contribute to open source. Not always quite as well paid as working for an IT company in, say, Germany, but it gives you the ability to contribute to your favourite project and get money in return.
So in my opinion this is quite a nice deal. Yes, it's only for coders, unfortunately. One hint concerning beginners and concerning giving support.
People may not use the communication channels that you prefer. At Apache we've got the saying, what didn't happen on the mailing list didn't happen at all, but there are people who prefer having their questions posted and answered on Stack Overflow. So it does pay to spend some time there and fetch users from where they are, especially if your goal is to grow your community.
Speaking of mailing lists, when you create your project, helping out isn't just about providing code samples, it's also about answering questions. So what we did at Apache Mahout is to give out the commit bit,
like the OK to commit to subversion. Back then, now it's good. Just for people answering questions and just for people helping others grow the project. Because machine learning isn't quite that easy. So we've had a couple of people who were into the field and who knew a lot of use cases,
but didn't have the time to contribute code-wise. But they made great contributions on the mailing list, answering questions, giving architectural advice, et cetera. So at some point if you have a project of your own, you want to reward that.
Speaking of mailing lists, how many of you speak one more language than just English? Pretty much everyone. There are projects who do a great job at providing localized resources, be it mailing lists, for people who are uncomfortable communicating in English
or whatever the project's native language is. There are also projects doing a great job at translating documentation. Just one example, there's the Apache HTPD. They've got great documentation, not just in English, but also in German and other languages. So despite the fact that probably nobody installs the web server
by going to the download button on the Apache website, people still come back to the Apache website for the documentation, which I think is a great thing, especially if you look at some of the major big data projects,
their documentation usually is lagging behind quite substantially. Okay, there may be users who are not fond of using mailing lists. There exists communication fora, in this case discuss at Elasticsearch, which provide good access both for people who are uncomfortable
with using stuff like mailing lists and who would rather prefer to have a discoverable interface, user interface, which also provide a side channel that gets mirrored to a mailing list so that people can interact either way they want. Honestly, I'm a mailing list person,
but at least for Elasticsearch, this thing is configured well enough so that I can deal with just using the browser front end. I wasn't convinced to begin with at all. Right now I'm a convert. Okay, now you've got potential, now you've got users and you've got potential contributors to your project. How do you turn them into contributors?
One thing that I found helps is prioritization. Like one of the most common questions we got at Apache Mahout was, I want to contribute, what can I do? You know, like me, go to the JIRA issue tracker and look for something that looks like it fits your need.
What's actually usually slightly more effective is to use the project yourself and fix something and scratch your own itch, kind of ish. Like fix something that bugs you hard enough. The second most common question you get
is when will you implement a chip feature X? What's a common answer for that? Common answer is, patch is welcome. Sounds pretty deflective, pretty defensive, go away, I don't want that patch. What it truly means in the projects
that I've been involved with is, I'm sorry, I don't have the time, please help me. So it's really an invitation to help the project out. And this is also how the project should be using it. Why shouldn't you use it as a defensive strategy?
If your user actually sits down, writing this patch, and from the very beginning you had no intention of merging it or using it, then this will turn into a huge mess and into a huge mass of frustration because there's a user who put lots of time in it, probably spent a lot of time cleaning it up,
so better tell them upfront if you want that. If you really want that, you're free to fork the project, go ahead. But this is not the direction that we want to go to. Now about inviting contributions. What I've learned the hard way at Apache Mahout
is that you should be explicit about which kind of contributions you want. Apache Mahout is about machine learning, so what people thought we were after was just implementations of new machine learning algorithms. After a couple of years, this was like the least wanted patch ever.
What we wanted was clean up, was more testing, was more documentation, was help on the mailing list, was help with public relations, was help with helping other users, et cetera. Was help with scaling, was help with benchmarking. So at some point I sat down, writing a call to action, my heart needs your help email,
listing everything that was not writing a new algorithm. So it helps to be very explicit about what you want. It also helps to write down how a contribution actually works, like step by step. This is how you check out, this is how you communicate,
this is how you build, these are the tools you need for building, and this is how you contribute your patch. I know several seniors level software engineers who have no clue how to read a diff. They're great engineers, they do great architectural work, they do great coding, they still don't know how to read a diff
or how to read a patch. So if you want these people, and if you want to get them in, teach them and train them. What helps of course as well is to have an up-to-date issue tracker, so have real-time help requests there,
to track feature discussions there, and to make it visible to others which problems are being currently worked on or which have been decided as not being on the roadmap right now. A little anecdote to that, we ran a local Hadoop hackathon in Berlin, as a site hackathon to Berlin was words.
So we had many non-Hadoop coders here, but we also had a few core Hadoop coders. So we ran like a little poll. What do you want to do? Some people wanted to work on feature hacks, some people wanted to do use case of why. What was by far the task that was most voted for
was to get a walkthrough how checking the project out, building it, making a change, and contributing to the project looks like, and maybe getting a glimpse of what the other side of the fence looks like, like what does the developer on the other end do to your patch. So what helped for this hackathon
was that the Hadoop project had a few issues in their task trackers that were really trivial. Like, look, here's a typo in our documentation, go fix it. Here's a little typo in our variable name, go fix it. So people had some tiny change that probably didn't break anything,
and they could just walk through that process to get familiar with it. It's like if you're living in this open source world, this feels really, really natural. If you're living in a corporate world where oftentimes code reviews aren't even common practice, this can feel very, very scary even to a senior person. So having these tiny issues
that can get people started is very helpful. What else? What do you do if you have someone who submitted their first patch? Here's your patch. Clock's ticking. You wanna give feedback early. You wanna automate as much as you can
to avoid work on your site and decrease the time to first response. Hadoop does a good job with that. They've got like a patch checker that checks if all the tests have been submitted, if everything is correct style-wise, so no human being has to look at that. You also want clear rules
for what constitutes an acceptable patch. So if you say no, it's clear why. It's not a person's issue. Of course, people spending lots of times to work a patch only to have it rejected don't make happy contributors. So don't go through these endless cycles. When you do this review, remember that people may do this on their corporate time.
So context switching takes time. Projects move on. If it takes a couple months to get this patch in, you may no longer get the feedback that you need. So if this is an interesting patch, get the feedback and changes early on. Otherwise, you mainly have to do them yourselves.
How can you motivate as well? You can ship chocolate. I once was offered iTunes coupons for doing a patch. I didn't accept the iTunes coupons. What was more important to me was to get this person blog about Apache Mout. I was essentially writing this blog post
as a guest blog post, getting more reach. What I got as well for my first contribution was lots of thank yous. I got it in the JIRA issue that I didn't have to open myself. I got it in the commit message plus in the release notes. It's just a tiny little name mention.
But if you Google for my name plus a project, you still find it today. And it was very helpful to have this name mentioned there in order to justify to my employer why I was doing that because suddenly this employer could go out bragging about how one of their employees is actually into open source, which for them being a consulting company was a big thing.
You can say thank you only gets you so far. At some point, if people invest too much, you may want to think about getting some financed, getting some funded. So how do you find payment? It's just foundations whose whole purpose
is to funnel money from sponsors to developers. Since we see fundings for some projects, for some other projects, you can find freelance gigs for your collaborators. Was that a little warning?
Open source project owners and contributors usually wear multiple hats. In my case, multiple jackets. Today I'm wearing the Elasticsearch jacket, but I'm also talking about my experiences at CASF. My experience is in the wider open source ecosystem, so always be aware of what hat is on your head.
Okay, speaking of funding, what do you need funding for? You need funding for the machines to host your infrastructures, like issue tracker, source control. You can use canned hosting versus self-hosted. Currently it's pretty common to use GitHub. Remember that if your project is long-living,
GitHub may not be the coolest kit in 10 years' time. SourceForge once was and is no longer, so you may think about how easy it is to move all your assets out of that canned hosting. You need time to configure this infrastructure. Even if you use GitHub or even if you use an issue tracker that's hosted,
you still wanna configure it to meet your needs. You probably need machines to actually work on yourself sitting in front of you, like your laptop. And you need time to do coding work, and you need time to do the non-coding work, so that's why you need funding. And most likely there's different sponsors
for different points on that list. Speaking of funding, you can of course fund the ASF. That's how. If you need help, talk to me after this talk, and I can get you on that list. Okay, another thing, communication. You wanna communicate your vision very clearly,
so you can tell people what's going on, so that you can keep people out and stop them from spending time if what you do is actually not what they need, and you can embrace people and pull them in if this is what they really need. Also tell them what your priorities are to avoid discussion about a patch,
because what is better depends on your definition of quality. If you work in a tiny team, one-on-one communication is great. If a team grows, this turns into mass media and doesn't scale anymore, so you want a central app.
So what kind of communication channels can you have? You can have meetings in person. They're really high pandas. You can talk to each other, you can see each other, you can communicate, you can talk back and forth. But they're expensive to set up. They are synchronous, both in time and space, so you have to ship people to one location.
They're also not durable, because they have to be repeated for every new human in the project. Let's go a step back. We do a video chat. It's still pretty hype, but it was with these faces and a little less body language. It's still kind of pretty expensive to set up, because they are still synchronous in time. Imagine having one person in Australia,
one person in the US. It's gonna be really, really tricky. So it needs good technology. You need a good internet connection. You need a good computer. If you've got someone where internet connection's bad, this is probably not an option. Also, it's barely durable. Imagine having to watch all the video chats again when you join the project.
You're not gonna do that. You can go for an online group chat. IRC is popular, there's hip chats, there's Slack, whatever. It's lower bandwidth, so still it's text only. There's little queues like your partner's typing right now. It's rather cheap to set up, but it's still synchronous in time,
and it needs a decent client. It's rather durable because you can search the logs, but trust me, it's hard to follow in retrospect. There's web fora, low bandwidth, text only, cheap to set up, suddenly it's asynchronous. So somebody can post a question.
Somebody can post the answer once they are online, and there's no need for them to coordinate. It's pretty durable. You can search these discussions. You can follow archive discussions. That's pretty nice. Mailing lists are similar. Except now it's really text only.
You can use your issue tracker. It's nice because it's low bandwidth, it's asynchronous, it's durable, it's well-structured, but it's really, really fine-grained. If you look at the bug tracker of Elasticsearch Core, you will have a hard time figuring out what the strategy is except you know exactly how this bug tracker is being used by the project.
So for higher-level views, you can use Wiki pages. So you have, well, sometimes structured. You can go for web pages which hopefully are really well-structured, which lead you through everything you want, which have documentations, which have the high-level view. So what I wanna tell you is use
the right communication medium for the task at hand. You will have to need all of them. You will have to use all of them, like if you have a burning conflict where people are fighting each other, get them on a video chat to talk to each other. Probably sometimes it goes boof and all is fine. Sometimes it's just like, okay, technology failed us,
we misinterpreted, there was a misinterpretation of states. Like when I call my husband, and O2 tells me that his line is busy, and it's not actually busy, but he's just clicking me away because he's in a meeting, so this is like, okay, we talked in an hour, and we figure out O2 was a culprit, everything is fine.
No need to communicate through the broken channel anymore. What you do want is one canonical place for keeping current status. Where do you go to figure out if a build failure is fixed already? Is that on a mailing list? Is that in the issue tracker? Is that somewhere else? Have it in one place.
You want one canonical place for documentation. No separations there. You want one canonical place for tracking previous decisions. At Apache, this long-term memory is provided through mailing lists mainly. It can be different at different projects, but have one place where people can go.
If you pull more and more people in, you suddenly will think about mental health, and you will think about overcommitment. One thing to avoid, avoid the liquid effect, jumping on every task. I'm gonna do it. Five months later, it's still sitting there. So there's a couple dozen people who could have done it,
but didn't do it because you jumped on it. Leave it there, leave it sitting if you don't do it immediately. Maybe someone follows up. Avoid getting too much on your plate. At some point, you will need to tell people, my pipeline is full, patches will come, please help me out, ask for help.
So there's a few nice pages, especially at Apache, on remaining sane and not overcommitting. You will also have to think about physical health, because if you're sitting at your laptop for hours and hours and hours,
there's a good chance that your hands will be very angry with you. There's a good chance that your neck will be angry with you. I've got a better time carrying my 11-kilogram child all day than sitting in front of my laptop all day. Neck is worse when I'm sitting in front of my laptop all day, because I'm like huddled together.
There's a great keynote at Berlin Buzzwords two years ago by Eric Evans on ergonomics and what pain he went through by not following this advice. If you don't believe my words, watch this keynote. Probably you will believe him. You will also need to think about project growth.
As soon as you have several people working on your project full-time, for newcomers it will feel like drinking from the fire hose, because they are contributing eight hours a day to your project, multiple people. So you will need to find a way if you want to have a diverse community
to enable those who don't have that time, who are probably doing it after work, or just as a side project at work, to follow up with what's going on. You will also need to deal with poisonous people who try to destroy the culture of your project by trolling or by asking the same questions
over and over and being very persistent. So you will need to figure out strategies to identify these people together, together data about what's happening and how much energy is actually draining
from your project and to kick them out if worst comes to worst. There's a few talks, one by Ben Collins-Sussman and one by Brian Fitzpatrick on dealing with poisonous people. There is a great YouTube recording on talks that Kristin Küntop gave at this very conference a few years ago on how he dealt with flame works
and breaking communications. There's also a nice presentation on the Rust community online that talks about how publishing acceptable and unacceptable behavior, together with the countermeasures that are going to be taken if he breaks this contract, can help build a friendly community
and can help keep people out that you do not want. Because one thing to keep in mind is, no matter which rules you set up, even if you don't set up any rules, you will always exclude some people. And somehow some culture will evolve. My advice would be to build this culture consciously,
to read strategies on how to build the cultures that you want to see. Some culture will evolve, it may not be the one that you want. Finally, change management. What's the biggest change in an open source project?
Leader leaves, nobody's there anymore. Prepare your exit well in advance. Me personally, I did Berlin Buzzwords first time in 2010 with no intention of running this event more than once. Unfortunately, we had attendees who wanted to come back.
So now I was left with a conference that depended on me doing all the marketing, all the outreach, all the sponsorship. Fortunately, not doing all the accountings, that's what I had a producer for, but I still had to do the talk selection together with a few other open source contributors who helped me with this conference,
one being Simon Willner, the other one being Leon Leonard, but suddenly we had to find a way to make this conference stand on its own feet. Me personally, it took, for me personally, it took four years to get rid of this conference without breaking it. So build this handover in from the beginning, document from the very beginning,
and delegate from the beginning, and find a way to build a memory into the project you do. So it's really mostly about delegation. And with that, it's time for me to wrap up and to start the discussion, because I want your feedback and I want your questions.
If you have any questions, I have the microphone. Well, come on. So what I'd like to ask you is,
especially if you set up a project and if you don't have the right communication culture, are there any hints you could give us to improve on that?
The first one about improving culture is the first one lead by example. You should be the committers that you want to meet coming to the project. Well, the other ones, there are tools out there. One book by Peter Hintjens on building successful open source communities.
There's one on building successful online communities that you can read, where there's actual tools like gamification tools, being transparent, et cetera, that help you do that. There's quite a bit of literature that can help you. One thing that I find helps is to lead by example.
This helps very much. The other one is, if there is like rude behavior, to keep the discussion technical, but to call out people on their rude behavior, and then lead the discussion back to a technical topic. What also helps if you keep,
like sometimes you read an email and you're really upset about it. What helps is to take a breath, walk away from the keyboard, come back and reply to this email in sort of like a professional context, with keeping most of this anger out, and keeping all of the ranting out.
What you need for that is like a distance between you and the email you are just answering. One hint for the first three questions, you should go to the Elasticsearch booth. There's a surprise waiting for you.
Okay, I'm lucky then, I think. I wanted to ask, what did work for you best for event management from the infrastructure side? I've been organizing events myself, and I've been struggling finding an open source solution to gather all the details for a conference, for example. What do you mean with infrastructure?
For accounting, for managing speakers? Yeah, for managing speakers, websites. So, I do have an event producer. I don't like the systems they are using. The one that I like from a speaker's perspective is the one that FrostCon is using, and that is FRAP.
There's also a talk, I think, today, I don't know, about what tools FrostCon is using. So, if you don't make it to this talk, check the schedules, there will be a recording of it, and they will talk at great lengths. I'm interested because I'm a mentor at Google Summer of Code, and we are doing an open event tool, and we wanted to improve and maybe work also on that.
Thank you. Over there. Hello, thank you very much for the inspiring talk. I was just wondering, this is now a very long list of very big words, and if in the beginning you are just one coder with one idea, where do you start?
I mean, it's like, I feel a little bit overwhelmed. Do you take a depth-first approach where you say, okay, I do the code, and then I spread out, or do you go breadth-first before I code anything, I start building up all the departments? It depends on your preference. So, the way I started with Apache Mahout was
that I tried to find, so my goal was to build something not necessarily to build a business around, but to build something that would be bigger than me. So, the first step I did was to find people that are like-minded, that want to do the same thing. What we together then did was to figure out
if there's a project already there that covers our use case. Unfortunately, there wasn't, and the third thing then was, okay, we decide on a license, we decide on hosting, and then we go coding, and we never forget about getting new people in. So, it really depends on your goal.
I think there was another question. Yes. First, thank you very much for the talk, very interesting. Second, I was wondering if you know any studies, because you spoke about that it's very important
to answer quickly, to pull requests, to contributions. Otherwise, they might get lost, and that makes a lot of sense. But I'm wondering if you know any numbering around how many contributions might be lost if you, you know, in open-source projects. Sorry, no clear numbers. No clear numbers.
Sorry? No numbers here. Oh, okay, it's interesting. Oh, but we'll be in an interesting study to know. We work at the same company. You can go to Elasticsearch Core, check our GitHub issues, and see how old they are. I'm actually thinking to the weight. Go ahead. So, we have one more giveaway,
because he's working for the same company, so same gimmicks. Sorry. One more question, come on. So, it's like a Rubik's Cube, there's power chargers, so I'm not talking about pens. Here you are.
One thing I didn't hear you talk about is when one's thinking about coding, one often tries to make the number of differences in approach to a project as small as possible from established practice. It seems to me there are big advantages
doing the same thing on the non-coding side, especially, for example, you talked about walking people through the committing process. There'll be a lot of people who are familiar with committing to other open-source projects, and presumably there's quite a lot of mileage in making your process only different when it's really important to your project.
Are, I suppose, I was wondering, really, whether you have anything to say about resources for, I suppose, turn-keying that in the same way as you can turn-key quite a lot of the technical, cyber-technical infrastructure for a project.
So, the reason why Apache Mout is Java is because at the time when we created this project, there was a huge amount of Java developers. So, we knew that there are just a tiny fraction of people who know about machine learning, and going for some exotic programming language
would even further reduce the communities that we can draw from. So, that's why we decided for Apache Mout, for Java. So, reason to go for Maven slash And was because it was a well-known, well-established build system, so that people who get started know how to get started. You may hate Maven as much as you want.
If you want to make it easy for Java devs to contribute, make your project follow the Maven structure, so they know where the source code is lying, so they know instantly how to get stuff into the IDE without reading documentation. Any additional hurdle makes it even harder
for people to get started. So, that's very, very true. So, those are all technical choices about the project. I was really wondering the same thing, maybe more with processes on the non, I was asking on the non-coding side, so making it as easy for people to apply their knowledge
from other projects on the non-coding side as it is on the coding side in the way you're talking about using skills that are common technical skills. So, for some of the topics I've been talking about, you don't see a lot of commonalities between projects.
I only have experience within the ISF and within Elasticsearch, of course, so it may well be that similar processes exist within the Debian community, Federer community, and what have you. I can't talk about that because this is apparently not my background. What I found helpful when getting started with projects
is to write down some of these processes. When I started at Apache, nothing was written down, no, that's not true. I didn't find the documentation, sorry. Try googling for Apache and Jenkins. What you find is the documentation on how to set up Jenkins behind Apache. You don't find the Jenkins of the Apache Software Foundation.
Not helpful. So you need some good documentation on how things work. It does help to look across boundaries if you are within the ISF. It does help to have some insight into other communities. What I did realize from my own background is that this kind of looking behind the scenes
is pretty time consuming. So I've got some insight into the Linux kernel community by virtue of having a husband who's into this community. I've got some insight into the FSFE just because they are also in Berlin and I know some of these people. This is how it works, but it's time consuming
and there's, I know a few people who have experience with community, like crossing boundaries and comparing things and how things work. But it's definitely something that I would find interesting. Okay, then. No more gimmicks doesn't mean no more questions,
but you can talk to me afterwards. So you will be around on your booth for the remaining of the day? ISF booth, children's room here. So I've got a little giggling with me and I've got my husband with me. If you see a little girl about this age with a pitchy patchy fingering,
I'm probably not far away. All right. Then once again, thanks for your insights and project management of the source project. Thank you very much.