We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The 750,000-line long pull request: crafting a more resilient open source community

00:00

Formal Metadata

Title
The 750,000-line long pull request: crafting a more resilient open source community
Title of Series
Number of Parts
32
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
As open source communities grow, they need to adapt to new dynamics: Different types of expertise between contributors, different expectations, etc. The arrival of a very large PR on the OpenFisca project became the test of the work we had put in to create a more resilient community. Since 2011, a community of developers and economists are developing OpenFisca, an open source framework in Python that turns law into software so it can be used by administrations, economists and activists. The contributors are split between tech experts and domain experts, each bringing interesting skills, mindsets and issues. However, a full time tech team became the full time core team, shifting the balance of the community. Having a full time team working on the project was a plus, but it gave tech experts a central role that put a lot of strain on domain experts’ contributions. In this talk, I will describe how we worked to create a dynamic community that can deal with uncertainty (new contributors, very large PRs, …) and grow to reach new heights in the hope it can inspire other communities.
Process (computing)Software developerOpen sourceTwitterMultiplication signProjective planeComputer animation
Flow separationCASE <Informatik>LogicComputer animation
BuildingProjective planeBuildingLine (geometry)Closed setBitContext awarenessBridging (networking)WordComputer animation
CodeOpen sourceSimulationSystem programmingCodePhysical lawOpen sourceProjective planeSimulationReal numberVideo gameComputer animation
Game theoryTemplate (C++)Parameter (computer programming)FrequencyOpen sourceWell-formed formulaVideo gameReal numberSystem administratorProjective planeOpen setComputer architectureTemplate (C++)Physical lawBit rateNichtlineares GleichungssystemComputer animation
InformationNichtlineares GleichungssystemMathematicsSound effectMultiplication signProjective planeType theoryPhysical lawoutputComputer clusterRevision controlComputer animation
DatabaseVideo game consoleOnline helpEinsteckmodulMotion captureExpert systemNeuroinformatikSimulationPhysical lawMultiplication signPhysical systemType theoryCore dumpGame theoryVector graphicsComputer animation
Expert systemPhysical lawType theoryOpen sourceProjective planeCodeComputer animation
CodeComputer simulationPhysical systemInfotainmentCASE <Informatik>Open sourceRevision controlComputer animation
Open sourceRevision controlComputer animation
Multiplication signProjective planeMereologyInsertion lossContent (media)Product (business)Computer animation
NewsletterEvent horizonElectronic mailing listEvent horizonNewsletterOffice suiteGoodness of fitConnectivity (graph theory)Core dumpVideo gameContent (media)Open sourceComputer animation
Digital filterComputer fileHydraulic jumpView (database)Error messageComputer fileComputer animation
Context awarenessSystem callProjective planeDemosceneComputer animation
Context awarenessProjective planePhysical systemLine (geometry)Software developerContext awarenessSoftware testingScripting languageOpen setOpen sourceIntegrated development environmentComputer animation
Line (geometry)Computer fileType theoryLattice (order)Mountain passMereologyPoint (geometry)Strategy gameSoftware testingProjective planeLine (geometry)Computer configurationContext awarenessBranch (computer science)CodeTerm (mathematics)Prandtl numberThomas BayesCommitment schemeReduction of orderComputer animation
Projective planeCodeLine (geometry)MomentumComputer animation
Variety (linguistics)Duality (mathematics)MomentumPosition operatorMomentumProjective planeFood energyComputer animation
Food energyFood energyNP-hardSelf-organizationOpen sourceTemplate (C++)Event horizonSoftware developerBridging (networking)Projective planeLevel (video gaming)Backdoor (computing)Open setComputer animation
Event horizonPolygonDivisorProjective planeOpen sourceLimit (category theory)Lattice (order)Computer animation
Arithmetic meanOpen sourceBuildingFocus (optics)Different (Kate Ryan album)Bridging (networking)BuildingOpen sourceMereologyProjective planeType theoryComputer animation
InformationData conversionPatch (Unix)Line (geometry)MereologyExpected valueProcess (computing)Meeting/Interview
Open sourceLecture/ConferenceMeeting/Interview
Roundness (object)InternetworkingLecture/Conference
Transcript: English(auto-generated)
Hello, everyone. So my name is Anna Olivia, and I'm on Twitter if you want to check that out. But other than finding very catchy titles for talks, I'm also a Python developer. And I co-host the PyLadies Meetup in Paris.
And I've been working on open source projects as a full-time job now for the last two years. And something that has been in the back of my mind for those two years is to see the relationship between tech and community.
And I realized that more and more, I don't want to separate soft skills from other skills that would be not soft. I don't know. But basically, this idea that they're completely separate things is bothering me more and more. And I wanted to talk to you today about a case where
the fact that we worked on the community helped us to deal with a big technological challenge. So what's going on? First, I'm going to talk to you about this project called OpenFSCO, because I need to tell you a bit about the context around the 750,000 line
pull request. And then I'm going to talk about building bridges, which is one of the two things you can do with social capital. One is building bridges, and the other one is doing some bonding. And we're going to see when you can use one and the other to deal with technological challenges.
And finally, we'll have some closing words, and I'll take some questions. OK, so what is OpenFSCO? OpenFSCO is an international and contributive open source project. The GitHub is OpenFSCO. And the idea is to turn law into code.
So the idea is that you take any fiscal or benefit text of law, and you turn it into Python. So as one of my friends said, it's like The Sims, but for real life. So this is an example.
For countries starting out, we have something called the country template. And it's basically when you do a Django admin and then start project, and it just creates a whole architecture of your project, this is what you do when you want to use OpenFSCO. You use our country template. And we have some very simple equations.
I don't know if you know, but fiscal law is complicated. So I'm just going to take that very simple example. But basically what it does here, it says that if you want to have the income tax of someone, then you take that person's salary and you multiply it by something called an income tax rate. And that would give you their personal taxes.
So what do we do with that? Because it's very nice to have Python equations. We all like them. But what do we do with that? Two things, two projects right now, mostly in France, but in other countries as well. Lex Impact, which is a tool for Parliament, so
that they can know what effects changes in the tax law will have on certain types of households. And the other one is Mesed, and there is a French version, and there is a version for the city of Barcelona now. And the idea is that you input all your situation,
how many kids you have, how much you earn, and all that information. And then it tells you which benefits you can apply to. So instead of going around and having to apply to one after the other after the other, you just do one time this one simulation, and it tells you everything you can apply for.
So how does it work? So OpenFiska is like a game console. Pretty much you have one big engine that we call OpenFiska Core, and then you have several cartridges that are the country packages, so you have one engine, and then you can have the Tunisian cartridge, or the French cartridge, or
the one for New Zealand, and you also have local cartridges such as the city of Barcelona, or the help from the city of Paris. And to do that, so we have the core, and the core is vectorial computing, which is basically, you can run simulation on millions of households at a time,
which is great for researchers who use OpenFiska on anonymized databases of all the French tax system. And they can do that because it's vectorial computing. So we have two types of experts in that community. One is tech experts who do the engine mostly, and
the other one is economics experts who understand the law. So the more tech experts you have, the better open source project, because you will have a code that is reusable, because you will have tools to help new contributors come in, because you will have complete documentation.
And if you have more econ experts, your systems will be, so you'll have new use cases, and you can use all that code
to do more simulators and to create more value for citizens. But in all open source community, you have the issue of sometimes interpersonal conflicts. And when there is no social capital left,
well sometimes these people fork. A note about forking, what is it? The idea of forking is that you have a community that has a product, and then suddenly, a part of the community wants to change and the other doesn't. And so they just decide to move away. The problem with forking is that usually after a while,
you cannot really reconcile the two projects anymore. So it's a big loss for a project when someone leaves, because as time passes, it's harder and harder for those contributors to come back. And when I arrived on the project, one of the contributors had just left, and so
we wanted to prevent this. So we worked a lot on, as well as working on the engine and working on making the documentation better and working on creating new content. We actually worked on this social capital,
which is like having a newsletter that asks you not a lot of work. I don't know how many of you started a newsletter and it kind of disappeared after a while. It happened to me a lot, it always seems like a good idea, but after the while, you need to keep on doing it. So we find a way to have a newsletter every two weeks.
And what we did is we just took the name of the PRs, we merged, and we just list them, and it's good enough. It's good enough to have a way to talk to your audience, to show them that things are moving along, and to show their work. Because as they receive the newsletter, they tell you about their news.
And you could put it in the newsletter. We started having monthly after work events and co-working sessions where new contributors will come to our offices and can get bootstrapped and ask questions in real life. And for people who are not used to working open source, being able to talk to someone instead of writing down an issue
can be something very useful. And finally, we started having roadmapping workshops where everybody in the community could come in and we could agree together where that core component should go toward. What it should go towards to, go towards.
And so this is where we were. We were working on knowing more about our contributors. And suddenly, this happened. One fine morning. So this is the error message GitHub gives you when you have more than 3,000
changed files. So I never saw it before. And the first thing to do is not panic, because things are not as dire as they seem. We panicked a little bit at first, but then we realized we know this person. We've had coffee with them, and we've worked with them on other projects.
So we just gave them a call and tried to understand their context. And that's one of the first thing you need to do when you work on that social capital inside your open source project, is to understand that your context as a developer might not be the context of other people who have
other kind of work environment, other experiences, and other way of working. And especially in the economics world, for example, one of the things that we discovered is that their habit was to finish a project until the end before they would show it to the world.
So the idea was this person, she worked for three months, and she updated the whole French tax system. So this PR would update everything tax-wise on open fiscal France, which is a huge deal for us. And she just waited until everything was perfect to open the PR,
which is a good intention. So we analyzed it with her, and we discovered that she actually wrote a script to generate automatically tests, which was great. I mean, she understood that tests were really important, but they were randomly automatically generated.
So once we took that out, we went from 752,000 lines to a mere 215. So that was a big improvement on the work we had to do now. But still, it's a large amount of lines to read.
So with my colleagues, there were three different strategies. One was just push the merge button and let's go to lunch strategy. We'll deal with it later. Another one was just reject it, just close that PR, say no. But that would mean maybe our contributors who did such great work
would just fork and maybe not come back. And the other one was to find a compromise. And that's the option we chose, because we thought this is just too much work, she worked for three months. If we close it now, if we say no now, in human terms,
it's not something you come back from very easily. So we met. We met for half a day and we talked. We talked about their context. We talked about what they could give in and what we could give in so that we could move forward together. So in the end, we agreed on 15 smaller PRs
that would amount to the 200,000 lines of code. We agreed on review guidelines, which part of the PEP8 we would apply, which part of testing we would apply. And especially the point that was the most important is that they needed the tools to succeed.
Because telling someone just do a cherry pick and then a rebase and I'll come back in a few days. This doesn't fly. You need to be with people and you need to hold their hands at first so that they know how to do this and then they can teach others how to do this.
A quick point, what is git rebase? Git rebase allows you to take a branch from your project and then actually base it on another commit. And cherry picking, it allows you to take a commit from a branch and put it on another branch, which is quite useful.
But those are technical tools. The real thing that we developed is social capital. Because by doing those 15, it took us two months, 65 days, to merge those 215 lines of code. And by doing this, we created social capital.
We created trust in our project. So what is social capital? Social capital has many ways of being defined. It's quite elusive, so I'm going to define it as the quality and momentum of relationship within a community. And there are ways you can create that momentum
and that positive quality in your project. And the first thing we want to do is put energy and enthusiasm in what you do but also what others do in your community. Not only should you, should, I mean, if you want to create that social capital,
something that is great is organizing things. Organizing conference, organizing events around what you do. Ease cooperation. Make it easier for people to work with you. And find things that are kind of hard. Let's say a documentation that is not quite right. It might be doing our country template
was something that helped a lot of new contributors work with us. And finally, work on trust and reciprocity. Having guidelines that you never budge on will not create that trust. Trust means taking some risks and people can then take risks with you.
So it's important to find those opportunities to build trust. So once you have capital, what do you do with it? There are two things you can do with it. And I think, I mean, there are many things you can do with it. But two of the main things are called bridging, which means that you're going to create external ties.
For example, for us, we use our social capital to bridge the gap between our developer's culture and the economic's culture. But the other thing you can do is called bonding, which means strengthening the ties within your community. And I have another example of an open source community
who use social capital to do something great, but more into the bonding than bridging. I don't know if you heard of OpenStreetMap, but it's a contributive open source project. And they had this project of delimiting city limits in France.
There are more than 36,000 cities in France. It took them six years to do it. So it's an open source project doing a six year long technical challenge. And from their own account, having some community meetings, having their yearly conference
was one of the determining factors in their success. So finally, what does it mean for open source? Well, we often think about community work as strengthening the bonds between us, and I think it's great. But if you want more diversity,
I think we should use that social capital we built to actually build bridges towards other kind of skills that we have, but also other kind of people that we usually see in open source. Because now that if we can create the tools they need and the paths they need to become part of our communities,
I think we'll be richer and we can have those open source projects that are well designed and well marketed which can be used by other types of people. So thank you very much, and if you have any questions, we'll be happy.
Thank you. Are there any questions? There's always a question, Russell. Sorry, yes, there is always a question. The contributor who submitted this 100,000 line patch
or however many it was, the question that comes to my mind is, how did we get into a situation where they got so far down that path before they came to you or even knew they were working on that? Preempting a problem is always better than solving it. How much do you think the role of setting up expectations
about how people are going to engage before they start engaging is part of this process? So if I understand the question well, you are asking first, was there any way for them to kind of ping us beforehand before we got to this part?
And then if we could have some kind of documentation for it, like a ritual. So what I understood from my conversation with them is that for them, well, for her beforehand, you wouldn't show your work before it was perfect.
So if you have that in mind, there's no reason for you to ping anyone until it's perfect. And I mean, if you want information, you have to look for it and that's the thing. If you think you know, you're not going to look for that information. But now she knows, so now she tells her colleagues
and so we have a better community for it. So having gone through this experience and having been on the other side of this experience now, it really helped the community as a whole learning this together. But it's still an issue and I think it can be an issue for any community working with people who are not used to
just opening a PR as soon as it works. There's this idea of pride, there's the fear of being judged on your work and all of that I think we need to have also, we need to behave ourselves better when we answer and when we talk to new contributors to show them that any work is good work.
And I mean, at least that's what I believe and I believe we can have better open source communities if we can convey this idea. All right. Are there more questions? Are there questions from the internet audience? No.
Okay. Thank you again. Thank you very much. Round of applause.