The 750,000-line long pull request: crafting a more resilient open source community - TIB AV-Portal

The 750,000-line long pull request: crafting a more resilient open source community

00:00

1

DjangoCon Europe

Gomart, Anna-Livia

Formal Metadata

Title

The 750,000-line long pull request: crafting a more resilient open source community

Title of Series

DjangoCon Europe 2019

Number of Parts

32

Author

Gomart, Anna-Livia

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/45450 (DOI)

Publisher

DjangoCon Europe

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

As open source communities grow, they need to adapt to new dynamics: Different types of expertise between contributors, different expectations, etc. The arrival of a very large PR on the OpenFisca project became the test of the work we had put in to create a more resilient community. Since 2011, a community of developers and economists are developing OpenFisca, an open source framework in Python that turns law into software so it can be used by administrations, economists and activists. The contributors are split between tech experts and domain experts, each bringing interesting skills, mindsets and issues. However, a full time tech team became the full time core team, shifting the balance of the community. Having a full time team working on the project was a plus, but it gave tech experts a central role that put a lot of strain on domain experts’ contributions. In this talk, I will describe how we worked to create a dynamic community that can deal with uncertainty (new contributors, very large PRs, …) and grow to reach new heights in the hope it can inspire other communities.

DjangoCon Europe 20191 / 32

1

19:24

The 750,000-line long pull request: crafting a more resilient open source community

2

28:36

Take the goRe out of a DjangoReact stack

3

53:29

Keynote: Sketching out a Django redesign

4

18:31

Simple visual regression testing

5

33:02

Serverless Django with Zappa

6

28:58

Reduce, Reuse, Recycle - Persisting WebSocket connections with SharedWorkers

7

25:14

Pushing the ORM to its limits

8

29:06

Pentesting your Django apps

9

35:31

Nothingness and identity in Python and Django

10

27:15

Maps with GeoDjango, PostGIS and Leaflet

11

30:34

Making your life (h)APIer with Django

12

18:40

Fetching data from APIs (GitHub) using Django and GraphQl without hitting the rate limits

13

40:20

Keynote: Feeding the Pony: Contributing back to Django & How to make that work for you

14

25:39

Does this run in linear time?

15

48:24

Keynote: Docs or it didn't happen!

16

49:34

DjangoCon Europe 2019 - Lightning Talks Day 3

17

32:59

DjangoCon Europe 2019 - Lightning Talks Day 2

18

28:51

DjangoCon Europe 2019 - Lightning Talks Day 1

19

26:39

Building plugin ecosystems with Django

20

21:01

Building a Django Community in Africa

21

26:23

Building a custom model field from the ground up

22

31:00

Logging Rethought 2

23

46:45

Maintaining a Django codebase after 10k commits

24

27:40

Jupyter, Django and Altair

25

19:39

How PyLadies Brazil became the biggest PyLadies chapter of the world

26

29:44

How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

27

27:13

Here Come The Robots

28

52:40

Frontend Development for Backend Developers

29

14:20

djangocon.close ()

30

32:12

Django and Web Security Headers

31

36:59

Keynote: Apathy and Arsenic

32

43:13

Advanced, free, open-source application performance monitoring for your Python web apps

Automatic playback

Speech

Text

Image

00:00

Process (computing)Software developerOpen sourceTwitterMultiplication signProjective planeComputer animation

00:37

Flow separationCASE <Informatik>LogicComputer animation

01:10

BuildingProjective planeBuildingLine (geometry)Closed setBitContext awarenessBridging (networking)WordComputer animation

01:47

CodeOpen sourceSimulationSystem programmingCodePhysical lawOpen sourceProjective planeSimulationReal numberVideo gameComputer animation

02:12

Game theoryTemplate (C++)Parameter (computer programming)FrequencyOpen sourceWell-formed formulaVideo gameReal numberSystem administratorProjective planeOpen setComputer architectureTemplate (C++)Physical lawBit rateNichtlineares GleichungssystemComputer animation

03:06

InformationNichtlineares GleichungssystemMathematicsSound effectMultiplication signProjective planeType theoryPhysical lawoutputComputer clusterRevision controlComputer animation

04:04

DatabaseVideo game consoleOnline helpEinsteckmodulMotion captureExpert systemNeuroinformatikSimulationPhysical lawMultiplication signPhysical systemType theoryCore dumpGame theoryVector graphicsComputer animation

04:55

Expert systemPhysical lawType theoryOpen sourceProjective planeCodeComputer animation

05:23

CodeComputer simulationPhysical systemInfotainmentCASE <Informatik>Open sourceRevision controlComputer animation

05:47

Open sourceRevision controlComputer animation

06:03

Multiplication signProjective planeMereologyInsertion lossContent (media)Product (business)Computer animation

06:45

NewsletterEvent horizonElectronic mailing listEvent horizonNewsletterOffice suiteGoodness of fitConnectivity (graph theory)Core dumpVideo gameContent (media)Open sourceComputer animation

08:28

Digital filterComputer fileHydraulic jumpView (database)Error messageComputer fileComputer animation

08:46

Context awarenessSystem callProjective planeDemosceneComputer animation

09:08

Context awarenessProjective planePhysical systemLine (geometry)Software developerContext awarenessSoftware testingScripting languageOpen setOpen sourceIntegrated development environmentComputer animation

10:28

Line (geometry)Computer fileType theoryLattice (order)Mountain passMereologyPoint (geometry)Strategy gameSoftware testingProjective planeLine (geometry)Computer configurationContext awarenessBranch (computer science)CodeTerm (mathematics)Prandtl numberThomas BayesCommitment schemeReduction of orderComputer animation

12:48

Projective planeCodeLine (geometry)MomentumComputer animation

13:06

Variety (linguistics)Duality (mathematics)MomentumPosition operatorMomentumProjective planeFood energyComputer animation

13:27

Food energyFood energyNP-hardSelf-organizationOpen sourceTemplate (C++)Event horizonSoftware developerBridging (networking)Projective planeLevel (video gaming)Backdoor (computing)Open setComputer animation

15:08

Event horizonPolygonDivisorProjective planeOpen sourceLimit (category theory)Lattice (order)Computer animation

15:46

Arithmetic meanOpen sourceBuildingFocus (optics)Different (Kate Ryan album)Bridging (networking)BuildingOpen sourceMereologyProjective planeType theoryComputer animation

16:33

InformationData conversionPatch (Unix)Line (geometry)MereologyExpected valueProcess (computing)Meeting/Interview

18:31

Open sourceLecture/ConferenceMeeting/Interview

19:05

Roundness (object)InternetworkingLecture/Conference

Transcript: English(auto-generated)

00:00

Hello, everyone. So my name is Anna Olivia, and I'm on Twitter if you want to check that out. But other than finding very catchy titles for talks, I'm also a Python developer. And I co-host the PyLadies Meetup in Paris.

00:24

And I've been working on open source projects as a full-time job now for the last two years. And something that has been in the back of my mind for those two years is to see the relationship between tech and community.

00:42

And I realized that more and more, I don't want to separate soft skills from other skills that would be not soft. I don't know. But basically, this idea that they're completely separate things is bothering me more and more. And I wanted to talk to you today about a case where

01:02

the fact that we worked on the community helped us to deal with a big technological challenge. So what's going on? First, I'm going to talk to you about this project called OpenFSCO, because I need to tell you a bit about the context around the 750,000 line

01:22

pull request. And then I'm going to talk about building bridges, which is one of the two things you can do with social capital. One is building bridges, and the other one is doing some bonding. And we're going to see when you can use one and the other to deal with technological challenges.

01:42

And finally, we'll have some closing words, and I'll take some questions. OK, so what is OpenFSCO? OpenFSCO is an international and contributive open source project. The GitHub is OpenFSCO. And the idea is to turn law into code.

02:01

So the idea is that you take any fiscal or benefit text of law, and you turn it into Python. So as one of my friends said, it's like The Sims, but for real life. So this is an example.

02:21

For countries starting out, we have something called the country template. And it's basically when you do a Django admin and then start project, and it just creates a whole architecture of your project, this is what you do when you want to use OpenFSCO. You use our country template. And we have some very simple equations.

02:42

I don't know if you know, but fiscal law is complicated. So I'm just going to take that very simple example. But basically what it does here, it says that if you want to have the income tax of someone, then you take that person's salary and you multiply it by something called an income tax rate. And that would give you their personal taxes.

03:07

So what do we do with that? Because it's very nice to have Python equations. We all like them. But what do we do with that? Two things, two projects right now, mostly in France, but in other countries as well. Lex Impact, which is a tool for Parliament, so

03:24

that they can know what effects changes in the tax law will have on certain types of households. And the other one is Mesed, and there is a French version, and there is a version for the city of Barcelona now. And the idea is that you input all your situation,

03:42

how many kids you have, how much you earn, and all that information. And then it tells you which benefits you can apply to. So instead of going around and having to apply to one after the other after the other, you just do one time this one simulation, and it tells you everything you can apply for.

04:02

So how does it work? So OpenFiska is like a game console. Pretty much you have one big engine that we call OpenFiska Core, and then you have several cartridges that are the country packages, so you have one engine, and then you can have the Tunisian cartridge, or the French cartridge, or

04:22

the one for New Zealand, and you also have local cartridges such as the city of Barcelona, or the help from the city of Paris. And to do that, so we have the core, and the core is vectorial computing, which is basically, you can run simulation on millions of households at a time,

04:41

which is great for researchers who use OpenFiska on anonymized databases of all the French tax system. And they can do that because it's vectorial computing. So we have two types of experts in that community. One is tech experts who do the engine mostly, and

05:02

the other one is economics experts who understand the law. So the more tech experts you have, the better open source project, because you will have a code that is reusable, because you will have tools to help new contributors come in, because you will have complete documentation.

05:23

And if you have more econ experts, your systems will be, so you'll have new use cases, and you can use all that code

05:41

to do more simulators and to create more value for citizens. But in all open source community, you have the issue of sometimes interpersonal conflicts. And when there is no social capital left,

06:00

well sometimes these people fork. A note about forking, what is it? The idea of forking is that you have a community that has a product, and then suddenly, a part of the community wants to change and the other doesn't. And so they just decide to move away. The problem with forking is that usually after a while,

06:22

you cannot really reconcile the two projects anymore. So it's a big loss for a project when someone leaves, because as time passes, it's harder and harder for those contributors to come back. And when I arrived on the project, one of the contributors had just left, and so

06:44

we wanted to prevent this. So we worked a lot on, as well as working on the engine and working on making the documentation better and working on creating new content. We actually worked on this social capital,

07:00

which is like having a newsletter that asks you not a lot of work. I don't know how many of you started a newsletter and it kind of disappeared after a while. It happened to me a lot, it always seems like a good idea, but after the while, you need to keep on doing it. So we find a way to have a newsletter every two weeks.

07:20

And what we did is we just took the name of the PRs, we merged, and we just list them, and it's good enough. It's good enough to have a way to talk to your audience, to show them that things are moving along, and to show their work. Because as they receive the newsletter, they tell you about their news.

07:41

And you could put it in the newsletter. We started having monthly after work events and co-working sessions where new contributors will come to our offices and can get bootstrapped and ask questions in real life. And for people who are not used to working open source, being able to talk to someone instead of writing down an issue

08:01

can be something very useful. And finally, we started having roadmapping workshops where everybody in the community could come in and we could agree together where that core component should go toward. What it should go towards to, go towards.

08:20

And so this is where we were. We were working on knowing more about our contributors. And suddenly, this happened. One fine morning. So this is the error message GitHub gives you when you have more than 3,000

08:41

changed files. So I never saw it before. And the first thing to do is not panic, because things are not as dire as they seem. We panicked a little bit at first, but then we realized we know this person. We've had coffee with them, and we've worked with them on other projects.

09:04

So we just gave them a call and tried to understand their context. And that's one of the first thing you need to do when you work on that social capital inside your open source project, is to understand that your context as a developer might not be the context of other people who have

09:22

other kind of work environment, other experiences, and other way of working. And especially in the economics world, for example, one of the things that we discovered is that their habit was to finish a project until the end before they would show it to the world.

09:41

So the idea was this person, she worked for three months, and she updated the whole French tax system. So this PR would update everything tax-wise on open fiscal France, which is a huge deal for us. And she just waited until everything was perfect to open the PR,

10:01

which is a good intention. So we analyzed it with her, and we discovered that she actually wrote a script to generate automatically tests, which was great. I mean, she understood that tests were really important, but they were randomly automatically generated.

10:20

So once we took that out, we went from 752,000 lines to a mere 215. So that was a big improvement on the work we had to do now. But still, it's a large amount of lines to read.

10:40

So with my colleagues, there were three different strategies. One was just push the merge button and let's go to lunch strategy. We'll deal with it later. Another one was just reject it, just close that PR, say no. But that would mean maybe our contributors who did such great work

11:01

would just fork and maybe not come back. And the other one was to find a compromise. And that's the option we chose, because we thought this is just too much work, she worked for three months. If we close it now, if we say no now, in human terms,

11:20

it's not something you come back from very easily. So we met. We met for half a day and we talked. We talked about their context. We talked about what they could give in and what we could give in so that we could move forward together. So in the end, we agreed on 15 smaller PRs

11:40

that would amount to the 200,000 lines of code. We agreed on review guidelines, which part of the PEP8 we would apply, which part of testing we would apply. And especially the point that was the most important is that they needed the tools to succeed.

12:02

Because telling someone just do a cherry pick and then a rebase and I'll come back in a few days. This doesn't fly. You need to be with people and you need to hold their hands at first so that they know how to do this and then they can teach others how to do this.

12:21

A quick point, what is git rebase? Git rebase allows you to take a branch from your project and then actually base it on another commit. And cherry picking, it allows you to take a commit from a branch and put it on another branch, which is quite useful.

12:43

But those are technical tools. The real thing that we developed is social capital. Because by doing those 15, it took us two months, 65 days, to merge those 215 lines of code. And by doing this, we created social capital.

13:03

We created trust in our project. So what is social capital? Social capital has many ways of being defined. It's quite elusive, so I'm going to define it as the quality and momentum of relationship within a community. And there are ways you can create that momentum

13:23

and that positive quality in your project. And the first thing we want to do is put energy and enthusiasm in what you do but also what others do in your community. Not only should you, should, I mean, if you want to create that social capital,

13:40

something that is great is organizing things. Organizing conference, organizing events around what you do. Ease cooperation. Make it easier for people to work with you. And find things that are kind of hard. Let's say a documentation that is not quite right. It might be doing our country template

14:02

was something that helped a lot of new contributors work with us. And finally, work on trust and reciprocity. Having guidelines that you never budge on will not create that trust. Trust means taking some risks and people can then take risks with you.

14:21

So it's important to find those opportunities to build trust. So once you have capital, what do you do with it? There are two things you can do with it. And I think, I mean, there are many things you can do with it. But two of the main things are called bridging, which means that you're going to create external ties.

14:41

For example, for us, we use our social capital to bridge the gap between our developer's culture and the economic's culture. But the other thing you can do is called bonding, which means strengthening the ties within your community. And I have another example of an open source community

15:01

who use social capital to do something great, but more into the bonding than bridging. I don't know if you heard of OpenStreetMap, but it's a contributive open source project. And they had this project of delimiting city limits in France.

15:21

There are more than 36,000 cities in France. It took them six years to do it. So it's an open source project doing a six year long technical challenge. And from their own account, having some community meetings, having their yearly conference

15:40

was one of the determining factors in their success. So finally, what does it mean for open source? Well, we often think about community work as strengthening the bonds between us, and I think it's great. But if you want more diversity,

16:01

I think we should use that social capital we built to actually build bridges towards other kind of skills that we have, but also other kind of people that we usually see in open source. Because now that if we can create the tools they need and the paths they need to become part of our communities,

16:20

I think we'll be richer and we can have those open source projects that are well designed and well marketed which can be used by other types of people. So thank you very much, and if you have any questions, we'll be happy.

16:40

Thank you. Are there any questions? There's always a question, Russell. Sorry, yes, there is always a question. The contributor who submitted this 100,000 line patch

17:03

or however many it was, the question that comes to my mind is, how did we get into a situation where they got so far down that path before they came to you or even knew they were working on that? Preempting a problem is always better than solving it. How much do you think the role of setting up expectations

17:24

about how people are going to engage before they start engaging is part of this process? So if I understand the question well, you are asking first, was there any way for them to kind of ping us beforehand before we got to this part?

17:44

And then if we could have some kind of documentation for it, like a ritual. So what I understood from my conversation with them is that for them, well, for her beforehand, you wouldn't show your work before it was perfect.

18:01

So if you have that in mind, there's no reason for you to ping anyone until it's perfect. And I mean, if you want information, you have to look for it and that's the thing. If you think you know, you're not going to look for that information. But now she knows, so now she tells her colleagues

18:20

and so we have a better community for it. So having gone through this experience and having been on the other side of this experience now, it really helped the community as a whole learning this together. But it's still an issue and I think it can be an issue for any community working with people who are not used to

18:40

just opening a PR as soon as it works. There's this idea of pride, there's the fear of being judged on your work and all of that I think we need to have also, we need to behave ourselves better when we answer and when we talk to new contributors to show them that any work is good work.

19:02

And I mean, at least that's what I believe and I believe we can have better open source communities if we can convey this idea. All right. Are there more questions? Are there questions from the internet audience? No.

19:20

Okay. Thank you again. Thank you very much. Round of applause.