Beyond Wikipedia
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61688 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023227 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Point cloudQuicksortProjective planeCloud computingWebsiteSoftwareMereologyOpen sourceCore dump10 (number)RobotContent (media)Service (economics)WikiBuildingComputer programmingINTEGRALHeegaard splittingSoftware engineeringVirtual machineComputing platform
02:09
HypermediaHypermediaStatisticsComputer fileProjective planeText editorScaling (geometry)Computer animation
02:40
Computing platformPoint cloudService (economics)WikiSoftware1 (number)RobotType theoryBuildingSoftware frameworkComputing platformProjective plane
03:51
Latent heatGroup actionGenderWikiProjective planeStatisticsNumberContent (media)CASE <Informatik>Computer animation
04:49
GenderVisualization (computer graphics)MereologyContent (media)StatisticsCASE <Informatik>Projective planeGenderWebsite
05:17
Electronic mailing listProjective planeWikiInstance (computer science)RobotFormal languageWebsiteGraph (mathematics)Category of beingQuery language
06:29
RobotSoftware maintenanceComputer animation
06:41
Service (economics)Point cloudRobotCloud computingService (economics)Interface (computing)DatabaseAdaptive behaviorQuery languageElectronic mailing listLibrary catalogMereologyNeuroinformatikProduct (business)Software maintenanceLaptopProgram flowchart
09:12
Gamma functionInverter (logic gate)Elasticity (physics)Source codeService (economics)Projective planeInstance (computer science)Physical systemVirtual machineFront and back endsComputing platformComputer animation
10:37
Task (computing)Data managementCollaborationismSoftware developerCodeGoogolWeb pageWikiPatch (Unix)Data managementFlow separationTask (computing)Real numberComputer programmingInternetworkingSoftware engineeringProjective planeFreewareRight angleSlide ruleBitSoftware maintenanceLink (knot theory)WhiteboardMereologyCollaborative softwareComputing platform2 (number)Different (Kate Ryan album)Open sourceStudent's t-testComputer animation
14:37
Program flowchart
Transcript: English(auto-generated)
00:06
Hi and welcome everyone. I am here today to speak to you about a little bit about Wikimedia's open source ecosystem. So I assume all of you know what Wikipedia is and maybe some of you know that it runs on a software that is called
00:24
MediaWiki. So all the wikis run on this software but there's also tens of thousands of websites around the world that use this. A very cool example is NASA is using it for some other projects. But this is sort of
00:42
the core of Wikipedia and the other projects and it's of course something that is open source that anyone can contribute to. But it's not what I'm going to be talking about today. Because surrounding Wikipedia in all the other projects there is a huge ecosystem of software tools. You can
01:02
think of these as like third-party integrations. People build bots that do edits on Wikipedia, that fight vandalism. There are machine learning algorithms, there are pipelines for data that then go to research purposes. There's a
01:21
lot going on. So I'm going to be talking about this part. Just a quick word about me. I'm a software engineer with a technical engagement team. You can see we're part of the technology department and our team is kind of split into parts. We have the cloud services team, SREs and engineers that
01:45
build services and platforms for all these tool developers. Then we have developer advocacy. They do a lot of things. They are writing documents and running outreach programs and doing everything so that our technical
02:01
contributors can build cool stuff on top of our platforms and content. So just to give you an idea of the scale of this, we have over 300,000 editors that contribute to Wikimedia projects every month. Wikimedia Commons,
02:23
which is the project that is for free media files, videos, images, so on. There are now over 90 million media files on there and we have 1.7 billion unique devices that access Wikimedia. Again, statistics for every month.
02:43
Some of you may recognize at least some of these projects. Of course, Wikipedia is the flagship. There are other ones like Wikidata, Commons that we just mentioned, the Wiktionary and many more. We're gonna take a look at
03:00
tools ecosystem that I mentioned in the beginning and we're gonna start from from Wikipedia. This is the thing that most people know about us and from there we can explore the tools that are community created software that interacts with and contributes to Wikimedia project in some way. An
03:20
example here is PyWikiBot. It's a framework for building bots. So if you have a wiki and you want to run some kind of bot that does something, some type of edits, you would very likely use PyWikiBot as a framework to develop this. From the tools themselves, we're gonna go and have a
03:43
look at the services and the platforms that support them. We're gonna start with an example of a couple tools and how they actually integrate with one of the projects. This is a wiki project called Women in Red and a
04:04
wiki project is a group of users on Wikipedia that decide that they want to work on something specific. They come together to work as a group and in this case it's to fight the content gender gap. They observed that only around 15% of English Wikipedia's biographies were about women and as of
04:27
the 23rd of January this year, they have managed to take this number up to around 19.45% in about seven or eight years and so where does this
04:41
very precise statistic come from? You can see mentioned here and in red it's something called Human Nikki and so this is what we would call a tool and in this case this is a dashboard. It provides statistics about the gender gap on all the Wikimedia projects and you can see here that female content is
05:02
the orange part and then the rest is male and if you go to this website you can see it in a more granular way by country, by project, by date of birth and so on. So if you want to contribute to this project, an easy way
05:30
to do it is to go to this wiki project site and you can see different lists that have been curated and for instance here we can see female activists so you can get a list of all the female activists and there are
05:44
many many categories like this and some of these lists are curated by humans but most of these lists actually come from a bot that's called ListeriaBot which makes queries on Wikidata which is another one of the projects. You can think of it like a huge knowledge graph that you can
06:05
query using a similar language to SQL. It's called Sparkle so yeah you can use Wikidata to get lists with very high granularity. You can have activists from Germany or activists from Germany that were born
06:22
after a certain date and so this is what ListeriaBot does. So we have seen two different tools. One was a dashboard, one was a bot and there are thousands of these tools. There are thousands of maintainers and we're gonna
06:41
take a look at how we sustain these people. So I mentioned that my team is a cloud services team and what we do is we provide hosting, we provide compute virtual machines, we provide data services for all these tools to
07:01
function. So again to give you an idea of the scale of this, 30% of all the edits on Wikipedia as of 2020 were made by bots hosted on our services. For just to make you aware of that this is a quite important part of the ecosystem.
07:28
So I mentioned there are thousands of tools and as of a couple years ago we now have a catalog where you can browse and search and find the tool you need for your project or if you are a tool maintainer you can add it here so
07:42
that people know it exists. Then what you see here are lists that have been curated. We have something called it Coolest Tool Award. If you look down you can see that Humaniki was one of the award-winning tools in 2021. Some of
08:03
you may recognize this as a Jupyter notebook. So this is a JupyterHub deployment that we have that is directly integrated with all of our data services so that people can access dumps, they can access wiki replicas, they can access a lot of things that otherwise would be gigabyte and gigabyte and
08:25
gigabyte of data they would have to download onto their own computers. A similar tool is called Quarry. It's a public query interface for wiki replicas. Wiki replicas, I didn't mention it, there are replicas of our
08:43
production databases. And the cool thing about this is that all the queries are public so people can actually search and see other people's queries and be inspired or if you're not very good with SQL well we can you can adapt someone's query to your needs. Here you can see a specific
09:11
query. So these services are still kind of tools that serve this ecosystem but we also need somewhere to host them. So we have a
09:26
platform as a services offering, it's called Toolforge. It's not quite as fancy as Heroku or DigitalOcean or anything of the sort. If you look closely you see that you have to actually SSH into it. But it's still very
09:41
powerful and very convenient for our users. It integrates again with data sources and it has managed databases, a redis instance, an elastic search cluster that everyone can use without having to maintain all these systems themselves.
10:00
So yeah the back end here is Kubernetes. Then for more complicated projects, some projects need more compute for instance. We also have a Cloud VPS offering so that people can spin up their own virtual machines and basically do what they want on them. So this runs on top of OpenStack.
10:32
And how could one get involved with this? So it's possible to get involved in any of these layers, either as a tool maintainer or as
10:46
maintainer of any of these platforms. And that's kind of the thing I wanted to highlight a little bit today, is that this is kind of a unique opportunity to actually contribute to platform and to infrastructure. We have
11:08
people that work with our team and they are on our IRC channels and they push patches just like everyone else. And if you don't know, you would think they are just another software engineer on the team. And I asked some
11:24
of them, what brings you here? And of course many of them associate with the free knowledge movement and open source and all of that. But many also said that this is a unique opportunity to actually play with things like OpenStack or Terraform or Kubernetes in a
11:46
situation where actually you have real traffic and real users, which is something that is kind of very difficult to do at home. And there are not many other projects where you would have this possibility. So some
12:01
ways to get involved. We have several outreach programs. We have Outreachy, which is an internship that runs twice a year. It's targeted more towards underrepresented demographic. Google Summer of Code, that's once a year. And
12:28
both programs are open to anyone. So you don't have to be a student. You can be someone who is changing careers or doing some kind of a letter move. Google Summer Code has also become more flexible. It's not just summer
12:42
anymore. There are shorter projects. There are longer projects. So that could be a way to get involved and get some kind of hands-on mentorship. Another way would be to come to the Wikimedia hackathons. We have one in Athens, in May
13:00
this year. And then one is part of Wikimedia that takes part in, that is in Singapore in August. And of course if you are brave you can just dive right in because everything we do is open and it's out there on the internet. Documentation of course, but like just even our like project boards
13:25
and fabricator that it's a collaborative software for task management and such. So if you go there you will see that there is a huge variety of tasks. You can see the work boards of different teams at the foundation. You can see volunteer led projects and projects where people
13:44
work together alongside each other. So a way could be simply to you know find something that interests you and look at the documentation and then come on our IRC channels and contact us and you know and that's it. So yeah
14:07
I have added some links which can be helpful to get started. Yeah and you are of course free to just reach out to me. I had my Twitter handle on the first slide. My slides are published on on the website. We have 45 seconds
14:24
for questions. Thank you.