We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Beyond Wikipedia

00:00

Formal Metadata

Title
Beyond Wikipedia
Subtitle
Discovering Wikimedia's Open-Source Ecosystem
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
While the Wikimedia Foundation is best known for its flagship project, Wikipedia, and the MediaWiki software that powers it, the Foundation's open-source ecosystem extends far beyond these well-known projects. In this talk, we will explore the fascinating world of Wikimedia's open-source tools ecosystem and the cloud infrastructure that makes it possible. We will showcase some of the coolest tools and projects, and we will highlight the unique opportunity that the Foundation offers for contributing to its cloud infrastructure – a rare chance to work on infrastructure for a cause that does good in the world, supporting the Foundation's mission of providing free and open knowledge to the global community. Whether you are a seasoned open-source developer or a newcomer to the field, this talk will provide valuable insights and inspiration for getting involved in Wikimedia's vibrant community of contributors.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
Point cloudQuicksortProjective planeCloud computingWebsiteSoftwareMereologyOpen sourceCore dump10 (number)RobotContent (media)Service (economics)WikiBuildingComputer programmingINTEGRALHeegaard splittingSoftware engineeringVirtual machineComputing platform
HypermediaHypermediaStatisticsComputer fileProjective planeText editorScaling (geometry)Computer animation
Computing platformPoint cloudService (economics)WikiSoftware1 (number)RobotType theoryBuildingSoftware frameworkComputing platformProjective plane
Latent heatGroup actionGenderWikiProjective planeStatisticsNumberContent (media)CASE <Informatik>Computer animation
GenderVisualization (computer graphics)MereologyContent (media)StatisticsCASE <Informatik>Projective planeGenderWebsite
Electronic mailing listProjective planeWikiInstance (computer science)RobotFormal languageWebsiteGraph (mathematics)Category of beingQuery language
RobotSoftware maintenanceComputer animation
Service (economics)Point cloudRobotCloud computingService (economics)Interface (computing)DatabaseAdaptive behaviorQuery languageElectronic mailing listLibrary catalogMereologyNeuroinformatikProduct (business)Software maintenanceLaptopProgram flowchart
Gamma functionInverter (logic gate)Elasticity (physics)Source codeService (economics)Projective planeInstance (computer science)Physical systemVirtual machineFront and back endsComputing platformComputer animation
Task (computing)Data managementCollaborationismSoftware developerCodeGoogolWeb pageWikiPatch (Unix)Data managementFlow separationTask (computing)Real numberComputer programmingInternetworkingSoftware engineeringProjective planeFreewareRight angleSlide ruleBitSoftware maintenanceLink (knot theory)WhiteboardMereologyCollaborative softwareComputing platform2 (number)Different (Kate Ryan album)Open sourceStudent's t-testComputer animation
Program flowchart
Transcript: English(auto-generated)
Hi and welcome everyone. I am here today to speak to you about a little bit about Wikimedia's open source ecosystem. So I assume all of you know what Wikipedia is and maybe some of you know that it runs on a software that is called
MediaWiki. So all the wikis run on this software but there's also tens of thousands of websites around the world that use this. A very cool example is NASA is using it for some other projects. But this is sort of
the core of Wikipedia and the other projects and it's of course something that is open source that anyone can contribute to. But it's not what I'm going to be talking about today. Because surrounding Wikipedia in all the other projects there is a huge ecosystem of software tools. You can
think of these as like third-party integrations. People build bots that do edits on Wikipedia, that fight vandalism. There are machine learning algorithms, there are pipelines for data that then go to research purposes. There's a
lot going on. So I'm going to be talking about this part. Just a quick word about me. I'm a software engineer with a technical engagement team. You can see we're part of the technology department and our team is kind of split into parts. We have the cloud services team, SREs and engineers that
build services and platforms for all these tool developers. Then we have developer advocacy. They do a lot of things. They are writing documents and running outreach programs and doing everything so that our technical
contributors can build cool stuff on top of our platforms and content. So just to give you an idea of the scale of this, we have over 300,000 editors that contribute to Wikimedia projects every month. Wikimedia Commons,
which is the project that is for free media files, videos, images, so on. There are now over 90 million media files on there and we have 1.7 billion unique devices that access Wikimedia. Again, statistics for every month.
Some of you may recognize at least some of these projects. Of course, Wikipedia is the flagship. There are other ones like Wikidata, Commons that we just mentioned, the Wiktionary and many more. We're gonna take a look at
tools ecosystem that I mentioned in the beginning and we're gonna start from from Wikipedia. This is the thing that most people know about us and from there we can explore the tools that are community created software that interacts with and contributes to Wikimedia project in some way. An
example here is PyWikiBot. It's a framework for building bots. So if you have a wiki and you want to run some kind of bot that does something, some type of edits, you would very likely use PyWikiBot as a framework to develop this. From the tools themselves, we're gonna go and have a
look at the services and the platforms that support them. We're gonna start with an example of a couple tools and how they actually integrate with one of the projects. This is a wiki project called Women in Red and a
wiki project is a group of users on Wikipedia that decide that they want to work on something specific. They come together to work as a group and in this case it's to fight the content gender gap. They observed that only around 15% of English Wikipedia's biographies were about women and as of
the 23rd of January this year, they have managed to take this number up to around 19.45% in about seven or eight years and so where does this
very precise statistic come from? You can see mentioned here and in red it's something called Human Nikki and so this is what we would call a tool and in this case this is a dashboard. It provides statistics about the gender gap on all the Wikimedia projects and you can see here that female content is
the orange part and then the rest is male and if you go to this website you can see it in a more granular way by country, by project, by date of birth and so on. So if you want to contribute to this project, an easy way
to do it is to go to this wiki project site and you can see different lists that have been curated and for instance here we can see female activists so you can get a list of all the female activists and there are
many many categories like this and some of these lists are curated by humans but most of these lists actually come from a bot that's called ListeriaBot which makes queries on Wikidata which is another one of the projects. You can think of it like a huge knowledge graph that you can
query using a similar language to SQL. It's called Sparkle so yeah you can use Wikidata to get lists with very high granularity. You can have activists from Germany or activists from Germany that were born
after a certain date and so this is what ListeriaBot does. So we have seen two different tools. One was a dashboard, one was a bot and there are thousands of these tools. There are thousands of maintainers and we're gonna
take a look at how we sustain these people. So I mentioned that my team is a cloud services team and what we do is we provide hosting, we provide compute virtual machines, we provide data services for all these tools to
function. So again to give you an idea of the scale of this, 30% of all the edits on Wikipedia as of 2020 were made by bots hosted on our services. For just to make you aware of that this is a quite important part of the ecosystem.
So I mentioned there are thousands of tools and as of a couple years ago we now have a catalog where you can browse and search and find the tool you need for your project or if you are a tool maintainer you can add it here so
that people know it exists. Then what you see here are lists that have been curated. We have something called it Coolest Tool Award. If you look down you can see that Humaniki was one of the award-winning tools in 2021. Some of
you may recognize this as a Jupyter notebook. So this is a JupyterHub deployment that we have that is directly integrated with all of our data services so that people can access dumps, they can access wiki replicas, they can access a lot of things that otherwise would be gigabyte and gigabyte and
gigabyte of data they would have to download onto their own computers. A similar tool is called Quarry. It's a public query interface for wiki replicas. Wiki replicas, I didn't mention it, there are replicas of our
production databases. And the cool thing about this is that all the queries are public so people can actually search and see other people's queries and be inspired or if you're not very good with SQL well we can you can adapt someone's query to your needs. Here you can see a specific
query. So these services are still kind of tools that serve this ecosystem but we also need somewhere to host them. So we have a
platform as a services offering, it's called Toolforge. It's not quite as fancy as Heroku or DigitalOcean or anything of the sort. If you look closely you see that you have to actually SSH into it. But it's still very
powerful and very convenient for our users. It integrates again with data sources and it has managed databases, a redis instance, an elastic search cluster that everyone can use without having to maintain all these systems themselves.
So yeah the back end here is Kubernetes. Then for more complicated projects, some projects need more compute for instance. We also have a Cloud VPS offering so that people can spin up their own virtual machines and basically do what they want on them. So this runs on top of OpenStack.
And how could one get involved with this? So it's possible to get involved in any of these layers, either as a tool maintainer or as
maintainer of any of these platforms. And that's kind of the thing I wanted to highlight a little bit today, is that this is kind of a unique opportunity to actually contribute to platform and to infrastructure. We have
people that work with our team and they are on our IRC channels and they push patches just like everyone else. And if you don't know, you would think they are just another software engineer on the team. And I asked some
of them, what brings you here? And of course many of them associate with the free knowledge movement and open source and all of that. But many also said that this is a unique opportunity to actually play with things like OpenStack or Terraform or Kubernetes in a
situation where actually you have real traffic and real users, which is something that is kind of very difficult to do at home. And there are not many other projects where you would have this possibility. So some
ways to get involved. We have several outreach programs. We have Outreachy, which is an internship that runs twice a year. It's targeted more towards underrepresented demographic. Google Summer of Code, that's once a year. And
both programs are open to anyone. So you don't have to be a student. You can be someone who is changing careers or doing some kind of a letter move. Google Summer Code has also become more flexible. It's not just summer
anymore. There are shorter projects. There are longer projects. So that could be a way to get involved and get some kind of hands-on mentorship. Another way would be to come to the Wikimedia hackathons. We have one in Athens, in May
this year. And then one is part of Wikimedia that takes part in, that is in Singapore in August. And of course if you are brave you can just dive right in because everything we do is open and it's out there on the internet. Documentation of course, but like just even our like project boards
and fabricator that it's a collaborative software for task management and such. So if you go there you will see that there is a huge variety of tasks. You can see the work boards of different teams at the foundation. You can see volunteer led projects and projects where people
work together alongside each other. So a way could be simply to you know find something that interests you and look at the documentation and then come on our IRC channels and contact us and you know and that's it. So yeah
I have added some links which can be helpful to get started. Yeah and you are of course free to just reach out to me. I had my Twitter handle on the first slide. My slides are published on on the website. We have 45 seconds
for questions. Thank you.