We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

#bbuzz: A Crisis of Critical Thinking

00:00

Formal Metadata

Title
#bbuzz: A Crisis of Critical Thinking
Title of Series
Number of Parts
48
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
A crisis of critical thinking - what Wikipedia can teach us about the future of information and technology’s role Over time, we have all become avid consumers of information. Yet that information is increasingly fragmented and partisan, based on isolated effort instead of consensus. As we look to the future of information production, how can we build solutions that incentivize people to learn from each other, to understand the full complexity and nuance behind an issue instead of just their own perspective? For the past two decades, Wikipedia has been helping humans and machines process the entire spectrum of information, from lengthy research and graduate level dissertations at one end to soundbites and trivia questions dictated to voice assistants at the other. We can provide a valuable model for how technology can help the next generation of knowledge-seekers decode, decipher and analyze information. This talk will focus on knowledge in the next evolution of the internet, and how we must enable not just more access to information, but also critical thinking at global scale.
1 (number)Multiplication signGroup actionReflection (mathematics)InformationView (database)MultiplicationPoint (geometry)Event horizonXMLUMLMeeting/Interview
View (database)Condition numberCrash (computing)Vulnerability (computing)Pairwise comparisonMereologyTransportation theory (mathematics)Principal idealDistanceAsynchronous Transfer ModeFingerprintAreaÜberlastkontrolleFrame problemGeometryAdditionSelf-organizationSurgeryStrategy gameWeb pageTape driveWeb 2.0Server (computing)Plane (geometry)Personal area networkOnline chatBit rateLaceACIDCommon Language InfrastructurePort scannerThomas BayesDecision tree learningWordSineBenz planeCovering spaceVideo GenieTwin primeRing (mathematics)OvalAirfoilStructural loadAiry functionCorrelation and dependenceTrigonometric functionsCone penetration testUniform resource locatorWave packetNatural languageSound effectOrganic computingDivisorDrop (liquid)Process (computing)Inclusion mapSource codeIntegrated development environmentPhysical systemControl flowInsertion lossContent (media)Price indexTheoryReading (process)WeightEntropie <Informationstheorie>Type theoryForm (programming)Lattice (order)Network topologySpontaneous symmetry breakingNumberGroup actionCategory of beingLocal GroupDirected setMassEvent horizonStress (mechanics)SpacetimeLocal ringProcess (computing)Expert systemPhysicalismLink (knot theory)AdditionWeb pageData centerNatural languageSearch engine (computing)View (database)Server (computing)Data miningOpen setReflection (mathematics)Right angleReading (process)Strategy gameCircleCycle (graph theory)CodecBookmark (World Wide Web)Multiplication signSurfaceFamilyText editorBitSound effectSlide ruleMassCASE <Informatik>Direction (geometry)Program slicingGroup actionRow (database)Projective planeBerners-Lee, TimStatisticsWebsiteNumberSource codeComputer animation
BitTouch typingPoint (geometry)Text editorWebsiteOpen setDifferent (Kate Ryan album)Ocean currentNatural numberCollaborationismDemosceneLibrary (computing)Perspective (visual)File archiverOrder (biology)View (database)Source codeEvent horizonMultilaterationComputer animationMeeting/Interview
Perfect groupMetropolitan area networkQuicksortSource codeContent (media)Arithmetic meanReflection (mathematics)InformationBitComputer animationMeeting/Interview
Variety (linguistics)Instance (computer science)Different (Kate Ryan album)Vector spaceWebsiteContent (media)SpacetimeMathematicsText editorAsynchronous Transfer ModeLine (geometry)Desire pathType theoryComputer animation
Physical lawContext awarenessEncryptionZoom lensInformation privacyComputing platformMobile appFacebookOnline chatSearch engine (computing)Level (video gaming)Cartesian coordinate systemComputer animation
Dependent and independent variablesInfinityMoment (mathematics)Open sourceNatural languageMachine learningNeuroinformatikInternetworkingProcess (computing)Physical systemComputer animation
Level (video gaming)Cartesian coordinate systemProcess (computing)NeuroinformatikTrailRight angleNatural languageTerm (mathematics)Information privacyArtificial neural network
WebsiteContext awarenessGroup actionPoint (geometry)Multiplication signDecision theoryMoment (mathematics)InformationDamping
Programming paradigmSpectrum (functional analysis)InformationHypothesisGoogolExpert systemWritingInformationProgramming paradigmContent (media)Multiplication signAnalytic setMoment (mathematics)Physical systemComputer animation
Commitment schemeTelecommunicationPerformance appraisalRational numberMathematical analysisStandard deviationAnalytic setObservational studyComputing platformMultiplication signFormal verificationRight angleConstructor (object-oriented programming)BitSoftwarePoint (geometry)BuildingView (database)WebsiteOnline helpProcess (computing)Inclusion mapGroup actionMeeting/Interview
Library (computing)Perspective (visual)Physical systemWeb crawlerWeb 2.0Entire functionProcess modelingGroup actionContent (media)Software testingAlgorithmHypothesisResultantNatural languageBuildingAnalytic setConstructor (object-oriented programming)Event horizonMeeting/Interview
Point (geometry)View (database)Computing platformCommitment schemeContent (media)Parameter (computer programming)Expression
CAN busContent (media)Point (geometry)Level (video gaming)Hand fanAlgorithmText editorOrder (biology)Computing platformInformationVirtual machinePhysical systemBuildingPerspective (visual)Open sourceView (database)TrailMeeting/Interview
Content (media)WebsiteInternet forumPhysical systemUser-generated contentWeb crawlerOpen setFigurate numberFormal verificationPeer-to-peerEndliche ModelltheorieGame controllerPoint (geometry)Set (mathematics)Analytic set
Physical systemInformation privacyForm (programming)Link (knot theory)Meeting/Interview
Web pageRevision controlTwitterVirtual machineSet (mathematics)Data managementAlgorithmMultiplication signPerformance appraisalSoftware testingEndliche ModelltheorieWave packetMeeting/Interview
UsabilityObservational studyTwitterMeta elementComputer configurationNatural languageCodeWordMultiplication signDecision theoryInformationLink (knot theory)Computer animation
Observational studyTwitterOnline helpMeta elementWordWikiNatural languageComputer configurationCodeMultiplication signCompilation albumView (database)WeightBuffer overflowType theoryRight angleStack (abstract data type)1 (number)Representation (politics)RobotPhysical systemBuildingPoint (geometry)Electronic mailing listVirtual machineObservational studyMathematicsWeb pageProper mapSpacetimeNumberBitGenderPolarization (waves)MultiplicationInformationWikiMachine learningBlogConstructor (object-oriented programming)Degree (graph theory)Instance (computer science)Slide ruleText editorExpert systemContrast (vision)Modal logicPersonal identification numberWordChemical equationXML
Multiplication signResultantTrailMereologyHypermediaAreaMedical imagingField (computer science)Content (media)Query languageGUI widgetStudent's t-testLink (knot theory)Computational linguisticsMassControl flowSearch engine (computing)Natural languageNumberRepository (publishing)InformationInstance (computer science)WebsiteExterior algebraBlogOpen sourcePhysical systemProcess (computing)BitMultiplicationPlug-in (computing)HyperlinkMathematical analysisType theoryInformationstheorieLevel (video gaming)Information retrievalSpacetimeElasticity (physics)Validity (statistics)Moment (mathematics)Software testingCASE <Informatik>
XMLUML
Transcript: English(auto-generated)
Hello, everyone, and thank you for tuning in and welcome to Berlin Buzzwords. I'm excited to be here, and I thank you for showing up in support of this conference. This is, of course, a very different experience from the very first Berlin Buzzword noted
11 or so years ago. But regardless as to whether it is in person in one of my favorite cities in the world, Berlin, or done virtually as we are now, I'm really excited to have the opportunity to kick things off with you all today.
Before we dive into the next 30 minutes, I do want to step back and acknowledge what's happening all around us, even while we still show up here and are trying to learn and improve ourselves. Whether you are from the United States or not, or personally affected by current
events, whether that is systemic racism in your home country, the ongoing global pandemic, the worldwide protests, or simply are just deeply concerned about your ability to earn a living, I hope you and your loved ones are safe. And I hope we can all come to these really tough questions facing us as a world with an open mind and an open heart that is focused on making this conference
our collective shared interest in technology and our broader world welcoming to all. To that end, I want to explore with you a question that is the inspiration for why I joined the Wikimedia Foundation as CTO
10 months ago or so, one that I think is very much relevant to this group and the topics that we all think about daily. As someone with a background in search, discovery, and information seeking, it's a question that has plagued me for years, never more so than
now, given current events, the rise of disinformation, and gratification culture. I want to explore this question of how can we use tech to help people think more critically, ideally, as tech is want to do with less work or at least less time.
Put another way, I want to ask the question, how do we shrink the time it takes for each of us to get to that personal moment, especially when we are considering deeply difficult topics that often require multiple points of view and engagement over time.
Now, when I first submitted this talk way back in what seems like ancient times of, I think, January, I originally thought I would explore this idea of critical thinking through the lens of a current event, but I ultimately came to the conclusion to start with a lighter subject and show
how quickly it can lead to very deep and thoughtful reflection on a subject. So I hope you will indulge me as I want to explore these questions of critical thinking by using a personal hobby of mine, cycling,
both road and mountain bike for the record, as well as, of course, our most well-known project here at the Wikimedia Foundation, Wikipedia. Don't worry, you don't need to read the slide. In case you aren't familiar with us here at the foundation, some quick, fun facts about our little slice of the world.
We serve up over 50 million plus articles across some 300 languages. Chances are, if you're listening to this, there is a Wikipedia in your language. In April alone, we had 26 billion page views and 53 million edits from around the globe.
And all of those are, of course, really cool stats, but perhaps the most amazing thing about us from a tech standpoint is, as a top 10 global site, we are incredibly efficient at doing this, requiring just 2200 servers across five data centers.
And trust me, now that I've seen behind the hood here, that is incredibly efficient, even though it feels like a big number. But enough about us, let's get back to our question on critical thinking. I want us to explore this
question by going back to our cycling English Wikipedia page. Despite this being a topic that is relatively straightforward on the surface, after all, it's a bicycle. The page itself covers everything from the basic to the incredibly rich,
as one might expect if you've ever perused Wikipedia. It covers different bikes, like the unicycle, which I've always wanted to learn to ride. Hey, fun for the whole family. To recumbents, which
combine two of my favorite things, biking and reclining. It even has a disambiguation link to my favorite team sport, ice hockey. No surprise when you dig into this, the community engagement is also incredibly robust. This page has been edited nearly
2500 times, and it's not even anywhere near the top of our most edited pages. And it's been edited by approximately 1300 editors, with the last edit happening just last week, before it gets a bit more interesting.
With just a few clicks, we can get into some of our world's most pressing concerns. Topics like air pollution, or medical topics like arthritis and cycling's effects on it. Of course, we can also touch on
social and political movements like critical mass and a cyclist's right to be on the road, which of course leads to much deeper reading on strategies for political activism. In some ways, we've now come full circle to my opening reflection on where we are in the world.
But in reality, you're probably wondering, so what? All I really did there was demonstrate. We as humans, Tim Berners-Lee, magical little invention we called the web, we clicked a few links, we browsed through some pages.
In essence, most of the critical thinking here was done by the contributors up front, as they decided what links to include, as they decided what's relevant to a topic and how to organize it.
While I wasn't completely passive in that I chose what to click based on my own personal curiosity, I'd have to do a lot of additional work to inject my own critical thinking into the process, perhaps by asking questions like, what else should I be thinking about related to mass?
What are the pros and cons of direct action political activism? What do experts say about the physics of a bike or about air pollution? And then of course, I would have to go do the requisite searching, browsing, refining, that is all too
familiar to all of us as users of Wikipedia and search engines like Google. But of course, the source of this story, a lot more. I want to dig in on what's going on behind the scenes a little bit with you,
both from a contributor's point of view and from us. How does Wikipedia do this work? It puts together the millions of articles that you and others read. Every single article on our site is written by individual volunteers working
alone or together. Each contribution involves summarizing sources and evaluating different perspectives in order to present the most comprehensive and neutral perspective on any topic, whether that's celebrity biographies or current affairs or historical events.
By its very nature, collaborative editing encourages people to consider other viewpoints. Wikipedia is better because of the fact that editors disagree. I'll touch on this more later. It forces us to question why we think a topic
should be represented in a certain way. Of course, our editors rely on things like secondary sources. They cite open access journals. They work with libraries to add more references as they improve and build articles. From a Wikipedia article, you can click to navigate
the citations to great journalism, the collections of some of the world's leading museums and archives. And of course, you can access the latest research in open access science. Because Wikipedia is written by the world, we are also, of course, a reflection
of its biases. We have a ton of articles about Europe, about military history, and about man. That is reflective not only of the interests of people who currently edit Wikipedia, but it's also the sort of information historically considered worthy.
And I emphasize historically considered worthy of being in Wikipedia. We also still struggle with finding and rooting out actors with malicious intent or who are out for personal, corporate,
or political gain. Wikipedia is by no means perfect. In fact, the more we know what is in Wikipedia, the more we know what is missing. So that's a little bit about the content on Wikipedia and the people behind it. But what about the users?
All of you in the audience, the people who read Wikipedia, who check it as their main source of information on any given topic. Here's the thing. Perhaps it's not all that surprising. But despite all of the work our editors do to
organize that content, users often have their own agenda on our site, just as I'm sure they do on your site. Our users routinely create their own desire paths, their own herd paths, their own navigation through our content, whether we want them
to or not. It's almost as if our users know something that we as curators don't know. We see this behavior throughout our experience here. For example,
one of the things we've been working on on our research team here at the Foundation is mapping out our topic spaces and then overlaying it with our anonymized user session behavior. One of the things you quickly see is there are both very
focused browsing sessions and less focused browsing sessions. Additionally, many of our users also go back out to our search or to sites like Google or DuckDuckGo and come back in as well to reengage with topics
along different vectors, different lines of thinking. Interestingly enough, this type of behavior also depends on the topics that users are searching for. For instance, our team has also found that users reading STEM articles, science, technology, engineering and math are often very much
in a get in, get your information and get out mode. Whereas users reading us about culture based topics spread out more to cover a variety of different topics. Guess
what? And I'm not bursting your bubble here, but I think this user behavior is also spilling over at the societal level. We see this in the rise of privacy first search engines like Go or chat apps like Telegram. We see it in new laws like GDPR
or even the greater awareness of things like end to end encryption in the context of applications like Zoom and Slack and our messaging platforms. And of course, we see these same questions in the never ending debate around Facebook's use of data.
It all begs the question in my mind, what is our responsibility here as technologists? I'd like to share with you for a moment how my thinking has evolved on this subject and then how it leads
to the crux of this idea around critical thinking. Like many of you, I suspect I got into tech because it was a ton of fun to work and solve really hard problems on this infinite landscape that is a computer and the Internet. And of course,
I'd be lying if I didn't say the pay is pretty good, too. Once I discovered search and open source and distributed systems, it only further amplified that joy I got from problem solving as my career grew alongside the growth of the Internet. Throw in natural language processing
and machine learning and I felt like I was in engineering heaven. Perhaps you all can relate. Until the doubt started, it first creeped in at the dawn of the big data era and the explosion of
ad tracking tech. And it grew louder the more I engaged with it. Don't get me wrong, I fully participated in this movement. I've stood on stages like this one. In fact, I've done it in buzzwords. Thinking about the opportunities
to improve applications and understanding of users and their desires through data mining, natural language processing, and of course, the big buzzword of all, artificial intelligence. But I could never quite shake the doubt that there has to be more to it.
I assuaged that doubt by telling myself I wouldn't work in industries that don't align with my values. But the doubt persisted. There has to be something more. The thing is,
I still believe in the incredible opportunity we have in front of us to use computers and data to progress ourselves. And I still believe private companies have the right to collect data as long as their terms of use are clear, well understood, and respect our privacy.
I had to ask myself some pretty hard questions, many of them centered around is there a way to bring these seemingly competing things together? And I think there is. I think we have an opportunity, you might even say a critical opportunity in front of us.
We have a chance to raise the discourse and understanding online while still selling our wares, and our eyeballs, doing our searches, all of the things that we as technologists like to do. I believe we have a chance
to enable more critical thinking online. We are at a point in our time as technologists, scratch that, as humans help people make better use of our information, to
understand the nuances of information, to make more informed decisions and more informed opinions, not just whatever comes to mind in the moment. I believe our users are already telling us that they want this. Their behavior shows it. And
while I think it can often be attributed simply to lack of awareness, I do also think it's often due to our desire to funnel people in to specific actions. This leads me to believe we need a new
paradigm for content creation, curation, and readership. We need one that aligns with matching all of the information needs of users. And for those of you in the audience, especially who have studied search, I think you really know what
I'm talking here about when you really think about all of the information needs we have. Not just those that are very tactical in the moment, not just those that are designed for QA systems like Alexa or search systems like Google. We have to find ways to get beyond that quick, snackable content.
We need to recognize that are you spending a lot more time than we give them credit for looking for information, trying to dice, trying to understand. We have to get past the session-based analytics that drive so much of our behavior and look beyond. We have to recognize
that our users want to build a deeper understanding and they want to do it faster. To reflect back, you'll also come to the conclusion that we as humans, over thousands of years, have built many constructs
for critical thinking. Maybe not always in the most inclusive way, but help but feel like we here are letting those gains slip away online as we push people to find their tribe
or we funnel them to meet our analytics goals or our sales goals. We need to recognize that we as builders of these large tech platforms have forgotten about the social constructs that encourage critical thinking. The Wikipedia movement has built many of these constructs
into the movement, some through software, but most through people, policies, processes. ideas like trying to achieve a neutral point of view or fact-based reasoning or consensus building or verifiability or
perhaps our most cherished that everyone has the right to knowledge. Here's the really cool thing about these social constructs the community here at Wikipedia has built. And by the way, keep in mind, I'm here
to this movement. I'm still learning. There's so much to learn and nobody knows how this 20-year-old experiment by Gels and Larry Sanger works. But there are a few bits I want to have you take away. First,
there was a recent study published in Business Review that showed that when people contribute to Wikipedia, their contributions and perhaps I like to think, hopefully I like to think, their views get more neutral over time. They get more neutral over time. A
the conclusion that the sites most tend to also be of much higher quality.
When was the last time you heard either of those kinds of behaviors online in 2020? This feels almost like an impossible dream. And yet it's happening. In fact, I think it's just the opposite of an
impossible dream. I think we as a group actually have many of the tools and skills in place to enable this deeper and richer engagement with our content and each other.
And I believe we can do right by all the social constructs developed over all the years that encourage critical thinking. While putting a modern spin on them, that also makes room for so many of the voices that have been traditionally shut out. Think about it. We have tools like
search and recommendation engines for servicing content to users. What would it look like if we tweaked our scoring algorithms for diversity of results when we detected exploratory events intense on controversial topics? We also have natural language processing models for identifying related topics
as well as topics bring different perspectives. What if we fed those into our feeds piece by piece instead of the usual fodder that reinforces our belief systems? We also have crawlers that can harvest the entire web and build authoritative
algorithms like PageRank. And of course, we have analytics capabilities. Oh, do we have analytics capabilities for identifying all these user journeys and testing out these hypotheses? Yet there are still many open questions, especially on the social construct
side. But here's where I think some of the Wikipedia values can actually help us on the technology side. For start, we might ask ourselves, how might we use this neutrality on our platforms?
First off, I do want to note here that for us, a neutral point of view doesn't mean our community of volunteers don't try to understand and represent multiple sides of the story. Nor do I want to represent that even some of those feel very strongly about a particular one of those sides.
But it does mean they have a commitment to try to present those sides in a neutral, unbiased way. How might we use this on other platforms? How might we use things like text classification to recognize content that would benefit from other
viewpoints? How might we use it to help people better flesh out their side of an argument? How might we use it to help people see the other aspects of a problem? Perhaps most importantly, how do we do this without taking away our users creativity and freedom of expression?
Neutrality feeds on consensus. One of the interesting aspects of Wikipedia, one that I didn't quite get at first, is that everyone sees the same article on any given topic. There's some nuance
there. There's no customized content for someone who's a fan of bikes from Trek versus Giant, for example. Because there is only one article, our editors have an incentive to come to consensus. They must edit this one article on a topic. Therefore, they must
find a way to reconcile their different perspectives, emphasize a neutral point of view, and to decide what information to include in the article. For many of you coming here from open source, this perhaps is nothing new at a social level. Probably isn't
something you've thought about building into your tech, other than perhaps at the really low level of things like consensus algorithms like Paxos in our distributed systems where we have to build consensus in order to make sure the machines are up. Let me ask the question, what might it look like for
you to build in your system using tools like collaborative filtering or other recommendation engines to incentivize consensus on your platform? Maybe it's not obvious at first, depending on what your platform does, but I bet there are ways that you can do it. Right now, a lot of work
goes into verifying content on sites, not just what I'm sure yours is, or to fact checking politicians or trying to figure out whether something is misinformation or disinformation. In fact, active disinformation
campaigns are making it even more difficult for us to know what is true. We as technologists, we need to find a way to build verifiability into our user generated content
systems. We need a way such systems can mark content as unverified. And on top of that, we need a way that our users accept and try to do something about and not just try to legislate it away. I wish I
answered for this one, but I asked the question, how might we use all of our content harvesting systems like our fancy crawlers and our deepening model, even sites like Wikipedia and Wikipedia to better know what is credible, what is verifiable and what is trustworthy?
Now, all of these eyes in my mind really point to a need for a better set of open, transparent, verifiable moderation tools. It's interesting when I talk to my peers at other user generated content sites, they too express this same need
for better moderation tools. How do we help our users better care for our sites? They want tools that can be used across platforms so that we all don't have to reinvent the wheel. We want tools that build analytics like spam control with some of the more advanced features we've talked
about today. Tools that allow communities to easily describe, manage, update and enforce their values. Tools that provide privacy preserving, community controlled governances
at their heart. Most of all, we need tools that enable critical thinking, encourage us to be better forms of ourselves. One such way we are doing this here at the Foundation and in the community is through the creation of a system we call Jade. I'll link
it up at the end for you all. That is designed to capture the labeled data that we all know is so important machine learning algorithms, but we're doing it in a very Wikipedia way. Not only does it allow us to capture the usual annotations that many of you in the audience are familiar with when
building training and test sets for our machine learning algorithms, but it also allows our community to register disagreements, to decide on the best label and to make improvements over time. Just do when improving articles through pages, revisions and reverts.
Our hope is that Jade will play a critical role in helping our community track the trends and biases in the machine learning models we host for them. By making sure that Wikipedians have the final say
over the data we use to train and evaluate models, in many ways we are giving them a powerful tool to govern the algorithms managed on their behalf. Why does this ability to enable critical thinking matter? Perhaps it feels just too hard.
But I think beyond the obvious societal gains of more informed decision makers and a more informed electorate, practically speaking, I think it leads to deeper engagement and loyalty
as users are able to make more informed decisions, whether those are in what they buy, spend their time or who they vote for. We have a lot of work ahead of us and it won't be easy. But I trust we can go further and faster
the sooner we burst our bubbles and get our users out of their bubbles. To leave you with a few resources on how we might engage on this. And then, of course, I'm happy to answer a few questions. Of course, make sure we have these links available. Beyond that, I want to thank you for your time
and I hope you enjoy the rest of the conference. Hey, Grant, I'm Josh. I'm here in Berlin with the Buzzwords crew and I have a few questions
and comments on the Slack channel. So the first one comes from Matteo and he asks, is no bias equal to neutrality? So he further asks also to see some changes in society, isn't some degree of polarity necessary? That's a great question and a
very deep, rich topic. The Wikipedia community tries to get rid of bias completely or think through that there are multiple points of view. I think, in fact, if you just bring back up our resources here
and we share that. I think if you read like so under the Jade and neutral point of view bullets towards the ends of this, it's interesting. They've actually codified quite a bit what neutral point of view means, and it does not mean that we try to get everything to just be this one perfect, no biased view.
What they try to then do instead is represent all of the different views or at least many of them. And so to be able to compare and contrast, it's interesting. They've also developed kind of subtlety and nuance that is designed to give weight to ones that are more commonly accepted so as to not, you know,
necessarily bias will fringe ideas or what some people are not. I am by no means an expert here, but I do think, you know, to the first question, there is a need to have some of that tension. The real question then in the critical thinking approach
is how do we then resolve those tensions? How do we get past those? How do we build up constructs that allow us to have more constructive debates as opposed to just everybody going into their camp like now? That's a great question. On a related subject, so this is a question from me,
how do you deal with not only biases in communities, but communities that really gang up on each other in a way? And we see this right now in a lot of the political debates. Is there a way, you know, when we think about bias in machine learning, bias is effectively the information that is not knowable by the machine.
So how do we deal with this, being able to detect this? How do we trust certain communities, communities of editors? How are you thinking about these types of topics at Wikipedia? Yeah, that is a hard problem. I think, you know, there is a lot of work in the communities around governance. It certainly does remove
individuals biases. And of course, as I mentioned earlier, in one of the earlier slides, it stood up in very real ways. There's been another number of studies around things like the gender gap on Wikipedia
and representation for traditionally been underrepresented because history hasn't told their stories. So there's therefore not a citeable references. But there's a lot of work bringing forward those stories and make them citeable in an encyclopedia way.
For instance, the main way most articles get discussed, you can go to any Wikipedia page, you'll see a discussion tab or what's called page internally. Some of those talk pages in fact can be quite enlightening. They can also be quite toxic. And so I think, you
know, we have to recognize all points. We have to make space to listen for them and then come to our own opinions and hopefully informed opinions. I do think, you know, perhaps Wikipedia has some things going for it. Like I mentioned here and that at the end of the
day, there's one page, right? It's not somebody spins off and says here another page that has my view on it, right? Like the community will fleece that building and system list in the volunteer community have been thoroughly debated for a year or so.
I think, you know, we have to find ways to provide that governance as we go forward. And I suspect there's ways machine learning can help. I don't fully know if they can. We haven't enough evidence that allows us to at least recognize
when it's happening. It reminds me, I think, Overflow, I believe, published a blog post on one of the things they did on Stack Overflow to help the unfriendly bot. I think you can just search for that Stack Overflow and unfriendly bot. What they did was they
trained a machine learning bot to recognize when users comments weren't particularly helpful for solving a problem. And pretty quickly that when the people making the comments were told by the, hey, you probably could improve this. You could probably make this more opening and welcome that the users actually
did. So I think we could start with some of these more simple things and build and learn from those. So maybe a good related question is Peter asks, is one of Wikimedia's goals to become the barometer of opinion versus balance? Yeah, I don't think we're well, I of course can't speak
for the community. There are a lot of people with a lot of opinions. So are we meant to be the barometer of opinion? I think our goal is to make sure that we have people have the right to free and open access to knowledge. Right. And so I think our goal is to make sure that
is properly represented and served properly. I'm not even sure if that's the right word, but that knowledge has a place to live and is free and open to anyone in the world. So there's not so much of a question, more of a comment from Sarah.
She says, I agree with Grant that it should be a goal to enable critical thinking. At the same time, there seems to be this almost unsolvable problem that we have masses of information available, which is constantly coming in, making it very hard to stay on top of an in-depth understanding of facts. This is why we came up with text summarization, for example.
If these methods, though, take away depth in favor of making the masses readable, critical thinking could suffer. What are your thoughts on that? Yeah, for sure, Sarah, I think that's spot on. And I mean, I think that's why, you know, I talked about like if you study information theory, information retrieval, et cetera,
you know, there's often this bias towards the short, the ten blue links, the question answering systems. But if you actually, you know, and I've had this experience in prior jobs, you know, as you actually dig into people's search behavior, for instance, you see them routinely coming back to a site around
different viewpoints on a topic. You know, you see this at the e-commerce layer. People search one day for a specific thing. They forget about it. They come back a week later for it again, maybe different things. They're doing that opinion gathering. They're looking for alternatives. They're looking for
cheaper prices, whatever it is, right? And I think, you know, it's recognizing that people have this need over time. You know, one of the fields in search that does this and sees this firsthand is e-discovery. If you've ever spent any time in the
e-discovery space, you know that lawyers and legal scholars, et cetera, are often looking to gather all of the things that are on topic and they have to wade through thousands these days, millions of pieces of content to find, you know, the smoking gun, the defense, whatever it is
that helps serve their case. And so they have to think across time. Same goes for things like Ph.D. students, et cetera. And so part of my challenge here for all of us is like, how do we think about better ways to capture and learn from our users over time beyond just the very tactical
Tembo links or even beyond just the session and the query rewriting that many of us are so focused on, right? Those are, of course, valid. We still need those very quick, snackable moments to settle a bet or find out what the price of a widget is. But we also have these broader needs across
time. And I think we are doing our disservice if we don't spend at least some time on those types of problems. So Peter is hosting, I believe, a talk later this week on Thursday about diversifying search results. And this was brought up in this talk. So if anyone's interested,
feel free to check out that talk as well. There was another question about search relevancy at Wikipedia. So effectively, how is it handled? So especially given that users are usually not logged in to Wikipedia itself and there's no tracking or personal data involved. How at a high
level is it handled? It is funny for being somebody who's coming from the search space. The interesting thing for me was I think I've been here now and search is probably the area I've
spent some time but my take was that, hey, it's pretty well. There certainly are some areas of improvement that we are working on around image search and things like that. But for the most part, it's been working especially for the more popular sites. So we spent a lot of time on
some of the underserved search communities. Our search team has blogged fairly extensively about search. In fact, one of our computational linguists just put up a post about processing Khmer as a language for search.
So all that to say, digging in a little bit, the things I do know about and certainly have provide some links. We have a fairly large elastic search cluster across multiple nodes. We are using learning to rank plugins quite extensively
and number of the sites we use anonymous user behavior, query logging and all that kind of stuff to get at the best relevance for our topics. We don't have like query intent in place yet.
We see what else we do some light session analysis again on anonymized behavior. We do have AB testing capabilities in our search engine. So if you've ever heard me talk about things like learning to rank or AB testing, we do have many of those things in place.
Let's see what else. Like I said, we refer people to blog posts. Yeah, we can do as I said earlier, too, we do have and support search across some 300 languages. And we have contributed a number, I believe, of improvements back to Lucene or they are
at least available. Our analyzers are available in our open source repositories. So, yeah, I wish I could say more there. There's a lot to dig into there, I'm sure. We're out of time, unfortunately. Grant, thank you so much for your talk. And we have now a
breakout room as well if people want to join. There is on the Slack channel studio. The link is pinned to that channel. So if people want to ask more questions of Grant, feel free to join the Jitsi channel. We have a short break, 10 minutes, and the next talk will be up after that. Thanks again.