We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Rage Against The Machine Learning

00:00

Formale Metadaten

Titel
Rage Against The Machine Learning
Serientitel
Anzahl der Teile
275
Autor
Mitwirkende
Lizenz
CC-Namensnennung 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
This talk explains why audits are a useful method to ensure that machine learning systems operate in the interest of the public. Scripts to perform such audits are released and explained to empower civic hackers. The large majority of videos watched by YouTube's two billion monthly users is selected by a machine learning (ML) system. So far, little is known about why a particular video is recommended by the system. This is problematic since research suggests that YouTube's recommendation system is enacting important biases, e.g. preferring popular content or spreading fake news and disinformation. At the same time, more and more platforms like Spotify, Netflix, or TikTok are employing such systems. This talk shows how audits can be used to take the power back and to ensure that ML-based systems act in the interest of the public. Audits are a ‘systematic review or assessment of something’ (Oxford Dictionaries). The talk demonstrates how a bot can be used to collect recommendations and how these recommendations can be analyzed to identify systematic biases. For this, a sock puppet audit conducted in the aftermath of the 2018 Chemnitz protests for political topics in Germany is used as an example. The talk argues that YouTube's recommendation system has become an important broadcaster on its own. By German law, this would require the system to give important political, ideological, and social groups adequate opportunity to express themselves in the broadcasted program of the service. The preliminary results presented in the talk indicate that this may not be the case. YouTube's ML-based system is recommending increasingly popular but topically unrelated videos. The talk releases a set of scripts that can be used to audit YouTube and other platforms. The talk also outlines a research agenda for civic hackers to monitor recommendations, encouraging them to use audits as a method to examine media bias. The talk motivates the audience to organize crowdsourced and collaborative audits.
Schlagwörter
NP-hartes ProblemUnordnungStreaming <Kommunikationstechnik>InformationsmanagementGrundraumVirtuelle MaschineNummernsystemComputeranimationBesprechung/Interview
YouTubeVirtuelle MaschineOrdnung <Mathematik>Physikalisches SystemSkriptspracheHackerYouTubeBesprechung/InterviewComputeranimation
Maschinelles LernenSystemprogrammierungYouTubeVideokonferenzZeiger <Informatik>Physikalisches SystemQuellcodeVirtuelle MaschineYouTubeAlgorithmusVideokonferenzInhalt <Mathematik>Physikalisches SystemViewerAlgorithmische LerntheorieExpertensystemIntelligentes NetzPunktInformationsspeicherungQuellcode
Komplex <Algebra>Physikalisches SystemVirtuelle MaschineYouTubeVerschlingungSystemprogrammierungDatenmodellMaschinelles LernenTaskComputerEinflussgrößeGammafunktionElektronischer DatenaustauschE-MailKommensurabilitätGruppenkeimInformationFacebookAlgorithmusParametersystemExtreme programmingSoftwareSoftwareentwicklerAlgorithmusFormale SpracheInformationStatistikProgrammierungProgrammbibliothekTypentheorieDatenverarbeitungssystemVideokonferenzTaskBitGruppenoperationInhalt <Mathematik>Komplex <Algebra>Physikalisches SystemProjektive EbeneStatistische HypotheseTeilmengeTermVirtuelle MaschineE-MailFamilie <Mathematik>EinflussgrößeNeuronales NetzExistenzsatzKlasse <Mathematik>BeobachtungsstudieFokalpunktAbgeschlossene MengeWeb SiteZustandsmaschineSystemplattformObjekt <Kategorie>Algorithmische LerntheorieFreewareYouTubeFacebookSoftwareentwicklerEinsBildgebendes VerfahrenRationale ZahlDigitalisierungEntscheidungstheorieGrundraumVerschlingungMagnetbandlaufwerkCASE <Informatik>PunktFitnessfunktionMeta-TagKontextbezogenes SystemMultiplikationsoperatorDigitale PhotographieMinkowski-MetrikComputeranimation
DigitalfilterPrognoseverfahrenYouTubePhysikalische TheorieSelbstrepräsentationGruppenkeimInhalt <Mathematik>Abstimmung <Frequenz>VideokonferenzBaumechanikSichtenkonzeptProdukt <Mathematik>AlgorithmusMaschinelles LernenRechter WinkelZoomEinflussgrößePhysikalisches SystemVirtuelle MaschineYouTubeAlgorithmusMathematikOrdnung <Mathematik>InzidenzalgebraVideokonferenzRekursiv aufzählbare MengeGesetz <Physik>ForcingFunktionalGruppenoperationInhalt <Mathematik>Physikalische TheorieRadikal <Mathematik>ResultanteStatistische HypotheseBroadcastingverfahrenHypermediaSystemverwaltungVorhersagbarkeitInstantiierungVektorpotenzialPunktVerzerrungstensorAbstimmung <Frequenz>KreisflächeBaumechanikShape <Informatik>RichtungEinfügungsdämpfungSuchmaschineVerkehrsinformationEuler-WinkelBitrateSichtenkonzeptHoaxKontextbezogenes SystemMultiplikationsoperatorRechter WinkelDienst <Informatik>LoopMereologieComputeranimationVorlesung/Konferenz
CodeVektorpotenzialKontrollstrukturSystemprogrammierungTypentheorieBeobachtungsstudieInternetworkingSystemplattformAlgorithmusProgrammierungFormale SpracheKategorie <Mathematik>DateiformatSondierungValiditätPunktAbfrageURLSkriptspracheFokalpunktQuellcodeYouTubeOrdnung <Mathematik>HackerVektorpotenzialWort <Informatik>AlgorithmusCodeProgrammierspracheRoboterValiditätFunktion <Mathematik>ProgrammbibliothekTypentheorieKategorie <Mathematik>VideokonferenzAlgorithmische ProgrammierspracheInformationsverarbeitungKomplex <Algebra>Physikalisches SystemResultanteVirtuelle MaschineZahlenbereichZufallsgeneratorE-MailVersionsverwaltungCASE <Informatik>InstantiierungVarietät <Mathematik>Web-SeiteBeobachtungsstudieBrowserSkriptspracheVerkehrsinformationElektronische PublikationWeb SiteDifferenteSystemplattformSondierungZweiRechter WinkelYouTubeStatistikProgrammierungInzidenzalgebraBildschirmmaskeStichprobenumfangGüte der AnpassungProzess <Informatik>Orbit <Mathematik>PunktAbstimmung <Frequenz>Spezielle unitäre GruppeInformationsspeicherungDreiecksfreier GraphFernwartungMultiplikationsoperatorBrennen <Datenverarbeitung>Spider <Programm>Mechanismus-Design-TheorieComputeranimation
VersionsverwaltungDigitalisierungMathematikTransformation <Mathematik>EnergiedichteSichtenkonzeptYouTubeGammafunktionVideokonferenzZehnAnalysisMathematikOrdnung <Mathematik>SoftwareCodierungVideokonferenzEntscheidungstheorieArithmetisches MittelIrrfahrtsproblemKette <Mathematik>Leistung <Physik>Metrisches SystemRadikal <Mathematik>ResultanteTabelleZahlenbereichZentrische StreckungÄhnlichkeitsgeometrieEinflussgrößeRegulärer GraphInstantiierungNotebook-ComputerBrowserJensen-MaßBitrateSichtenkonzeptKontrast <Statistik>SelbstrepräsentationMedianwertMultiplikationsoperatorZweiFigurierte ZahlTwitter <Softwareplattform>CodeSoftwaretestComputersimulationPhysikalisches SystemRechenschieberZustandsdichtePunktQuaderNeunzehnBeobachtungsstudieVollständiger VerbandFernwartungIdentifizierbarkeitComputeranimation
SichtenkonzeptTeilbarkeitSystemprogrammierungYouTubeSmith-DiagrammEindeutigkeitVideokonferenzPrimidealExistenzsatzInhalt <Mathematik>TeilmengeAlgorithmusEmpfehlungssystemGruppenoperationSoziale SoftwareÄhnlichkeitsgeometrieWeb-SeiteMaschinelles LernenGruppenoperationPhysikalisches SystemGeschlecht <Mathematik>IdentifizierbarkeitCoxeter-GruppeAlgorithmusInformationKraftSelbst organisierendes SystemSoftwareExpertensystemVideokonferenzGrundraumInhalt <Mathematik>Inverser LimesIrrfahrtsproblemKomplex <Algebra>LeistungsbewertungResultanteTeilmengeVirtuelle MaschineSystemaufrufTeilbarkeitParametersystemComputer Supported Cooperative WorkNegative ZahlVarietät <Mathematik>BeobachtungsstudieBrowserSkriptspracheSoundverarbeitungSichtenkonzeptKontrast <Statistik>SystemplattformKontextbezogenes SystemMultiplikationsoperatorZweiYouTubeBenutzerfreundlichkeitFormale SpracheOrdnung <Mathematik>PerspektiveHypermediaCASE <Informatik>Formation <Mathematik>StrömungsrichtungDifferenteFernwartungBrennen <Datenverarbeitung>Smith-DiagrammComputeranimation
Translation <Mathematik>Prozess <Informatik>Produkt <Mathematik>Inhalt <Mathematik>Physikalisches SystemVirtuelle MaschineVorhersagbarkeitTranslation <Mathematik>Eigentliche AbbildungBenutzerprofilRohdatenOrdnung <Mathematik>ForcingStatistische HypotheseProzess <Informatik>Klasse <Mathematik>Endliche ModelltheorieAssoziativgesetz
SystemprogrammierungEndliche ModelltheorieAssoziativgesetzSoftwaretestProdukt <Mathematik>ExpertensystemProdukt <Mathematik>SoftwaretestEntscheidungstheorieBinärcodePaarvergleichPhysikalisches SystemVirtuelle MaschineGüte der AnpassungOnline-KatalogInstantiierungGeschlecht <Mathematik>UmwandlungsenthalpieEndliche ModelltheorieKontextbezogenes SystemDienst <Informatik>IdentitätsverwaltungGamecontrollerOrdnung <Mathematik>Rationale ZahlVerschiebungsoperatorLeistung <Physik>Computeranimation
RechenwerkVerschlingungYouTubePhysikalische TheorieStabKonfiguration <Informatik>Spannweite <Stochastik>Treiber <Programm>Konvexe HülleStrom <Mathematik>Element <Gruppentheorie>QuellcodeVideokonferenzAvatar <Informatik>Codierung <Programmierung>CodeWeb-SeiteMathematikLokales MinimumZufallszahlenGeradeSocketExogene VariableCliquenweiteMinimumDatensichtgerätp-BlockNormalvektorBetrag <Mathematik>Kompakter RaumSkriptspracheHackerSystemaufrufAutomatische DifferentiationYouTubePhysikalische TheorieService providerHoaxKrümmungsmaßCodeOrdnung <Mathematik>RelativitätstheorieProgrammierungDateiverwaltungProgrammbibliothekVideokonferenzSoftwaretestAlgorithmische ProgrammierspracheLoopMaßerweiterungPhysikalisches SystemResultanteZahlenbereichAbfrageParametersystemInstantiierungGraphische BenutzeroberflächeKlasse <Mathematik>Varietät <Mathematik>BrowserQuellcodeElektronische PublikationWeb SiteDifferenteElement <Gruppentheorie>Cookie <Internet>FernwartungMultiplikationsoperatorSpider <Programm>BenutzerbeteiligungGamecontrollerMechanismus-Design-TheorieProgrammierspracheArithmetisches MittelBitCOMMereologieFastringRandomisierungWort <Informatik>AppletSchnelltasteMeterTrennschärfe <Statistik>ARM <Computerarchitektur>Minkowski-MetrikRechter WinkelComputeranimationProgramm/Quellcode
HyperlinkStabSichtenkonzeptE-MailPolygonnetzSummierbarkeitLesen <Datenverarbeitung>CodeInformationProgrammbibliothekVideokonferenzBitPhysikalisches SystemZählenZahlenbereichGraphische BenutzeroberflächeKlasse <Mathematik>BrowserSkriptspracheQuellcodeSichtenkonzeptDifferenz <Mathematik>Rechter WinkelEinsInstantiierungWeb-SeiteDifferenteMessage-PassingProgramm/Quellcode
DigitalsignalFaktorenanalyseSystemprogrammierungWeb-SeiteChi-Quadrat-VerteilungAlgorithmusVideokonferenzYouTubeComputerHamilton-OperatorInformationRankingFacebookMultiplikationsoperatorTransformation <Mathematik>KugelHypermediaInverser LimesPhysikalisches SystemKommensurabilitätKappa-KoeffizientMaschinelles LernenArchitektur <Informatik>Offene MengeExplosion <Stochastik>Virtuelle MaschineStatistische HypotheseVerschlingungVideokonferenzZahlenbereichPhysikalisches SystemStatistische HypotheseAlgorithmische LerntheorieNatürliche ZahlOrdnung <Mathematik>EntscheidungstheorieBitExakte SequenzIdeal <Mathematik>Inverser LimesStützpunkt <Mathematik>TermVirtuelle MaschineVerschlingungExogene VariableHypermediaNetzadresseSchnittmengeInformationsüberlastungBeobachtungsstudieService providerBrowserSkriptspracheGoogolCursorBitrateSichtenkonzeptKonditionszahlSelbstrepräsentationSystemplattformARM <Computerarchitektur>Cookie <Internet>Kontextbezogenes SystemMultiplikationsoperatorSchlussregelZweiRechter WinkelDienst <Informatik>Demoszene <Programmierung>Spezifisches VolumenGamecontrollerMobiles InternetDokumentenserverEinsKonfiguration <Informatik>KonfigurationsraumÄhnlichkeitsgeometrieFeuchteleitungKugelkappeYouTubeSmith-DiagrammComputeranimationBesprechung/Interview
Formale SpracheMathematikMehrrechnersystemVideokonferenzIntegralSoftwaretestHecke-OperatorEntscheidungstheorieGrundraumInverser LimesMereologiePhysikalisches SystemRechenschieberVersionsverwaltungEinflussgrößeHypermediaInstantiierungGraphische BenutzeroberflächeQuadratzahlPunktNetzadresseSchnittmengeWort <Informatik>t-TestBrowserSoundverarbeitungQuellcodeWeb SiteDifferenteTouchscreenHauptidealElement <Gruppentheorie>ARM <Computerarchitektur>MultiplikationsoperatorSchlussregelRechter WinkelMehrwertnetzTwitter <Softwareplattform>AlgorithmusBitSkriptspracheAdressraumURLBenutzerbeteiligungYouTubeDokumentenserverKontinuierliche IntegrationComputeranimation
FreewareHackerBitBesprechung/InterviewComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hello everyone and welcome back to our Chaos West stream. Our next speaker is Henry Koyer.
Henry Koyer is a researcher at the University of Brem, in the Institute of Information Management in Brem. And in his talk, Rage Against the Machine Learning, he will explain why audits are a useful method
to ensure the machine learning systems operate in the interest of the public. Welcome, Henry. Hi, I'm Henry Koyer and welcome to Rage Against the Machine Learning, auditing YouTube and others.
In this talk, I will explain why audits are a useful method to ensure that machine learning systems operate in the interest of the public. My goal is to empower civic hackers and I'm going to do that by releasing the scripts that I use to audit YouTube and I'm also going to explain to you how to use these scripts.
Why would it be interesting to audit YouTube or other machine learning based curation systems? Well, YouTube has more than 2 billion users per month and 70% of the videos watched on YouTube are recommended by a machine learning based curation system.
This is remarkable because every fourth person worldwide relies on YouTube as a news source. That percentage is even higher for younger people. There, every third 18 to 24 year old consumes his or her news on YouTube. This means that YouTube's machine learning based system plays an important role
in what billions of people watch and how they see the world. Why do we need machine learning in these systems? Well, there are 82.2 years of video uploaded to YouTube every day. So that's 500 hours of video uploaded per minute. For a team of human experts, it would be impossible to review and categorize this user generated content.
Now, YouTube markets its recommender system as a sophisticated algorithm to match each viewer to the videos they are most likely to watch and enjoy. In this talk, I will show that popular, unrelated content is king. So who am I? My name is Dr. Henrik Hoyer and I'm a researcher at the University of Brem
and the Institute for Information Management, Brem. And this talk is based on my doctoral thesis focused on auditing machine learning. And in this talk, I will explore audits as a way of making sense of complex and proprietary machine learning systems used by YouTube as well as others.
And this is based on a research project I conducted together with Henry Koch, Andreas Breiter and Yanis Theoharis. And this talk is based on my doctoral thesis called users and machine learning based curation systems. And using the link, you can download the thesis for free at the library of the University of Bremen.
As many of you may know, machine learning based curation systems are a special type of artificial intelligence. The definition of AI, according to Hansen, is that it's an umbrella term for computer systems that are, quote, able to perform tasks normally requiring human intelligence.
And I prefer the term machine learning because it's a bit more precise and also because many of the successes in AI that we've seen in the recent years have been obtained through what is called statistical machine learning. You might even have heard about the term deep learning, which is an even smaller subset.
So what am I referring to when I say machine learning? It's a certain kind of artificial intelligence that infers decisions from data. Formerly, Mitchell defined machine learning as follows. A computer program is set to learn from experience E with respect to some class of task T and performance measures P
if its performance at task T, as measured by P, improves with experience E. And machine learning enabled many of the recent advances in artificial intelligence. It is used to recognize handwritten digits, to recognize people and objects and images,
to translate from one language to another, to drive cars, and, and that's the focus here, to recommend postings, photos, and videos on platforms like Facebook and YouTube. And in my research, I focus on Facebook and YouTube because they are two of the most visited websites worldwide.
And they use ML-based systems to curate the content for billions of users. Now recommender systems, like ones you find on Facebook and YouTube, have a long history. Famous early examples include Loon from 1958, the information lens by Malone et al.,
the tapestry email filtering system by Goldberg et al., and group lens by Resnick et al. Looking at the research out there, you find that Facebook received a lot of attention regarding algorithmic awareness, user beliefs about the system, how its system works, and the biases that the system enacts.
Meanwhile, especially when I started my research, there was comparatively little on YouTube, despite its importance and the many people who use YouTube. And what motivated my research was a study by Eslami et al. from 2015. They found that 62.5% of Facebook users were not aware of the existence of Facebook's newsfeed algorithm.
They also showed that users are upset when posts by close friends or family are not shown. And users mistakenly believe that their friends intentionally chose not to show them these posts.
Eslami wrote, in the extreme case, it may be that whenever a software developer in Menlo Park adjusts a parameter, someone somewhere wrongly starts to believe themselves to be unloved. And there's a lot of research that has pointed out the political, social, and cultural importance of machine learning-based curation systems.
Zeynep Tufekci wrote in the New York Times that YouTube may be one of the most powerful radicalizing instruments of the 21st century. And challenges like fake news, biased predictions, and filter bubbles make
an understanding of ML-based curation systems an important and timely concern. Journalists and researchers have accused ML-based curation systems of enabling the spread of fake news or conspiracy theories in general. And these accusations make sense because such systems can shape users' media consumption and influence directions.
I will talk a lot about bias in this talk, so I want to operationalize what I understand as bias. In the context of my thesis and also this talk, I operationalized bias as an inclination, prejudice, or overrepresentation for or against one person, group, topic, idea, or content, especially in a way considered to be unfair.
And there are famous examples for biased predictions. Epstein and Robinson, for instance, found that biased search engine results can shift the voting preference of undecided voters by 20% or more.
So considering this prior work, I started to believe that it is important to understand whether users are aware of the ML-based systems they are interacting with and whether users understand how such systems work. Otherwise, users might believe that they are presented with an objective reality even though the news they are seeing is
the result of a co-production between directions as a user and a machine learning system's ability to infer their interests. Now consider this example. If we have a recommender system, it can easily lead to a virtuous circle. So you end up watching a video related to human rights, and then you learn about the treatment of asylum seekers at the European borders.
And that leads you to develop an interest in the decriminalization of civil sea rescue. However, it can also lead to a vicious circle, where you watch a video about a crime committed by a foreigner, and then you see many videos about crimes committed by foreigners because the system just infers.
That's what he's interested in. That's what he likes. So it's just giving you what you like. And you may end up with a distorted view of reality, changed political views, and even xenophobia. This poses the question, how does machine learning influence people?
And you might have heard about the potential dangers of so-called online radicalization and algorithmic rabbit holes where people end up in this loop that I just described, where they have one topic and then see more and more related to that particular topic. And there was one incident that really motivated me to understand what's going on on YouTube and what kind of recommendation the system is provide.
So most of you probably remember that in KEMNET, the stabbing of a citizen spawned street demonstrations and rioting. And the New York Times wrote an article called As Germans Seek News, YouTube Delivers Far-Right Terrades.
So according to the Times, people who try to inform themselves on YouTube were shown increasingly radical far-right videos about the incident, which allegedly radicalized them and which fueled the protests. And this motivated us to perform audits to see whether YouTube is actually systematically recommending more and more radical content.
Why is this important? Well, video recommendations of political topics and news have special requirements, especially in Germany. We have laws that force broadcasters to provide fair and balanced reporting.
We have also laws that make sure that minorities are protected. In Germany, one of these laws is the so-called Rundfungstätzvertrag, the interstate broadcasting agreement. And it's a law that enforces that broadcasting services report in a fair and balanced manner that takes minority views into account.
So motivated by this, we performed the 2019 YouTube KEMNETs audit. And we found that YouTube is not pushing users towards politically extreme content by consistently suggesting more extreme videos. YouTube is also not leading users down a rabbit hole by zooming in on specific political topics.
What we found is that YouTube is pushing increasingly more popular content as measured by the views and likes. The sadness evoked by the videos decreased while the happiness increased.
Now, let's take one step back, because this is only part of a much larger puzzle. To thoroughly understand radicalization on YouTube and how YouTube influences the behavior of users, research would have to show that YouTube is presenting users with increasingly extreme content, that this extreme content negatively affects users' attitudes, that this affects their intentions, and that this changes their behavior.
With this talk, I really only can talk about the first point, that is, whether YouTube is presenting users with increasingly extreme content. But I strongly invite other researchers to look at all these different aspects.
And these audits can really be a voice of the voiceless. The dictionary defines the word audit as a systematic review or assessment of something. And in this talk, I will show you that audits can enable researchers and civic hackers to uncover the potential hidden agendas of social networking sites.
Audits are especially interesting because they're immediately meaningful to users, as newspaper reports by Smith had also suggested. So unlike explainable AI techniques, which might require a deeper understanding of statistics, these audits can be interpreted by anybody.
So why perform audits? Because they enable individuals and society at large to monitor and control the recommendations of machine learning systems. And I found that audits are a very useful way to identify potential biases enacted by these systems.
Sandvik et al. distinguish five different kinds of algorithmic audit studies. Code audits, non-invasive user audits, scraping audits, sock puppet audits, and crowdsourced audits. And I'm going to explain each one of them step by step in the following.
So with a code audit, you obtain a copy of a relevant algorithm and then you study the instructions in a programming language. And this is challenging since the code is considered valuable intellectual property. And the code is commonly concealed using trade secret protection.
Understanding systems through code audits is also challenging because algorithms depend on personal data. That is, they need to be audited with real data to be understood. Machine learning algorithms are also quite trivial. So the data is the most important thing. And to illustrate this, I have the code example here on the right.
And that's actually a fully functioning machine learning system that can detect spam. It was coded using the Python library scikit-learn, which makes it quite easy to train machine learning systems, hiding a lot of the complexity. And what you can see here is that the things that are specific to the spam filtering use case are just the two things that are highlighted.
It's the file with the data, that's the emails.csv, as well as the dimensionality of the data. Now, if we would change the data from emails.csv to cars.csv, we could easily turn the system into a car recommendations.
We could also swap the file emails.csv for a file called cancer.csv and turn this into a breast cancer detection system. It all goes to show that the algorithm, and studying the algorithm, is not sufficient and not really helpful for our use case. So we really have to look at the output.
The second type of audits that Sandvik et al. recognize are the so-called non-invasive user audits. There you ask users questions using a survey format. However, this comes with serious sampling problems, because how do you actually reach the users that you want to reach? This also comes with important validity problems.
For instance, due to cognitive biases, because people might just remember things wrongly and might not be good about explaining why they did certain things. The third kind of audits are the so-called scraping audits. And there you have a script that interacts with a platform, for instance by querying a particular URL.
And this allows researchers to obtain a large number of relevant data points. A more sophisticated version of these audits are the so-called sock puppet audits. Here a script is really impersonating a user and creating programmatically constructed traffic. And this is what I will be focusing on, and I'm going to explain it in a lot more detail in the next step.
So the other potential way of performing an audit are the so-called crowd-sourced audits. And there you recruit a large number of users to use a particular platform. So it's quite similar to the sock puppet audits, but it's doing it with real people.
However, this is challenging, because you need to find a large number of people. And that can either be done through Amazon Mechanical Turk or through inviting volunteers, but that can be quite challenging. So I performed a so-called sock puppet audit, where I wrote a script that is remote controlling a browser and impersonating a real user.
Just a reminder of what we were trying to do. We were motivated by the candidates' incidents, and we wanted to know whether YouTube is actually showing increasingly radical far-right videos for a variety of political topics, as the New York Times has claimed.
So using a Firefox-based bot that I'm going to release with this talk, we performed 150 random walks that always followed the same procedure. We randomly picked one of nine political topics from Germany. Then we entered the topics in German into the YouTube search bar. Then we randomly picked one of the top 10 search results.
And then we saved the video page and watched it for a random number of seconds. Then we randomly chose one of the top 10 video recommendations displayed in the right sidebar next to the video. And then we repeated this 10 times. And we looked at both quantitative metrics, like the number of likes and views, as well as qualitative metrics.
And for this qualitative investigation, we performed an in-depth analysis for which we randomly select three videos per topic and coded three videos per random walk. We coded the initial video, the fifth video, and the tenth video.
And this coding was performed by three independent raters, one male, two female, all in their 20s to 30s, who did not know about the research question. And they really watched the videos for five minutes or more. And then they assessed how closely related the videos are to political topics.
They also rated whether the videos evoked sadness or happiness on an 11-point Likert scale from least zero to most 10. So we simulated a regular web browser by remote controlling a browser. And we collected between 12 and 25 random walks per topic.
So for each random walk and each topic, we started a new browser instance and cleared all cookies. All random walks were collected in May 2019 with the same laptop on the same network. So the decision to select the fifth and the tenth recommendation for the in-depth analysis was made at the beginning of the study.
That is, before reviewing any of the material and before we performed any kind of analysis. The raters reviewed all videos in the same randomized order. We computed Krippendorf's alpha to understand how strong our inter-rater agreement is. And we found substantial agreement regarding how similar the videos were to the topics in our investigation.
That's at 0.765 and the sadness evoked by the videos, which is at 0.613. We also have moderate agreement for the happiness, which is at 0.441. So you might wonder what are the topics that we chose.
So we took nine political topics from a representative telephone poll conducted on behalf of the VDAL, the West Deutscher Rundfunk. You find the topics here on the slide, I won't read them out, but they were what people at the time thought were the most pressing issues. And we used the keyword just like they were in the telephone poll.
And our audit revealed that recommendations become significantly more popular measured by views and likes. You can see a steep increase from the initial videos to the recommendations. Note that we operationalized popularity as the number of views and likes. We included both because views are an implicit measure of popularity, while likes are an explicit measure of popularity.
Regarding views, it also remains unclear how many seconds a video must be watched before it's counted. We have a table here which provides the median and mean numbers of views and likes. Comparing the initial videos and the fifth recommendations, a substantial increase in views
and likes can be observed, especially between the initial videos and the fifth recommendation. While the initial videos have a median of 9,500 views, the first recommendations have a median of around 200,000 views. After following a chain of 10 recommendations, the views have a median of almost 300,000 views.
The number of likes increases significantly too. The initial videos have a median of 170 likes, while the fifth recommendations have a median of 1,404 likes. We performed two-tailed Mann-Whitney U-tests which support the finding that the
number of views and the likes changed between the initial videos and the recommendation. The audits also revealed that recommendations become significantly less related to political topics. The median topic similarity rating of the initial videos was 8. This decreased dramatically to 0.83 after following only five recommendations.
The similarity remains very low for the tenth recommendations with a median of 1. Two-tailed Mann-Whitney U-tests indicate that the topics in the videos change between the initial videos and the fifth recommendations and between the initial videos and the tenth recommendations.
So all these results indicate a strong topic drift. We also found that the happiness in the video increased while the sadness decreased. So the happiness changes from a median of 0 for the initial videos to a median of 2 for the fifth and tenth recommendations.
So while 75% of the initial videos have a happiness rating between 0 and 2, more than half of the fifth and tenth recommendations have a happiness rating higher than 2. Regarding the sadness evoked by the videos, the trend is opposite. The median ratings in the boxplot in the figure move from 1.67 for the
initial videos down to 0.0 for the fifth and 0.33 for the tenth recommendation. So while more than half of the initial videos have a sadness rating higher than 1.67, 75% of the tenth recommendations have a rating smaller than 1.
Overall, in contrast to what the New York Times reported, our findings suggest that the dangers of online radicalization may be exaggerated. Now taking a step back and taking the power back, I want you to understand that scraping audits and sock puppet audits are in my opinion the most promising method to investigate complex machine learning systems.
Because these audits can be used to identify popularity biases like the one that I showed you, but it can also be used to see whether a system is enacting a gender bias or if a system has a tendency to discriminate against or towards a particular ethnic group. So from your experience reading the news, you know that controversial political topics require
a balanced presentation of all arguments in a way that weighs the pros and cons. However, the audit suggests that YouTube's recommendation system is not suited to help users inform themselves about complex political issues.
Popularity, as measured by likes and views, was the defining factor for selecting recommendations. And if this is the case, then minority views are not adequately taken into account by the system. So the popular recommendations that you see here are of course attractive for the majority. And this could be motivated by financial incentives that try to optimize the watch time for a broad majority.
Our audit corroborates Smith et al. who also performed random walks. Smith et al.'s random walks were criticized as artificial because they relied on YouTube's API. In contrast to that, we remote controlled a Firefox browser from a university network.
So in a way, the audit is a prime example for the recentering of public engagement around the complementary interests of the broad majority and profitability. And this connects to Harper's investigation of the so-called big data public and its problem.
Because, as I said, from a platform perspective, where longer watch times result in more shown ads, which leads to more money, it makes a lot of sense to target the majority. One important limitation of my approach is that we cannot rule out that a rabbit hole effect exists for a particular subset of users and topics, especially for those who are users actively looking for fringe content.
And that's why it's important to understand the users and their understanding of the system. Because YouTube is a complex socio-technical system with human and non-human actors who all influence how information is accessed and understood.
And we also wrote a paper on this in more detail where we examine how middle-aged users without a background in technology think YouTube works. That is, we asked them why they think they see the recommendations they see. So we found four big user beliefs. One is related to the current users' previous actions and how they influence the recommendations.
The second is related to social media, that is, how other users' actions influence the recommendations. The third is related to the algorithm and what the algorithm regards as similar, who is similar, what is similar, and also what kind of context the algorithm actually takes into account.
And the fourth user belief relates to the organization and the company policy. And here it's interesting because especially company policy is connected to a lot of negative beliefs where people think that YouTube is actually selling the recommendations and they think that YouTube has psychological experts that just try to keep them watching and watching.
If you want more, read the paper. It's written by Oscar Alvarado, me, Rero Vanden Abdel, Andreas Breiter, and Catherine Verbea and was published at the CSCW conference this year.
Now, here's my call to action. I want you to collect YouTube search results, video recommendations, and advertisements for different topics. And I want you to do this without user accounts and with user accounts. And my goal is to systematically analyze the recommendations by YouTube's machine learning system.
And the next step after that would be to design and implement and evaluate algorithmic transparency tools that help users understand and influence their recommendation. And in the following, I will show you the script that I wrote, and I'm also going to point out where it can be adapted to not only study YouTube and to not only study YouTube
across countries, languages, and topics, but also to study other platforms like Instagram, like TikTok, and a variety of more. The audits, in my opinion, could be a powerful tool to surveil surveillance capitalism.
With the audits that I described, it would be possible to investigate how the content is targeted to individual users. So the audits could be used to explore how advertisements are targeting specific users. And this directly relates to the dangers of the so-called surveillance capitalism. Shoshana Zuboff describes surveillance capitalism as human experience, which is used as raw material for translation
into behavioral data, which are declared as a propriety behavior surplus, fed into advanced manufacturing processes known as so-called machine intelligence, and fabricated into prediction products that anticipate what you will do now.
Considering these political and economic forces, it's vital to investigate how advertisements are targeted to users. So these audits that I presented could be a tool to investigate personalization, as well as the user profiles of contemporary surveillance capitalism.
I also believe that a foundation for machine learning-based systems is needed. And in the thesis, I describe in detail two different models that could be used. One is following the German Association for Technical Inspection, the TÜV. The other one is following the German Foundation for Product Testing, the so-called Stiftungbarren.
And both approaches could be used to make sure that machine learning systems act in the interest of society at large. TÜV institutions, for instance, evaluate each car in Germany every year to ensure that a car is street legal. The purpose of the German Foundation for Product Testing is to compare goods and services in an unbiased way.
So the TÜV ensures that something complies with a certain norm, commonly making binary decisions whether something is permitted or not. The Stiftungbarren test usually develops a catalogue of criteria used to compare different instances of a specific kind of product or service.
So an expert consortium defines these criteria for specific products or services in a particular context. Now, a foundation for machine learning-based systems could adopt this schema and iteratively develop criteria for the control of ML-based curation systems. Audits could then be used to make sure that the system is not enacting a
popularity bias or that the system is not discriminating against ethnic minorities or certain gender identities. And I really hope that this talk will inspire other researchers to examine users' understanding of machine learning-based curation systems or other machine learning systems and to motivate them to design and develop novel ways of explaining and auditing such systems.
But until these bigger things are established, it's kind of up to you and me. So here's my call to civic hackers. Use the script to investigate the recommendations and the ads on YouTube. And here are some ideas. You could look at fake news and pseudoscience related to climate change or
the COVID-19 pandemic, vaccination in general, the immune landing conspiracy or the so-called flat earth theory. So in the repository, there are two scripts that I'm providing. One is called Crawl YouTube and the other one is called Extract Data from DownloadedVideos.py.
And they're both Python scripts. So let's consider the first one called Crawl YouTube. As I told you, the goal is to remote control a web browser and we're using the web testing library Selenium for that.
Selenium is also available in other programming languages, but I'm using it here via Python. And this is based on a Chrome browser. You can use different browsers. There's also an extension for Firefox and others. And in this script, you have different parameters that you can set. So here's the number of paths to collect per keyword.
And I set that to 20. Then the number of search results to consider, which is set to 10. And the number of related videos to consider, which is set to 10. And the number of related videos to visit depth. And that's the number of recommendations we're collecting. The naming is a bit weird.
Here are the different keywords that we're entering into YouTube to download the recommendations. And it's quite easy for you to add your own keywords. So you could just say and then save the file and that would be sufficient.
I'm going to remove it for now. If you just want to replicate the same approach that I showed you in the paper, then that would be sufficient. So when running the script, we randomized the order of the keywords. And then we have the main loop here where we select for each of
the keywords, we select the number of recommendations that we specified in the parameters. And for that, we start a new Chrome instance and clear all the cookies. That's what we're doing here. And then we're taking the keywords and we're entering them as a search query to YouTube.
We're opening up the link, the URL. And we're waiting a bit. And the reason for that is because YouTube is dynamically loading a lot of the videos. So when the web browser finished loading, there's still loading going on in the background where a lot of data is retrieved.
And that's what we're waiting for here. And I just basically wait until the browser knows there's an element called comments and he knows that by the ID. An element with the ID comments. And then we're preparing a file name because we can't just save URL to the file system.
And if we haven't already visited that website, we're going to write that down and we're writing the entire source of the website. And then we're collecting the top N recommendations. And then we're selecting a random video from these recommendations and then we're following the recommendations up to a certain depth.
And it's always the same procedure. We open the website, waiting for a random amount of time. We're waiting until we can see the comments and then we're saving the path and we're finding one of the recommendations and selecting one of the recommendations.
And it's quite nice because we can just use the CSS classes to find certain elements in the website based on the ID and based on the class. And after that, we're not only saving each of the individual videos that we're visiting, but also the path. So which video led to another and we're saving that to a file called crawl underscore paths.
So if you were to adapt this code, the easiest way would be to add your own keywords. And of course, to change the parameters, but you can easily adapt this code also
to visit other websites like Instagram, like Telegram and then collect data through this mechanism. So the system selected education policies and it's now downloading. The different videos and I'm stopping it here to show you the downloaded videos and I do that with typing control C.
And let's have a look at the source code of one of the videos that we downloaded.
And you can see that this is really the whole HTML document, including all the CSS and the JavaScript. And you can find a variety of things in the data.
For instance, if we look for the video title, we find the CSS for that video title, but we also find the actual video title. And that's how our education system is embarrassing itself.
And that's really what we're doing programmatically, right? And when we have the HTML and then we use the HTML to extract certain information. And also provide a script to help you with that. And that's a script extract data from downloaded videos. So here we're using a Python library called beautiful soup that allows you to pass HTML and to be able to search the HTML efficiently.
So what we're doing here is we're looping over all the videos that we downloaded and then we're passing the HTML that we downloaded. So what you can see here is we're selecting the number of views of a video based on the ID info text class view count.
So how do I know where the views are? Well, it's quite simple because I just looked at the source code. So if we find a video that we're interested in, like this wonderful talk by
David Kiesel, we just look at the CS selectors by right clicking, clicking inspect in Chrome. But it's the same in all the browsers. So we have a span with the class view count and that's within a diff that's called info text. And based on that, now going back to the source, we're selecting the count and we're taking the first ones because there's usually just one.
And we do this for all the different things. For instance, the date on which the video was posted, the name of the channel, the number of subscribers. And as you can also see here, it's a bit more tricky to get the likes and the dislikes.
But you can have a look at the code on your own to figure out. It's not really rocket science either. So here are the references of the paper. I mentioned quite a large number of papers. So I'm just quickly going to scroll through them and giving you a chance to stop the video to look at them.
And I again invite you to have a look at my doctoral thesis, users and machine learning based curation systems. Thank you very much for your attention.
OK, thank you for your interesting talk, Henrik. And now you have the option to tell us some questions. Use the FC3CWTV hashtag on social media or the Etsy chat to do so.
And there already are some questions and a wish. And the wish is, can you please provide the link to the slides? Sure. Yeah, I can upload them. I just put them in the repository.
I think there's a link to the repository in the video. And yeah, I can definitely provide the slides. Excellent. OK, then the first question. How would you want a platform like YouTube to make minority views more attractive without advertising similar small extremist views?
Very, very good question. I think my main the main idea behind the talk and also behind a lot of the other research that I'm doing is to give people more control. And it kind of starts even just with the knowledge that these recommendations
are selected by a machine learning system and they're selected with a particular rule. Right. And that's the first step. So everybody knows I'm seeing the recommendations because there's a system that's actually like Amazon trying to find things that are similar to what I've done in the past and just trying to show me stuff that's similar to what I've done in the past.
And I think that understanding is the first and maybe the most important step. But the second step would then be also to give people control over what they're seeing. And that would be to just give them more tools and more configuration settings to
decide what recommendations they want to see and in what context they want to see them. And I think the second question with the more extremist as in like far right extremists, I think that's a that's a different issue in a way that's kind of policing what's uploaded. And that's a bit like it's orthogonal to what I'm talking about.
I think this is more related to actually making sure that people can't like them can't just upload things. This is not it shouldn't be on the platform in the first place. Right. So that's not a recommendation system issue per se. But very good question. Thank you. OK. Thanks. So next question.
What do you think about the recent cap car he made in Germany initiative that aim to trackable responsible. Yeah. To be honest, I don't know much about it. So I really can't comment on it. I think the idea is in like having like responsible I am all for that.
But it's something that's really, really hard to do. And I think a lot of people are working on this actively. But yeah. So the more the barrier. So very much welcome that. But I can't comment on that in particular because I don't know it. OK. Do you also run audits in which you choose two videos of the same topic before and picking them the rest randomly?
I didn't get that fully. What do you mean? You talked about how you do you do the orders.
And the question is, did you also run orders in which you choose two videos of the same topic topic before picking the rest of them randomly? I haven't done that yet, but I think it's something that that's worth doing. I think the most important thing that I really want to do in terms of what with the orders is understanding personalization.
So me, for instance, I'm not sure when I created my YouTube or Google account, but it's been years, 10 years or something. Right. So they really know a lot about me. And I think nobody really understands yet how that influences the recommendations. And I think it would be really interesting if people with their very old accounts, not just accounts they created a week ago,
but things they've used for years and years, start to do these audits relating to the topics that I presented here, but also related to more urgent topics. For instance, you can just go to the IID Deutschland trend, which every now and then is asking people in Germany what's interesting.
It's a representative poll by the IID or the VDR. And you can then use these topics to see what what people are or might be Googling or might be looking for on YouTube. OK, yeah. While doing that, are there any limits YouTube imposes you on quality?
Yeah, I mean, that's the thing. I mean, there's there's different ways of doing this. And I've been like, like I commented on the Smith et al works, which were using the YouTube API. And the YouTube API has very clear limitations on what you can do and what you can't do.
But I did them remote controlling a browser. So I would. So there are no natural limits. Of course, you will be blocked if you're too eager, let's say. So you should be responsible and you should have delays every now and then and take some time. But there is no technical limit. Right. But I mean, be nice and don't overload the service.
OK, now we have a question at German. Thank you.
Let's just repeat it for the other people in English speakers. So the question is, what about cookies and what about. So like that just deleting the cookies is not sufficient because you're coming from the same IP address.
And there's a lot of things that are like really limiting you here. And it's just one limitations I have to live with in this particular order. I did not do that. And I did that in May 2019, quite recently after the candidates incidents.
But that's also kind of my idea of releasing the script because there's so many things that have an influence on the recommendations, at least have a potential influence that we really need a lot of people to do audit studies to really get an idea. And in a way, that's why I want other people to be able to do these kind of things, because I wholeheartedly agree with the comment. And I think that the IP makes a difference.
I mean, I now kind of like scientifically said, OK, these are the limitations of our approach. This is what we know. This is what we controlled for. But but yeah, probably it has an influence, but we don't know. So I think we just need a lot of a lot of people kind of Wikipedia style to to to go about these problems about this problem.
OK, thanks, then. There are a lot of questions how you did this. Like, did you or how YouTube acts in doing this? And maybe did you run into any counter measurements against automation on YouTube?
No, actually not. I think that's in a way why this is such a such a nice hack in that, like we just use Selenium and a lot of people use Selenium as part of their web testing. It's a very, very widely used tool. Right.
It can be part of your continuous integration workflow, just making sure that that certain things about your websites work. And it's quite hard to block in a way because it's it's an actual Firefox. It's not just trying to be Firefox. It is Firefox. And of course, you can like if you look at the behavior of the user, then it's quite artificial in a way.
And you probably can detect it. But when we did this, we didn't see anything. I mean, you definitely need to look at the legal situation and make sure that you don't comply and obey with all the laws. But like for this academic purposes that we've been doing this, we didn't see any problems.
Next question. YouTube might be using several different algorithms and keep changing their tech. How would you research address this? Again, this is limited in a way. I mean, we did this at this particular time and with this particular purpose.
But that's also why I'm open sourcing it. I want other people to look at this. And it's known that there's a lot of A .B. tests and there's probably dozens of versions of YouTube running at the same time, targeting different people. But in a way, that's just showing how important this kind of research is, because, again,
like billions of people using this, 70 percent of the videos watch recommended by the algorithms. And we don't know shit about it, to be blunt. So, yeah, I think just take the script and have many people do this. Yeah, yeah. Thank you.
Yeah, there are oh, there's a new question. And why is crawling with Chrome no problem with not many surfing with a tall browser is constant in constantly blocked?
Is tall blocked or what's the question? Surfing with a Chrome browser is no problem, but using tall browser is most of the time a problem because there are captures and verify and it's constantly blocked.
I have no idea, to be honest. I know that you can use Selenium with tall. And I also know that this can be interesting because we can get different endpoints, of course. So that can be quite useful. But what actually YouTube is doing to prevent people from tall and why? I don't know. OK. Yeah.
Then there are some people in the ERC telling you that it's a really nice talk. Thank you very much. Thank you very much for that. You're welcome. Yeah. Are there any question for this Q&A session left?
You can go to the chat. It's linked in the streaming media, CCC, the page or do it on social media. I put the slides in the repository. I can just upload it and people can find it.
Especially if you, let's say, media science students and the like use these scripts and then do exciting stuff because there's a lot of interesting research on Twitter, for instance, because it's quite easy to do this kind of research. And I really hope in a way with releasing the scripts that make researching YouTube a bit
easier and hopefully also Instagram and Telegram, because in principle, it's really the same in a way. You just toss in a URL and then look for the HTML elements like I showed. You need to know HTML a bit and Python a bit, but you can only do so much in a talk, right?
A new question appeared. Doesn't randomly choosing a recommended video have its own bias? This may prevent the ML algo from learning a user preference and following a rabbit hole? Yeah, very good question. And definitely it has an effect.
I reflect on that in the thesis and in a way it's a conscious decision, right? I mean, you can do a lot of different things. And that's just the one way I try to, which I thought is the most interesting way that I can do now. But it definitely has an effect and it definitely is also quite different from a human being on the website, right?
But again, that's, I think, why this is only one small piece of the puzzle and we definitely need more than that. Another question. Did you investigate on the effect on having a German IP address or the browser language in German?
I had a university IP address. The browser was actually set to English. My whole system was set to English at the time.
I'm acknowledging it again. And in a way, this is like this puzzle piece which has these different settings. But yeah, we don't know what difference it would make, right? Do we get different recommendations? And yeah, that's about it.
Okay. Then, thank you very much for your awesome talk. Yeah. Thank you, Alfredi. And have a beautiful R33. Same to you and happy hacking to everybody and let's hope we can do a bit of surveillance of surveillance catapults.
Okay, bye bye. Bye.