We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Tackling disinformation with OSS

00:00

Formale Metadaten

Titel
Tackling disinformation with OSS
Alternativer Titel
Tackling disinformation using opensource software: Tha case of Qactus
Serientitel
Anzahl der Teile
542
Autor
Mitwirkende
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Disinformation on has become a crucial issue for our democracies, nowadays. This presentation, based on a real use case will show how to efficiently identify the real owner of a disinformation website by using OSINT techniques based on open-source software. We will also present the methodology and tools we used to better understand the ecosystem of disinformation this website evolves in, its influence out of the far-right social platforms, and the financial motivation of its creator behind the scene.
14
15
43
87
Vorschaubild
26:29
146
Vorschaubild
18:05
199
207
Vorschaubild
22:17
264
278
Vorschaubild
30:52
293
Vorschaubild
15:53
341
Vorschaubild
31:01
354
359
410
Multi-Tier-ArchitekturFormale GrammatikWechselsprungGammafunktionOffene MengeMaßerweiterungMagnettrommelspeicherGewicht <Ausgleichsrechnung>Green-FunktionSpieltheorieOSS <Rechnernetz>Güte der AnpassungGreen-FunktionProjektive EbeneSoftwareProgrammDemoszene <Programmierung>Prozess <Informatik>Open SourceVerkehrsinformationOrdnung <Mathematik>HypermediaSampler <Musikinstrument>Offene MengeKernel <Informatik>FlächeninhaltDiagrammVorlesung/KonferenzComputeranimation
Formale GrammatikE-MailManufacturing Execution SystemGammafunktionHill-DifferentialgleichungDateiformatNormierter RaumOrdnung <Mathematik>InformationPaarvergleichGrenzschichtablösungVerzerrungstensorVerschlingung
InformationBiegungGammafunktionTaupunktHypermediaFormale GrammatikSpezialrechnerDivisionDreizehnRechter WinkelComputersicherheitFAQVideokonferenzGruppenoperationGoogolDisjunktion <Logik>GruppenkeimLokales MinimumGewicht <Ausgleichsrechnung>DualitätstheorieKlon <Mathematik>VerkehrsinformationHypermediaBenutzerbeteiligungWeb SiteSoftwareOrdnung <Mathematik>Gewicht <Ausgleichsrechnung>InformationProgrammierumgebungOffene MengeTeilbarkeitProdukt <Mathematik>MultiplikationsoperatorInternetworkingFramework <Informatik>Inhalt <Mathematik>TopologieZentrische StreckungBrowserGrenzschichtablösungStützpunkt <Mathematik>SoundverarbeitungPhysikalische TheorieWeb logComputeranimation
GammafunktionOffene MengeFormale GrammatikEinfügungsdämpfungWeb SiteVerschlingungBenutzerbeteiligungURLProgramm/QuellcodeXML
DateiformatDiskrete-Elemente-MethodeFormale GrammatikMenütechnikWurm <Informatik>GammafunktionAlgorithmusModul <Datentyp>VerschlingungWeb SiteAlgorithmusMomentenproblemMultigraphOrdnung <Mathematik>DatenverarbeitungssystemModul <Datentyp>MAPInverser LimesMultiplikationsoperator
E-MailDatensichtgerätOffene MengeFormale GrammatikMaßerweiterungEinfügungsdämpfungWeb SiteBildgebendes VerfahrenGraphVererbungshierarchieAutorisierung
Offene MengeExtreme programmingInhalt <Mathematik>GammafunktionFontWeb SiteRekursive FunktionKnotenmengeAusnahmebehandlungInformationSchlussregelMAPSoftwareProgrammierumgebungWeb SiteComputeranimation
MarketinginformationssystemAusnahmebehandlungKnotenmengeInformationGewicht <Ausgleichsrechnung>GammafunktionHoaxComputersicherheitGraphTopologieRankingMereologieWeb SiteProgrammierumgebungSoftwareOrdnung <Mathematik>Computeranimation
Formale GrammatikGammafunktionSurjektivitätPlastikkarteMobiles InternetSpieltheorieTwitter <Softwareplattform>VerschlingungOpen SourcePufferspeicherSnake <Bildverarbeitung>Elektronische PublikationLokales MinimumGeradeExogene VariableInterrupt <Informatik>SchnelltasteLesen <Datenverarbeitung>StatistikVersionsverwaltungVakuumMagnettrommelspeicherSystemplattformElektronische PublikationTwitter <Softwareplattform>VerschlingungSoundverarbeitungGüte der AnpassungEinfach zusammenhängender RaumSoftwareProgramm/Quellcode
Diskrete-Elemente-MethodeGammafunktionMenütechnikLokales MinimumElektronische PublikationVolumenvisualisierungDateiformatComputeranimation
Formale GrammatikComputersicherheitGammafunktionWinkelInnerer PunktBildgebendes VerfahrenW3C-StandardDatenmissbrauchRuhmasseDean-ZahlIdentitätsverwaltungDualitätstheorieApproximationSoftwareSoftwareKollaboration <Informatik>GrenzschichtablösungSchaltnetzProgrammWeb-SeiteIdentitätsverwaltungWellenpaketVirtuelle MaschineFormale SpracheWort <Informatik>Coxeter-GruppeWeb SiteOpen SourceEinfache GenauigkeitCodeComputeranimation
Formale GrammatikExplorative DatenanalyseKollaboration <Informatik>Dynamisches RAMKollaboration <Informatik>Projektive EbeneCoxeter-GruppeIndexberechnungMatrizenrechnungStandardabweichungInformationOrdnung <Mathematik>VerschlingungGemeinsamer Speicher
Formale GrammatikOffene MengeEinfacher RingMetropolitan area networkOvalBereichsschätzungWeb SiteLeckVerschlingungInformationCASE <Informatik>ZweiInhalt <Mathematik>Rechter WinkelMultiplikationsoperatorQuantenfeldtheorieFacebookPhysikalisches SystemSystemplattformOrdnung <Mathematik>E-MailGraphfärbungGoogolMomentenproblemAutomatische DifferentiationGraphMAPMereologieEinfach zusammenhängender RaumZentralisatorStrategisches SpielRoboterRankingDatenstrukturProgrammComputeranimationVorlesung/Konferenz
Ordnung <Mathematik>MultiplikationMomentenproblemVorlesung/Konferenz
Interface <Schaltung>BeobachtungsstudieSpezialrechnerVorlesung/KonferenzFlussdiagramm
Transkript: Englisch(automatisch erzeugt)
Good morning everyone. Okay, so my name is Hervé and I'm French, so sorry my English
is not very perfect, it's not my mother tongue, but I work for an NGO called Open Factor,
which is a French NGO. I'm going to talk about it right now. It's a self-funded NGO based in France and we created it in 2019. Our initial goal was to try to federate the Francophone OZINT open source intelligence scene. As we noticed that the Anglophone one was pretty
active and brilliant with Benincat, for example, people like that. So we decided to create this NGO and we wanted to assist the newsrooms and activists on OZINT investigation. We also wanted to train journalists about what we knew about and also wanted to promote
young journalists based in France in order to help them to get some skills on OZINT, which is a kind of pragmatical way to find a job in France right now. And as we are
an NGO, a self-funded NGO, we also wanted to set up some philanthropic projects and we trained investigative journalists from Syria which are relocated in Europe. We also did some programs in West Africa, all the Francophone area. The NGO today is about 260 members and
it's still counting. And about me, some of the work first, we participated to some of pretty cool investigations such as Green Blood for a media called Forbidden Stories. We also published some things and worked on things for BBC, for the French television
on the Wagner Group, for example. And we also won a prize with Swiss TV for a documentary on war crimes in Syria, in Ukraine, sorry. And recently we published two recommendable reports on Inforos. I don't know if you've ever heard about that. It's a Russian company
backed by the Russian intelligence. And we also work on that and try to dismantle their network of disinformation inside their border. And about me, I'm not a coder, obviously, as you probably see. I'm the co-founder of the NGO. I'm a former judicial
investigator and I'm an open source software enthusiast. I've been compiling, well, I used to compile Linux kernel since 1999 or something like that. So I'm an open source software enthusiast and evangelist. I'm not a techie, but not a coder. And I consider myself rather
like a Swiss army knife. I'm very lazy. And believe me, if you want to solve a problem, hire a lazy guy or lazy woman. And I'm very curious. What about disinformation? This topic is about disinformation. So disinformation is, as you probably know,
false information that is deliberately spread by an actor in order to deceive people. There is a fantastic researcher called Ben Nemo. He tried to resume this concept using the
4Gs acronym. The first one is dismiss. The other one is distract, distract the people, of course. Distort the truth and also dismay. So that's a very good way to define disinformation. It's an art for several actors, several threat actors. You can compare CTI, for example,
CTI, sorry, and disinformation. There are some overlap between these actors. And there is a very great investigation. All the links are in the presentation, but there is a really cool investigation called Doppelganger by the EU Disinfo Lab. And I strongly recommend
this report about media clones serving the Russian propaganda. I'm going to switch from time to time in the presentation. This information is about war, but also health, politics, economy. It covers mostly every topic actually. And of course, internet and the web
are probably the greatest echo chamber that they use. In France, we have such a problem. And we have a problem with QAnon. And there is a website called Cactus in France. It's called
for news and Cactus the tree. I don't know. So QAnon is an American political conspiracy theory and political movement. I'm sure, pretty sure you heard about it. And it's mostly far right
based. And it appeared on the net in 2017. This website, this French website is, of course, conspiracist. It's all about anti-vax. And it's also very anti-semitic. So it's a problem.
It's present on several platforms, such as, of course, the web, but also Telegram, Twitter, Gabber, Gab, et cetera, all the network. It has some kind of translated content. So as soon as you arrive on a web page, it depends on the language, but you can get the news automatically
in French or in English. This is via Tor, the Tor browser. So it's basically in English, but you can also save it in French, for example. And most of the readers are French or Francophone. It publishes five or seven articles a day, mostly crap. It's always crap.
It has more than two million visitors per month. So it's not nothing. It's a lot of people. I have a personal blog and I have an Open Factor blog, and we don't have two million visitors a month, I can assure you. And since it was open, it re-vandicates more than 80 million
visitors. So yeah, it's kind of big things. It's one of the most productive Q&A website. So in order to make the investigation, we needed a methodology on this information. And there is a very good framework called the ABCDE, very easy to remember. It's a good start. A stands for actor. So who is doing the disinformation campaign? The behaviour, how?
The content, what about the content, of course. The degree, it means the scale of the disinformation campaign. And the last one, he, the he was added later, but he means the effect that is looking by the actor. So if we based our research at Open Factor for Cactus, we would say
something like, what is Cactus? What's its audience, its environment, ecosystem, its influence, its motivation above all? And last but not least, if we can identify who is behind Cactus,
it's really cool for us as journalists. So we decided to approach this by trying to qualify the environment. And in order to do that, we used three different tools that I mentioned as a sweet combo. The first one is Hive. And it's a tool that was made,
most of these tools were made by the Media Lab Sciences Po. And Hive is one of the greatest ones. It's a tool that you can download on your computer. And it mostly scrapes the web, starting from a URL. And you can build a web corpus of all the websites that are connected
to this website by the links inside the article. So it was very cool to use with Cactus, because the most important activity of Cactus is writing articles, as I mentioned, but inside they put links to other articles, sometimes legit, sometimes links that lead you to other conspiracy websites.
So kind of a long process, because right now, at the moment of the investigation, Cactus had about 8,000 something articles on the website. So it took me a long time to scrape everything.
And I had to go to the level two in order to scrape the link from the website that were cited by Cactus, if I'm clear about it. So I had to go at a level of two. It works pretty well with Caveat. There are some limitations with Hive, especially one that
really bothers us. It's Cloudflare. Sorry for the typo. Because as soon as you arrive on a Cloudflare website, and a lot of websites are protected by Cloudflare, you cannot scrape it. You cannot scrape them with Hive, but it works pretty well. Actually, not in my computer. I tried
at the hotel yesterday night, and it doesn't work, but believe me, it works. So I scraped the links. I'm going to show you the result after that. The first tool that I combined with Hive was, of course, Jefi. I've been an enthusiast of Jefi since probably 2011 or something like that.
It's a very interesting tool to explore and analyse graphs. Especially if you try to use the modularity algorithm to determine sub-communities. It's a brilliant tool that I'm sure you already know about. The last one was important for us. As journalists, we try to illustrate our articles with images and graphs, etc. It's very difficult to render graphs on
websites. This tool is brand new. One of its authors is in this room, actually, and I'm super happy to know that. It's Retina. Retina offers you the opportunity to import a graph inside
your website, and to dive into it, and to analyse it, or at least to help the people try to understand it. That's really cool. That's what we got. I'm going to show you the rendering of the article. This is Hive. Based on these three tools, what I try to do is
to illustrate the environment of the website. The first one at the top level is Cactus. All these websites were at least once mentioned in the articles. Then, all the websites also cite the other ones. Here, for example, you can reach the American disinformation network.
You've got the Francophone one here, and the Canadian one here. It's very interesting to see where Cactus is in the middle of everything. I try to illustrate the graph like this.
It gives you, with this tree tool, a very good idea of the environment of the website. What you can see inside also are some legitimate websites, such as Mediapart, for example, or YouTube, etc. That's one of their techniques in order to lead some activity on the website.
They link regular websites and also conspiracy websites in order to increase their SEO, their ranking on Google, for example. It's very important to put legitimate and disinformation websites. That was the first part of the investigation purely made with open-source software.
The second one was the influence, and it's rather usual to do that. We needed a good scraper for Twitter. Our idea was to have an idea on how the disinformation could escape from
traditional networks such as Gabe or Parler, for example, and to escape from this platform to go to mainstream platforms. Twitter used to be a very good platform to analyse. I don't know what's going on, and I don't know what it's going to be within a few days,
but Essent Scrape is a cool tool to do that. I don't know if you heard about it. It's a tweet from its creation that had a link referring to cactus.fr, which was the website. We wanted to know about its activity, so you just basically open a terminal,
and you copy-paste this. I'm not going to export to it first, but what you receive, I hope my connection is okay, yes. You get all the tweets. You can grab a cafe. You don't have to scroll, and when you come back, you get all the tweets.
There are approximately 124,000 tweets that talk about it. Once again, it's not nothing, and you can export that as a JSON file here. It's going to work, blah, blah, blah, blah.
I'm going to break it right now. Once you have these tweets, you have to scrape them, and you have to wrangle your data. I used another tool called OpenRefine. You probably know about OpenRefine. It's a powerful tool to cleanse. There is a wow effect for journalists
especially with this tool that I like. It's how it deals with JSON file. This is going to be maybe possible. You open the text like this, and it knows how to render JSON file in CSV, which is a format that we all like when we do data
journalism. You can now analyze all the data that you have. That's why I love OpenRefine, and also when you want to do some clustering. The last one was, of course, R Antidiverse. I'm not going to dive into R Antidiverse. You have had several presentation on that,
but these tools are really easy because R is an easy language for statistics, easy syntax, very readable when you don't know how to code. This is really important for newsroom. For example, most of the time, future analysts know how to code. Having a code that is readable
is important. You get a quick overview. We've got 10 minutes. Yes, thank you. It's very convenient for prototyping and important also repressibility, but if you want, you can use Python. I don't want to enter the war between R and Python. The motivation, we use Firefox, and we dive
into the source code. This one was pretty easy because QA2 has a Google ad program inside. Every click that you made on this web page gives you some money. This is the main motivation, apart from the ideology. About the identity are the tools that we got from
GitHub, Holyhead, Jint, and Sherlock. These are command line tools that are packaged right now by a French company called EPIOS, but you can use them on your terminal, and it will help you to dive into the nicknames and the male combination on websites. We also use some leaks,
but we found out who was Patrick behind this website. Let's wrap this up. First of all, I want to talk about the myth of a single tool in Hozind. Most of the time, when I say to people, well, I'm a journalist and I work on Hozind, they say, oh, which tool
do you use? There are several tools. There is no single tool in Hozind, and you've got to mix everything. About this investigation, this guy, Patrick, is a retired IT technician in a lost town in the middle of France. He's a completist, but as we saw, he's in the middle of a bubble of
completist influence. He earns approximately $4,000 a month just by writing crap. It's very on the website. Patrick, we know who you are. That's my thing. Why do we use open-source
software? Because they are powerful, adaptable, they respect your privacy, and reproducibility. I don't ever know how to pronounce this word. Most of the time, these tools are free, which is really important, and I insist on that. Who knows? We don't have money, and when you go to Africa and train NGOs, for example, or journalists, it's very complicated
to buy software for them. They don't even have regular machines. Having open-source software is really important. Thank you, you all. If you code for open-source software, it's really important for civil society. Thank you. The last one, collaboration. I didn't talk about
collaboration. You're all in the room today, and it's important for us to mention that collaboration on our investigation is the key, first of all by software, but also in order to share information and to share indicators of confirmation, for example. Data capitalization
is important. There is this project called OpenCTI that I hope we will try to impose in Europe as a tool. It's an open-source tool, and we'll try to impose it as a standard with the disarm matrix. It's called disarm. All the links are in the presentation in order to get some cooperation. Thank you very much. Patrick, and on with the questions.
Yes. Some leaks. Is that leaks that you have friends that gave you information, or is it that you found things that had been put inadvertently in the public sphere? Second one. I mentioned the fact that we use leaks, and the question was,
is this information that some friends give me, or information that is publicly available on the internet? That was the question. The answer is OpenFACTO is an NGO. We don't have friends, so we use public leaks. We dive on Telegram, the darknets, et cetera, and we use
basically keywords, passwords, and emails in order to make connection between nicknames and try to identify people like this. That's what we do. I'm a former investigative journalist, so what I try to do is judicial methodology.
We try to open everything in our investigation, and we mention all the information that we get. If we get a link from somewhere on the internet, we will explain where we got the leak every time. Transparency is important. Another question here?
Are you Jewish? No, I'm not. The question was, am I Jewish? No, I'm not. No, no, no. I'm not Jewish. Other question? Yes. The question is that we identify one person behind this website, and there are probably more
people and more websites like this. We are just a bunch of cool people working behind
their computer, so we try to do our best, but at least we have one website which is really influenced. We start by the most productive website that we want to investigate, but of course there are millions, not millions, but thousands and thousands of people doing these kind of things. We do what we can, and we try to inspire other journalists in
to make their own investigation and to out these kind of people.
The question is, are we a lot of people, for example, in France working on this information? There are some newspapers and newsrooms that start to investigate on these kind of topics. It's not very popular, not very famous, but I think that the last three or four years, disinformation has become a major issue, especially in Europe, and we see plenty of
NGOs such as the EU Disinfolab, for example. They are friends of ours, and we really like them and we try to collaborate with them. I hope that we'll get more and more people in order to work on that. Other question? How can you stop automatic?
Disinformation. I'm thinking about ChatGPT and bots and stuff like that. Can you distinguish whether it's been auto-generated or human-generated? The question is, how can we stop automatic disinformation made, for example, by ChatGPT? ChatGPT is going to be a very big
issue for influence, which is something pretty obvious. When I'm saying that, I'm saying obvious thing. It's going to be a problem because website, especially search engines, they use, as I mentioned, SEO technique in order to do the ranking of the website.
One of the biggest issue, for example, for Google will be to figure out if a content not only is duplicated, which was pretty easy to do, but now automatically connected. I don't have any solution for the moment. We haven't faced this problem so far. ChatGPT
is quite recent and its use is not wide scope right now, but it's going to change within the next six months, and I'm sure about that. Given a text, it will give you a confidence
level if it's human-generated or computer-generated. Yeah. Sorry, sorry. Yep. Sorry. This one. Yeah. Could you explain more about the color classification?
The level? Yes. So the level is basically the title of the website. Yeah, sorry, the coloring. Ah, the coloring. We use the betweenness centrality on this graph. So it's usually what I do on this kind of website in order to figure out the importance
of a node as a distributor, of the connections between other sub-communities. The distribution, for example, of the links, it's basically the standard Gephi.
Move on? No, no, no, the specialization. For satlass. For satlass. So it's this one? Yeah, this one. So it's for satlass, so every link that is close to the other one will be
like packed as itself, and a group, a cluster that is not very linked to another one will be far away from the other one. There is a bias in the representation, because I wanted QFT to be at the top. So it's part of my fault, but for the rest,
it's basically the extract and the export from Gephi. Yeah, I was just wondering, because in the beginning you said that this information is deliberate. Yes. Incorrect information. Yes. How can you judge if it's deliberate? I mean, this guy behind the website,
is he like, I'm going to push this slide, or is he actually believing it himself? Is there a method to... First of all, when you work on this information, so the question is, how can we make the distinction between misinformation and disinformation? So basically, when you start an investigation, this is the first question that you ask yourself. Is it going to be disinformation or misinformation?
Misinformation is like something that could be true, but no, that is false, but the person thinks it's true. But when you see the content of the website, you know that he is talking always on the same topics, always distributing the same
kind of information, so you know that he does this deliberately. But the motivation is important, and that's why in this case it was important to identify the Google Ads tag, because you know for sure that its motivation is probably ideological, probably. If you see his Facebook wall, for example, but there is the money, and the ads, and the Google,
and the platform are fueling the disinformation with the ad system. So this is really important to identify. Okay, last question over there. I didn't get the question, sorry. So is there a strategy from... The question is that,
is there a strategy to tackle disinformation from authoritarian states? I think so. The problem
is that everyone has its own strategy, so there are millions of strategies, and this is the big issue, according to me. So Europe, for example, is trying to organize this by different programs. EU Disinfo Lab is the leader in trying to federate the disinformation structure and hacktivist
in order to make wider and bigger investigations like this. So I think, yes, there is a strategy. The problem is that for the moment there are multiple strategies, but it's going to change.