We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Breaking News Detection [DEMO #2]

00:00

Formale Metadaten

Titel
Breaking News Detection [DEMO #2]
Serientitel
Anzahl der Teile
19
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
The Cortical.IO team demonstrates some twitter analysis.
GraphHackerTypentheorieWellenpaketSoftwaretestReelle ZahlProgrammfehlerVorhersagbarkeitKontrollstrukturSchnittmengeEin-AusgabeElektronischer FingerabdruckStreaming <Kommunikationstechnik>DifferenteSelbstrepräsentationSoftware RadioMultiplikationsoperatorMessage-PassingDemo <Programm>Twitter <Softwareplattform>ComputerspielHalbleiterspeicherTemporale LogikForcingStellenringHypermediaProgrammfehlerComputersicherheitKontrollstrukturDualitätssatzQuelle <Physik>VerkehrsinformationHoaxDemo <Programm>PRINCE2Twitter <Softwareplattform>ComputeranimationDiagrammVorlesung/KonferenzBesprechung/Interview
Besprechung/Interview
AlgorithmusFormale SpracheVideokonferenzArithmetisches MittelBitMomentenproblemStichprobenumfangQuick-SortCASE <Informatik>VorhersagbarkeitHash-AlgorithmusSystemzusammenbruchVorzeichen <Mathematik>PräprozessorTrennschärfe <Statistik>Streaming <Kommunikationstechnik>SymboltabelleTwitter <Softwareplattform>DatensatzGewicht <Ausgleichsrechnung>BEEPFront-End <Software>Vorlesung/KonferenzBesprechung/Interview
Demo <Programm>ComputeranimationDiagramm
Transkript: Englisch(automatisch erzeugt)
Hi. So we are the Cortical IO team. And this is the breaking news hack. Teams made up by Pablo, Christian, and myself.
So the agenda is we can describe the problem, the data we used, and our solution. So the problem is filtering real-time data with the goal of semantically detecting new topics in a feed and
possible new trends. The data we've used, so for our training and testing purposes, we have the offline data feed from the archive team. So this is a substream of the fire hose from February 2013.
And for our live demo, we've actually, well, as you can hear, that we keep asking for new tweets. So we're using the real Twitter feeds. And we'll hand you over to Pablo now. OK. So I will present to you the solution.
That's a simple minimalistic workflow of what we use to solve the problem. So we will need Twitter feed access. We will also need a new PKSTM technology. And we will combine it with a cortical API to get a semantic representation of the tweets.
So the idea is that we should be able to detect two different kinds of anomalies. One is the new topic anomaly. And the other one is the new trend anomaly. So this is a small graph of the anomaly scores.
For this experiment, what we did is, from this data set that Peter just said, we extract 100 tweets with the same hashtag. And here we can differentiate two types of anomalies. The high scores are the new topics, and the lower scores
are the new trend anomalies. And why is this different? It's because we choose a hashtag that is about news. So it's more or less speaking, writing about different but similar topics. And when you have really something that is the first
time it is seen, so it's a really new, new. It's this peak. And when it's some people, it's retweeting or just creating tweets about the same new, it's when we can see this scoring value just going down.
So this is an example of a high scoring value, high anomaly value. And this is just because this tweet is the first time that appears in the stream, something related to this new.
We can see in the first fingerprint is a real tweet, and the second fingerprint will be the prediction. And what is happening there is that it cannot predict anything or almost anything, and it's completely different to the input just because the Twitter has never seen
before in the stream. Here we have an example of the other type of anomaly. And we can see here the first tweet shown there is
where we detect the anomaly. And the other tweets that are just below are tweets that happened in the past in this same stream. And also we can see that now the real tweet, the real fingerprint, the SDR of the tweet is much similar to the prediction as we saw before.
And yeah, this is just a collection of all the new topics detected. So we can clearly see that they are all about different topics, different news, and all are from this BBC breaking.
So now we'll pass to the live demo. I will let Christian continue with this. So you please could tweet something using the hashtag of
the hackathon. Yeah, thank you. You've all been really lazy tweeters, I must admit. So cool how it works.
Yeah, so basically what we do here is the algorithm that Pablo and our team developed applied on the Twitter feed and on our hashtag. We see that the HCM started to learn a bit and there were a
couple of boring tweets maybe in the beginning and that caused the really good prediction. But yeah, towards the end we see that still every tweet seems to be some sort of breaking news here.
At the moment we are at almost 50 tweets. That's not a lot, yeah. Yeah, yeah, start retweeting. It's looking good now, 51, 52.
But still it's breaking news. But it's really helping you, Matt, right? In promoting next hackathon.
Yeah, see, now the score is going down. Yeah, I guess it would take a lot more tweets to learn from the stream.
OMG. Everyone agrees we should try OMG?
With Matt's new video is trending, yeah. What's Matt's new video? OK, sounds interesting. OK.
You want to change the hashtag or? Matt's new video. Let's try it. The only problem is that the Twitter sample
feeds really slow. We would need like the Twitter firehose. Matt's new. Matt's, yeah, yeah. And from that 1% feed I'm selecting or filtering every tweet that's got English language assigned to it.
Sorry? Yeah, OK.
There is incoming tweets now. Matt's new tweet. What stream?
We pick a tag or whatever you would like to filter. And then we start capturing tweets. And after a few tweets, we start predicting. So the only pre-processing that's done
is with the cortical.io API where we, for example, remove the, how it's called, pound sign or the hash from hashtags. And that's it pretty much.
I mean, a lot of tweets are really kind of junky looking. We got like, you know. Yeah, tweets are really, really hard, yeah. I mean, the examples you showed from BBC were really nice and clean. So I didn't know. Yeah, there is a smaller pre-processing work that is basically removing the HTTP. Because HTTP is contained in the retina. And this will bias all the meaning of all the tweets.
And something else, like the retweet symbol is RT. And I think that's all the pre-processing is needed. Yeah, but here we kind of see that we got 35 tweets.
And the HTM is already learning to predict it. So they all seem to be pretty similar. Yeah, they all say go watch Matt's new video. So in case there is some little new news in there,
we would get notified by this anomaly detection, right? Oh, OK. Some tweet did crash us.
I guess we hand over the mic to the next team.