Mapping the Chatter: Spatial Metaphors for Dynamic Topic Modelling of Social Media
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68936 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Quantum chromodynamicsDialectAsynchronous Transfer ModeLocal GroupMathematical modelSocial softwareYouTubeFlickrAlgorithmStandard ModelNatural languageBit error rateGoogolTwitterDatabaseProcess (computing)SpacetimeVector spaceMathematical modelLevel (video gaming)Presentation of a groupNatural languageAlgorithmAreaMachine learningVector spaceDatabaseStandard ModelHypermediaMoment (mathematics)Dynamical systemProjective planeOrder (biology)CASE <Informatik>Mixture modelProcess (computing)Visualization (computer graphics)Dimensional analysisTwitterNavigationDigitizingRight angleTesselationYouTubeComputer animation
02:30
Plot (narrative)Time evolutionRadiusDimensional analysisCircleInstance (computer science)Semantics (computer science)NumberDynamical systemDimensional analysisHypermediaDistancePlotterAnalytic continuationCirclePoint (geometry)SurfaceCartesian coordinate systemEvoluteQuantum chromodynamicsVisualization (computer graphics)Point cloudMultiplication signMathematical modelComputer animation
04:07
Term (mathematics)BitInstance (computer science)Computer animation
04:28
SurfaceAlgorithmASCIIDimensional analysisEstimatorPopulation densityKernel (computing)Electronic visual displayComputer iconTwitterAuditory maskingEstimatorPopulation densityKernel (computing)Level (video gaming)HypermediaSurfaceVisualization (computer graphics)Computer animation
Transcript: English(auto-generated)
00:00
Thank you. So the presentation is titled Map in the Chata. What does this mean? It means that we'd like a novel visualization technique for dynamic topic modeling. So what is dynamic topic modeling, first of all?
00:20
It's a natural language processing technique. Right, it's a problem area in which some algorithms are used to determine what a text is about, what its topic, or a mixture of topics in case of longer documents. Now, we apply this to social media posts,
00:42
which are very short. So basically, it's what people are talking about in the social media modern piazza. To do this, we have to collect many social media posts. Then we use an algorithm to determine
01:00
what each post is about. And then we cluster them in order to determine the topics and the popularity of every topic. So this is what we do at the Australian Digital Observatory, which is a project jointly funded
01:22
by the University of Melbourne, University of Technology, University of Queensland, and some other entities. We collect 400,000 social media posts, mainly from Twitter, but also YouTube, Flickr, and stuff like that, every day about Australia, from Australians or from people that are living in Australia at the moment.
01:45
So we have collected 121 million posts so far. We store them in a cluster database. Every night, there is a topic modeling algorithm that runs through those data. It's a deep learning algorithm based on Google birth language model.
02:04
And then we determine the topics, and we cluster them. Now, there is one more problem, though. How do you visualize these topics? Because every topic is a vector in 384 dimensions,
02:23
which means that unless you are a spacing-gilled navigator, you cannot conceptualize that. Traditionally, this has been the visualization use. So you reduce the number of dimensions from 384 to 2.
02:44
And then you plot them. So every circle is a topic, and the size of the circle is the popularity, so the social media posts number of the specific topic. And you do it for every day. Now, it's simple enough.
03:01
The problem is when you have dynamic topic modeling, so you want to see the evolution through time, then it's more complicated, because you need to have so many plots to look at. So we thought about doing something different, so to use a special metaphor. So on the x-axis, you have time.
03:23
On the y-axis, you have the distance between topics, semantic distance. So a topic on the Russian-Ukrainian war will be, say, closer to a topic of the Ukrainian economy than to one on the US economy, for instance.
03:43
So and then you have the z, which is the topic popularity, so the number of social media posts for that topic for the day. You have then a point cloud, basically. You drape. Sorry, you don't drape. You interpolate that with a 3D continuous surface,
04:00
and then you have something that resemble a physical landscape. So for instance, this is the Ridge of Batman. So this over there is the lead up to the release of the Batman, the movie. And you see these topics over there, the book chapter writer.
04:23
The one on the ridge is movie, Batman. These are the top terms, yep. So if you look a little bit, that's at the beginning of March. You see here, this is the mask. We call it the mask peak, because apparently the mask tweeted about his buying of Twitter.
04:41
And you see Twitter tweet that topic, social media, at the bump of popularity. And you see over there the Ridge of Batman. And how did we do it? We reduce the dimensionality from 24 to 1 using UMAP, then use a kernel density estimator for the surface.
05:01
And then we use QGIS to do the visualization, using QGIS 3.js. And that's all.