We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A practically useful history of network visualization

00:00

Formal Metadata

Title
A practically useful history of network visualization
Title of Series
Number of Parts
4
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production Year2023-2024
Production PlaceBerlin

Content Metadata

Subject Area
Genre
Abstract
This talk is about the science of visualizing networks. Although the field of graph drawing, as we call it, has no shortage of theoretical contributions, those have always followed practices; and with a sensible delay. Mathieu Jacomy, co-founder of the network visualization tool Gephi, will sketch the double story of the craftsmanship and the academic evaluation of representing networks as dots and lines, from Jacob Moreno’s sociograms, to Peter Eades‘ first force-driven layout algorithms, to Helen Purchase’s aesthetic criteria, and finally to the data deluge and the availability of large complex networks, paving the way to network science. This story will show how practices shape cultures in ways that escape the control of academics, at once bad and good, uninformed yet innovative, unjustified yet meaningful. Reading a large complex network is not the same as a small diagram, and the talk will provide illustrations of the ideas that had to be left behind to properly frame graph drawing as a mediation driven by technology. Which will also help you read large network maps properly. Mathieu Jacomy is Doctor of Techno-Anthropology and assistant professor at the Aalborg University Tantlab, and MASSHINE center. He was a research engineer for 10 years at the Sciences Po médialab in Paris, and is a co-founder of Gephi, a popular network visualization tool. He develops digital instruments involving data visualization and network analysis for the social science and humanities. His current research focuses on visual network analysis, digital controversy mapping, and computational social science.
Keywords
Digital signalMathieu functionVisualization (computer graphics)Computer networkLine (geometry)WordBitGoodness of fitAuthorizationField (computer science)Computer networkPairwise comparisonGraph (mathematics)Graph (mathematics)Dot productPoint (geometry)TheoryAlgorithmAreaSound effectStatement (computer science)Address spaceInstance (computer science)Graph drawingBoundary value problemVertex (graph theory)Link (knot theory)Computer animation
Mathematical analysisComputer networkVisual systemMachine learningAlgorithmComputer networkGraph (mathematics)WordMathematicsLevel (video gaming)ExpressionTerm (mathematics)DatabaseAbstractionAlgorithmMachine learningLine (geometry)Graph coloringDot productConnectivity (graph theory)XMLComputer animation
Machine learningCodeAlgorithmWordComputer clusterKognitionswissenschaftMereologyLevel (video gaming)Dot productRevision controlComputer animation
Visualization (computer graphics)InformationBridging (networking)Computer networkMereologyComputer clusterTerm (mathematics)Level (video gaming)PercolationLine (geometry)AlgorithmWordConnectivity (graph theory)
Machine learningAlgorithmPhysical systemMilitary operationData managementVisualization (computer graphics)Level (video gaming)Dot productAlgorithmSignal processingComputer science
AbstractionAlgorithmAlgorithmNeuroinformatikMathematical analysisDivisorLevel (video gaming)Right angleWordVisualization (computer graphics)
Mathematical analysisComputer networkVisual systemComputer clusterDesignwissenschaft <Informatik>AlgorithmMachine learningMereologyBridging (networking)Hydraulic jumpField (computer science)MappingLevel (video gaming)Computer clusterAudiovisualisierungAlgorithmComputer networkComputer animation
Data transmissionDegree (graph theory)SkewnessFinitary relationParameter (computer programming)View (database)Data structureGroup actionSmith chartTheory of relativityWordComputer networkClassical physicsVulnerability (computing)Process (computing)Different (Kate Ryan album)Computer animation
Computer networkObject (grammar)RoboticsComputer networkTheory of relativityData structureParameter (computer programming)Source code
Gamma functionMaxima and minimaDifferent (Kate Ryan album)MereologyContext awarenessComputer networkGraph (mathematics)Universe (mathematics)Observational studyField (computer science)Computer animationSource code
Mathematical analysisComputer networkVisual systemComputer networkConnectivity (graph theory)MiniDiscComputer networkField (computer science)Computer networkTheoryBoundary value problemDemosceneTheory of relativityJSON
Ocean currentRepresentation (politics)Computer networkAudiovisualisierungTheory of relativityComputer networkField (computer science)Computer animationSource code
Mathematical analysisComputer networkStatisticsVisual systemComputer networkData typeTriangleNumberDegree (graph theory)Computer clusterSparse matrixConnectivity (graph theory)Vertex (graph theory)Self-organizationRankingWeb pageDiameterPopulation densityComputer-generated imageryDatabaseBuildingProcess (computing)Formal grammarLevel (video gaming)Einbettung <Mathematik>Directed graphReduction of orderAttribute grammarGraph (mathematics)AudiovisualisierungComputer networkForcing (mathematics)AlgorithmComputer networkGraph (mathematics)Computer clusterData structureRingnetzDot productDifferent (Kate Ryan album)Right angleMultiplicationMathematical analysisDiagramClosed setCentralizer and normalizerMetric systemBridging (networking)NeuroinformatikForm (programming)Visualization (computer graphics)Line (geometry)Arithmetic meanInterpreter (computing)Connectivity (graph theory)Reading (process)Machine visionMereologyGoodness of fitPosition operatorElectronic mailing listGraph coloringAuthorizationChainBitResultantNichtlineares GleichungssystemPhysical lawThermodynamic equilibriumNichtnewtonsche FlüssigkeitDirection (geometry)Mobile WebSpring (hydrology)ForceFood energyFamilyPrice indexComputer animationXML
AlgorithmDifferent (Kate Ryan album)Ocean currentVisualization (computer graphics)Computer networkAreaGraph theoryVisual systemDigital object identifierPlanar graphDegree (graph theory)LengthMaxima and minimaPopulation densityUniform convergenceVertex (graph theory)Data structureHierarchyVertical directionDiagramCartesian coordinate systemPolygonGraph (mathematics)Symmetry (physics)Physical systemSystem programmingDatabase transactionHeuristicRAIDGroup actionGraph (mathematics)SoftwareComputerGraph drawingForceLine (geometry)Revision controlRight angleNichtlineares GleichungssystemNichtnewtonsche FlüssigkeitForcing (mathematics)Greatest elementGoodness of fitElectronic mailing listExpert systemObservational studyGraph (mathematics)Shape (magazine)Graph drawingAlgorithmCuboidForceFlow separationDifferent (Kate Ryan album)CircleParameter (computer programming)Dot productNumberDiagramTrianglePhysicalismResultantMultiplication signBitGraph (mathematics)Visualization (computer graphics)Computer networkComputer networkForm (programming)Computer animationEngineering drawing
SoftwareComputerForceGraph drawingMeasurementAlgorithmGraph drawingGraph (mathematics)Flow separationComputer clusterCoefficientChemical equationDistanceSpring (hydrology)Vertex (graph theory)Computer animation
PurchasingBerlin (carriage)Digital object identifierGraph drawingComputerSoftwareForceWritingCASE <Informatik>Computer networkTask (computing)Online helpSound effectGoodness of fitComputer networkVisualization (computer graphics)Different (Kate Ryan album)Computer animation
SoftwareComputerForceGraph drawingPurchasingBerlin (carriage)Dimensional analysisGraph (mathematics)Cylinder (geometry)InformationAlgorithmComputer networkData modelFood energyEndliche ModelltheorieGraph (mathematics)Flow separationLengthComputer clusterGraph (mathematics)Mathematical optimizationGroup actionAlgorithmNumberComputer networkComputer networkLatent heatGoodness of fitPolygon meshNeuroinformatikArithmetic mean19 (number)Graph drawingWeb pageComputer animation
Graph (mathematics)Open sourceDirected graphPiAlgorithmContinuous functionVisualization (computer graphics)Computer networkSoftwareComputer networkPopulation densitySystem identificationComplex (psychology)DiagramVertex (graph theory)Distribution (mathematics)TopologyEmpennageGraph drawingAlgorithmReal numberLevel (video gaming)MathematicsVisualization (computer graphics)Computer networkGoodness of fitInstance (computer science)Flow separationGraph (mathematics)Optimization problemNumberComputer clusterArchaeological field surveyMultiplication signMathematical optimizationComputer animation
PurchasingBerlin (carriage)Graph drawingComputerCommunications protocolGraph (mathematics)Electronic data processingCylinder (geometry)Computer networkPopulation densityDistribution (mathematics)EmpennageVertex (graph theory)TopologyDiagramDistanceGroup actionFlow separationLocal GroupAerodynamicsFood energyDifferent (Kate Ryan album)AlgorithmComputer networkComputer networkGraph coloringPosition operatorElectronic visual displayMereologyComputer clusterSparse matrixDirection (geometry)Group actionFlow separationGame theoryEndliche ModelltheoriePoint (geometry)SpacetimeNeuroinformatikMoment (mathematics)Slide ruleZoom lensDot productGraph (mathematics)BitCASE <Informatik>Square numberRight anglePhysical lawData structureInterpreter (computing)RingnetzVisual systemComputer animation
Sound effectFamilyMereologyCircleMultiplication signProcess (computing)Different (Kate Ryan album)ResultantComputer network
2 (number)Visualization (computer graphics)Computer networkLink (knot theory)Polarization (waves)BlogMeasurementTwitterNeuroinformatik
Attribute grammarVertex (graph theory)Cross-correlationRingnetzKey (cryptography)ChainMedical imagingMereologyCodePosition operatorGraph coloringRingnetzDot productComputer animation
Polarization (waves)NetzwerkschichtBlogLink (knot theory)Vertex (graph theory)Content (media)Computer-generated imageryProcess (computing)Position operatorReal numberCASE <Informatik>Graph coloringOutlierBlogContent (media)AreaAngleComputer clusterSound effectDot productMedical imagingComputer animation
Computer-generated imageryDatabaseComputer networkReduction of orderBuildingProcess (computing)Einbettung <Mathematik>Directed graphLevel (video gaming)Vertex (graph theory)Graph (mathematics)Polarization (waves)NetzwerkschichtBlogLink (knot theory)Content (media)Polarization (waves)Video gameBlogGraph coloringGraph (mathematics)Dot productMedical imagingRingnetzPosition operatorInterpreter (computing)Descriptive statisticsComputer animation
DatabaseComputer-generated imageryVertex (graph theory)Computer networkReduction of orderBuildingFormal grammarTouchscreenAttribute grammarEinbettung <Mathematik>Directed graphGraph (mathematics)Computer networkAuthorizationGraph (mathematics)AlgorithmComputer animation
Polarization (waves)NetzwerkschichtBlogLink (knot theory)Vertex (graph theory)Content (media)Computer-generated imageryAuthorizationDynamical systemBitArtificial neural networkMultiplication signComputer animation
Transcript: English(auto-generated)
So, as you know, I'm a tool maker, and now I'm an assistant professor, so I moved from tool making to the other side of the academia, but I've been making tools for science and
the humanities for more than 10 years, the most well known of which is Gephi, so for network visualization. And let me start with this statement, the user of a tool or algorithm may have different goals than those expected by the author of that tool or algorithm, and well, this question
came to me as a tool maker first, you know, people who are doing weird things with my tools. Yeah, so what do you do when you are in that situation? But more importantly, is it controversial to say that the users repurpose technology?
No, so can you repurpose technology? Of course, if you do something that was not planned by the algorithm designer, then maybe
you're doing it wrong, but then the question is who tells right from wrong, who says what you're allowed to do and who sets the rules, right? Is it the algorithm designer? Are you necessarily wrong to do something else? Does it mean it's not working? But at the heart of this question is for me, the fact that if you want to do something
else than what was initially planned, right, so that's what repurposing means, you have another purpose, then for me, it says nothing of whether or not it works, it may well work, or it may not, but your goals are different. So suddenly, as a user, repurposing puts you outside of the boundaries of something
that was anticipated by the algorithm designer, for instance. And I say that because the question I'm going to address is what's an effective network visualization? And, you know, that's the question we're going to turn around.
And so I met someone for whom this statement that users have their own goals seems controversial. I thought it was completely obvious, but it's not. So in an abstract, like this is all fresh, I wrote that observing network analysis practices shows that users have their own epistemic culture with a distant characterization
of effectiveness. I should have written a plural here, there are different cultures. But this just means that once I've got to observe what people actually do with Gephi, I realize that they do something else than I thought. And they do something else because they are in a different situation than me.
So they want something else. And what is important to them is maybe not what is important to me. And, you know, importantly, I don't mean that they are necessarily rightly right. And yet that's how one of my reviewers answered, with sarcasm, right, how caricatural science
has become in some areas where criticizing is considered a problem. The user is always right, even when wrong, because epistemic cultures, etc. Existing practices are perfect, and not understanding is great.
So now, okay, now I'm pushed into the seat of someone who says not understanding is great, which I think is completely on the contrary of what ever I've been trying to do. And but then I wonder, right, why would that reader understand that when I say users
have their own goals, I am implying that they are necessarily right, which I did not imply or state? Well, I think that you react like that because you think that the practices should be bounded by academic authority. You think that what people do should be, you know, framed by what science says and
then wrong. And OK, so some people out there think like that. What strikes me here is that academic authority, so like the theory of what you should
be doing with networks, has always been following the practices. So what science says we should do comes from observing what people were doing in the first place. So this is all balls, you know, like you should be doing the thing that everyone has accepted as the ideal thing, which was just actually observed from the field.
That's the end of your point of what I'm going to try to tell about today. So we'll come back to that. Who sets the rules? OK, that was just a preamble to, you know, for the mindset.
This is my actual outline. I'm going to show you an example so that we know what we are talking about, an example from my own practice, which is not representative of every practices. I'm going to compare network analysis with social network analysis, which sounds the same as but is not going to be quick on that, but I can go deeper if you need.
If you have questions, I'm going to push that a little bit more to talk about other fields with network analysis, and then I'm going to go to the history of graph drawing, as it's called, you know, who sets the academic authority about what a good graph drawing or good network visualization is and what does it mean and
where does it come from? OK, and finally, I will talk a little bit more about the layout, another word for the same thing, how you visualize the network. I realize that I'm saying network, graph, a lot of those words already.
So just maybe for clarity, I'm assuming here that you are somehow familiar with network analysis, but if you're not, basically I mean dots connected by lines. And sometimes they're called nodes and links or edges and vertices and so on. But I hope an example will make it clear.
And graph is just a fancy math word for network. So these kind of things, you've got a quite complex example here. This is from my own practice. This is a kind of recent work. And it's a network we have with the other people involved here, Anas and Matilda and
so on, we've annotated that network. We've annotated the data that is represented as this network and we've annotated it manually on a big map. It was in the beginning, the network in Gephi.
So you don't see the dots here. They are too small, but you have a lot of small dots, but you see the lines that are connecting those dots. Each dot is a word or an expression that comes from 1.5 million abstracts from Scopus or academic publications that have in common to be mentioning algorithms or
AI or machine learning in various ways. So this is kind of the landscape for 10 years of AI and algorithms in science from Scopus, which is one of the databases that has that.
So, yeah, I'm going to skip over some of the methodological details, but we extracted named entities and other terms from those papers. And when they co-occur a lot, they are linked. So each dot, they are so small, you don't see them, but each dot is an expression and the lines are connections and those connections means they appear
together in the same abstract. So when you have kind of a semantic field, if you want, where many words appear together all the time, because they talk about the same thing, then they would make a small cluster as we say it, right? And you see some of those clusters, and then you can use a community detection algorithm to detect some communities and you get the colors.
But whatever the colors, we didn't use that. So what we actually did is we used an algorithm named click percolation, whatever that is, it finds some clusters and we've annotated each cluster. So each cluster is a bunch of words that appear together.
Then we went back to the articles that contain those words specifically, and we've tried to summarize what algorithms do in those. And so that's a lot of manual work, like qualitative coding. And then we've put these annotations on the map. So you can see that here. So sometimes they are in the clusters are very nice.
Sometimes they are not very nice in the visualization, but here you have, so you have in black, you have the printed version. So you see some of the words, every dot has a word, but I cannot show all of them because there are too many, but you can see, you know, neurons stimuli, EEG, cortex, prefrontal, youth, psychiatric, whole brain, psychiatric, psychiatric, and mental health.
You know, and we have determined by manual coding that part of it is brain science. Part of it is analyzing brain imagery. Part of it is mental health. And those are short names, but we have actually a whole paragraph describing what each of those labels means.
So we've reduced these thousands of dots to about 250 annotations. Right? So we have used algorithms in this process, but we also have used the printed map as a substrate for manual annotation. Here's another part. That one was very dense.
A lot is going on here. This is like more like social science. Yet another part, sometimes it was complicated. All these orange line come from the fact that the click percolation algorithm that works on the network itself, not the visualized version, it finds clusters
that sometimes are not clusters visually. So in terms of topology, they are clusters, but in the map they get stretched to different parts because they are basically bridges between clusters. We'll get more into that, but you know, we had to compromise because the
visualization is losing some information because the network is not flat. Really. So we make it flat to be able to visualize it, but we are losing some information and what are the things we lose is those bridges, but the bridges are as much clusters as the others, but they appear as bridges. So, Hey, it was hard.
You know, we, we were trying to find compromises between all of that. So for instance, to give you an example here, we have a connection between youth and mental health and so on with something that goes here and neonatal sepsis, risk estimation, and so on, and the label is children. So there's a cluster for children, but it has words that also overlap with two other clusters.
Then we have tried to, you know, make the visuals more clear about these bridges, and then I'm not going to explain yet why it's red, but we have the same, you know, the blue is the original words, the dots as we worked on, and then the red is my annotations and we've produced a big map
that you can see here. It's a, it's actually made to be printed as a big poster. I couldn't bring it with me, but you have, you know, you have areas. So you see, for instance, on the left computer science signal processing on the right, you see health and medical science. So they talk, they, they talk about different things.
And if you zoom in and if you get closer, you cannot hear, but you would see the little annotations. These are all the things algorithms do in science, if you want. And then you can use these map as a, as a support for further analysis. So here, for instance, sorry for the low contrast, but basically in a
that is currently under review in big data and society, we produce this kind of visualizations here on the left, you have the words that appear in the articles where I read in the title algorithms based on, which I would characterize as one of the ways you can grasp papers.
About algorithms, you know, about making algorithms, algorithm papers, and then those tends to be on the left on the map, you know, computer science, right? While if you look for algorithms that contain algorithms wise used to so using algorithms, then you get something that is more on the right of the map. So this map, one of the big kind of divide you can find in this
map is making algorithms versus using algorithms, you know, and we study more things. I'm not going to go into the details. So let's, let's kind of jump to take away. So this is an example of a network analysis, basically, you know, I don't have very big findings here for you, but that's because visual network analysis is not really the kind of things that gives you
big findings in it's better to get questions. It's very exploratory. What I've been doing here is very descriptive. I've been describing what algorithms are doing in science, documenting it with a way that it looks like distant reading, if you want. It's about clusters and bridges.
It requires qualitative knowledge to be read. So if I had not annotated that this map would be useless. Yeah, it's just one part of a larger research design. We can use those maps to, you know, as an elicitation device for other people to engage with the field and so on, to make them talk, to do interviews.
It's part of a quality quantitative method. That's what network analysis is very exploratory. You know, it's not, it's not hypothesis driven. It's not hypothetical deductive. So that's an example of what you can do with a network analysis. And so let me just compare with the social network analysis.
If you're not familiar with that. And the main difference is that for me, the network was something I built. So it comes from the literature, but I gave it to myself through a process of extracting words from documents. While for someone like Mark Granovetter sociologist, like the network, actually the word network refers to the empirical reality of people
being in relation with each other. The most famous paper from Granovetter is probably the strength of weak ties. I maybe, you know, that paper already. I'm going to assume so as we can talk about it, but I want, you know, from another paper, show you how, what it means for sociologists,
the network and how behavioral institutions are affected by social relations is one of the classic questions of social theory, his words. And basically if you look at the quote in orange on the right, the embeddedness argument stresses instead the role of whatever that means concrete
personal relations and structure or networks. So that's what the network means for. Kind of a basic, in short, these paper argues that people are not just agents that do what their, what the social structure tells them to do. They are not robot completely determined by the social structure, but they
are also not unaffected and that the reality of the social relations whole exists between those two caricature takes anyways. So the network for Granovetter, the network is the phenomenon, is the relations while for me, the network, which is the tool on the methodological objective.
And, and well, so that's part of the differences that you, you, you can find if you look in the different fields that deal with networks. Just have to mention that actor network theory, we jump over that, but that's one of the fields that is so close to me because I've been working with STS
science and technology studies for so much for so long, essentially network is a metaphor in this context. It just means that you can look at as a network of actors, as a single actor, you know, Humboldt university could be seen as an actor sometimes, you know, it wants something.
It wants new money for the new building, but also it's also a network of actors in the sense that it can be divided into agents that want different things, actors that want different things. So it just means that every actor is a, is potentially a network and every network an actor. But that's a metaphor, right? You don't actually get nodes and edges in this context. And if you do it means something else.
So the, but that's, that's kind of an outlier. The three main fields are network science, social network analysis, and network analysis. We've seen social network analysis where the network is the phenomenon. And the one we've not talked about is network science. And while that's an interdisciplinary research field, it has
30 boundaries if you want, but it's mainly concerned with what I want to call complex networks. It deals with complexity in many ways. So it has its own concerns and theories and so on. Social network analysis, the oldest of all deals with the social scene
as a matter of relations. And it's the oldest you know, in the thirties, you find the first visual representations of networks. I'm going to come back to that. And then you've got network analysis and network analysis is just for the people
who deal with data with relational data. So whatever data is about things connected together, and if you need to look at the structure, so you would be interested in the relations more than the substance, there are things in the relations that, you know, you would not find if you just look at the entities, you would only find that if you look at their relations, you know, stuff
like notoriety, authority, maybe. Then you need to analyze networks. And if you do, then you need some techniques and that that's a field about analyzing network. So they overlap, but they are not the same.
And visual network analysis is basically network analysis with pictures, nothing more, you know, you do, you have some data, if your data is relational, then you need network analysis. And if you use pictures to analyze it, then you do VNA visual network analysis. That's the end of it. You don't have to, you can analyze networks with non-visual means.
And there are other ways to visualize networks like matrices, for instance, and yet other ways that I'm not talking about, I'm just going to acknowledge it exists as well. Here I'm really into what we call, you know, the dot line diagrams. And one of the big questions you have when you do that is how do the visuals
relate to other forms of analysis that may not be visual, you can compute, for instance, from that structure centrality metrics, for instance, you can compute betweenness centrality, it finds the bridges if you want, or closeness centrality, and they mean different things.
So here you have an actual network with actual metrics. And then the betweenness, the node that has the highest betweenness centrality is not the same than the one with the highest closeness centrality, the centrality can be implemented in different ways, depending on the nuances of what you think it means. And you can see it in some ways, but of course, actually being
familiar with, linking the metrics with the visuals requires a literacy and experience that you have to go through. It's not that easy. Yeah. So it's a practice also in that sense, if you use Gephi a lot, you will get start to see where the bridges are and so on and so forth.
We will open a little bit that question, but not too much because what's really important is that so visual network analysis is something you do through a mythological chain where you go by data, you would
extract the data, then you would extract the network from the data. And by this, I mean, you know, sometimes your data could be transformed into a network in multiple different ways. If you have a list of papers, you could make a network of papers
linked by if they have the same authors, or you could make a network of keywords linked when they appear in the same paper, or you could have a network of authors linked when they appear in the same papers, those are completely different networks that tell you different things and they come from the same data. And then you would have the step we talked the most about, which
is putting the dots somewhere in the space so that it makes kind of a map. And then you would have another step, which would be putting colors and other things so that it's readable and you can share it with other people. But I'm making the distinction here because the last step is just semiotics. We see that just semiotics is, I mean, there's more to it.
But the part that is really problematic is from the network data, which is a list of nodes and a list of edges to positions of the dots. This is an algorithm. And then you need to understand what the algorithm is doing. And then people will say, but you don't know what you're
doing. Right. And here we are. Okay. So let's jump into the main dish. Who says what is a good network visualization? And I mean here, when do you have this question of putting the dots in space? What is a good way to do it? And who says it's good and why?
And I'm giving kind of, I'm planting a seed here because this is also going to be part of my conclusion, but it's so important. As you will see in the whole story, there are two visions that confront that are confronted in the literature.
One of them is what I would call the diagrammatic way of interpreting networks. It works when you have to read a network that is essentially a diagram, like the kind you see on the left. You know, you don't have that many nodes. And then in this situation, reading means being able to follow the path, so the connections.
So reading is about following the connections. And then you have what I want to call the topological interpretation regime, where there are too many lines anyways, so you cannot follow them. It's hopeless. Then what you want to actually do is to get indications about the structure from the position of nodes.
So you want to see the clusters, you know, the blue cluster, the red cluster. They are in different places. And that mediates the knowledge about the structure. We want to see the clusters. That's a completely different thing.
Okay. So before we move on just shortly, how does this work? So you have the nodes that are basically the dots, and then you have the edges that are basically the lines and the algorithms I talk about that are called force driven. They are, it's a family of algorithm. They come in different flavors, but the principle is always the same.
You have two forces, a repulsion force between every pair of nodes and an attraction force when they are linked, not if they are not linked. And basically you can think of the attraction force is like a spring and a spring, you know, it pulls more when
the nodes are far, where the things are far away. And when they are closed, the spring doesn't pull much. Right? So the force is, is stronger when the nodes are far away. While the repulsion is like magnets of the same polarity, it repels more when the magnets are closed and it doesn't do anything when they are far away. So when they are closed, the repulsion dominates.
And when they are far away, the attraction of the spring dominates. So they tend to find an equilibrium. There has to be an equilibrium, except it's not just for two nodes, it's for every node. So it's like big mobile and the algorithm is iterative. It runs, you know, step after step, it moves all the nodes in the directions where that minimizes the energy.
If you want to, this is Newtonian physics, essentially. And then it reaches an equilibrium that gives you the position of the nodes. And here you have three different algorithms. They all use the same principle, but the laws are different. No, the way the equations used are different and it gives different results.
The most obvious one is cluster separability, which is, I think the thing that nowadays we want the most because we want the clusters mediate the structure, so we want to have the cluster of the most well separated visually that really helps. So you have like three different algorithms and the same network, and they produce three different kinds of visual affordances, right?
I think we can have an intersubjective agreement about the fact that the one on the right is where the clusters are the most visible. I hope it works for you. Else? I'm interested in debating that with you. Okay.
So the first network visualization was made by, at least I'm going to say that here by Morinot in the thirties, and he calls that geography emotions mapped by new geographies. So the nodes here, triangles and circles represent people, kids actually.
And you have, so it's a classroom. The boys are in triangles and the girls are in circles and he asks them, who would you like to sit next to? So they give an answer. And then when they say that he connects them.
So you can see that there is only one girl who wants to sit next to a boy or vice versa, as the boys want to sit next to a boy and the girls next to girls. You can also see that there are two girls that are just disconnected from the rest on the top right. Morinot didn't know that's what you see depends on where you put the nodes.
He was aware of that. You can see this question of the relative separability of the boys and the girls, because he puts them, you know, on each different side. And he had to basically invent that and do that manually. He tried different things and yeah, so this is something we
do nowadays with algorithms. That's the purpose of the layout algorithm with attraction and repulsion. I want to highlight as well that this was empirical data, right? So this is kids and whether or not they want to sit next to each other. So this is in the thirties and then, so there is, I'm jumping there.
It's not that nothing is going on, but not much is going on, especially because the question of graph drawing will reboot from a different place. And I'm going to restart this story with in the eighties, you have in the
industry people doing diagrams. So they have, they have the need for the purpose of the industry to draw these networks as we would call them today. But as you can see, so this is on the left on the top left. Um, this is not just dots on like, you know, the, those are boxes and they have different shapes. You know, it's, it's a diagram. The nodes are not just dots.
They are more, there's more to it. Um, but the question is like, what is, what is a good way to draw those diagrams? So Sugiyama and others make an empirical study. So what they do is they go, um, to those people who draw diagrams. They talk to them about what they think is a good practice of drawing diagrams.
And they tell various things. They compile that from the experts. They make a list of what they call aesthetic criteria, which is what the people say is best when you draw that. You have an example of such lists compiled from a later paper in the 1993. But, um, yeah, so this idea of aesthetic criteria comes from the practices.
So in the beginning, no one knew, except the people who are doing it, what was a good network visualization. So the researchers went and listened to what people do. And they said many things. So here typically in the left, you have from the paper, from Sugiyama and others on the top left, the network they take, and then they apply their principles.
Symmetry, aligning the stuff, preventing edges from crossings, if you can, and so on, that there are a number of aesthetic criteria, and then they produce the one on the bottom that they find more readable. So with time, more and more of these aesthetic criteria will appear.
But the idea was here. Then 84, the first algorithm to do graph drawing was invented by Peter Eads. It's called spring. It literally has the strength of the, the forces of the spring and magnets.
The literal Newtonian equations. And well, a few things to remark. The first one is the networks are quite small. Second thing, and I have to explain, I didn't have, this is from my PhD
thesis, and I didn't have the right to reproduce the pictures. I had to redraw it myself, but it's basically a hand-drawn version of something that looks exactly the same. So the dots in the paper are from, I'm going to show you a lot of papers on their pictures that maybe our argument is visual here. It's actually dots on lines. So now the rest, you know, the rest of the, the, the different form shapes of the boxes, the different forms of lines, dashed lines, all these visual
complexity, the semiotics is now out. This is now dots on line. And then, you know, Eads is showing the results he obtains with his algorithm. Small networks. And he, you know, grants that into the aesthetic criteria of, of Sugiyama.
And others, but he, he justifies why his thing works using this criteria. I know some work better than others, but okay. And then I'm jumping to another one in 1988. And so those, they refer both to, to Sugiyama and to Peter Eads and
they, they explore different ways. So the semiotics are back here. I really like this one where the nodes are lines as well. Everything is aligned, just horizontal about here. So there is a little bit of invention and we start to see a big network that is empirical.
This is quite rare at the time. Okay. So they, they, they even further enforce the justification based on aesthetic criteria. And the algorithm is, they don't propose an algorithm.
Do they? I think they don't. So they really reflect on what's a good visualization. And then comes the next big step in algorithms for graph drawing, which is Frusturm and Reingold. The Eads algorithm is not in Kefi, but this one is.
And so we are, we are not much later, but this algorithm is much better. So I could show, I don't have the time to show you, but it is more efficient and it gives more visually pleasing results. Because they don't, contrary to Eads, they allow them, their
forces to not mimic closely the physics of the real world, but they are different in a way that makes the result better. So they give themselves more freedom. So once again, we still have, now we are back to just nodes and edges. The semiotics have disappeared.
I mean, the boxes have disappeared, just dots. But what's really interesting is that they, they, they start breaking with this question of aesthetic criteria that, you know, remember it came from observing the practices for
these goals, you know, the aesthetic criteria of Eads and Tamasia and so on. We have only two principles for graph drawing. First vertices connected by an edge should be drawn near each other and second vertices should not be drawn too close to each other. And so they, instead of using all these aesthetic criteria as
many goals that you want to, to maximize and to cover them all, they say it's better if you just focus on two of them. One of them, especially is different from what Eads does. So they say vertices can be drawn near each other.
So if you're connected, you should be really as close as possible, essentially. Well, for Peter Eads, his algorithm states that nodes should be at the same distance from each other. You know, remember the spring and the magnet, there is an ideal distance that is not as short as possible. The ideal is, you know, whatever the balance is, it
depends on the coefficient in the forces, but it's, it's algorithms wanted the nodes to be as separated as possible. They say, no, no, we want them as close as possible. If we can accept that we still must separate the nodes to see them different, but then, you know, we are on the way to what we will call later on cluster separability.
So, and I have been fishing for this quote in that paper, because even though they, they state black and white, that aesthetic criteria doesn't make good algorithms. This gets forgotten. And so, so this is 1991 and six years later, we have a paper
that will be quoted a lot later on. And I'm not saying that this is a bad paper or bad science is really good science because nothing has been evaluating the quality of the layout. So someone is trying to do it. Thank you, Helen purchase, you know, which aesthetic has the greatest effect.
And then she does empirical research, you know, with people reading networks, and then she's looking at which, you know, which different kinds of visualization help or not on which aesthetic criteria actually make people perform certain tasks better. And I mean, this is all unwell, but her case is 16 nodes.
Right? So you can see here, the example of the network. So the same network, but visualized in different ways. If I go back to the paper, we already have networks here in 91 that have more nodes than 16. Now we go back to evaluating very small networks. Nineteen seven, Google is 98, right?
And the arrival of network science is 99. We are, we are on the, on the, on the verge of the data deluge, and this is what goes on. So now we move to 2001 and now the means of computation are suddenly much higher. So the data deluge is not exactly there yet, but what we
have is a lot of meshes. So those come from the 3d world, 3d graphics. And so this is a new algorithm. Very good. And as you can see here on the pictures, we don't have these very small networks. Now we have big networks, but those networks are basically lattices, they look like grids. That's a specific kind of networks, but they are big.
And so their algorithm works very well with big networks that are lattices. So you see here, the number of nodes up to 10,000 nodes. 2005, FM3, famous algorithm, once again, even bigger
networks from 3d worlds. And finally, the one that is nowadays considered like the gold standard, the Linlog paper by Andreas Nowak. And basically not only we have big networks, but they get empirical again. So, you know, they have not been much empirical since
the thirties and now we get, you know, the data is coming, you have big relational data, and now we can start visualizing networks of web pages and stuff like that. Here in this case, these are countries with a, I don't remember what this is. And okay, 2007, and once again, during all that time,
since 91, the aesthetic criteria have still been confirmed systematically by the papers to justify that their algorithm is better than the previous one. And Nowak just kills it in the sense that he says, okay, I'm saying it, what we want. So this is what I call the topological interpretation regime.
We want to optimize network visualization for cluster separability. And we want that group of dancing connected nodes. And we want to separate the sparsely connected nodes. If you want to do that, you have to violate aesthetic criteria in particular, small edge length or
uniformly distributed nodes. So some of these aesthetic criteria that tie back to the eighties and looking at people's practices are actually a problem. If you want to enforce them, it will never work. I'm not going to go too much into the details, but it's kind of a very material thing. They are just not good.
And if you want to see the clusters, you have to get rid of those. Which means that, you know, I mean, this move from Nowak means I've put myself into trouble because I'm saying that I will intentionally not do what the literature says you should be doing. So, you know, how do you fight against the consensus here?
Well, at the heart of this is his algorithm was just the best or practical reasons. So people loved it and kept using it. Other algorithms from later on. And now you can see some big, big networks that are also empirical. This is a map of science. This is the algorithm developed for the map of
science by Clevance. This is the open node. This is my own paper for Faucetless 2, which is one of the algorithms in Gephi, which is not so different from the Linlogg, actually didn't know the Linlogg when I created that. Anyways, so there are a number of things worth
mentioning at this stage. One of them is that one of the reasons you want to have a metric for graph drawing or the quality of
graph drawing is because measuring what is a good visualization allows you to make an algorithm. One of the ways math works is that if you can frame a real world issue as an optimization problem, then you
can try to solve that optimization problem. You know, you can try to find a solution that fits the closest your goal, but you need a quantitative goal. You cannot do that if your goal is subjective, like if every time you have to do a survey to ask if people like it or not. So this quest for a good criteria for graph
drawing is not just for assessing them. It's also to make algorithms. And yet, as you can see, so first of all, the people who make those algorithms, especially the one that were successful, so Frustam and Reingold and later on the Linlogg and even later on stuff like
all the following ones, they just broke free from this criteria saying, okay, I'm not going to listen what you think is good because I've tried and it doesn't work. So my move is going to be forget about it. I'm going to optimize for something else. No X says, okay, cluster separability. He sets a different goal. Well, I get rid of the symmetry, for instance.
I may have it or not. I get rid of the short edges. If you try to do that, it's not going to work. I don't want to optimize for that. I want to optimize for this. It's better, I think. And then it worked because people liked it. So we have kind of 15 years where the, the supposed justification, which are aesthetic criteria are not met by people's practices.
They are not listened to by the people who make the good algorithms that people use, you know, so the practices are, have been there at the forefront of what it is to make good visualizations all along. And basically the whole literature on graph drawing has been running behind. No, this is why we are still at evaluating networks of
16 known in 97. And then four years later, we have algorithms that assess 10,000 nodes. Obviously the criteria have to be different because 16 node network on the 10,000 nodes networks are very different objects, right? It's, I mean, would you compare a pile of sand with a bunch of some grains, you know, they behave
differently. So the, the practices in the literature have been moving to the topological interpretation regime. And basically this one hinges upon the cluster separability. That's the heart of it.
So the kind of mediation it is, is make the clusters clear visually that mediates the knowledge about the structure. It doesn't matter if we cannot follow the edges. Following the edges can still matter, but if your network is small enough that it has a meaning, you know, that you have any hope of following it.
So from there, I want just to talk a little bit to, to land on something a little bit more practical about what, how do you, then how do you read the network? How do you read the layout? And I'm going to rest on the gestalt law of grouping.
I think, you know, the gestalt. And so the law of grouping states that if you have a bunch of dots, for instance, but we are talking about dots in space, right? So this is all about that. So your eyes make groups, whether you want it or not, your brain is seeing those groups intuitively without any effort on your, on your end, just
from how close they are. So you see probably like me, kind of a square of dots on the left and then three columns of dots on the right. There are no columns or square. These are just dots in space, but group them this way in our minds. I mean, for me, this works very well. There are other gestalt laws, but this one is
the most important for us. So one of the questions would be, for instance, okay. If I give myself three groups of dots and I leave the positions the same in the three groups, but I slowly separate them more and more.
What you will see that there is kind of an edge case situation. So on the top for the dots on top, you would see that as one cluster. They look like one group. I assume that you see that as on the left, while for the one on the bottom, here it's a little bit hidden behind the zoom stuff, but whatever you would probably see that as three different
groups, and then we could play the game. Sometimes I do that. Actually, I have some slides not with you today. Well, do you, at which moment the separation makes you see multiple groups, you know? So this is how we see, this is just our visual system is made like that, whether we want it or not, that just how it is. And I am mentioning all of that because I want
to show that there are very legit reasons why people's practices may differ from the goals set by a computer scientist, a center, you
know? So this is on top, how the algorithms think. They want to see the differences between the groups from center to center of the groups. So for the algorithms, these are three groups
that are the same. They would be occupying the same space. While here for the computer, they occupy different spaces, the three groups, because they are not overlapping. They are in three different positions because it sees the gap from very center to very center.
Us, to start seeing the gaps, we need actual empty spaces. So because we see the gaps border to border, so we see the gaps here. Here we see no gaps, but there is a space. There is a situation where the algorithm has
created a separation, but we don't see it. And that's just due to, you know, it's an illusion if you want, but like who's right? Are we right? Is the computer right? Like no one is right or everyone is right. We just don't think the same way. So I can, I can actually measure that experimentally.
So this is a model called planted partition. Basically you give yourself two clusters and they have a certain probability of being linked inside the cluster and a certain probability to be linked from one to the other. And the two probabilities go to sum up to 100%. So it goes from the most separated to more and more homogeneous. At the bottom here, if you have as many chances as being connected, whether or not
you are in the same cluster, you basically have around them networks and your groups don't exist anymore. But what's really weird is that my point is quite subtle, but I will, I will still try to make it convincingly. As long as we have the gap, it's really
clear to us that we have two groups, right? So the game here is can you guess the groups from the positions? And of course you have the colors. So because you have the colors, you always see the groups as long as they exist. But imagine you don't have the colors. At the 60 against 40%, you don't see the
separation anymore, but the algorithm has still put all the, the one group on the left and the other one on the right. So it has sorted them out. It still knows which is which, but you just don't see it. I can give you an actual example from,
for that, but before I'm going to give you the clue that helps you see it. When that happens, it, the, the network gets elongated in a certain direction and you would have like denser parts on the ends and the sparse of parts in the middle. So that's how you know that they are kind of clusters that are there, but they, they have not been, they do
not have the gap to help your visual system. So this is a C-legand, super famous network from the 98 papers about small work networks by Strogatz and Watts. I didn't display the edges, but the edges are there because you know, the layout uses the edges to produce the,
the positions. You just have the dots. And then if I ask you, what are the clusters, you know, it looks like one big thing, except maybe these, those guys are their own thing and maybe those, but basically the big structure. So the trick is to realize that it's not bold, it's not round. So if you run a community detection
algorithm, for instance, we're going to find three different clusters. Colored here in, you know, three different colors and they were actually hidden in the structure because of the shape, you know, this elongation produced by the layout gives you visual clues. If you know how to interpret them about that.
So you have to say, okay, clusters may not have gap between them, but they will stretch the network in a certain way that gives me a visual clue. And, you know, if I use this exact same network and I'd run community detection again and again, I'm going to find different results because it's a non-deterministic process.
So basically you could say it's, it's unclear which nodes belongs to which community, which cluster, but it's, it's unclear in the middle. This time it even found four communities and not three, but on the, on the sides, on the places
where it pulls the network, this is stable. So for certain nodes know, if you want as a metaphor, they know where they belong and others hesitate, which is actually, you know, this kind of ambiguity is a feature of the data. It's just how it is, you know, you're not just in one social circle. It would make no sense for me to ask you if you belong to your
family or to your circle of friends, like obviously you belong to the two and you know, and maybe your childhood friends are almost part of your family and you know, it's gradual and it's overlapping and it's messy and it's complicated. Like reality is like that. So it's not really surprising that empirical data also has this kind of effect.
So let's land on how you interpret a network like this one. This one is too good to be true. I mean, it's, it's a true, but it's a, it works very well for a good reason. This is blogs just before the second Bush election. It's a very famous network by
Lada Damic and Natalie Glantz. So blogs of Republicans, they are connected by an edge. You know, this is pre-Twitter social media, right? Think you could, you could think
of that as Twitter, a Twitter network of who follows whom, basically, if you're not familiar with blogs. Anyways, we are in 2000, the paper is in 2005, and it's one of the very so convincing examples of a network analysis. And so everyone sees the blue ball and the red ball, and the paper makes a statistical measure
based on network analysis, where they say that they measure the polarized, the polarization of that network in the sense that Democrats tend to link Democrats and Republicans tend to link Republicans. There are not many links from
one side to the other. They don't say that we see that in this picture. They use the picture as a screenshot of their method, if you want. And yet that picture was so famous that it's also, it was also reused in the famous paper
on computational social science by Lazare and others. The question is, how are you supposed to read that? Because obviously it looks like polarization, if you want to see that way, because it looks like the blue are on one side and the red are on the other side. But can you make, can you explain actually how this
network, how this visualization relates to the polarization? Can you go through, I'm missing a slide, through the chain of mediation, you know, what the layout is doing and so on. So the key to reading this image is that we have two different
things going on. One is where do I put the dots? And this is using the layout. We've been talking about that a lot, so you know what it does. So this, the layout has been able to find two clusters, which means that the position in the picture says who you're
connected to. You're connected to the dots that are around you, basically. And then, so that's the bottom path, the layout mediates the topology. And then the top path is I'm using the color to represent the coding of the nodes. So whether they are Democrats or Republican. And, you know, with blue and
red and two balls, you could have different situations. You could have the blue and the red corresponding to the two clusters, or they could be mixed. The part that is totally not obvious is that the color ignores the position and vice versa, like the process for the color and the process for the position of the nodes are
independent. If they were not independent, this would mean nothing. But you don't understand what's going on. And actually, if you look closely, and that's, I mean, that's whole, so we know this is empirical data. There are a few red that are
in the blue and a few blue that are in the red. Like there are a few outliers and oddities, like it's always the case with real world data, right? So that's how you know, actually, that's one of the ways you can see that this is two things that are independent and they just happen to correlate so much. So basically, the way to interpret that is, you know, we could go through the layers of mediation, the image, everyone
agrees that the blue dots are on the left and the red dots are on the right. Now, the layout plays the nodes of the same kind together. That's what the layout does. That's the effect of the algorithm, you know, because it is optimized for cluster separability, because it's an algorithm, it's pushed on an angle in that case. They want to put connected nodes
closer and so on. And then, you know, blogs tend to connect more with blogs of the same political affiliation. We know that because the blue are in the area where the blue are and the red are in the area where the red are. So they gather by color, which is not a given, right?
So then we can say, okay, when bloggers add another blog to their blogroll, it is generally one that has a similar political content that from the behavior of the people who make those blogs creates the situation. And now you can see what it means, you know, the political polarization, but that is totally not as obvious as, you
know, the, the strength of this image. Ah, this is the side I wanted to see, you know, so the kind of the colors belong to this step and the layout belongs to this step, if you want. And I had to go back to, oh, what is this step is not
nodes on edges. This step would be, those are blogs and the initial step would be, okay, those blogs represent the phenomenon of the political life in the USA and polarization of the political life in the USA. That's a complicated process, but that's what I'm doing properly, the layout, that's
what it entails. So if you really want to unpack the layout on someone, that's what you have to do. Just note that I have not in this description of the image, I have used the topological interpretation regime. I have only looked at the position of nodes. The, the edges could have been hidden. The readability of this image doesn't depend on drawing the
edges if I am using it this way, this because the edges are there anyways and coded into the positions of the dots. Yeah. So I'm basically done. Maybe that might take away to
go back to the preamble here would be the practices are not necessarily right, but they may be sometimes right against what the algorithms have been doing. And in this case, one of the reasons why academic authority is wrong is not that it's wrong in the, in the absolute, it's just that it is
sticking to a situation that is obsolete and it's not completely obsolete. Like the same way the TV didn't replace the radio, the big networks didn't replace the small networks. There are still people who have small networks to interpret and they may rely on readability as following the edges. This it's still there and they need these aesthetic criteria, but the most people's
situation like DivaD, they blog, you know, like, like this picture, they ha they are in a different situation and he took an incredible amount of time for the academic literature to abandon the old situation to the benefit of the new one and accept that we are now in a different world and that we have to
rethink everything. Also think from the standpoint of someone who writes the papers, you know, it's just much easier to tie it back to something that exists and satisfy the reviewers and move on, even though in some ways it's a little bit artificial or a little bit fake, then actually, you know, being the one who comes with the bad
news and says we have to rethink everything, right? So the internal dynamic of research also produces the situation, but that's one of the reasons why sometimes the practices can be right against the academic authority. They are not necessarily, but they may be, that's why it's worth looking at what actually is going on in the practices.
And that's what I had to say with him to you today. Thank you.