We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to create inspiring data

00:00

Formal Metadata

Title
How to create inspiring data
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
How to create inspiring data [EuroPython 2017 - Keynote - 2017-07-11 - Anfiteatro 2] [Rimini, Italy] Many times data visualizations need to communicate insights clearly and effectively. But sometimes the goals of a visualization go beyond that: they need to inspire and engage people. But how do you draw them in? What is the process behind creating a creative data visualization? During this talk, I will show some of my projects, and explain a little about the process behind it. Peter Hoffmann - Infrastructure as Python Code: Run your Services on Microsoft Azure "Infrastructure as Python Code: Run your Services on Microsoft Azure [EuroPython 2017 - Talk - 2017-07-11 - Anfiteatro 1] [Rimini, Italy] Using Infrastructure-as-Code principles with configuration through machine processable definition files in combination with the adoption of cloud computing provides faster feedback cycles in development/testing and less risk in deployment to production. The Microsoft Azure Cloud (https://azure.microsoft.com/) allows different ways to provision, deploy and run your python service: The Azure Resource Manger Templates (https://azure.microsoft.com/en-us/resources/templates/) allows you to provision your application using a declarative template. With parameters, variables and Azure template functions, the same template can be used to deploy your application in different stages (dev, test, production) and environments for different customers. We open sourced the tropo library (https://pypi.python.org/pypi/tropo/) to create Azure Resource Templates from python. Azure SDK for Python (http://azure-sdk-for-python.readthedocs.io) for a low level access to manage resources in the Azure Cloud. An Azure Ansible Module (https://docs.ansible.com/ansible/guide_azure.html) based on the Azure SDK to automate software provisioning, configuration management, and application deployment in a single environment. Each of the alternatives has different strengths and drawbacks. Presenting our learnings from migrating our infrastructure into the Azrue Cloud will help to avoid common pitfalls and show deployment patterns that will ease the live of devops
WordTwitterGoodness of fitElectronic mailing listLevel (video gaming)FrequencyVisualization (computer graphics)Moment (mathematics)Cellular automatonState of matterLecture/Conference
WordFile archiverProjective planeVisualization (computer graphics)Associative propertyGoodness of fitLecture/Conference
Group actionGame controllerEmailFile archiverFood energyState of matterUniform resource locatorComputer architectureCASE <Informatik>Lecture/Conference
ArchitecturePlanningEmpennageContent (media)Presentation of a groupPressureGraphical user interfaceAreaData structureComputer architectureComputer fileCombinational logicDigital photographySet (mathematics)Process (computing)Endliche ModelltheorieFormal languageWeb pagePattern languageOrder (biology)Content (media)Universe (mathematics)File archiverGoodness of fitSelf-organizationDistortion (mathematics)Visualization (computer graphics)CASE <Informatik>Semiconductor memoryProjective planeArmRepresentation (politics)Uniform resource locatorBitElement (mathematics)MereologyInformationWordMathematical morphologyNumberConnectivity (graph theory)Interactive televisionAreaRight angleNetwork topologyType theoryOutlierMetadataFrequencyMetreWebsiteGroup actionDressing (medical)GenderNatural languageSpacetimeWorkstation <Musikinstrument>Interface (computing)Hard disk driveWater vaporResultantElectronic mailing listPort scannerOpen setTouchscreenDifferent (Kate Ryan album)Exterior algebraKey (cryptography)Demo (music)Level (video gaming)Covering spaceIdentifiabilityDescriptive statisticsMultiplication sign1 (number)Instance (computer science)Computer animation
Combinational logicMultiplication signContent (media)WordPosition operatorInstance (computer science)File archiverProjective planeRight angleElectronic mailing listMereologyForm (programming)Similarity (geometry)BuildingType theoryResultantCorrespondence (mathematics)Mathematical morphology1 (number)Computer animation
Data structureData structureVisualization (computer graphics)Medical imagingFile archiverDifferent (Kate Ryan album)Line (geometry)Selectivity (electronic)1 (number)NumberWordSoftwareMereologyElectronic signatureComputer animation
File archiverForceVariety (linguistics)Visualization (computer graphics)Process (computing)AlgorithmException handlingSoftwareResultantProjective planeGene clusterParameter (computer programming)Set (mathematics)Exploratory data analysisLevel (video gaming)Computer animation
1 (number)Link (knot theory)Connected spaceRight angleCombinational logicElectronic mailing listUniform resource locatorComputer architectureSoftwareFile archiverSelf-organizationNatural languageRadiusGraph coloringCircleProcess (computing)Computer configurationWordReduction of orderInstance (computer science)Computer animation
Ultraviolet photoelectron spectroscopyVariety (linguistics)WordFile archiverLine (geometry)Greatest elementSoftwareReduction of orderMultiplication signVisualization (computer graphics)Scaling (geometry)Representation (politics)SpacetimePoint (geometry)View (database)NumberBit rate
BitMereologyFlow separationVisualization (computer graphics)InformationCircleDimensional analysisNumberBoris (given name)Drawing
Scaling (geometry)SummierbarkeitNumberWordDot productBitVisualization (computer graphics)Phase transitionCircleWater vaporOrder (biology)Representation (politics)Line (geometry)
AbstractionComplex (psychology)WindowInfinite conjugacy class propertyPoint (geometry)FlickrConnectivity (graph theory)DiagramMultiplication signLibrary catalogInstance (computer science)MeasurementFile formatFlow separationVolume (thermodynamics)Table (information)SatelliteSpacetimeUniform resource locatorDirection (geometry)Pie chartVisualization (computer graphics)Projective planeBitLecture/ConferenceComputer animation
Self-organizationBitVisualization (computer graphics)Group actionStudent's t-testMereologyTelecommunicationSpacetimeEngineering drawing
WindowDemo (music)Demo (music)Interactive televisionVisualization (computer graphics)YouTubeMultiplication signFrequencyInstance (computer science)Shape (magazine)ResultantGoogolBitTouchscreen
Information managementOrder of magnitudeGame controllerView (database)SpacetimeRight angleInteractive televisionAxiom of choiceVisualization (computer graphics)DistanceGreatest elementLatent heatTouchscreenMoment (mathematics)Flow separationSatelliteMultiplication signSheaf (mathematics)Slide ruleLibrary catalogComputer animation
Order of magnitudeCartesian coordinate systemResultantUniform resource locatorMathematicsDiagramOrder (biology)Video gameCycle (graph theory)Graph coloringData storage deviceDistanceView (database)Computer animation
Stereographic projectionPlotterGene clusterMultiplication signGame controllerGoodness of fitRight angleInstance (computer science)SpacetimeView (database)
MereologyNetwork topologyVisualization (computer graphics)BitSequenceView (database)Different (Kate Ryan album)Total S.A.Computer configurationData structureProcess (computing)NumberEvent horizonCASE <Informatik>Endliche ModelltheorieProjective planeAdventure gameComputer animation
Network topologyResultantNumberDifferent (Kate Ryan album)Natural numberSatelliteObservational studyMedical imagingEstimatorTotal S.A.BitLecture/Conference
Hardware-in-the-loop simulationMetropolitan area networkWater vaporReading (process)Projective planeMathematicsArithmetic meanBit rateUniform resource locatorForestGroup actionNetwork topologyMeasurementSpeciesPhysical systemCivil engineeringMetreSquare numberAreaComputer animation
Grand Unified TheoryVisualization (computer graphics)Square numberPopulation densityNetwork topologyMedical imagingTheoryPixelLevel (video gaming)Green's functionImage resolutionView (database)Set (mathematics)
ResultantCharacteristic polynomialNetwork topologyVisualization (computer graphics)Right angleMedical imagingGreen's functionType theoryCovering spaceGradientDifferent (Kate Ryan album)Arithmetic meanPopulation densityCategory of beingInstance (computer science)Projective planeDot productGraph coloringCASE <Informatik>Cartesian coordinate systemArea
VotingBit rateExploratory data analysisVisualization (computer graphics)Point (geometry)Different (Kate Ryan album)Level (video gaming)CoalitionCASE <Informatik>Line (geometry)Error messageOverlay-NetzProduct (business)Projective planeSummierbarkeitRevision controlDataflowResultantFigurate numberPattern languageGoodness of fitArrow of timeVotingCodeDiagram
ResultantTouchscreenShape (magazine)Cross-correlationLevel (video gaming)VotingGene clusterSet (mathematics)
Variable (mathematics)EmailView (database)Gene clusterEndliche ModelltheorieBitVisualization (computer graphics)Set (mathematics)Group actionDifferent (Kate Ryan album)Similarity (geometry)OutlierKey (cryptography)VotingProjective planeMultiplication signHookingProduct (business)FacebookPattern language1 (number)Analogy
Medical imagingMultiplication signNetwork topologyWordState of matterLibrary (computing)Representation (politics)Lecture/Conference
Visualization (computer graphics)Software frameworkStack (abstract data type)1 (number)Drag (physics)Library (computing)Programming languageDrop (liquid)Projective planeComputer fileSpacetimeDiagramInstance (computer science)Multiplication signLecture/Conference
Multiplication signSoftware frameworkControl flowLecture/Conference
Transcript: English(auto-generated)
Good morning everyone. How are you doing? Are you enjoying the conference? Yeah? Good. Yeah, so my name is Jan Willemtok. Just a moment. And I thought instead of just
telling what I do, I thought maybe it's a nice idea if I try to find out what people think that I do. So I went to my Twitter account and I've quite a following by now and quite a few
people have added me to a list and these lists are named lists and so I extracted some of the words people have been using to name their lists and then I just looked at the frequency of those words and I'm very happy to announce that one third of the words that have been
in those lists is data visualization which is actually what I do. Some other words that are related to what I do is data of data science, sometimes design, infographics, data-driven journalism. I can hear you thinking why is this guy on stage because this is a Python conference
but in 0.4% people have also added me to a list where Python was in the name. So I have a reason to be here I think. But actually I use Python for all of my projects. I use it to prepare my datasets before I actually create a visualization. So I'm actually a Python
user but I don't tweet a lot about it so that probably explains it. So let's get started. So first let's just take the time and think about the word archive and come up with some associations, let emotions flow, what kind of feelings do you have by the word archive?
Good ones? Something? Anyone? I'm sorry? Old and dusty. I think that's probably one of the most common things where people think of. You also have your email archive which is also a
kind of place where you put your emails when you don't want to read them anymore but you store them just in case. But yeah there are also these physical archives and I think most people think well these are kind of old dusty locations where people store books and it's not
it's maybe a little bit cold because of the temperature control so not the most funny place to be probably. But that's not true for every archive and I want to talk about one archive in particular which is the state archive for Dutch architecture and urban planning.
And this is quite an interesting archive because it contains about 150 years of Dutch architecture and this archive is maintained by a museum called the New Institute in Rotterdam and it's actually not an archive it's a collection of archives because when an architecture
or an architect organization decides well we have an archive and we want people we want to have it maintained and they transfer it to the New Institute. And last year the Dutch national government also assigned the status of national heritage to this archive so it really is an important and valuable archive which is kind of a small
piece of the collective memory of the Dutch. Inside this archive there are a lot of photographs, sketches, models, books, reports, all kinds of documents so it's really a big archive. It's
the biggest architecture archive in the Netherlands and even one of the biggest in the world. And the New Institute has put quite a bit of effort in making the archive accessible online so you can go to the website and then you will end up in this page
and you can enter a search word a keyword at the top of your screen and then all the archives that match your keyword will show up in the result list and on the left you see some facets where it's a very common way of searching and then if you click on one of the
archives you get a detailed page specifically for the archive and there are all the documents that you can, well some of them you can look at because they have some scans, other ones you can, well at least you can make a reservation so that you can go to the archive and have a look at it. And this way of searching is really common. This is a Dutch house search website and on the
right is a car searching website, AutoScout and you basically you do the same thing. You say you know what you're looking for, this is, I can pay this amount of mortgage, I'm looking for a house with three rooms and what are the results and then you narrow down
the search space and same goes for a car, I want a diesel car, it should be a station from this year and you also narrow down this search space and it's very effective way of searching through a large collection of data but there's one big assumption here and that is that
you know what you're looking for and that's not always the case. My wife and I are almost in the stages for looking for a new house because our house is becoming a little bit too small but one of the things that I'm personally interested in is the house that I've in mind, what are the chances that I find it in the location where I would like to live, should I
somewhere else or should I look for a different type of house or something like that and same goes for a car, how unique is my search, what are some alternatives that are easier to find, those kind of questions and also the more general sense of what does a data set look like,
that's really hard to do with a search interface like this and one of the ways that you can access data and search through data and get a better sense of the overall contents of a data set is by using data visualization and when I think of data
visualization, I think of something like this, when you look at this picture, you see a lot of people walking, there's a lot of things going on and if you would want to make sense of it, it's a little bit difficult but there's a Dutch photographer and this photographer travels the world and he goes to cities and there he takes pictures of people
and he publishes them on his website and his books but he does it like this and I think this is really amazing because he groups them by the way people dress and suddenly people start to become part of a group that they were not aware of and this is exactly the same thing what a data visualization designer does, you look at a data set and you
try to come up with a design that shows some structure and some outliers and allows you to see some patterns in the data set. Now every data visualization contains three components and you might label them with three questions, what, why and how and
what is the data that you're looking at, what is represented, why is a user using a data visualization, what should he get out of it, what are the questions that he should answer with the visualization and how is the visual design and the interactions
and these are also the areas where it can go wrong, if you have the wrong data set then your visualization is wrong, if you ask the wrong questions it's not right and also if you have the right questions, right data but you're not showing it correctly it also goes wrong and these are also the areas where you can improve things, if you improve the quality of your data, if you ask better questions or if you make a better design then your
visualization will improve. So this is part of every visualization. So let's talk about the concept for the architecture archive. The new institute approached me and asked me well we have this big archive and we would like to know what does it look like, that was their question and
it is actually a very good question to start a data visualization process because when you start a data visualization process sometimes people think that well you just think of a design and then you build it or design it and implement it and
that's it but that's not the case, sometimes I describe what I do as finding a visual representation of a data set that works for a particular situation, you always have to discover what works and what doesn't work. So this question is really good because it gives you a direction but at the same time it doesn't tell me what it should look like but at the
same time it was still a little bit too abstract for me. So I came up with two sub questions and one is what does the contents of the archive look like and more specifically are there archives that are similar if you look at their contents and the other one is what does the
structure of the archive look like. Now let's have a look at the data of the architecture archive. If you look at the website, this is a detailed page of one of the archives, then there are several components that are already visible that can be used as a data set. First there's
a title, it has an identifier, a title and then after the slash you see the type of archive, then there's some other metadata, for instance the period the archive is about or the size and just for your information the physical size of the archive, there are archives that are over 250 meters long so that's really a big archive and then this was really the part that was
most interesting to me because this is a tree structure similar to the folders and files on your hard disk and also all these elements in these tree structures had labels and that's what I could use for understanding the contents of the archive. Now this is a Dutch sentence
but this is an example of a title that was there and what I wanted to do was extract some kind of informative words, so I did some natural language processing on this data and I wanted to extract the nouns and the verbs and the morphology, so the combination of words,
I wanted to extract them, get rid of punctuation, get rid of numbers and things like that. Here's another one, I also wanted to know if something was a person like Kromhout is a person in the middle that's Mediterranean so that's that's a location and so these are all the
kind of things that I wanted to extract so that I could get a better sense of what is this archive about. Only a week and a half before I had to deliver the project I came across this one. I've been using a python package called pattern which you may know for language processing
but the thing is that many of these language processing tools are very good at English and sometimes they also support other languages and they are reasonably supported so pattern worked really okay but I had to do a lot of work in order to make it right and then I came
across this and this was developed by two Dutch universities specifically for Dutch so this worked extremely well. The only thing is that the original language you are processing if that's not really good grammatical Dutch then the result is also not very good and that was
also quite often the case because those titles were just descriptions and sometimes it was just cover one cover two cover three so that doesn't really tell you something so there was still a bit of a challenge. So for the visual design I would like to give a live demo of the result and I cannot see my screen so I have to switch to mirroring.
So this is the end result. This is the first part. This is the content of the archive and what you see right here is these are the archives and they are clustered by
similarity based on their contents and they're also colored and positioned based on that. If you hover over an archive you can see the name of the archive on top and below that you see the three most frequent words. On the left and on the right you see two lists of words and
those morphologies I mentioned the combination of words that's what I'm showing here because I was interested there's for instance the word building occurred many times but there's a hotel building there's a congress building there's all kinds of buildings so I wanted to get a better sense of all those types of buildings. So on the left you see a list of
words that parts of words where that's are the beginnings of bigger words and on the right you see parts where that are the endings of bigger words so here you see for instance it's all in Dutch for instance projects so you have project documentation or project
correspondence or something like that so and if you click on it you can see where in which archives these words occur and the same goes for the ones where the words are ending on a part of a word and then you can see where it occurs. The other part of the visualization was
about the structure of the archives and once you click on a visualization you can see what the structure of the visualization looks like and this is one of the visualizations and over here you see it's actually just a line chart where you can select different archives and it's
based on the number of nodes in the network so here are the bigger archives and personally I think this is very interesting because this is some kind of signature image of each archive each archive has its own unique appearance. As I mentioned briefly
when I start a visualization project I need to get a sense of a data set so I usually try to visualize data really soon in the process and here are some sketches of the visualization that I did just to get a sense of is this big data set is there a lot of variety and things like that and what works visually because I also have in the back of my mind the idea that I
need to communicate it and it needs to look nice so it's not just about what's correct or something like that or most effective but what's also nice to look at. So here are some different ways to look at maybe the same archive but using different algorithms to show a network
and it was just yeah trying out what works what doesn't work what looks nice so these could all be the same archive but looking completely different and this one for example I thought this is not really working you see one archive which has a lot of nodes at
one level deep but there's a small exception there and so I thought this was not the best solution so well you saw what the end result was but also the clustering was a kind of exploratory process because there are so many parameters you can play with the strength of the attraction
and repulsion of the nodes in the network the size of the node so here everything is shown at once here's one big cluster on the right here's only a cluster in the center and the ones on the outside are not connected which is also not very good here everything is one big
blob in the center but also the styling how do you show the different nodes do you just use transparency should I show the links between the nodes so that you better see the connections maybe a combination what if I show just the or reduce have a fixed radius for every
every circle and also the the natural language process provided me with lists of verbs nouns adjectives and things like that and also persons and locations so one of the ideas that I had was maybe I should offer the user an option to to dynamically cluster the network
based on nouns or adjectives or something like that and this one for instance is based on persons and it it actually shows that the person is very much related to one archive and not to other archive which makes sense an archive is about one architect or architecture organization
so and even those connections may be coincident because maybe two people have the same name so it's not even the same person so I decided not to do this in the end and this is just these are just a few examples and also the highlight color what should it be white or does
that work I also had the idea of showing the the most frequent word per cluster but I decided not to because it to me it was a kind of too much of a reduction to a single word where there was really a lot of variety in the in the words that were used in the archives so I decided not
to do this and also the line chart you saw at the bottom of the of the of the rotating networks this was was another example and I was looking for a visual representation of of that where it didn't interfere with the rotating network and and didn't take up too much space because the rotating network was the main point of that view so so it's really a lot of experimenting
and trying things out and and see what works and what doesn't work so what makes the data visualization interesting now there are researchers in social science that are trying to
figure this out what makes something interesting and there are at least two challenges people differ in what they may find interesting and what's interesting now may not be interesting in the future but these researchers do some experiments and I would like to try this experiment with you I've done it a few times in the past and most of the time it's successful but sometimes it's not
so don't feel guilty if if the experiment fails so what I'm going to do is I'm going to show you a visualization and what I want you to do is to rate this visualization on a scale from one to ten with one being not interesting at all and ten extremely interesting for whatever reason and that's just it so just give a number how interesting do you think this is
well you you don't have to you don't have to mention it it's just for yourself I think everybody has a number okay let me explain a little bit about this visualization this is a visualization done by Boris Mueller and he's a professor of information visualization
at the University of Potsdam in Germany and he created this visualization and it is part of a festival called Poetry on the Road for several years he has been asked to create a visualization based on the actual poems of the festival and this visualization was used on the poster that
was well used to announce the the festival and what you see right here is that every circle every big circle is one poem and the bigger the circle the longer the poem and then he came up with an idea where you assign a number to a letter so a is one b is two c is three etc
and then for every word he summed up those numbers and well each circle is actually a scale with zero on the top and then based on the sum of these numbers the words the red dots are representations of words so they're putting on a radial scale and then the sizes of the red
circles are bigger if more words have the same number and then the gray lines are used to connect the poem in the original order or the words in the original order of the poem so now that you know a little bit more about the how you could read this visualization who
made it what it was used for how would you rate the visualization now for who did it go up quite a few for who did it go down also quite a few for who did it stay the same
oh that's also quite a few i i couldn't i really couldn't tell if there's a majority or not the idea is that um uh interestingness according to the researchers has two main components the first one is novelty which is what we usually think of when something is interesting because
it's surprising it's new it's unexpected but there's another component to it which is just as important which is comprehensibility so you have to understand what you're looking at if you put this in a some kind of diagram you can think of it like this if something is very common and it's comprehensible then it's a little bit boring you can think of bar charts
for instance they're very effective everybody understands them they're used all the time but the same time they're also a little bit boring if something is incomprehensible but also novel maybe the visualization i just showed showed you it can be very beautiful but you can think what am i looking at it looks nice but i don't know what it is if it's common and if if it's
incomprehensible then it's a failure and you can think of those as visualizations for instance a pie chart where the segments don't add up to 100 that's kind of manipulating um and well if it's comprehensible and novel then it should be interesting so let me tell
you about another project i did um and um this is a project i did for the european space agency and it's about this satellite and this satellite is called hipparchos and hipparchos is a satellite that was in space a few years ago i don't know if it still is but at least a few years ago it
made measurements about stars and all the data collected was ended up in a large catalog that is both available digitally and well in printed formats where you have several volumes
with lots of tables and diagrams and it was at that time the largest star catalog of the world um and inside this catalog you can see diagrams like this where you can see locations of stars um brightness of stars whether stars are moving or not and things like that or the direction
they're moving because they're all moving um and when the european space agency approached me they wanted to they asked me if i could create a visualization that communicated what what is a star catalog what's in it because the european space agency is an international government funded organization and one of their uh one of the things that they also do is is
communicate to the outside world what they're doing so they have a group of people that work with education so to to explain to students what what they're doing and a large communication uh part in their organization and they also did a little bit of research what was already out there
with regards to interactive visualization of star catalogs and there wasn't very much at the time so there was one if you look for instance on google for stellar motion you don't find a lot of results and also on youtube you find a few animations and this is one of them where you
one constellation and it moves over a hundred thousand year period and then it changes shape and that's but that's about it so i thought there was quite a bit of an opportunity there to create an interactive visualization so let me again give you a demo so this is the starting
screen of the star catalog visualization and what you see here on screen is um on the right you always have a short explanation of what you're actually looking at
here on top you see some small suggestions from how you could interact with the visualization here on the lower left you have buttons buttons to turn on star names or constellations and here at the bottom of the screen you have a few sections where where i will go through
that that shows you several aspects of star catalog and each of those views has a small control to interact with the visualization and this particular view is about the apparent magnitude and that's the brightness of the star as you would see them from earth and with this slider you can change the brightness of the star so these are all the stars that the
that the satellite measured and you can also move around in space and you can zoom in and zoom out and personally i think it's quite interesting to see when you turn on the constellation that especially when you zoom out you see that they the choice of stars of a
constellation was really a human choice in a specific moment in time where it made sense to based on what they saw from earth but not not necessarily from the distance from earth so another view is the absolute magnitude and that's the brightness of the stars as if they
are from a fixed distance from earth and here you can switch between them and you can see that they become most of them become bigger and brighter some of them become smaller and but what i'm actually doing is when you zoom out you can see you can see the result i'm
actually placing all the stars at a fixed location so that's how the brightness of the star changes or in order to explain the concept of the absolute magnitude there's another diagram which is often used by astronomers and it's the Hertzsprung-Russell
diagram and on the y-axis you see the absolute magnitude and on the x-axis you see the temperature and you can hear switch between white and color and color is based on the temperature and for astronomers this this diagram is interesting because it the location of
a star in this diagram tells you something about the life cycle of a star the next view is about stellar motion and then what i've done here is i've created a stereographic projection so when you zoom in here you can see it especially if i turn on the constellations in the previous one you saw that the constellation were distorting and here
they just become bigger and smaller so here again you can zoom in move around but here you can simulate how stars would move over time so you can move forward and backwards in time and i don't know if i have a good example right here but yeah over here you see
for instance this is what's quite interesting that some stars moving in clusters so that's quite interesting to to notice and finally there's a view where you can play with all the
controls at once so you can turn on more stars you can make them colored you can set the turn on the motion and you can switch between 3d and the stereographic projection now there's a nice cluster so this is the star mapper for the european space agency now the reason why i showed
you this is because maybe someone is thirsty i don't know but the reason is this is an example
of storytelling with data and storytelling with data is a little bit different than literary storytelling where you usually have a main character that goes through an adventure and it's really a chronological sequence of events but with data that's not really the case
with data you tell a story and you use data to support your story and researchers have looked at a lot of visualization and discovered that several models have been used over and over again and if and three of them are very popular and this is one of them and this is called the martini glass model and it's called this way
because well martini glass is narrow first and then it widens up and that's exactly the way the visualization is structured first i showed a very limited views you had only one option to interact with the visualization it was only about one topic so it's really focused and then
you move on to the next which is the same so you're in the narrow part of the martini glass and then in the final view it opens up and then you can play with all everything all at once and the reason why you want to do that is because if if you want to explain something to a user using a visualization you and it's a little bit more complex you usually don't want to
throw the visualization at him in for him to figure it out himself you want to guide the user through the process a little bit and explain to him what he should look at so at the end when he can play with everything at once you should know okay this is what i'm looking at
interesting or engaging visualization is this project does anyone have an idea how many trees there are in the world the total number of trees five almost anyone
well let me reveal the result it's 3.04 trillion trees um if you would chop down one tree every second it would take you about
100 000 years to chop down every tree so it's still a big number but maybe a little bit more understandable than a really big number this was the topic of research that has been done and was published in nature magazine and the researchers before the study the
researchers only used satellite images to to make an estimate of the total number of trees in the world but for this research they did something different for this research they also dispatched people so uh and they had those people count trees so whenever they looked at
a satellite image they were much more certain that this was an accurate estimate of this number of trees um and nature magazine was publishing this research and they wanted an animation for this to to showcase this research and they approached me and asked me if i could do that using a visualization and let me show you the animation
we have data not just on how much land is covered in forests but how dense those forests are as well measures were collected by thousands of people out
counting trees in forests all over the world the average density of trees at each location is represented by the height of the green lights some of the densest areas of these subarctic forests where you can find a tree every square meter but almost half of all
our trees are found growing in the vast tropical and subtropical forests all this data is going to help our understanding of where endangered species might be able to live how long to recycle the living system or how much carbon dioxide is being absorbed from the atmosphere it also helps us work out what we ought to be doing to preserve and replenish our planet's forests
because even counting the new growth we're currently using about 10 million trees a year there are about 400 trees for every person on earth but they're disappearing at a rate of 1.4 trees per person every single year if we keep going at this rate the walk in the woods will
soon become a lot trickier years ago before human civilization took hold the earth had almost twice as many trees as it does now most of europe for example was one big forest groups all over the world are working on planting trees and restoring natural habitats this
new data make their targets and it will certainly change our fundamental understanding our planet we now know we share with 3.04 trillion trees so for this project i
received a data set and you're looking at it right now it was an image but it was a kind of special image because this was one gigabyte in size and the reason is that you can
zoom in and zoom in and zoom in until you end up with one pixel which is one square kilometer so the researchers have been able to to create a map of the earth with a resolution of one square kilometer and for each
square kilometer they were able to estimate the density of trees which is really amazing um when i created this visualization i was looking for a kind of view like this when i think of trees i i thought maybe i want something like this so it's green but it's
also a little bit bumpy and that's what i had in mind when i wanted to create this visualization so i started out first i had the all the coordinates mixed up so but then i ended up with the original idea that i had in mind which was also a failure because what i had in mind was just to
show only the trees on a globe but a transparent globe and displace the dots based on the density but as you can see this image totally doesn't make sense so yeah i quickly moved on to a solid globe and and two bars to show the the density of the trees but to get the bumpy appearance
right was quite tricky to do as you can see in this image and in the next one the the green is really yeah just one big area of green and and it doesn't look like the bumpy appearance i was looking at and so i've tried all kinds of different things different colors different types of representing the bars but all didn't produce the right result
you still see the one big area of green and not the bumpiness and then finally i figured out what did work so in the end every bar is is has a gradient it starts out with the color of the globe so kind of bluish and then ends up with the green color that i assigned to to to the bar
and this way they have different heights and then you have different colors on different heights because of the gradient and this way you do see the bumpy image and it works really well and um it also ended up on the the cover of the magazine
which is really nice of course now the reason why i'm showing this project is because what i've done here is i've been using metaphors and in this case the project was about trees so i thought what what are some characteristics of trees that i could use well obviously it's
green the bumpiness and that is something in general which you could think of when you create a visualization think of the the actual characteristics of the data that you're visualizing what what's the meaning can i use some of those properties in my design and i have some examples of other people's work where this is done as well this for instance is a visualization
done by the south china morning post and it's about the deaths in iraq and basically it's a bar chart but since they flipped the y-axis and made the bars red it clearly has the appearance of blood so they this is a good example of using a metaphor and and thinking about what
does the data represent and what does it mean here's another one this is a wind visualization it's an animation if you go online by martin waterberg and fernando vegas and well the common weather maps usually have arrows but this one is actually especially if you go
onto the online version you see flows of of lines going and they become narrower and wider and and that's much more and the way we experience wind than an arrow so the final project i would like to show you is this for visualization i strongly believe i think just as
with writing code in general you only get very good at it if you practice you can read a lot about it but if you don't do the actual coding or creating visualizations in this case you don't it's hard to become really good at it so i i really like doing personal projects
as well and this is one of those projects and this is about the dutch national elections of 2012 and when the elections were over people had a few questions which party has won which party has lost what are possible coalitions which are which is also always the case in the netherlands
how is my city voted and these are all very good questions of course because that's what the elections are about but you can also ask different kinds of questions the question i personally had which cities vote in a similar way as my city and maybe are there some patterns in the data
well the concept is more or less something like this so if you take these voting results you just overlay them and then you look at the differences and you sum them up and well you can imagine if it's zero then they're exactly the same and if it's not then they have some difference more or less like this now this is the visualization that i created you see a map
of the netherlands i actually didn't draw a map because the netherlands is really densely populated so you can see the shape of the netherlands just with the cities and you can hover over your you can click on cities and when you click on it they become bigger and more orange and other cities also become bigger and more orange if they are more similar based on their voting results on the top left corner you see a bar chart with the voting results of the
city that you selected and this is another layout where you a radio layout where the selected city is in the center of the of the screen and the more similar the voting results are the closer to the center the other cities are
i also included another data set about population size because i thought maybe there's a correlation between bigger cities and smaller cities so that's what i also included and it turned out there were actually some interesting findings
the first one is is something like this which you can find in several places of this map and it's these are regional clusters so this is around the city of eindhoven which is a relatively large city and funnily enough eindhoven is itself is not in this cluster so it's really the cities around eindhoven that vote in a similar way but also a little bit different than the
rest of the netherlands but i think this is quite interesting and and also if you would extend this project would these kind of clusters remain the same over time it would be quite interesting to research another thing is in the netherlands we have a bible belt so if i select one city of the bible belt you clearly see that they all vote in a similar kind of way
here i've selected waschina which is a city associated with richer people and then the ones are also cities of where people think rich people live so that's also something that you can see in the data in this view i've selected amsterdam and although the relationship
is not that strong you do see that the bigger cities are more on the inside and on the outside so bigger cities also vote a little bit in a similar way and as with many data sets there's also there are also outliers and this is irk and there's no city that votes like irk
so now i've i've received emails that people were using this on for half an hour during work time just playing with this visualization and i thought why were they doing this because
and i i recently discovered this book it's called the hook model uh how to build habit forming products and it's it's basically about what you do when you want to create something like facebook so that people keep using it and inside this book they have this model and i think this model also applies to the visualization because first you need to have
a trigger and if you move over with your mouse over the visualization you see that it turns into a hand so you are triggered that you can click on it and then clicking on it is the actual action but what's key here i think is the variable reward so sometimes you see something sometimes you don't and you also have to do some effort for it so i think that's one of
the main reasons why people were really playing with this visualization for quite some time because yeah sometimes you can really find something and sometimes you don't and also something like the bible belt that i discovered the visualization itself doesn't tell you
there is a bible belt in the visualization i just thought of it myself and then i confirmed it with the visualization but you can do that yourself as well so you can think of well maybe i know where farmers live or something like that and see if there's a pattern there so you can find things yourself and i think that's something that if you can apply that
to your visualization that's something that's really useful thank you very much
thank you very much for this inspiring talk i think the image of thousands of people counting trees all over the world will stay in my mind for a while and also amazing how you can get from like one white and black picture to a representation which contains much more
uh detail so we have maybe time for just one quick question yeah here so hi do you use
off-the-shelf tools and libraries or do you build stuff yourself to do these kinds of visualizations it's all custom visualizations the only thing that i do use is some frameworks so one of the most popular ones is d3
but for the european space agency for instance i use three js so i use libraries like that but other than that it's it's all custom maybe one other quick question
if there is any here just close to the microphone so a similar question but if you deploy these projects what's your typical technology stack the typical technologies that i use yes i would say that today most of my visualizations are web-based and for the data preparation and wrangling and and everything on that i i use
python and those are basically the the the tool or the program languages that i use i sometimes use tablo which is a business intelligence tool but um it allows you to quickly get a sense
of data because you can just open an excel file or csv file and then drag and drop and you it already creates diagrams so sometimes i use that but most of today i most of the time use python for that so but mostly python and web-based frameworks whatever okay so thank you very much now there's a coffee break let's thank yeah i'm willing again thank you