Did you know Matplotlib could do that?
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 8 | |
Author | ||
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68623 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
2
8
00:00
Electronic mailing listExpressionPlot (narrative)Basis <Mathematik>Regular graphDependent and independent variablesGraph (mathematics)Archaeological field surveyVector spaceRaster graphicsFile formatComponent-based software engineeringPlotterMobile appInformationStatement (computer science)ForestQuicksortVisualization (computer graphics)CodeProgrammschleifeFunction (mathematics)Variable (mathematics)MereologyBlock (periodic table)Duality (mathematics)Advanced Encryption StandardData typeCoding theoryMiniDiscLibrary (computing)Game theoryPhysicistIntegrated development environmentStatisticsLipschitz-StetigkeitDifferent (Kate Ryan album)Visualization (computer graphics)Library (computing)Video gameComputer hardwareProjective planeMedical imagingMathematicsAudiovisualisierungBitMachine learningPlotterComputer scienceCodeProbability density functionShape (magazine)Moving averageMereologyPoint (geometry)Data analysisE-learningComputer configurationInheritance (object-oriented programming)Cartesian coordinate systemLevel (video gaming)CodeStandard deviationGraph coloringAttribute grammarFile formatGraphical user interfaceMultiplication signTouchscreenCuboidCircleScaling (geometry)Interactive televisionSlide ruleBasis <Mathematik>Green's functionGoogolObject (grammar)Physical systemMUDLink (knot theory)Direction (geometry)Computer animationLecture/Conference
06:26
ArchitectureSinguläres IntegralPhysical systemPatch (Unix)Variable (mathematics)Graph (mathematics)Function (mathematics)Moving averagePlot (narrative)Line (geometry)Group actionScripting languageComponent-based software engineeringLetterpress printingFile formatMereologyType theoryElement (mathematics)GeometryComputer-generated imageryTransformation (genetics)Sound effectSystem programmingCartesian coordinate systemPoint (geometry)Category of beingFunctional (mathematics)Object-oriented programmingContext awarenessLine (geometry)Object (grammar)Variable (mathematics)Geometric primitiveComputer architecturePhysical systemVolumenvisualisierungMereologyGraph (mathematics)CodePlotterPatch (Unix)GeometryMultiplication signElement (mathematics)Uniform resource locatorCoordinate systemString (computer science)Greatest elementDifferent (Kate Ryan album)Figurate numberFront and back endsPixelRectangleWhiteboardMathematical optimizationInformationModule (mathematics)AudiovisualisierungBuffer overflowConnectivity (graph theory)CodeRight angleStack (abstract data type)Interactive televisionInheritance (object-oriented programming)CuboidLipschitz-StetigkeitGraph coloringGroup actionLibrary (computing)Scripting languageMoving averageExtension (kinesiology)Digital watermarkingData storage devicePhotographic mosaicComplex (psychology)Computer animationEngineering drawingDiagram
14:31
Key (cryptography)Array data structureAddress spaceSuite (music)Cartesian coordinate systemPlot (narrative)Electric generatorFile formatFunction (mathematics)String (computer science)SpeciesSample (statistics)SubsetQuery languageScatteringConfiguration spaceCodeMassData structureAbelian categoryService (economics)HCL TechnologiesOracleDistribution (mathematics)Food energyCubeTotal S.A.GoogolSoftwareAudiovisualisierungTelecommunicationDefault (computer science)Message passingVisualization (computer graphics)Mathematical analysisWrapper (data mining)BlogSet (mathematics)Block (periodic table)Core dumpComputer fontCheat <Computerspiel>CloningKeyboard shortcutFreewareSatellitePatch (Unix)Binary fileTrigonometric functionsPermianMetreBit error rateComputer iconAreaVisualization (computer graphics)Graph coloringPosition operatorNP-hardExploratory data analysisCodeCombinational logicOpen sourceContent (media)Functional (mathematics)SubsetModule (mathematics)CASE <Informatik>ProgrammschleifePlotterBlock (periodic table)Observational studyExecution unitCheat <Computerspiel>Computer filePoint (geometry)Key (cryptography)Cartesian coordinate systemPerspective (visual)Figurate numberSlide ruleCodeMessage passingForm (programming)Decision theoryCycle (graph theory)Set (mathematics)AudiovisualisierungDefault (computer science)TelecommunicationContext awarenessDirection (geometry)Data dictionaryFlow separationParameter (computer programming)File formatString (computer science)Electric generatorFunction (mathematics)SoftwareMultiplicationDifferent (Kate Ryan album)Mixed realityProjective planeElectronic program guideLevel (video gaming)Moment (mathematics)BitControl flowBounded variationSingle-precision floating-point formatWhiteboardLatent heatRepository (publishing)Computer animationLecture/Conference
22:37
CodeData structureQuery languageUniqueness quantificationSpeciesInformationVisualization (computer graphics)Wrapper (data mining)BlogFundamental theorem of algebraAudiovisualisierungInteractive televisionAudiovisualisierungLibrary (computing)TwitterGUI widgetControl flowFigurate numberCASE <Informatik>MappingLevel (video gaming)PlotterCartesian coordinate systemCycle (graph theory)Network topologyOrder of magnitudeRight angleAuthorizationRun time (program lifecycle phase)Front and back endsFunctional (mathematics)Transformation (genetics)Virtual machineVisualization (computer graphics)Similarity (geometry)Flow separationGraph (mathematics)Web 2.0Lipschitz-StetigkeitContext awarenessModule (mathematics)Axiom of choiceCoefficient of determinationObject (grammar)Multiplication signPoint (geometry)LaptopBitSlide ruleStreaming mediaInstance (computer science)ScatteringLecture/ConferenceComputer animationJSON
Transcript: English(auto-generated)
00:05
So, for the next talk, we will have Teresa Kubatka. And she will talk about, did you know that matpoplit could do that? So, she's a senior data scientist at ETH Zurich. And also a data science freelancer with Xers.
00:23
Yeah, she's teaching data visualization and machine learning and AI. So, yeah, I mean, most of us probably use matpoplit at some point, I guess. So, I used matpoplit a lot. And also sometimes we probably did struggle with matpoplit
00:41
to get it to display the plot exactly like we want. And, yeah, I really hope that Teresa will show us how we should be doing stuff instead so that we don't fight matpoplit but can work with it. Is the screen showing what it is supposed to show? It's showing the loading.
01:01
Okay, thank you, Google Slides. We'll link back to the PDF. It's good to work. So, the stage is yours. Yes, thank you. Thank you for the introduction. I'm super happy to be here and to spread a bit of love towards matplotlib.
01:25
Sorry for a bit lower quality of the images on the PDF, but I hope everything is still visible. So, as you said, I'm a data scientist. In my previous life, I used to work with data scientists to be a physicist. I'm not a computer scientist. And now, for the last six, seven years,
01:41
I have been working as a data scientist. And I'm a senior data scientist at the ETH library, a freelancer with Ksurs, and I'm teaching data visualization at Hochschule Lucerne for Bachelor in AI. And exactly, I have heard some murmurs when people said, yes, we use matplotlib.
02:00
I use matplotlib very extensively. And so does 60% of data scientists on a regular basis. And the general sentiment is that people are quite frustrated with it. Which is a pity, because matplotlib is actually a really powerful tool. It can do a lot out of the box.
02:21
It can do 3D plotting, animations, interactive charts. It can be embedded in graphical user interfaces in many different systems. It can also create beautiful charts. I have some example to convince you of that. So you can create really cool charts for typically data analytics applications or more scientific applications.
02:41
But you can also create very impressive infographics. This is also created with matplotlib. And if you are more into creative coding and artistic feelings, then you can also do data art with matplotlib directly. And for today, I want to encourage you to use matplotlib.
03:04
And I want to explain how it works, give you some tips from my personal experience, and also give you some tips how to design better charts in general at the very end of the talk. So for the first part, let's understand matplotlib.
03:20
There is something about data visualization and intuition that I want to say upfront. And this is that we have a natural intuition for working with visuals. Because we are born with hardware that is pre-configured to help us, for example, find food and distinguish those important nutritious objects out of our environment.
03:41
Now, in data visualization, we are kind of using the same hardware, but instead of apples, we are looking at red circles or green objects or dark points. We also have some intuition for data visualization. Although we do train it since our very early years,
04:02
all those little toys teach us or young humans how to attribute colors together or how to put different shapes together, and so on. And this is a really great background for having some intuitive understanding of data visualization.
04:20
Now, there is a third part to intuition and data visualization, and this is intuition about coding visualization. A lot of people say, oh, a particular charting library is not intuitive for me. The problem is the intuition that we have for coding is a very different kind of intuition than this one. Because we are not born with intuition how to code.
04:43
Our intuition for coding depends on what we have seen and what we have experienced and understood in the past. So if you want to perceive matplotlib as more intuitive, the good way is to understand how it works. And to develop an intuition for matplotlib.
05:02
Another thing that should be said at the very beginning, matplotlib is actually a really old project. It's 20 years old. If you look at this time scale, it is not much younger than IPython, and it is actually older than NumPy and Pandas. At the time when matplotlib was created, there was no plotly, no D3,
05:22
no other popular charting libraries that we use nowadays and take as a gold standard. What was around was gnuplot. There was matlab, which was actually an inspiration for matplotlib. Now this is an advantage because such a long project can be very complex and can do a lot.
05:41
And it has also undergone a few major changes since its beginning, but it is also a disadvantage because now you have this super complex project which has to be backwards compatible for the new users. It becomes very difficult to see which of the options that are provided by matplotlib
06:02
are the correct or the recommended one, and which is for some reason especially bad for matplotlib. Most of the online teaching material actually is plagued by legacy code and by things that are discouraged and has been discouraged for a past few or several years,
06:20
and yet they are still being taught as their way of using matplotlib. And so for the next few minutes, I want to explain what is the main idea behind the architecture of matplotlib and how can you get the right intuition so that, for example, you can just go to the documentation
06:41
and develop a new chart yourself, or that you can find your way around on Stack Overflow and sift through the legacy code and pick the one that actually makes sense. So the mind-guiding idea is I want you to imagine an artist with a canvas board.
07:00
Basically, we have a drawing engine, which is the canvas and the paintbrush. The paintbrush is called the renderer. And the backend in matplotlib is something that draws. It can be static or interactive, it can be vector or pixel. You can also write your own backend. I mean, you could write a backend for, I don't know, a pancake printer if you wanted. Now, everything that is being element of a chart
07:24
is called an artist. Why? Because of this metaphor of an artist using a paintbrush on the canvas. Now, artist can be a simple element of your chart, so like a simple geometrical primitive, but artist can be also a compound element,
07:41
so something that contains a lot of geometry. Artist knows how to move the paintbrush. This is one important property of the artist. And every artist also shares other important property. Every artist knows where it is and how to transform itself in different coordinate systems.
08:01
Every artist has an even listener, which handles what happens when you mouse over or do stuff with it. And it also knows where to paint itself. It has some kind of a clipbox. And those two are completely separated. So the artist doesn't know,
08:20
it only knows how to move the paintbrush, but it doesn't know, for example, how to mix paints. And then the opposite is also true. Backend doesn't know how to optimize the layout or to transform the artist. Those two features are completely separate. How Matplotlib is storing information
08:41
about the artist in a chart is that it stores it in a graph. So you have the topmost container, bigger. Inside you have the axis. And inside you have children artists. So you have the line or a patch or a rectangle which represents your data.
09:01
You have various text elements. You have the spines. You have the axis Y and X, which contains ticks and labels. And each artist can be also saved to a variable. And you can use this to modify the appearance of them later on. So you can think that Matplotlib is composed of layers.
09:21
At the bottom you have this backend, the drawing part. Then you have the artist layer, basically the components which constitute your chart. But what we typically do is import Matplotlib Pyplot as PLT. So where is the Pyplot? It's here. It's a scripting layer which contains functions
09:41
which are designed to look like Matlab, and they create and modify a group of artists. Now, the really bad part about all of this is that this PLT API is super easy because it constantly tries to look for the latest active object and modify only this, which leads to the following really weird behavior
10:02
if you think about it. Let's say we have just some data. We create two subplots and we want to draw something on each of the subplot. If I iterate over my subplots and I use this PLT API, I get this. However, if I use an equivalent function and modify this access object directly,
10:23
I have the behavior that I want. And these two APIs, they also have their own names at this point. So the right way to use Matplotlib and to stay sane is create figure and access using this PLT API because there is no guesswork involved.
10:43
And then use the object-oriented API to do everything else about the chart design and chart context because then you can modify those objects directly. And so we throw away almost all scripting layer. We stay with this layout elements here. And then if your intuition about how the coding should look like
11:04
is not really, you just don't like how the artist API or the object-oriented API works, you can replace it with a different scripting layer. So for example, Seaborn, Pandas, Plot9, they all have very different APIs. But the good thing is because they are all based on Matplotlib,
11:22
they can be used to mix and match together inside one figure. So I can have a figure where I create one subplot with Seaborn, another subplot with Pandas, and third with pure Matplotlib, and it all interacts well together. Now, I want to show you some of the cool tricks
11:42
that I think should be known more because they will help you with designing beautiful complex visualizations. Again, every time we import Pyplot as PLT, we use only a little part of what is possible in Matplotlib. And then you will notice if you design complex charts,
12:02
you have to import many more functions from other submodules of Matplotlib. And those submodules contain geometric primitives. They contain different object placement optimization things. They contain different markers,
12:20
tick locators for matters, there is a lot of functionality to manipulate colors, and so on. And now for some tips and tricks. So I told you every artist is born with a coordinate system. Most of the time when we place a new artist on the chart, it gets drawn either in the kind of attached
12:41
to a figure coordinate system or to the axis coordinate system or to the data coordinate system. However, you can also attach it to a different artist. How does it look like in practice? I'm going to show you this example plot, which I will modify later on. One thing that you see is that if we plot something,
13:02
it naturally gets plotted in a data coordinate. So my data goes from minus two to two and from five to nine. And this is also what I see here. So I call plot two and two, and then I have a point at this data point. Now I can do something else. I can take my plotting function
13:21
to use the relative coordinates of the axis. And for example, like this, I can draw a line which goes from zero to 100% of the axis extent. And here I do it on each axis. This is super useful for watermarks, for example, or some different kinds of guiding lines. I can also do a different thing.
13:42
So let's say I want to place a footnote to my legend, and then I can feed this legend object as my reference point to the function that creates my footnote, and then encore a corner of one to the corner of the other in a very simple way.
14:01
Which is also very useful if you need some complex placement. I said that there is a lot of layouting functionality, and one function which is not very well known is subplot mosaic, and it is super useful. It lets you create a grid of subplots using an array of strings.
14:23
So basically you name your axis like this, and then subplot mosaic knows that it has to put together those two subcharts. And then you can access all those subplots using a particular key,
14:40
because they are stored in a dictionary, which is really useful for making complicated layouts. Something that should be much more known is a thicker submodule of Matplotlib, because it lets you modify your tick positions in a very pleasant and effortless way.
15:03
For example, you can set the repetition of a tick with a particular interval, or you can set the fixed position of ticks without having to read every each of them and to modify and then create your own arrays and feed them as ticks.
15:21
The same module also creates some generators that let you format the tick labels, so you can convert them to percentages, or you can add some particular string or unit after this. There is many more, so I refer you to the documentation to check it out. It is super useful and makes your code much more readable
15:40
and compact and efficient. Another really cool thing is that you can store or use different style sheets in Matplotlib. If you don't like the default styling, you can use a style sheet which contains different directives to configure your plot graphically. You can use it for all the charts in the session
16:01
or only in the context of one single chart. There are a few style sheets which are pre-built, but you can also develop your own. You can store those stylings as a dictionary or in a separate file. One thing that people tend to forget,
16:21
especially when you are in the heat of designing visualizations for an exploratory analysis, is that visualization is also code. And here all kind of best practices apply. My best practices come from my daily practice, so this exploratory visualization. I would be super interested to hear from you
16:41
whether you also have some tricks and tricks that come from your own perspective. I'm gonna use the Penguin's data, which is a very well-known dataset, to show you a few tips and tricks how could you structure your code more efficiently. One thing is to use loops. And this may sound obvious, but it's actually super important.
17:02
So if you want to cycle through a subset of the data, don't hard code those subsets here, but use loops. If you overplot a plot over the same axis object, it will just get added to the same axis.
17:23
Use loops also in combination with zip and ziplongest. And ziplongest is a variation of the zip that comes from intertools that iterates until the longer variable is exhausted. So if you want to make a visualization using a technique called small multiples, where you create a small chart for each subset of the data,
17:43
you can generate the subplots first, and then using ziplongest, you can iterate until your data runs out, and then you can remove all the outstanding axis from your chart. What I also recommend to avoid is hard coding
18:01
the styling of the charts. You can, for example, use dictionaries to store the styling of the charts. This way, if you decide actually you don't like orange anymore, you want to switch to, I don't know, color green, you can do it very cleanly like this. And then you can unpack those dictionaries here as arguments to the plotting functions.
18:24
It also makes sense to structure your code in a particular way. So here, for example, I put those two together, and I have one function which creates one single chart here. I have a function which does the layout,
18:42
and then I put this all together, and then I also keep the style separate. And the last point that I want to make is some tips how to make better charts. In general, what you do not want is to let the software decide for you
19:00
how your chart should look like. And this is especially important when using matplotlib because it makes very few design decisions for you. And it is dangerous to rely on the software or the software defaults to create a visualization because data visualization is a form of communication.
19:22
So even if you decide you have a data set and you want to use a bar chart to visualize it, actually how you create a bar chart and what do you put in it, what do you highlight, how do you order the bars, and so on, is very dependent on your target audience and actually the content of what you want to communicate.
19:42
If you want to read more, there is an interesting case study on the Storytelling with Data block where they redo the same bar chart. For us, the take-home message is do not get angry for matplotlib that the charts don't look how you think they should look like from the defaults.
20:02
Use style sheets, use your best design knowledge to create and shape your chart. Now, of course, this means you have to know how the chart should look like and you have to know how all those best practices. So I want to share some resources with you
20:24
which help you get better on that data visualization. Here also, I want to make you aware that I am hosting this awesome matplotlib repository and I try to collect interesting tutorials and resources. So feel free to check it out. If you find a good tutorial, let me know.
20:41
All those resources are really, really useful. This one is about matplotlib specifically. This is an open source book. This is also an open source book about more scientific data visualization. Check them out. Matplotlib also has really cool cheat sheets
21:01
and handouts which are not very well known but actually they are a really good visual guide to finding your way around different functions. And also, matplotlib needs you because nobody can make a documentation better than the person that tries to figure something out.
21:22
And we have many projects going on at the moment where the documentation is being improved on many levels but we constantly need contributors. So if you feel like you would like to help, let me know, get with touch and we'll see how you could contribute. And this is pretty much everything on my side.
21:42
I think we have a few minutes for questions and I'll be also very happy to talk with you during the coffee break. Thank you. Thanks a lot, Teresa. That was an interesting insight into matplotlib
22:00
and the questions from the audience. There's one over there. Thanks, Teresa. It was very interesting. Can you comment a bit on the slide
22:20
that you had the function that plots a single plot and then another function that creates a dashboard? What is the output of the first function? So do you return the figure in the plot and then you aggregate them on the dashboard function? Or like, can you comment a bit on the board plot here? This is a good point. So what I'm doing here is
22:41
I am creating my figure and axis in my dashboard function. I could do it also outside, but for the purpose of this example, I create them inside this dashboard function. Now what I'm doing here, I'm cycling through each axis that I created and then I'm plotting something into each axis.
23:05
So this guy plot single does not create a new axis. It takes an existing axis and then modifies it. So this is why I don't have to return the axis. I could return it, but it gets modified inside the function. And here, what I typically recommend to do
23:25
is to return a figure and axis from this function. Because most of the time, you also want to adapt something in the next dashboard because if you create a function, it means you want to reuse it. Maybe you have like different facets of the data that you want to see.
23:41
And so once you have these guys, you can also modify them later on. For example, you can set the title for the whole dashboard and this you would do by modifying figure or the axis object. Are there any other questions over here?
24:02
We are using map plot lib for maps. Is this a misuse? No, not at all.
24:20
No, I mean, there is this one guy who runs this. So many slides. Who runs this Twitter account phyton maps and he does only map the map plot lib. This is not a misuse. There are some functionalities. I think you have to merge it with a separate module,
24:45
but it's totally not a misuse. It's a very good choice. Over there. Over there? So it's more like a comment. So just add on to the question of what the previous speaker,
25:03
previous person asked. So if you're using maps in map plot lib, you could also think about using CartoPy, which integrates very well with map plot lib and you can do all kinds of coordinate transformations because with maps you want to sometimes do that. That works very well. And this is also an advantage of using such an established library
25:25
because you have a lot of custom tools to work with it. So it's very modular. There was a question over there. I've been using map plot lib for runtime visualizations for machine data and stuff.
25:43
And in my experience, map plot lib has been quite slow in runtime visualizations. So is there a way to improve the performance and speed? And you want to check out, again, going to the very end, you want to check out this book
26:01
because the author does a lot of high-performance visualization. He gives some tips. For example, if you use xplot or xscatter for some cases, one is like orders of magnitude faster than the other. So there is a lot you can check. You can think about using a different backend.
26:21
So there are also ways to optimize it. Sounds great. Question right next to you. You mentioned that the artists are arranged in a graph. Is it an arbitrary graph or is it something more constrained like a tree or an acyclic directed graph?
26:45
Oh, let's talk in the break about it because I would need to check out the docs. Yeah, that's the answer I can give you now.
27:01
There was a question here. Thanks for the talk. After the question, do you have a recommended way for using map plot lib in an interactive context? For instance, if you want to publish stuff on the web because I mainly know it from my own use cases
27:21
for doing visualizations that are offline more. As far as I know, there is one web-compatible backend which is developed within the PyIodide. We can talk about it in the break as well.
27:50
My question is a little bit similar. Do you have any experience or recommendation using it in a more interactive way like sliders and are adding buttons to hide or show stuff
28:02
maybe in a notebook? Or is it then better to use something like plot lib which renders to HTML and JavaScript? At this point, it really depends on what you want to do because you could use it, for example, together with Streamlit and then you just re-render your visualization
28:21
and then the whole UI is being taken care of by, for example, Streamlit. You can use it with Jupyter widgets. So if you are proficient in widgeting, you can make them talk with map plot lib. And then it depends on what you want to deliver at the very end
28:43
because for some cases it may be easier, for example, to use plot lib because it's a native JavaScript library in the end. Yeah, so you would have to kind of look at the pros and cons. If you have a particular risk case in mind, we can also talk in the break.
29:03
Right, are there further questions? I don't see any hands raised. So yeah, let's enjoy our first coffee break then. Thanks a lot again. Thank you.