Kyran Dale - Data-visualisation with Python and Javascript: crafting a data-viz toolchain for the web To accompany an upcoming O'Reilly book 'Data-visualisation with Python and Javascript: crafting a dataviz toolchain for the web' this talk aims to sketch out the toolchain by transforming some dry Wikipedia data (Nobel prize-winners) into a far more engaging and insightful web-visualisation. This transformative cycle uses Python big-hitters such as Scrapy, Pandas and Flask, the latter delivering data to Javascript's D3. While Python is fast becoming the goto language for data- processing/science, the visual fruits of that labour hit the wall of the web, where there is only one first-class language, Javascript. To develop a data-viz toolchain for the modern world, where web- presentation is increasingly mandated, making Python and Javascript play nicely is fundamental. This talk aims to show that the perceived wall between the two languages is actually a thin, permeable membrane and that, with a bare minimum of web-dev, one can get on with programming seamlessly in both.
here you have mostly this formula no generalized and and become specialists long enough you become a group and then you just turn up and they give you money and it's great here but anyway so mom guru and uh I gave talks in London England and
talk to people and a r relevant touch and right now python is obviously amazingly hot in the area data processing JavaScript is amazing hottest just because it's JavaScript and where this is
going convince more intense and descent have invested visualization libraries so that suggests something and if you buy me a point finality you how I to give these unlike other than just you consequence to and you have given a story I will be used as a as a research scientist also was brother of about that and
and great thing about writing book it is is not few questions like why which is always a hard 1 eye and it was just a question like they too have points which is really way to focus you know what what the main points this kind of book with serious truncal and for me I was in science lots of you using my time wxPython plication actually everything and I had their aligned beautiful under impressed by the middle never show them to you here it's just too much trouble if I had done them as aware that if I had sort the focus product of what made that the way we focus initially you would all what the other person shared world so is never compression adjusts the distribution of this 1st part differences so 1 with language in its in Python now what he did against witnesses and testifies initiatives and I'm really impressed by OK promptly they get a very clever permanent gymnastics thoughts there is no 1 in the living room and so it's a job is to have worked it out like they licenses so like a couple losses for the good but anyway so yeah people tend to Peter around the fact that we're problem is we don't like to be told something you can be compiled into our language that's sort of by the is also harmful people use JavaScript actually because some this quite like where we were impressed so and so it's great compliance and just you can do things with you can be faced with the problem the same problem to human faces you want anything else debugging and debugging take place on the browser and it's much better in the browser and you have to work out matching files and LC diskettes when my experience is pretty horrible and copy script had come in here but it didn't take um and things just isn't about language for that you might have heard it so for the good of the system is a completely random polemical the bullet point which I will refuse to defend fast I just I just stuck at the randomly from people it and not like the thing that the thing is whatever happens these days propaganda against jobs for and yes the standards committee will show their working on it will get there and about 100 years at the edge with along file so it turns out that she just right interoperate until the final people to do the whole point this book is that they interoperate pretty well PostScript language the both 3 simple has since cheat sheet quite frankly gets you most of all there and really is not a difficult language has quirks which change doesn't you learned through 4 things and cultures and just move on and you start programming and program so that's easy right it's not C + + if you don't have to you know that it's been 3 years communion with a period to understand how to do the simplest thing right since then it has its own way I love going in Python I think there's a reason why this unique request labs Python problem experienced so pleasant insulin to read other people's problems but JavaScript has first-class functional methods on arrays you can do select reduced things like this as anonymous functions being used as well which kind of thing point and I so sometimes it's actually more pleasant preferred environment and prices and I never ever thought I would say that it's not like this the google this nouns rate from the union jobs right now these are taken off a famous benchmarking site and yet spectral norm believe we're talking you know 20 to 60 something 200 times something you know they they got jobs running with native who places they're working on a huge forces involved in making it very very very efficient powerful language and so the other the other thing is you can get sets syntax is so stupid when people worry about it I love license whitespace offended because you spend much time reading code so that's great but syntax and the sharing of the data visualization certainly a a lot of things going behind your script declarative functional paradigms it's much more significant than any particular language and the 3 the library uses a very particular type of problem mentioned and the character and functional and uh it would be would be faced the same problems of using it in any language you still have things abstractions and so I guess that we should we should be in a good place here there should be a perfect complement right I don't know is making forests in in the world is so it hasn't hasn't show you where hasn't got the libraries yet uh Python has if you're doing data visualization partners amazing data visualization data processing stacks and they're getting better over time but it's this brick wall when it wants to express itself on the web and so there are various
solutions that my feeling having tried and play with them a lot of you actually doing programming on a browser and is only 1 way to do that um so why why why why is there was this post white people so forth such right to get get a feel for it I think of people standing in deliveries reason to the scared is what this is horrible it's associated with all sorts of craft have families and frameworks all this rubbish and can you know when to use an idea that you so this is being space between the between the Thailand Andreas and the thinking people the mind-sets and my book if anything is used to suggest that these pigmentary on structure you can just postulation between the 2 of them you create shame that passes date it's all about the data just for the data at which point you just doing programming that you've created a very very small amount of conventional weapons which is getting smaller all the time so my position is a little HTML skeleton to be honest not much more the sum of the tabs that means that the right for duty the and xhtml tempted to write for that that it's like 1 point and that was more horrible than conventional additional 5 quite frankly and Jason is the obviously reformat it's pretty good and increase much anything with and Python calls handled emphatically the great about Python as compared to say you oral some for any of the kind Levy met and Mathematica environment you know you just roll a survey in Python in a few lines and that's amazing and you can stand Python ecosystems to do everything all the processing of the delivery and you only need to lead to the when you hand off your data to the media to the Web at which point maintaining control of a certain visualization context it doesn't work and you could want identity like to think this is sum which approach you can think of it as being desktop-based items and you can even use various various ways to make so the best desktop the acts which and then press a button and then suddenly when which is incredible so I could do an amazing visualization for this tool and I get a much time but I'm still very crowded because I've ordered the numbers here in such a way that you can see patterns in the data that would the rows not the to help visually understand what's going on and uh you know I mean because if this was this is the original right so themselves so so that this data visualization we've we've made of taking secondary may be conceived of course some people say that's the way I did not know you choose to choose its entirety um now I did this following when it's my boss of 23 per done in the 2nd part the point is that when you start using the 3 you started thinking of something and almost doing it but he forces you to note that you will structions wanted on the digital world can be and . instinctual and then you can do things and I want to show you a few that's all where involved in that helps you look to the person who founded with visualization but that's the pharmacy index but nothing and everything else is programming is just that I won't go through now but it does usually made does look like programming but and programming and it's pretty easy to do what while it's not much would some of it's as powerful as expressive and them the point is to minimize all that horrible craft you don't need less you don't need these you don't need any help so and grants-in-aid that rubbish you can programmability assessed the jobs that could be an HTML tags backbone and the rest of it is just a problem which is the point just 1 can show that the 3 or 4 read what motivated by it's it's not the only game in the part that it's so much better than everything else so that overwhelms the data visualization in in in the browser and point anything has been made that category I remember when it was part of his arm in the following when my poster was 1st developing it and that there was what white wine JavaScript insane this is a powerful visualization to is 1 of the best implementations of the very profound locally grammar graphics what which basic plot to which are the question is so terrible and the problems with the idea of having seen it work but has a solid theoretical and he did it Johnston he was going against the grain and values using SVG which was about don't try and he sees the yet but stable vector graphics from the web and now it's unthinkable any browser would score pretty much the whole the whole so that in itself is and it's
knowledge just library and in fact I think something that was getting away from charts in which conventional data expressions the so the thing that we've been obliged to use because we can only ask the software didn't let us change anything and his partners that's the frustrating and so the idea is to building is you can build chart libraries within itself the out there I suggest using an instance erroneous I'll teach you how to build a budget is very low and experience all fundamentals that there's a point where you just wanna possible of uh I think the thing is the innovative use of data in the the data visualization data for the 3 and I would like announce there will be other libraries among the 3 is predominant but it's just it's amateur it's you know it's a 10 15 year project and it shows um so the idea of book was talk about that because I'm making the point that digitization transformation all databases ations essentially transformatory and uh and point is transformed into things that we world primary common visual cortex and can easily absorbs as we do communicate and with a picture so I thought I would try and based on parameter transformation to transform Wikipedia's Nobel prize a page which stated into a model interactive visualization and to teach the whole process and also to use all the amazing Python that available in and not completely contrived way and so it's greatly break the pages for you to represent me I know there's been talk here at the scene of the team has since becoming on meets and bounds and the it's once again that once you into it and it's not a huge than do amazing things cleaning appendices entire we all know about and this is great great way to clean data and the data is always there always and then matter of made with seaborne of regret which explore it and then you can roller so the point is lost in a few lines you can roll RESTful API that your javascript webpage just can use that that's amazing and you cannot get that in any other language that I'm aware of some I've seen which service our and the goggles sitting so this is the what is the starting point as you see
this is how it works in Wikipedia people have lovingly into these names by hand here along those triggering the mountains and human so that's going to be an interesting challenge going by country and these linkouts might wasn't prepared to risk 1 of these linkouts the individual and when it's in your ideas described this page and then use this page to get between isn't straight then get all the data biographical data about the categories and the people on and turn it into something a bit more and they just well I think and and push
on limited by the resolution of this 3 5 0 but figured that it would be huge but said this 7 6 8 so it's almost it's
but it's so this is something you can you know you can you can ask questions the
data you can kind acquitted the point is discovering your storage skirt around narratives and not being obliged to to other people this is generally not much has happened since that visualizations of Victorian times the other times newspaper uh did
amazing visualizations um in in pink and these can visualizations in pixels and aboriginal point where we can actually played with the little interact that's a huge things that lost their life thinking activity and things to communicate it's pretty big honestly um right so 1st described another going to integrate it just give you kind of feel for it scrapie has a learning curve when you get used to using that you use that's the amusing crime explorer uh yeah great thing about framing just scripts if you don't know is there's some very powerful debugging very powerful exploratory tools built into modern browsers the crimes current pictures of might not be the best the best and end there you might well be surprised how much you can the fact that the body environments velocities that anything Python and you can do almost anything and it's called profiling building and in this case I'm using it to explore the structure of the page like it was for the x fast which is the the identification the syntax indication of the the big science I want when this case the biographical details of the little prize winners want their little mini by picture and I critical spider with in this great big and I said that through the day it deals will the asynchronous load balancing introduce with cleverness that you don't want to do with yourself for example the demand is scraping you'll probably get and within 20 minutes you'll be getting service to God and then you wanna dimension throttling and that's a bit of an art form and it's so seen that's great it just doesn't work for you and we can do and undermines uh no might get some very so have using to apply advanced to the finishing finishing and zone and 2 in this case to consume the image so at the end of this I'm left with an nice array of Jason objects are listed in the table and the job is to identify and remove the phenomenon was feels um and clean up as best we can and then use the appended to explore equipment problem and so we got our data little of governor right adjacent objects as we get left with with and in our case we have little local and links to the image data and anything else that you might be interested in and analyze these since we have and and and efficiently done elements great of work for you it's lovely cleaning isn't so interesting and but you loaded into the data frame use uh do a quick correction of NLP is missing fields it's not obvious missing filled in place of death uh if you use built-in and this methods to we describe the data and you can see that here are duplicates in the In the name field here frequency of 2 for this guy that's flag and many of the other things 59 countries in total the nationality and of stuff this is the process we're kind of summarizing what you're looking at and so you directly to the the tension you bringing together is 1 example of how using this with and this clean stuff and the data that was recorded by a human being in practice in Wikipedia and we have of the Johannes Eder valves vandals field and he stated that is theory called which is what happens when you get you the instance in data and that's a category error in tools and that much of 1st good and bad see has 3 lines and you can you can find it you can tag it you can fix it you can throw away you don't like it when you found this makes this system very easy so and then did you have nice were staying still building blocks there if you don't because that's the nature of the beast but you clean the treaty well we have a 858 winners of the only ones with the recorded it might not be awareness probably won't still be somebody in usually a small countries missed the fact that they have a noble prize winner in that country and no 1 else solid so you miss 2 or 3 and that's nature wikipedia but with the joint that you then want to explore is much more fun to explore your data and this guy he looking stories to tell you looking for correlations and everything digitization is really narratives good actually trying to explain if you're counting and X 2 0 you're trying to explain anything in communicating story that's what human beings respond to i think and so you know that beautiful narratives that you can and that's what we call independent exploration will suggest that and is really quite del narrative frequent predictable but that's a story that United States has a huge number of level prices relative to many others allow us to see that not at the capital in that that's when you get a bit more interesting and but you can get a more interesting story a few more lines of and this this is breaking the countries into regions and up here we describe North America Europe Asia just take 3 these countries and then we just plot you see how easy it is just interacting and you can see the blue jobs America's noble prize hold passes the state European shot roundabout about 1990 odd um and that's a story so America's shooting off and this is part this is all a post investment in American science huge thing after the atom bomb and Manhattan reversals things among support there is a huge story uh reliance upon this gender disparities in Noble prizes that's pretty region right there you can look at individual things like um in the distribution of the age when it's quite interesting if none of you got the nobel prize the annual 95 it's very unlikely a period 60 sweet spot of your expecting you're expecting and Alexa was 1st elected a in Sweden then some the governs the timing this is like asking users this is more interesting they're incredibly long lived now because if they're selecting at 60 there's or a selection pressure for comprises 60 40 government died on different and I will add that but what's nice about using multiple violent to see bones pollen plot gives me as an extension of a box plot the distribution and then in 2 and you were at the Institute few lines and this a get distribution is studied the longevity is the the price but I think I can do this and be more interesting you can plot longevity against the time that the year in which they want and you see this kind of like him student population just population demographics and change over time including the beginning longer lived so here we have a line regression with confidence interval
in a few lines using the ones and plot of this brand that's 1 is lovely it has extensions statistics tensions demand for me but very recommended and and here's another story which is which destruct could Alaska these black spots here this is plotting the country which you know present on the intention which they won the prize to with the has that dental some other and of pheromone that data 1 and 2 and will be met and the lines in these black spots here and represents the exodus of essentially Jewish scientists from world from from European world war 1 and world war 2 following series anti-Semitism and there's a story and then they so story at some other than that of other Canadians America this program reason for that you have tasks some Canadians um but yet it is just a Hungarian collaborate couples is this signal and the break-up of the 3rd Reich and 10 the so once you've done all that you want to imagine visualization you just probably less is more or less and less is more you want to create a context in which you can find their own stories in the big idea of model that is the mean of the sometimes 1 editorializing by things that even the strongest editorializing should allow out some alternative perspective otherwise which is saying here is this is what you were interested in Washington uh so if you create something which may be guides people put in this and you can tell a story you want to obliquely you can allow them to to forage and that's kind of the idea that visualization here and before you can do it you need to live your data and as I mentioned last fantastic way to do this and this is a lost RESTful API that the enormous amount of work required and if you had a mutation among the DB this not lot longer for a standard SQL implementations of plus christmas which makes find bookies I just good this is the it's a loving you in the sequel and RESTful API uh roller and you see it 4 5 6 lines and at which point you can consume the data from age of script and stress among database and of course you can take this to heart attack ideas variables very powerful lots and lots of bells and whistles this is the basic and so you see tested from problem co and it produces some trace involved you can and used a
3 to 1 210 something nice and
this is the transformity transformative phase you pass it off to the browser I I know you can do stuff that doesn't involve essentially passing data repulsed a precompiled jobless critique uh thing and they often you're obviously that has its place that's the meaning that what you think is wrong in the browser this greatly regarding said and I have this computer generated code and I'm sure all of you have nightmares about genetic code is ICA using Delphi a number of things and billions of lines that you you never understand just do the simplest thing and could that's something that's not much more than the program we want to control things at a low enough level expressive enough level and it's hard to do that we direction is my kind of of feeling at this place but that's not to say that other countries don't have placed on much and so you build your visualisation underpinning through that In 2 weeks I had to go through in house now but in these 3 was all these are completely built from scratch that's the other thing the 3 down can chocolate and this is just a real thrill to build the 1st part shall control all of the different elements in various I'm sure you you have that feeling you get piece of software that is great is you doing things it's way than watching something and it's then you don't another suffering during this in cycle and with the 3 you just program changes so right so let's say this is the this historic couple stories onto to end on and is estimate the number of female physics prize winners to accepted right you get you know the other 1 was I presume that the first
one that was the first one was the other 1 the a rule that you had to know that I'm not something similar interest physics human chemistry the then you know that think so she's minimum mass thing spend something of a surprise when you thought somebody who knows that 1 I to by the way I'm like and I can't I can show 1st off is a big story we
got so that's female noble prize winners and which as you say is kind of smaller
than so I have a right and he points to I let's do physics they got married go put my what is the name show me another node query young female physics prize winner its astonishing right they so as a story told um let's this back and still another story about percapita that's was changed winning metrical little bit solve it and and let's do it in the
this this big thing here is to use the sense of year they're call poetry when this is not a population of about 50 thousand so you can imagine a per capita rating is is that students but let's just the physics concepts that's
that involves money and stuff you continents and which is with research budget they often the Netherlands not to thinking Netherlands Scandinavians Denmark which all
do incredibly well protector index which I think is perfect for everything most of the think of and and another this story would be the economics crisis
that post-colonial libertarian consensus I possibly so source-tract indicating find yourself by the directed to them if I did this this you yeah this is the idea is to learn how to build this but had the main thing is to allow you to find their own stories and balance that they you only think the about that said let's just have a very critical which this was started with the slightly less easy to find stories by individuals um but that's the original converted this is all the HTML that uses all those tags they essentially a backbone of flow the you flesh out programmatically using the 3 in this case presidency it's not and their most follow sophisticated visualizations that's multi-element and their interactive narrative and this is not too onerous I think in terms of you with there that control from and the rest is just as important tricks as you and don't does is not fixed in quotation atmosphere 6 which is in Japanese is made big strides the just is moving quite fast is improving and some this thing is going backwards and class but that's another issue and but it is young blood might there's my script false each 1 controls component and and you would write that in I'm having a to then when program and circular imports was from and that's the loading of the 3 which you can load of the web using CD ends which is profession
and them and summarized so mediated Python said budgets and Python John script Margaret complementarity think they work very well also something that what would just and it should be less right and because the alternative is no was just written just doesn't doesn't sound this problem can G and money and everything just role it data processing library and will take a while to reach Python's points but I think things make friends with the other elephant I should submit friends with very aware that made it as a how question that these are exciting times this is a very exciting time the data is it's and data is everything right berries and visualizing data is communicating there's so because doing any other ways you pretty bad so this is and this is an exciting time in fact may not be the same visualization of job the Napoleon's army and uh they marched to Moscow uh this is a major realization captured multi-dimensional data in a single frame and them the big challenge i think now is catching multidimensional data elements would need to do that so we now have the tools and they will be tweets of between becomes stuff and that is mainly if you wanna kind of the things about under which went on top
along with them and yeah high gradient tool and its end up being and so then Python and 21 little testing of this project and that was 1 of the 1st things I did to the python so this unfunded section she couldn't have so my question is very practical amenities initial book and uh it's have you got something on get how like in the case summer it's a few months away from the unimodal entitled to you after the exam period absolutely but I I I get book 3 away my that's the only thing that was missing parts of his so um but it's mostly done this and it will be in the country on that receive feedback which point robberies stuff into hollandite but then finished anyway thanks do you have some favorite set of GIS libraries that you work with with the mapping libraries geospatial so yeah this I used in inferring complex great idea will check GG GIS system and there's a fairly decent consoles In the current memory Apple using generally you can find a lot of the occasion and actually you can only find top adjacencies that was form between the only time you might might need to start with but yet this very continue to thanks for the next topic just a quick question do you recommend anyway Europe method we can use exports done the report to PDF forms or something like that so x for which exports the charts of the results into our PDF format or something like that appear to well I mean that we have to respect status it so I don't think new jobs in um but yeah I you can do it you know we have the and the 1 question you spoke about even momentary thanks very good tries have you consider also using longer engine which is for items some more natural way to query Mungle from them and there is also an an extension you moment in which a started maybe you could consider the idea of a single engine this sort of stuff to talk about the especially in the area but yet not absolutely I no longer engine and recommended and also for the gist of justice complained leaflets is 6 for just certainly leaflets they think it's just the idea of belief in book this 1 this 1 example and this is a great way to the mapping without having to greater awareness so the 3 and yet she said there are some very nice higher-level In questions so thanks a few