Using NoSQL & HTML5 Libraries To Rapidly Generate Interactive Web Visualisations Of High-volume Spatio-temporal Data

Video thumbnail (Frame 0) Video thumbnail (Frame 1174) Video thumbnail (Frame 2302) Video thumbnail (Frame 2723) Video thumbnail (Frame 3394) Video thumbnail (Frame 4024) Video thumbnail (Frame 4694) Video thumbnail (Frame 8480) Video thumbnail (Frame 10396) Video thumbnail (Frame 10681) Video thumbnail (Frame 10991) Video thumbnail (Frame 12035) Video thumbnail (Frame 13529) Video thumbnail (Frame 15124) Video thumbnail (Frame 16851) Video thumbnail (Frame 20276)
Video in TIB AV-Portal: Using NoSQL & HTML5 Libraries To Rapidly Generate Interactive Web Visualisations Of High-volume Spatio-temporal Data

Formal Metadata

Using NoSQL & HTML5 Libraries To Rapidly Generate Interactive Web Visualisations Of High-volume Spatio-temporal Data
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place

Content Metadata

Subject Area
Twitter has developed over the past few years into a potent source of public opinion and comment. The service passed 500 million users in June 2012, collectively posting hundreds of millions of tweets each day, and several high-profile analyses of this data (such as the Twitter Political Index, which mapped sentiment across the US towards the 2012 presidential candidates over the course of their campaigns) have demonstrated its potential for insight and near-time customer feedback. Handling such large volumes and throughputs of data is a sizeable engineering challenge, however, and several commercial ventures (TweetReach, Tweet Archivist - many others) have sprung up specifically to deal with this complexity - at a cost. In addition, many existing solutions are unable to properly utilise the location data that is present in a significant proportion of tweets, losing out on the rich geographical context. This retrospective aims to demonstrate how an informed coupling of emerging open-source component technologies can be used to resolve the complex problems of i. large stored data volumes, ii. real-time streaming input, iii. concurrency of writes and iv. geographically querying and visualising results - with a minimal development outlay. Specifically, the construction of an open-source process to read, process, write, query and visualise streaming, geolocated Twitter data using the MongoDB NoSQL database and D3.js JavaScript library will be detailed, focusing on how MongoDB handles real-time spatial data (including spatial indexes & querying) and the unique features that make D3 so well-suited to visualising and exploring spatial data in the web browser.
Software engineering Inference Hypermedia Multiplication sign Visualization (computer graphics) Summierbarkeit Logic gate Mereology World Wide Web Consortium
Source code Computer file File format Structural load System administrator Forcing (mathematics) Source code Shape (magazine) Line (geometry) Product (business) Uniform resource locator Message passing Velocity Computer animation Computer configuration Boundary value problem Right angle Volume
Area Multiplication sign Source code Moment (mathematics) Attribute grammar Information privacy Cartesian coordinate system Twitter Twitter Number Process (computing) Computer animation Visualization (computer graphics) Personal digital assistant Software testing Quicksort
Software developer State of matter Code Multiplication sign Execution unit 1 (number) Electronic program guide Methodenbank Streaming media Fault-tolerant system Login Focus (optics) Metadata Area Twitter Formal language Geometry Goodness of fit Methodenbank Operator (mathematics) Software framework Physical system Presentation of a group World Wide Web Consortium Dependent and independent variables Channel capacity Weight Sampling (statistics) Streaming media Coma Berenices Database Line (geometry) Cartesian coordinate system Connected space Subject indexing Maize Velocity Computer animation Statement (computer science) Website Video game Quicksort Volume Row (database)
Point (geometry) Boss Corporation Dependent and independent variables Touchscreen Image resolution Point (geometry) Projective plane Range (statistics) Database Web browser Streaming media Medical imaging Web application Geometry Computer animation Visualization (computer graphics) Methodenbank Different (Kate Ryan album) Pattern language Quicksort World Wide Web Consortium
Multiplication Spring (hydrology) Computer animation Multiplication sign Network topology Density of states Attribute grammar 2 (number)
Web page Point (geometry) Operations research Mathematical analysis Electronic mailing list Database Database Twitter Mathematics Word Mathematics Computer animation Lecture/Conference Operator (mathematics) Quicksort Data structure Spacetime
Computer file Transformation (genetics) State of matter Code Line (geometry) Workstation <Musikinstrument> Execution unit Range (statistics) File format Shape (magazine) Function (mathematics) Mass Field (computer science) Automatic differentiation Protein folding Geometry Mathematics Computer configuration Maize Data conversion output Relational database File format Software developer Structural load Computer file Range (statistics) Database Line (geometry) Transformation (genetics) Equivalence relation Subject indexing Process (computing) Computer animation Function (mathematics) Data conversion Codec Quicksort Table (information) Spacetime
Dependent and independent variables INTEGRAL Chemical equation Electronic mailing list Variable (mathematics) Word Subject indexing Word Computer animation Atomic number Query language Computer programming Subject indexing Right angle Energy level Key (cryptography) Figurate number Multiplication
Code Feedback Multiplication sign Control flow Twitter Revision control Mathematics Meeting/Interview Moving average Metropolitan area network World Wide Web Consortium Boss Corporation Dependent and independent variables Software developer Interface (computing) Feedback Moment (mathematics) Interactive television Code Coma Berenices Line (geometry) Flow separation Mathematics Process (computing) Computer animation Visualization (computer graphics) Universe (mathematics) Revision control Website Right angle Game theory Freeware
let let the canister and it was not that we do inference in China to it's always things you had the sum up again of the of the time and I had started so it's on a and read my inside so because complicated it was on soon after I submitted it but I'm just interested to know and the audience how many people can understand more than half of that title that as good as part of its less quite gets that means not I'm not the media I have in a bigger blow all of your minds all Borel of you and I wonder today anyway so on ways you I was tasked last year with looking at
how we can better realize the location value from high-volume high-velocity data sources and so I'm I'm from the owners of ashes of Eminem's jack and we make a lot that's products and 1 of them that I was so investigating for here his uncle boundary on boundary lines load of administrative borders for the whole of the UK not nonrefundable quite useful and to set up as a shape file and kind of behind this general message was kind of always someday sharpen the rights of formats and how well how much what you need to do to get it to I was some the new technologies so they don't wondering why I was doing this tool that was the kind of that reason for that so the technologies that we went to investigate with those that were fast
flexible affordable and I'm sure at force most was meant nothing is really free so much 9 1 plenary still a day and said you the kind of d option if you go if you're looking for a high-volume high-velocity date source
is Twitter data now was really looking to do anything revolutionary with Twitter data was more about the technologies the processes that went on behind it because I'm sure most of you have seen more visualizations and technologies using 2 estates and you care to mention but is it is a really nice day source to use if you want test assistance in a kind of rapid stuff that comes out of the sort of more data sources so for those who don't know and this
3 main that's the tweet which and some texts and a time applications in all tweets applications estimates berries out by seeing figures of around 10 to 20 % orgy located and this kind we warn the con- wouldn't you have I'm sure that will go down as more people become privacy where but for the moment you get us of numbers so does that mean that means that given an area covering
sort of greater London which is what I use my sample size you expect to get between 5 and 20 tweets each 2nd that dust the does vary quite a lot and that of a grace about a million records each state so we're not talking big Data but it's it is next quarter units at fairly decent system to recall that visualize that the the end so I would get that stuff onto a system well I use Ruby Ruby has
a module for every occasion I realized as a this these 2 statements kind of contradict each other and anyone who is anyone really here so it's a junkie bacon is probably not amenable to you but basically review has a lot of really nice features that make writing code in it really enjoyable and and very quick and people say it's written like English language that's not true it has lots of weird punctuational bicycle code does but generally you can read a line of it and if you read out loud it tends to make sense as a statement which is nice and there's a guide that has a lot of mentions a turnkey bacon which is good system modules they use which were tweet stream and which is a module for connecting to the Twitter Streaming API I you basically give it your login details Frances you wanna prof along the deals with main thing the connection and setting tweets up to your application it's quite nice and then on the other and they use which is very light weight rare web framework much much simpler than reviewing rows just wanna get the basic website was responses of going it's really nice for that for doing that and just few lines of code so in tweets comes through they are supplied is adjacent follows a simplified as lows metadata stuff like that and as you can see is the life of feed directly from this presentation and and then you get some corners the lucky and you get time and then Jason this is called the stuff we wanna take from the tweet and we we needs these main sort of 4 quadrants unity uses some text corn sometimes and that's how we fit into our database cited I used Mongo DB because it is just some friendly see just by that same document to it and it would just create a document for you for that tweet and it has built-in jeered capacities and I'm not sure how many of you use Monge before but it's especially since the more recent 2 comma decimal 4 update is really going quite good for doing special stuff not as fully fledged disposed just but it will do a lot of the kind of St operations and dusty special indexing as well and it was as a flexible schema which in my opinion is the most useful thing about I'm only doing a lot of skill technologies and I'll come on to that later so now you go
all your points feeding into you Mongo DB database and 40 do honey make sense of that stuff will 1st I put the data
points so just just stream out again into screens you can see what what you're looking at the kind of patterns that emerge so I used 3 for that and I 3 because his web-based looking still web-based visualization technology and its responses seeking make it responsive to the from web browsers different resolutions you not talking about serving up images or anything like that is drawn on the fly it has this will for Soneji adjacency can fire substrate other economies have to interpret it has both the modules for that sort of thing and again has built Ngô so you can you can do projections you can do some some spatial range stuff and a lot of the examples have a 100 years spent on them so if you if you coming into this new which I was and it's really nice because you can repurpose solid tutorials that the guy he makes this school my boss stock as mates and other people just really nice support in a lot of poise school looks quite cools off stuff going on but actually is kind of an interpretive tool it's not really that great you all Lowesville littering in London that's really interesting but some what you really need to do is is aggravates the needs do some careful
aggregation we need to pick your attributes and aggravate the stuff up so that you can actually make sense of it is loaded DOS in spring does make a lot of sense for the time so we're really talking back
aggregating multiple entries in in time which is quite a feat when you come to getting this kind of 5 to 20 trees seconds and special someone who's who's new to this and we really need to do to get that
to work on the other end 27 out millions of points is all this stuff in a database so you especially space operation so what I wanted to do is make sure that when a tweaking and I did not know the point that it was in owner tonight and I wanted to know who the bar that it was in on it's no the it will be false and bigger to sort of county was in Germany guelleh stuff pre done before I had to sort of do it on the fly setting up to the web page and I also the some keyword analysis on the tweet itself the text itself and so I strip out was of stop words and stuff and that so we have a nice clean list of of western terror so how to implement these changes had
a we you know we made a database we've got off structure and then no 1 as of 2 it the nice thing about
longer debate is if you wanna add additional fields you just add them there's no there's no predefining schema and which for me is really no more but I am an honest so toned people internally about this I mean in my ex e folds but I have never known any other way than doing this so I had to make some some sort of station relational database tables the other day and I can understand why needed to predefine we'll build legs a slight wide-bore 1 these do not understand the best performance benefits for doing that sort of thing said look space doing it that way but you know I spent all my temper decide being and the fact that I can just add feels take fields away change was in them whenever alike is really handy for being of a kind of change through development process so so ads as it had a shape file of some of our data and that I needed to get into Monday DVDs so 1st I used in other
open-source tool put to a job which is really nice 1 1 line and format conversion tool and this corner transformation because we insist on receiving a British national grids and a lot of other stuff doesn't accept that state back to convert it will cost to the GS 84 and has a huge range of special format so basically as just G say format he said we your output faulty code we we report fault is called and and that's it is hands and what comes out far for use as really straightforward there are loads of other options there's there's a the units of a well-supported tools around for a while but in its simplest essence desolate it's it's really nice and straight forward and then we do is you do among the import which is 1 of the command line tools that comes when you install longer so you pay give database is stick in a collection which is the equivalent of a table and so it follows mass it just write the whole thing and it's ready to go a pretty couple indexes fuel which again a very straightforward so what I love me to do is as
I wrote Wright wrote a tweet into on the DVD I did say some intersection queries and figure out which kind of which borrowers in and as as the atoms very quick to write in 1 of my things but longer is that you can write in an stack up your right so what you have to wait for a response before we send another right so when you're talking about quite variable amounts of data coming in said in and sometimes you get 1 or 2 sometimes you 30 and it's nice because it kind of balance that off over over the over the minute whatever you are you writing into it and the 2nd thing was a keyword extraction using ruby and so that was very simple and it was in any particularly complex natural linguistic programming it was just a strip on the main this face on it's basically but what the well and it gives you a nice list of a nice array of words on the other end to query which again is when I because of money to be d multi-key indexing seek index within a document and a right saying you can search within a rainy so that nice flossed speed said would learn
from this this is a regal picture I think this is neil Armstrong's Frenkel and this integration is
1 of the things was get a simple version of what you doing working as soon as possible and you quickly realize when is change sake and for those of you who don't do presenting it's you can spend a lot of time figuring out what you need to build I find with these so technologies you can actually build something quite quickly get a strong man up and then yeah this is all for this isn't working and you can kind of go through that process in you know a couple hours in and get something really quickly and and I find that a lot better for actually figuring out what you need to do and how you gonna go forwards and kind of following on from that BS protective of your own work so make clear is in development and just put out that the people around you and and get get feedback on it and I think and certainly in the design world which encode Paul's but but also in the developed world we it we can be a bit protective of all worked for we think it's ready or finished and this 1 has to stuff out there in an honorable formant get that feedback in and to support a letter picked on the technologies of flexible and even if you take a performance that now I'm sure there's a lot of people in this firm which would jump up and shout me among the DVDs really efficient or something woman's lines but for me it has a really nice interface is required to work with and and is flexible which is really nice to me and he said that at the end of an wants to come up and tell me any 10 on these urgencies instead of that of that it what yeah the so if if this is interesting see this 3 resources that I recommend so from already be they have a a moment of university now which is a free course or several free courses and that they run throughout the year and I did that 1 I 1 costs and lost it and really really useful course and Ruby Eckart Academy if you wanna learn Ruby and Khot Academy is a great way to do it a through some interactive lessons and again that's free and Scott Murray is a really great D-3 resource my boss is the right set so he's also a great resource and he will respond things on Twitter at is painfully obvious and the Scott Margaret read a book called interactive data visualization for the web and which is an I write a book and I went to that again last year and that was really really helpful to us game for money and all code is available on the on get and I
haven't particularly minute ready for tutorial on thing about the and to get a look at and play around that and when I always wrong breaks and my twitter analyst J. Harrison and websites to her so common and just come to shout intimate as much worse you know bring enough to tell me if ever technology here then please implemented lamina is really interested in science if