the course has been run in the first course which was based on aspirational. it was organized by political scientist in trendy i'm in the summer of two thousand and four that was very early and subsequently of don't have a doctoral courses or tutorials the day twenty years ago or fifteen years ago was fifteen years ago is trendy and summer of two thousand and four for political scientists. and organized in collaboration with the piece research institute also earned and conflict studies studies of conflict area this was at the time when we had no idea that anybody used the software we were writing and those so what should be doing initially to the first half hour or so. go through should join in the moment is to talk about the background food for where we are with these these two books came from so first edition of our book from two thousand and eight which is quite like the original course than the second edition of the book from two thousand and thirteen and when we get about know how often. marin to this morning how be explaining why we haven't got a third edition of the book which is that the world has changed and who we are under contract for a different book. have it helio who's who co-authored in in this one has several books which have just been published on on beijing statistics the ground and supports more and i are under contract for a different book but it's not ready yet and we're not quite sure where we are.
so that your in the in a slightly difficult place to have because the is those who have taken responsibility for making things happen over the last fifteen years are somewhat uncertain about where we are and where we should be going but that will develop as he goes that's with regard to data representation. and data humbling with regard to the with regard to another space then i'll be talking to you once were off streaming later on about the choice of topics towards the end of the week the moment the end of the week use a bit uncertain around and i need feedback from you. but that. the format for all of the talks is use identical so that you get copyright no to c.c. boy c.c. by i say the i'm running the current are the three sixty one on fedora thirty one. so on on on on linux of these are the packages which you need to have installed if you want to run the code which is for this morning and the code can be downloaded from this link on on get so this is a zip file which contains the are script in some cases there's just the are script in some cases those the. script and some data files. the shuttle for the whole the golf course is the today looking at specialty to representation which is something which you may find them. unnecessary please just litres get to work sorry we could look at data representation and it saves time in the longer run you may know your data at the moment that in two years' time will be on different projects the day to be different later presentation matters and in particular the changes which are carrying. indicator presentation will go through to the eleven o'clock could turn off the streaming than we can talk but for say a half an hour for have been killed over then we take lunch break the than we can interact a little bit more before one o'clock and start again with streaming at one fifteen. the than look at support to polity input to put so those of other elements to more morning looking at court reference systems and then in the afternoon visualisation and visualisation i'd like you to be fairly active and try things out there is a lot which has been changing so nice in since the the books. britain and much of a tease he is he's quite exciting the on wednesday morning and this is then up to discussion if nobody needs the interfaces with the jury us than we can choose something else the phone from then on the will be another ceased depend. being on what your needs for analysis so then we can we can read news substitute topics from from from then on an sos of said project surgery and wouldn't stay after lunch that means the that i won't be talking to you but you can you can new talk to me about your projects will be be here in. this room and we can we can develop and the same thing on friday after lunch with a presentation towards the the end of the the afternoon then on saturday the presentations for those of you have been presented on friday. and. than would be done. and i i was a little bit worried about my first line in indy in india klein why break stuff and then why not. the there are some people who provide open source software who really enjoy breaking stuff. don't. but sometimes you have to so what i'll be doing in data representation in input to put in visualisation particularly important reference systems is talking about things which have now. live their usefulness but still have many uses which means the how do we manage the process of migration to more modern robust and sustainable representations of the data. it's also e.u. still to know that there's good communication between their between the various communities and this includes to buy from communities to buy from communities these to also lively does things around the eye can provide links if someone is interested to know and our point fun workshop. that i did in in september in luxembourg. there are new opportunities for visualisation particular the team up back each and a few. and these are then there are challenges regarding the upstream the software libraries and i'll be talking about please. we're going to talk something about special special weight special to correlation spatial regression if that's something that you find find useful and they can talk talk about that because i'm responsible for those perky cheese the speed up and and special rate. and then we we can choose other things to later on as well so this morning and were now they're out and then. nine fifteen to nine forty five. cic slide show. and i have confirmation from outside that streaming news he is running so as a at least i can relax the stream ustream streaming is yes' is running a thank you learn so for the we haven't really worked out a way to get feedback but slider was one of the possibilities that it may be that some question. is coming through slider as well but but for the time being this will be more assertive straight to. and. provision in some information from my side and if any concepts that i'm using how their unknown to you please put your hand say the not only the mayor. at the moment i'm stuck completely inside a particular problem of data representation and i find it quite difficult to get out of it so that it may be that what i knew what i say to you even though i'm speaking english does not make any sense the if that is the case put your hands up and see what all could you repeat that in english trees. in which case of i'll do my best the and the. the fact that i'm i'm a stay the one battery but rather than five per tree bars means that i may actually get lost in my own thoughts if i do want me up. now if we stepped back knots fifteen years not the he even twenty years but if we step back twenty five years.
many of us were teaching courses in spatial analysis of many of the small group of people were teaching courses and spatial analysis and so perhaps ms work was in particular in india's to two sticks and writing an excellent standalone the standalone program g g g start for the four. there are just have to stick to analysis others were working in the same feed field the same area of brick your part was was was the teaching in in carbon foot and other people were teaching were trying to find software tools to teach the the to the outside g.o.p.'s. he is not the cheeriest was about thing but the most most geographical information systems did not really provide the tools that you needed to do analysis and partly this could be the could be concerned ways to with a jew him just to to stick or another since the start partly it could be concerned. and with ways to appoint parton's originally in in an inspirational point but another sees the the has split spac each spanx spatial lancashire because it was the university of lancaster when because the universe to and so on the were people who needed software to to. good to do an analysis to do or to do teaching. and it was also an advantage if the license fees for the teaching software were not an unreasonable as most universities had limited budgets for software and going to the going through the departmental and faculty committees to try to get more money to buy more software. was difficult g.-star was open source from from the beginning spanx was made available to license holders of us plus free so that there was a community working working as to what they were dispersed individuals working who may be knew each other. the it was possible to write and sure scripts for for arkin for in a m l r view visual basic for our she's site licensees dongles and i still have a postcard from a french the ecologist by my screen saying and this is from. a long time ago disease from fifteen years ago fourteen years ago saying i'm sitting here on an island in a river into but with my students the batteries are still running we can do the analysis on site because we have open source software otherwise you are to have a dongle which were touched the parallel port of your. up hope that working in the field was clunky if you have to have licenses if you didn't have to have lost in two thousand and five an island and but you didn't have internet connection in the case you couldn't run arc info them with a licence fed from a license manager on the internet so the. was a practical problem in teaching and field work which could be resolved by open source software. so site licenses dongles nowadays they seem completely mean why would anybody need anything like that everything is open source but then it wasn't. nerve from late ninety six are became a viable alternative for teaching pounds a couple of us several of us started saying ok why don't we try these are our uses much of two he began by using much of the syntax of us language which is in commercially available and quo. a lot of universities did have site licenses and they were managed in a different way to the jury a slice it seems. how is licensed under the new general public license. and pushing to undermine so so that it it's free to use who were of abuse wishes to free to use extend distribute and so on and i'm with modifications to the code restricted by the by the g.p.o. so that you can't sell a modified version without contributing back. the modifications she made slightly different from the by funny phones but it's the force the third the that our house. and in ninety ninety seven ninety ninety eight may be know the end of ninety ninety six a special starts model was made available for and four and s. close but it was to license you still needed in us plus licensed needed a license for the for the to them. module. and there was also a meeting in leicester in in the uk were quite a number of people working on expert especially true of this is meant and and the discussed all kinds of things has her could we used to even a tickle teach it to him to see it tickle t. c.e.o. was a language a scripting language. but the time i was using walk scripting learn which in addition to see them. supporting of s. code was begun by albert the get out because he needed for teaching and the was then made available as soon as the are back each mechanism mature which used his twenty years ago and. the so that. porting matter because if you if you'd been running across using s. plus and the different the u.s. plus fickle libraries the which were available as geo start spelling so on so these were available you can teach with them so ok so you can imagine that you're dealing with it must as. forces or what would now call most as courses or doctoral courses to give you giving the introductions in applied the his patients later another says there is a separate book by bailey and control on the interactive spatial nature analyses from the main nineteen ninety's distributed with the disconnect. which you had to run on windows know but windows through no you had to run it on das it wouldn't run on windows because the most work that they have a special now strive so that the everybody who was teaching this kind of stuff was trying to how on earth can we get it to suit to work distribute it. to our students without the having to go around with these these half kilo dongles the only one half kilos maybe hundred grams that but still. and it's so that the the the first packages that are on the conference of our archive network from from them. nineteen ninety eight were an airport to buy by a brick get pot worth a try pack team are both available within s. plus and unfortunately on on open source license used but needed by people who were doing a special work followed by ash as geo start in the hague. the six months later so a about a semester later so you could see the the clock of the semester is ticking said the outbreak was getting stuff out which much the the year the yum. the soonest as. it also helped the port parts of the special package which is part of the modern applied the statistics with s. and from the various it from the very beginning the the is here and administrative very firm. the alberta night more in contact and we did the the seriously mismanaged talk to the the regional science conference in vienna in ninety ninety eight we had a twenty minute slots and we used forty five but that people would tolerant in those times you. you can have these people with their lives will turn off your beamer stuff because the actually they were interested they were sympathetic and and they gave us some feedback the so than we were able to show that the sea in some cases in in in packages which you have to download from our own f.t.p. sites you could do the teacher. you need to do and if you needed to do research in the field you could you could to do this do this as well.
the the s. plus for shun the already mention of of spanx and contacted by rolling seen in nineteen and seven but only moved forward because of the the the amount of fortran involved in importing the pack each today as september ninety ninety eight and at that. stage which started. realizing that the was an issue because we had an implementation of ripley's k. test for those special randomness. call it that is not quite but a for point patterns in the special package and we had another one in the spring expected and it would be really nice to be able to confirm that both of them came to the same results from the same data ok so we've got a standard. test now other two implementations the same or different so this was a question which arose very early and. as of this is this does quote from an e-mail and issue a photo very little is whether at some stage albrecht and i wouldn't integrate to harmonise the points and pears objects in spring so special and it's just up they aren't the same but for use as maybe they were to appear to be so so we were thinking about shared classes for representing. in a special data and it turned out that this was quite fruitful i stepped aside a little and worked with mark is not alone on the an interface to the grass shares a gross cheeriest sees it was originally public domain and became open source in nineteen ninety's was written by the u.s. army. me and it was written for the breeze an ugly and military purpose of monitoring erosion on army rangers so if you drop your tanks along the contours of a of a hill than they created fewer gullies than if you drove them down the hill. use of this with this was he is so that they were interest in modeling the erosion caused by army exercises on and on the range in the middle of the states and so so gross existed and still exist it's now it the the this seven eight two is now considered an almost ready to be really slow release committed to was published last month. so the was working on into facing our and grouse and and using using our to an ice quite loud should the rest of data sets from from from bras the point in time as close to see classifying the landscape times and the the erm the interface has. both to gross has evolved and interfaces to other jury us have also you also involved and if you want me to talk more about this than we can we can look at home on wednesday morning already draw your attention to this book. the if you need a. the help i don't know what to do a new to this than this is the book to go to the it's not the third edition of our book although robin lovelace wrote a very well thought through review of the second edition of the book he knew the book before her. but he was thinking how do you teach people towards the end of the twenty tens this stuff and the saying. this isn't going to work the the reviews of the first edition. were ok this is tough stuff but if i have a doctoral student who has to do it then i given this and tell him to chew until his chewed all the way through. so in the twenty's so twenty zeroes than people were expected to have to considerable. and determination and independence of thought and if they didn't understand the paragraph they were expected to reread it and resist successively until they had understood it in the twenty tens this is not so much the case in people more perhaps expected to be stroked so i'm not saying that. this book isn't the accurate concise but it is more reader friendly the so and is also available online in addition to the printed to initiate was it was written as it is as a book down. project and the link is is provided in the references the end of the slime so if you're lost at the moment this book will be your front. the the one of the to going back to twenty years ago the. working on different different to package is on the arc of network the arc of network team who were the cottoning giffords leash in vienna said that we're going to have a meeting for our people in in march two thousand one can can you come and give us to talk about the grouse interface. so. it's scary. i'd done another scary thing in the nineteen eighty six when i went to the european unix use a group meeting to talk about specialty to his accent and obstruct said. the not going to be interested in this but they put me in the plenary session with three four hundred people including whose the next developers next was what came before the new mark so it was where steve jobs went when he wasn't friends with people. and that they was seriously intense people so talk to some of the most of my talking to the they were actually quite so interested about what was going on and what people were doing with spatial data and why unix was useful for this and you the modular approach to writing writing software that was that was scared. three but then walking into his room with people about whom i the only thing i knew was that they first they will they will also to stations i'm a job offer. and they'd all written a good deal of software which are used on a daily basis. and they were very kind but the. the interest it even and sore so so so the that the that was that was fun and they were extremely helpful to talk to because you could all kinds of so hints world had you thought of doing it that way did you try that arabs storms of the community was was feeding back in the mailing list as well was was very useful so that and i knew. all of them from the mailing list to we were fewer than seventy the the these this meeting even in vienna. so unique insights yes definitely. the a bit later the same year as us to go to to some to barbara to a workshop on software for this special day to buy new consuming and surgery surgeries most involved in fife and development and look i'm still in has been more involved in coordinating and a standalone new programs at the time this is a start. which are unfortunately escaped is his control. and was subsequently jailed. and i was continuing to work on the ears to the special econometrics as a as a result was a narrow field the during the to the second half of the of two thousand two hundred since about to try and do something at the next are meeting in vienna so the next time. meeting was being organised and i'd been asked could you do have a paper session and spatial statistics so what can we can send out some e-mails get some people to submit a person with see how that goes and i also thought of that maybe we should have a workshop to to discuss classes for special day to the i contacted the. the ads know it's a pub smoke because of his work with the start and he'd at the time coincidentally the end of two thousand to be an approach to buy the netherlands environmental agency to write an interface between she starts in the us plus. and so from an e-mail from and so in november two thousand to sue the one now seventeen years ago he heaved he mentioned perhaps i should i wonder whether i should start writing s. classes i'm afraid i should know i'm not sure whether he'd look i'm not sure whether he's grateful to have for the.
insight but since then he has done lots of other things but but the certain ing gauge one with with classes has been has been has been a very fruitful. the new video game is repealed with my third co-author and book had developed to park you choose one's are info for an interface and old arkin for the victor for months and d. cluster for disease clustering. and was also committee to coming to the meeting and than other people wanted to come to the meeting your did come to the meeting like marcus metal or a break to get powered the month and fisher and the number of other people so that we were about a dozen nicholas low income car would make country contributions to map tools also.
said moralists the same kinds of things it is a lot of duplication effort i did notice of to looking through people's packages those lot of duplication of my suggestion would be to set to propose a tree for special packages which we did on source forge similar to buy a conductor mode with the base spatial parky cheese which has as for. it's that was the new style and s. classes and methods which are efficient in general so that we had a if you like a mandate before we met or around the time we meant to do these after the workshop was set up a collective repository on source four inch set up the are secure mailing list and they're still. three and half those and subscribers. and that was the beginning of the s.p. park each. so we had a mandate for the development of the s.p. park each discussions within met for coding meetings in lancaster with are rolling stone in two thousand and four and two with video in valencia in two thousand and five and we got the s.p.d. on to a new comprehensive archive network in. in april two thousand and five. so what we've done when we went for food for food writing s.p. pack each so package containing definitions of classes for spatial data. it was to use new style class representations for spatial objects whether they were restaurateur and that they should behave like data frame objects and the we also in the the s.p. package contained the visualization methods to make it easy to show. shows show those objects. to see with her and the for the commons. so.
one of the things that we were clear about was that we shouldn't. oblige the authors of any other park each to use those classes if they wanted to use them to could if they didn't we would provide coercion methods that's a way of converting a spacious a special points have checked into a p.p.p. objects for point of. this is that we that you could move freely between representations for package didn't want to adopt the us be the representation of day to find not a problem. are we did talk about it whether we should whether we should do this mean very early on we said we were not show that our representation is the one which so the one size fits all so we want we won't go up in a certain way the those spots to back each has grown considerably does very well and we keep the interface. with spot start to up to date and and completely current and running from from xp to. we have this year made a radical change which is that up until the the. an ultimate release of my tools you could convert point to point out in geographical co-ordinates that in a degree. the small degree metric into a p.p.p. object for point but no this is despite the fact that the space spot start back each. was designed to handle plane a jam jar much so that distance measurements should be euclidean and not great circle and the absurd took the so in a question to him but the atrium by the lees the plenary a specialist. the two sticks conference in spain in july this year to say why can people still make the mistake he said i have this problem i have students who have they turn geographical co-ordinates and they coerced them to spot start. and carry on happily and allies in their points even though the technically the results are rubbish so we agreed that we would we would insert i the warnings or errors into the coercion so that if it was known that chore object was soon was injured graphic. who coordinates than you get a slap in the face and don't do that there is a way round of course which is to say that we don't know what whether the the the data are in geographical or plane accordance which case plane are or soon. but that's the uses choice so that now if the user goes in with data which is known to be injured graphical coordinates that could push but that that's the first the first radical intervention that we've made the in in in fifteen years. so there's quite a lot of continuity here continuity is something that we see as much or and the accessing data from outside have been available in them up tools park each a prior to the prior to two to him.
other work through for shape files the this was superseded by our google. but our google had originally been written simply to read rest are. the amusing some new pronunciations which you're not familiar with i call it the sea around the comprehensive archive network because there's also see pound which is the conference you pile archive network can see town the comprehensive tech archive network and some of the software. actually used to see your and was also used to see pam. um. it was written in pulled to salute the that see run was written in poet until quite quite like it and run though. so that's a possibly a an unusual pronunciation the other unusual pronunciation much of just used his are google and why my calling something which you might write as those whose soon. but as well. the in cheadle you might read that is cheadle and i'm reading it to us and the un and the. now why am i doing now will because the the one of the the prime original contributors to the software library and frank woman dumb so that they always wanted it to be object orientated so that he cheat he always pronounced it the. the geographical object orientated data obstruction library. but they never really got to very object orientated that he kept pronouncing it that way because that was their original ambition. so when you hear me say google i mean what maybe you might have pronounced as she go to google. google provided the extensive access to to rest of the waiter and the original work there was done by by the by tim kate other parts of. the good will for reading rector data which is what was then called the old g are part of google. were written separately by barry rolling stone and barry also contributed parts for dealing with a cordon reference system and the projections transformations the this was then made and speed. that was adapted to s. piece of that both are google looking at the rest of data provided rate could read in a special object could write from a spatial object time to rest or could read into a spatial object for. a vector file could write a note from a spatial object to evict the phone so all of these things were were in place who were in place fairly early on but in bits and he took a little longer or this was so that that was complete by but two thousand and two thousand and eight. so of completing this involved the using the external libraries google and approach the and then. we've managed to keep the the everything running moralists moralists more honest consistently since then. this was then using the s.p. package to define the classes and the are good ol pack each to handle data input and are put into either rest or victor representations for. the the final part of the framework arrived with new things to call in rundle who participated in a two thousand and ten google's summer of coding project and the lead to the r.g.s. park each which committee just to do special effect to data hundred lng so that we could do to political operate.
patients on factor. however at this stage the warning bells were ringing because the internal representation of the victor geometries used by the by google and geo us. the world what was and what are known as simple features representations is an international standard which became began to become have become important during the end of the two thousand zero zero so by two thousand so by about the time we we with done with writing the first book. the new the year. the choices we made in terms of the vector a presentation to a certain extent also rest are a presentation were beginning to show us we'd made choices seven eight is only a who documented in the first edition of the book and they were already beginning to show the the that the choices we made one up. the only ones which cost to publish the first edition and so it was completed in .
two thousand and seven published in two thousand and eight some of it was modified immediately prior to publication and the second edition of the book or was then involved a certain amount of one of the modification was about twenty thirty percent of the book was changed between the first and second edition came out in into for. as and thirteen and the first significant changes with the addition of the space time back each indeed the addition of argy os but beyond that the were not not very substantial changes so we started to realize that. spatial data was not. not just the end of the road because we needed to deal with time as well. for those of you who have used to graphical information systems will be aware that time is something that they don't do very well and we've followed in the same the line of thought the with regard to to two hundred time. what we have done however in the beginning of the in the preface of the of of the first folk we've included a. the figure. which showed the dependency tree between. this this this dependency tree and in two thousand date for the sort of in july two thousand and eight june two thousand and eight you could still print all of the names of the packages. which depended on the s.p. so we've got s.p. here and there are no other the packages which depended on a speech and the ones which are gray of the ones which the authors of the books maintained. so so we maintain most of the ones which were there are key genes which it say are also maintain like likes planks doesn't see it it's not an s p back each he uses his own own representation. so that some of the people had begun to have begun to the to adopt this. by the time we got to the two thousand and thirteen book. some not even sure if i can find for right. the right figure. the we couldn't fit the the graph on to the page. i have a. copies. the in two thousand and fourteen and under his degree so did the cluster another cease of using page ranks of party cheese on the serum. to this is the know are really run from from from the last months this is this is the fourth largest cluster on sea around its it's not a very big cluster and i'm not exposing hear how big is this the the the the if you ski. they'll the figures by the number of the page ranks of the paki cheese than. when the big. but the to in a poster it at the use of a meeting in all board return in mina two thousand and fifteen microsoft those mike the someone who is no microsoft employee had discovered that spatial was. was the third fourth or fifth cluster in in terms of use of are so that stage we realize that we do with causes cells into more trouble than we do we expected we thought we were doing something which was to enable teaching suit the first time i visited and sir and gave a give a talk the future. to east when he was in still in in utrecht in in think that must have been in two thousand and four. then the point of writing the software was to house it continued to be the same as it's been eight years earlier six years earlier his writing software is that you can teach stuff. so that the idea was that you have or a book on on special a journalist since and you can teach that with the software so students can not only the reader but the fear a tickle definitions of the methods but can tried out in contrite out on different data and can see what happens if so if you if you eat if you change the the the. very graham know the few the few if you're getting a very grim by i what happens if you you insert different different values into the core functions so so that the we were still sort of thinking more people are going to use this for teaching and they know we've realised to seize the its its. much worse from that suit if we if we owe him and we did the break stuff so you change something in s.p. or change something in one of the key park which is like or google or more obvious than the then you get looks lots of attention very quickly on on the mailing lists people people get in touch. and tell you things. so this is this is the rationale for know from where we go from here.
so our korean without a break because because we're working on. the. so i think what we hadn't realized was that there was a considerable appetite for doing stuff with spatial data out there. this is also been influenced that particularly in the last five to six years by increasing access to data. if you look at the behaviour of for the national marketing agencies five years ago was quite difficult to be able to download. but. you might be able to download a picture of a topographical map. but you probably wouldn't be able to download any detail data. the number of. the money would ten years ago you would find a number of people who were. exercising in the mountains all we're going to have their own g.p.'s but a g.p.s. was was this it's a family friendly clunky thing and its batteries run out of trigger an hour. is anybody used to like them to the g.p.s. and so if you're going for a long hike you took looks about trees if you needed to g.p.'s to remember this is your so that this is not just me making it up and that's just ten years ago so that it occurred at the doctor of course here two thousand. six the fishers researcher who is working on lobsters and he didn't yet have his lobster tracking data so he was using a handle g.p.s. around the campus here to generate some pull data for his project so how he thought to have lobsters moved it turned out later on that they didn't move like that. as he was using the acoustic they are tracking and he set out his triangulation points the but the lobsters have than a reflector new glued on which he had to remove before they should pick or places because of the ways to be killed and they were moving around on the bottom of the sea and the first train station taste. his his his data collection points the first things lobsters do did was move out of range. who does his his imagination of where the lobsters how how far the lobsters good move on a given night as he had known it didn't have anything to base it on there were no observations to give him a good idea of how what the home range of a lobster was the new go to a course in two thousand eight two thousand and ten. and twelve. and the amount of data that people have access to has exploded and it's become very very large so that were whereas in two thousand and six people using data sets me. if you had a hunch fifty point six was a big beset. now now under hundred and fifty thousand has been small really so so thing things have happened quite quite rapidly with regard to to to access to data the way the dates are configured isn't changing his facial data his position data into a three day. the we've got attribute data and got much the data which would be concerned connected to the position date. you could call spatial data map data or you could call a joyous data. by the use of. and speed and similar has not. we're not we're not clearly aware that has been used on other planets that have been knocked to the people were another plan it's busy being been the news was you go to plan a treaty to do we do know that it's been used with regard to to microbiological biological data so these been used for very small things as well even though the ms just treating them as. they know or three d. so the that some of this has happened. under a mention on the on the on the in the script for the the the g.p.s. only became as the g.p.s. g.p.s. the american military system only lost its the suit in civilian use noise added to. phone and the in two thousand. soon. or one of the last decisions of the clinton presidency was was to remove the year remove the the noise company. so where are we today. the ok so just just to give you a little be a couple of little a few pictures to begin with them. the were in the s.f. package for vick to date not the s.p. pack each old to be pointing to the s.p. park each where we get to it and i'll be explaining what the us to back each can to in a moment taken from the moment but this is also using the over some states are always them and you believe him. some. three letter can acronyms we won't get away from even fall foliage directions through it. the open street map is anybody use open street map anybody consciously you view open street map data. open a news newspaper. new website. the articles that they're running on a on online if they have them up they may have google. but open street map is free. they may be using an interface to open street map but sometimes if you look at this the the copyright line of the bottom use this he sees use of country to open street map started being used to he started being visible in terms of its usefulness in the early in a worldwide sitting the. off to the haitian earthquake. ok because the when the were no extents a digital maps of the that volunteers in the field there were recording g.p.s. data and uploading and so the certainly from not having proper maps then you have maps where you couldn't identify where were the a different two different to aid. it was was recorded. so open street map these is that it's not under present result reliable and the the code which you won't like to look out here and there it gives you an example of this is that one of the sources for the the the bergen light rail system did any of you use the light rail on the wane from your point. it's a it's much cheaper than the upon us. i'm especially for people over and over sixty seven because i get a price so stop using taxis the airport so i can get the gist was quickly with with the light running since two thousand seventy but the first part was quoted as as the light rail and the second part was code is a. i'm so that in downloading this data from one of what i've said here with regard to open street map is that this is the query that i'm going to generate and i want to generate for book from the abounding books just for bergen know what i'm saying this is in bergen no wait i don't want light rail from everywhere and what i want to do is to. to query. and the new railway features with the value of light rail and extract the lines as s.f. simple features lines and put them in this one than i'm going to get the trams. then i have to remove some of the tram entries which are bogus than i have to find out because the to two different data sets have different sets of headers. so they have different problems with data and so i take an intersection of them so they can then the put them together as those two different as merge the two different to data sets together and here i'm saving them as a dozen rds the object the so here in defining the area of interest. first here i had to do a little exploration to try try and find out which values worked. you have to look at the table and see which values are present and guess that they may be the right ones at a little bit further it looked at founded the with someone like rail somewhat tram some of the trams are actually the museum tram in the center of town which doesn't run. so these are the ones here which being removed. because some infuse us to been around and made an extremely detailed map of the museum from trucks in the center of town which don't run. and here we have it so special victor data is points so we got points the make lines and then we can construct larger live a more complex the objects from from these the light rail tracks a out to defect to data.
the point to stored studies double precision floating point and the downloaded the from from the open street map suit from from from the crowd. and this is then the where we are what else we doing here what else has happened to the way that we have a special nature and how the both the team up park huge in my view or provide interactive nothing they provided through leaflet which is new. another park each which uses leaflet j s which is javascript library so the error layer upon layer one above another. but this means then that we can instead of choosing this we could choose this background or we could choose an open street map the ground.
which takes a little longer to to load and we can of course.
zoom and pound and store. so we can we can visit ourselves here so this is this is a standard the. standard.
over the kind that you're used to from from from weapons.
to them up you back each began in. two. probably in two thousand and fourteen says tim ones. the eye. the the jiu the desert there's a series of seminars now called the the open geo up seminars the two thousand and fourteen one was held in in these two rooms here was this see and be and was also streamed. and the that stage we haven't really realize that this was going to happen and i noticed that the park each was made available on sea around and e-mailed him and said well we're going to have the next the roof open job samina in lancaster in in syria order. just two thousand and fifteen could you drop by so the drop by and the ease of a good community player in and and contributes a lot and there are it's really know him. two. it. satisfying perhaps is that is these the right word to go to sea and the generation of people who are thirty thirty five years younger than i am is like the second over side to it like keira been lovelace like like him. and lots of others and to the the most intimate case the with your thrifty map so there are lots of other people contributing things but they all build an infrastructure which which we have to tears to to go to him to maintain. so some up few years has is important and it's also based on what we didn't have until very recently which was access to the tiles are to place behind the the behind the the the interactive with marks another we can also carry take another example and its and is based on a park.
which by mayor by robin of lace the a s. t. plan are so that this is to trances for transport planning and what's going on here and if you want to replicate its its its food he soon a suit he says it's ok to do not know but i have blanked out here. my the individual a p.r. key to whom to the conversion from desire lines from and two for transport today to to him for years to rooting on the particular route and this isn't done properly is just just.
picture but again what we're doing here is downloading the complete the set of monthly to a calmer separated value files from the city bike system in bergen. are they been downloaded and placed in any in the folder called b b s. there you can download the same ones if you like or don't think i made them available can recall was really make them available. the. so then the need to read in the it read in the year the trips with the need to massage the trips and as some of them on. so the courts a lot of the the the the the initial is finding out the room which stations so it's from one to stations so obviously the city bike so taken from and hundred into the same hubs. and then some of them a move which we don't know we blew the data on movements of bikes where they accumulated one city sent to have been need to be moved back to place with no bikes that we don't have the data but we know where the stations are. the and we also know that one of the one of the the one of the year him. one of the station's was actually and also because that was where the bikes were primed so you get spurious movements across the whole of the of so the norway and the weather not actually cycling the the that the just being moved actually i don't see i'm fairly certain that these. of the actual cycled trips and not trips made when when the move by the by the year trucks. so when we have have counts of so what we're doing here is something the accounts between each potential pair of pair of stations you got the origin station in the distant nation stations that this is the old the object and the old the object. it is simply a table or of the counts from and two for each pair had a pair of stations for which they exist subtracting the ones where the boy was taken from and returned to the same station because obesity than there's no desire line. so using the us to plan our we want to create the old the lines given the stations where we know the image of graphical coordinates of the stations and we know the flows that's the number of the the number of the food flows from which station to which stations that here we go to the table of about one hundred something station. and here we've got to not a hundred by hundred so we haven't got ten thousand and the desire lines because some of them would have been zero in the tropics and some of the most from into the same ones as don't the down the principal diagonal so they're so have the the the the the lines that this this one.
which we could and again zoom into.
and here the new an alpha channel is used. to indicate the offer channel in the width of the lines they used to indicate which.
and the closest how here is is actually the a bit closer to. to the city than his you could walk about fifteen minutes to get the first fifteen minutes. twenty fifth fifteen if something like that. however the package also are the the the years to plan our park each also provides a function called the line to route. which if you have signed up to get an a.p.i. key from cycle street which is a uk website. which is so that you usually use it as i have a bike i'm standing here i want to get to their give me route but here we're giving them a subset of ten thousand roots. the and the that takes little longer so it is so this was an prejean aerated and them to your so need to apply in advance to get the to get the a.p.i. keep once you've got the a.p.i. key you can you can go if you using it sort of just once. have from the an application or something like that and then you be using the a.p.i. key of the new owner of the application. will develop a free application so that here we get to something like that so that if we are like eight all of these were all of these. news cycle trips assuming. that the hub the the bikes were handed back to was logical use on a sunny day in bergen there are sunny days in the summer and even in the spring then then people will say i just want to psychology just just went to cycle around so that they this. the reason to a real desire line that just cycling from somewhere it. round in circles and leave it somewhere else and he did it that the if you like that's ok most of the disease is fairly central in town.
you do begin no to see certain density out around here. there are certain things going on here but again partly assuming that that we picked up the the effective movements but we are starting to get to the density of of cycle movements. so those of the kinds of things which are going on so the the the than the the.
the these of the the to him. the files was actually went through to halfway through nov could have gone further. and i didn't get october which should have got ok so that within the reach of the c.v. as far as is a thirty year. these a these are data sizes which. the loo the they're just not real compared to where we were twenty years ago the is you you you have thirty maker on your hard disk that was was our wouldn't run on needed the the the two may get object in are in. and in ninety ninety seven was too much. the and even though you could do some quite harry regressions and things and and lots of modeling but the data sizes were much smaller than the ones which were familia now and where the data has been made available openly so there's much this is the access to data is much greater but it it's fairly heterogeneous you've seen in both. the examples that they give new is that if you were looking as offer the first time maybe some of you are using this look scary. is what i'm having to do is a lot of data cleaning to get something which is even represent blown them because of the data is provided by the data providers in ways which they feel is appropriate and which for their purpose is almost certainly is. appropriate they may not have thought but it's a great deal not just the debt that they may need it internally and those of the variables they need internally. good. but that leaves us with problems of advancing from the s.p. representation. so if we take take the the object that we have here we have the the the the the light rail system. you can check every village that was still. no for the commons can you. winslet is something of a over time loop. seventeen years ago it made a great deal of sense to use the formal class system for presenting special day to many other the implementation projects time used same representation by a conductor in particular which is a and and off see her own archive network with curated park. she's it's a very solid the boy informatics. the resource they also chooses to use the by and large as for the aussies. their formal classes you define them had time. and we can get from the s.f. representation which will talk about your moment to the two s p by core issues here work coercing from the the object here to an xp object and then we can look at the the former. in which we can see here here we have. an object with four slots it has a day to slot a line slot which contains the the the the geometries abounding books want and approach for string slope which contains the the court and references to which is also a formal costs. and then we can look at the way in which the geometries represented and if we just starts at the first the first one of these lines so we're taking the first slot all with taking the line slot and then we're looking at the first element of that laced. and we can see that this is a formal this is a formal class of lines for more plus of line and within the line with than got a matrix of course we've got a matrix of cord and it's inside but we know head of time the coordinates a are the are floating point. because if you're moving between our and see cold and you had integers as coordinates than you would get a mess when you check but in a formal class system you don't have to check because the class would be invalid if somebody somebody if they tried to insert and into churches accord and that they so it would be. thirty two floating point straight away they wouldn't be allowed to do so would know ahead of time a lot about the way the data was was was was was structured. so was the idea of having having a former representation was that we save time in into facing the interface incumbent languages and outside. special rested later in contrast from vector data which is observed points and from the points you construct lines if you need to construct polygons you construct those from the lines which lines which directions makeup of the a ring to construct a construct.
polly can. which once rest of the two so far we haven't gotten the. we could for example use the. the elevated park each again. when we were working with speed and even when we were the level of the second edition of the books let's two thousand and twelve thirteen. if you wanted satellite innovation dater. you had to download you had to download towels or elevation date. we had to identify where you're going to get them from many of them were available sometimes with a log in so if you went to u.s.g.s. as you might have a log in and then they would send you an e-mail when the it when the files you have requested had been packaged and. could be downloaded more of available for download so this would be the the typical system there's you go possibly to a web interface you would choose the finals that she wanted you would request them. and generally you need to log in or some kind of e-mail interactions that you would give your e-mail address. you'd be sent to challenge to reply whether your of a boat or a person you reply on the person and. and then you'd be sent to a link from which you could download the data but this is now on all know the ws says full elevation data at different levels of all for resolution and it's it it's the air we don't. in the elevator park each they do document to the provenance of the date when it was observed and what the quality of the teacher is this is something one needs to keep an eye on when working with online data sources but in this case if you if you ask for this in this case were using the spatial. the coerced spatial s.p.v. version of the the new light rail tracks so you get a bending box around like a light rail tracks and that gets pushed out to the server says ok this is this is the the the the air you want this is. zoom little did she want. and off we go. and it will then go to the crowd and say ok this is this is what we need this is what we pull in and when this is when this is read in this is red in as and the rest to layer from the rest of park each the rest to package builds on the rest of representation in and. pete and also uses for more classes the rest to package was written at the end of the twenty zero zero two thousand and eight two thousand nine two thousand and ten. and the also uses the us for crosses the same as in a speech. we did right in the second edition of our book that we really hope that the rest a book comes out soon it still doesn't come out it would be really useful good robot humans who wrote the rest a perky cheese very busy. is done an awful lot of work in modernizing the park each. and. you have to have you have to be able to do the other things you do and he works on on. and. the crop robust this and so use worked in the philippines worked on potatoes worked on rice and things like that so that he sees a working field the and three. you feel to colleges and so writing a book in addition its use is something that just hasn't happened yet. but this is a formal class we can we can look at his representation as a special grid data frames as the spatial representations the data frame and once again you see the data but there aren't lines as the world with the light rail but there's a grid which is defining the g.. from a tree in this case the grid is quite simple because he's just saying new what is the is south west grid centrepoint coordinates of the socialist grids and aunt how many grid cells are the each direction and what is the step what's the songs. so good the data with the data frame with with the observations of the innovation and the grid defining the geometry the bounding books and the approach for street. you know we can. it's the slopes but hopes for which we come back to this on tuesday. i sort of culture attention in a grades not updated for approach greater than or equal to six.
and we can display we can display the year the the the object hear it probably should actually have have changed the the representation all from. the.
should have changed the representation of the the.
from new from.
here was using topple colors and probably should have used to rain colors so you're getting blue were just blew it seats each should be to green so dark agree but you but you can see the the once again we we we've got to weave we've got the data and the the initial warnings before.
we get to got to the warnings of the project which there are lots of them. what a few was saying was that it's quite difficult for me to represent as much data as this could you do if you really want to show all of the pixels than shwe than good tell me to do that but otherwise the world is decimate pixels the number of pixels which beam being displayed.
one of the the consequences of these approach warnings will would be looking at them tomorrow morning is that the is that the data might perhaps have been offset in relation to where they should be in register on the on the the web. yes.
or. i. an area in. yes with qualifications so that the there is a d g grid are park each which provides not only for hexagons but the mixture of mexicans and pentagon's to give are complete obel coverage however the. the current status with regard to whether you can treat those as a vector object or restaurant jacked is unclear for for obvious reasons. the move to a restaurant just to most often oresteia a raise in in-depth because they may have four dimensions as the next x. and y. dimensions the time dimension and the attributes to mention so you could be measuring using different instruments on on and on. the satellite. the we're not we're not there but the d.g. grid these is the eighty's sees is somewhat some which look the could you get back to that after we turn off the screen streaming at eleven o'clock and then they can change my screen and look for the package or you can have the park you choose. but there are a number of possibilities like that. ok so the rest of back each. has been widely adopted and is a fairly robust way of represent representing the representing data. and the. one of the other thing a but it's also base that that what what what rested does in particular is to say that ok the the google library and our google has a.
as an opportunity off offers the opportunity for the reading not the whole restaurant but for reading chunks of arrest so you can decide which columns in rows of the rest you want to read and you don't have to read the whole restaurant same time so that one of the things that that rest of permitted was to integrate across the line. dressed or. to generate results from a larger us to which in two thousand and eight you couldn't get into memory because your memory was much smaller. so then you are looking at it. the thirty two bit systems. then probably you want really humbling memory above a bit of a couple of deca but it's so they decide this was sixteen the but but sixteen is something i've only had recently. four gigabytes was much more typical two gigabytes one gigabyte memory and the being able to handle a big russ to ten years ago was was really hot so the thought that facility in in rust it was important and used our google to do that one of the things which has been absolutely crucial has been help from the. this year an administrator is in particular and a professor brian ripley the university of oxford who. from very early on and who first and who compiled all of the external dependencies for windows and or sex himself on his own machine and make them available so that if you're going to use our google than he would. so before the windows binary version was available in serum then he was providing them from his own server in in them in oxford and this was extremely useful well in the sense of growing the use of basic was it was absolutely killing because it meant that lots and lots of students were using. the stuff because they could install it and they were installing it from oxford rather than from see run which didn't have happened then develop the capacity to do to to do this and his sympathetic support continues to be to to be very important so that the eighty's. the rule rather than the exception that if something is going to go wrong than brown ripley will find it before we do. the does this this is it this is it's not not just for especially covers everybody's backs the like finding out that the to a fedora thirty four training for door thirty was more so strict about standards than any previous. this version and led to things falling apart for everybody. and so that the but the that that. maybe a minority interest that but when you use our you can be sure that the fact that it's running properly is is to quite a large extent the don't to the brain ripley from things like memory money each month on windows which is written him self parts of which is modified himself. i'm going going back an awful long way. obviously the new was a limited set of picture interest to drive was some of them were not available and others have been made available as time goes on so that when we were contacted by the only the sentinel team so what's the them joint research center european union and research. institution to out the j.p. two thousand the driver to the windows and doors sex so we found out her to do that using open j. pick two thousand and so that that we we have if you like fools and will support because the was interaction between the data providers and the people. once he ran who could help us with the library's we needed to to to permit google to to humble and all these kinds of things. ok so questions are rising. i've already mentioned the r.g.s. pack each the r.g.s. package was was fairly consistent in its use of of the is simple features the idea was simple features was that you define the hierarchy of classes. theoretically. and it wasn't a good idea if software implemented that hierarchy of classes and not some of the hierarchy of crosses the hierarchy of classes which we implemented to affect today to in the way in which we implement the hierarchy of classes for vick today to in a speech was more based on the. then most used back to form of which is a shape filed under shape filed does not distinguish adequately between an internal ring and an external ring so that a polygon will have an sorry an exterior ring. so it will have an exterior ring and if there's a hole in the public and that sits in interior ring. and that the. know the difference between the excess theory a ring and the interior ring in a shape file is that the they go in different directions the coordinates go clockwise or anticlockwise to define whether they are exterior and interior however in the simple features each publican can only have. one exterior ring in shape file you can have an object which calls itself the public and which is got multiple exterior rings like a collection of violence. but in s. the simple features s.f. then you have to call this a multi polycom you can call it public and our system. was inconsistent in this way is that we could have. messi is of this the first one was drawn to my attention by my brother in two thousand and four when he was trying to plot the the labor market data for sheffield and found some new mercian district to a disappearing. and it turned out that we were plotting the mercian districts in order so by number. so it is a b. c. d. and we put them that way and it turned out that to get round that you needed to put the biggest one first. and successively smaller ones which might also be over plotted by the big one afterwards so there was a lot of mess caused by not using simple features if we'd use simple features from the beginning which we couldn't because i hadn't been defined. i haven't been standardized then everything would have been a lot simpler but that simply wasn't available so we need factor standards compliance.
j t s g os use the j.t. us is the job or our original version of g.i.'s they require a simple future compliance mechanism and to do this then we had to. they create a clutch for speed. polygons objects to define which of the component rings were exterior and interior rings and if they were interior rings to which of the exterior rings did they belong. so was a mess. his face your temporal data also appeared the so the yum.
in each should is essentially be obvious that all spatial data especially a temple anyway maybe you just have one observation for each point but but but each point is observed as a particular point in time we get back to the east more morning and it with the new with the. the him. the with the yum. the examination of corner from systems the indeed no joe did just that exists would like each observed point to be given the time stump when was this point observed. and is one of the city says to really enjoys working on iceland he says. you go back a week later and it's moved. because the earthquakes their tectonic movements is so that the landscape is his his his dancing and he says this is for a job that is is this is really fun so that if you're observing a g.p.s. point we need that time stump as well. it's not good enough just to have the the the position. and g.p.'s the observations come with the times time because time is what drives g.p.s.. so. so we realize that this the the setting was was was inadequate the original publication of the ice so which at that stage was closed was the wasn't access to it was in two thousand and four and work on international standards preceded from then on so that there's a paper by an article by your career he does.
in the longer work for harry about these so that we need to go back to two to two simple features.
and the in terms of my presentation and no somewhat behind my should jewels so that they won when i break it to live in a clunker break eleven o'clock and will carry on with whatever i'm completely from the first section and the one fifteen and home. because these these foundations of the earth import quite important how many of you are familiar with the it was the data frames. so even in terms of from point from day to tables. your soul. i'll give you some basic background on data from objects in in our. assume that we was of old or complete this this little bit on first simple features in our which is where we're going to get to the room.
sort of began on the after. and sir and the polar bear and i with with through with a special issue of the general a statistical software in two thousand and fifty so we've finished the book than those special issue around.
then we took a shallow breath and said what's going to hit his next and yet we need to revisit the victors sue will look at doing simple features the support was offered from two thousand sixteen by the newly started the consortium.
and the key breakthrough was after a m. how do we come to use the was a was a program a working stiff the station working as a studio and previously it working on gigi plot and g.g. plot to the had declared the data needed to be tidy and data frames. what tidy he also said that list columns are not tidy however at the two thousand and sixteen use our conference at stanford at cern i was sitting at the side of the room holding as we usually did and certainly we we we sort of started listening to the plenary. hadley had just declared that list columns were twenty. so is it ok. he didn't explain why and he said when we've been trying in gigi plots to drama school and we have problems with exactly this. so how do you. create a tidy data frame where two of the variables of the x. and y. coordinates of the lines you want to draw or the containers for phil colors. and the way that they don't have previously was simply by a having a list of victims of the x. coordinates fixes of the wine coordinates whether pen was supposed to move from six x. one why one takes to want to take that it will get to in line with the were so you jump over the next lot which is. way to being done in its pluses one. but they were losing holes from the middle of the publicans when they were feelings that there was no way to to sort the scent and he he'd come around to a point of view which was that you needed to have a richer data structure to handle the geometry the screen. so. list columns a tidy so what is the data frame object to date from object in in our and elsewhere on his there's the same kind of structure and points than the is a list object.
what's the least object of this subject is one of the things behind much as of the success of modern programming language or even what would probably now be called standard programming languages which is that their effect go back to when i started programming which was for trend and ago the. in the one structures like these when you can see the wood looks structures like this. this list which you can grow and where one of them may point to the next to them and so you've got something which is not structured as a as a vector well actually a least he's a victim. in the regular vector all of the the elements of the vector have to have the same. you need to be of the same kind but in the least you can put it in you can put the list component can be another list or it could be an internship or it could be occur to string or could you're floating point number is a list of very flexible tools the if you look at the output of the. it's fitting a regression in our what is it of course it's the least it has a class and them but it's a list so it lists are prevalent are lots of them. victims are fairly simple. and they can give some references if you see people people need references for the the. list can be manipulated with single square brackets and you can get sent the what's inside with a double square brackets so here was stuck with victims of four different types of the the one which she's integers one through three the wii to which the letters one through three says a.b.c. and the the three. is the square root of the one which is going to be floating point number and before is a complex of the negative of the which gives us a complex picture and we can make these into into a list so we make these into least and if we look at the structure s.t.r. is a function for showing structure it we can we can get. the we can get the yum. we can see what's inside so the one is his they want to three the to his a.b.c. the three he is one one point four one point seven and we can see the same in the complex number we can access them either using double square brackets or using the dollar and so. the name of the of the list element. it's so we can weaken me can were wondering this this list but the list is. the the template for creating a vector most don't know why is a day. the data from different from a any laced the only serious difference is that the. your list components have to be of the same length. the data from his thought of as being a rig rick tangle a container for data but it is the least. we can create this by using as data frames coercing today to frame and here we see that the the the the classes of the objects remain. come quick get both of them on sometimes you see we've got into to correct a new marriage too complex what we got here is in to a factor numeric complex the strings as factors is the default in on has been since s. the because start. his titian's needed to treat a categorical variables was statistically important variables which should be populate the and not simply as just text and now treating just text as categorical variables is something which goes back to the beginning of s. and has been. inherited so that by default if you give the way the read in or convert another object to data from and it was say ok you want the e.u. wants to do you want me to handle this the series in strong character string data so i'm going to convert it to affect a factor is something. my cash table which makes it easy to to build dummy variables for instance in the model. it's possible to to set these arguments strings as far as to fall so not take the default which case we get the the representational have before but we know have a day to frame rather than the least and the data from can also be humbled in in in other ways we could also. to extend the two of these. and try to create the so as the least this is quite fine with with the with the components of different lengths but when we try to convert this to a date data from than there's an air and the era is the arguments employ different differing numbers of growth. so the. it will stop us doing this that we know the difference between whatever least under data frame is that the data from his list with components with we do with the same length of the components of the same thing. we can also access the elements which which is there in the same way that we could do with the list. but we can also access them as though we were looking at the data from his matrix but is not a matrix because it's a list. so this is a good data from is really a rectangular object. we can access things the treating them as though they were elements of the matrix. there's a further point here for which she's concerned with the drop equals false or drop because true drop because true by default since early s. so for a very very long time that is if you subsidize act a matrix or a day to frame. or two. two or more dimensional ray. then two one down to two dimensions so that if you have a three dimensional array and subset so you're taking taking who'd just one slice then it goes don't two dimensions if you take just one victor it goes down to one dimension let's drop equals true which is twofold. so that if we if we as what is than of this piece is subset which is looking at one element to it saying the understanding would be that we just want that single element so drop is true by default if we set drop is false we get today to frame was one row in one column. we could try to coerce. the data frame to a matrix but in this case we have a character variable factor or character variable in in in the date for him so it will coerce all of the values to to the two were him to cart so this isn't a character. matrix if we just take the. the new the two columns of the data frame which are numeric integer murray was the third one which was the complex leaving the hunt. we get. and it we get a year the new marry me. obviously the length of least and all was for we put for things into it so it's in length is for. so what's the length of the data frame that we created from l.. it's fall because it's a list of four components the columns.
what's the lng length of. the data from when we turned it into a matrix. it's twelve because it's three times for two got four columns three rows. why should the matrix have a length. and it's affected us. as so the the answer for the new for those who don't enjoy this this level of obstruction is that a matrix is. a vector with a dem attribute of length two and a ray is a vector with the dem attribute of two or more. so this this goes back to us so huge it goes back a long way. part of the reason is that the data are organised so that moving the object from. to the s. side to the seaside. the sea uses the what's known as columb major representational matrix ease and the representation here is also columb major for trying to use row major but that's even more obstruct. so that when you when you ask what's the length of something you need to see what am i asking what's the underlying representation of the day to hear if the underlying or presentation is the least it will be the the the the the killing the list so if we ask what's the didn't have the least it's is no don't have to attribute. if we asked what the dem of the data for a maze it does but it does need to.
but it had so it's sort of pretends to have a demand tribute the matrix has to have a dimmer tribute otherwise is a vector. so that if we look at the the the the this the coercion of the data for into matrix then we see that it's a character matrix of three rows four columns and it has a dem names attributes when not if not why not showing the to do much. their own names of the data from this is also been modified his time as go on gone on originally all data frames had fully enumerated real names which if you had one two thousand it didn't really matter they took up a bit of space but not very much but if you got one too. hundred million takes a bit of space and perhaps it's a relevant so the new about ten years ago are said that if the row names are just the in to choose from one to end than we store a marker saying that to generate them on the flight if you need to. here you can the you can change the names of data frame can the dup from the we can look say here are the attributes of the data frame with no change the names to a.b.c. big a b b b b c and it has a class and it.
isro names one one two three if we look at the matrix than we see that it has a list of two.
the police to two attributes the reason. why they're not. was this was sort of this was just to showing the. the what what's happening here is that as t r is seeing that this is a matrix so i in code the information in the dim attribute in this description here and then only displaying the other attributes which is present but if we asked.
attributes of of this object and we can see both of the attributes the dem attribute and the demands of tribute. the men one of the possibilities is too is too.
the address from different link back to us by inserting missing values missing values and not available. no because it's important. the. how mention tidy lists now and then will be fresh start and seth a robust one. so what is a list column. so here what we're doing is we're adding an extra component to our data frame adding extra column to the data frame which is a list and this list contains one the floating point number one character n one logical value. and if we look at the structure of our day to frame we've got the dates from we have before. get good things are there and we've got a list. now. putting this into a regression and saying ok so that we want to regress. a on he is going to lead to make him so don't do it. but this column's. how valid they've been valid since forever it's not immediately obvious how to say to write the married to a calmer separated value final. spread sheet. in some settings that should be ok in others it might not but how do you know what formatting constrains to put on columbia he you don't really so so that there are things we list columns which are iffy. but list columns are completely legal. and that's where we go when we go to two to two s.f. as i said that the user meeting in a plenary hardly come said the list columns are tidy so from their own. off we go. the calls stopped the streaming now.


