Bestand wählen

GeoCouch: A distributed multidimensional index

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Erkannte Entitäten
fluoride um welcome everyone I think it's time to get started so it just wait until all people moved in and so this to becoming cast so all right so so uh what come to my talk about two year colleges distributed next them well good about me I'm for occur I most occurred in Erlang telescript rust Python I just love open source
I you coach full-time and other whistles produce of example chairs lens or met Crary pool of you have heard of some that carry before um I've said news so still not as an I credited them started at the false which in 2009 in Sydney and the root bed decided to retire the project so from today on its an abandoned project and I will send all also in the mid to the mating in this about alright but and let's get started with neutral talk I'm delighted will know what
what what would you use them dimensional index for about 3 years ago someone approached me and said he has accepted and light imagery from the European Space Agency and for him it was really about only storing the metadata about and not the actual images but it was already quite big it worked well with this post as database but he knew that with the new flu donjons in the coming years that he would get into trouble and he really looked into like he said to me in this be fast and asking me what do you mean with fast because that depends on the database what is fast and he said the crew should return in less than 60 seconds um it was surprising to me that it he got it that's smokers but basically so satellite imagery uh um is more exciting than I thought so of course you have 2 dimensions so that 2 dimensions are the area the images but then we have of the dimension which is time and the interesting thing about In this case of all the time is is not a point in time but its entire range because the 2nd guy takes the image and as it flies it has a start and end at that time but you will see this uh and they draw then you might expand it to more dimensions for example you wanted the diver fictions so set you guys can either fly ascending or descending our from pure and only tried this so might only care about us innings Sytos by Dimitri people all and finally the trick in number no if I was on certain tracks and you what might want to have a the 1 run from 1 a set in the light it is so this of a college based those who have been to previous talks from year that the force would use um I often really talked about most about what called phase of Scotch TB pitching coach be what's the difference between them and so on but this year I was confident culture phase the reason is that the distributed hard for due doesn't get work with Apache coach to be but I certainly plan that do this for a catchy cultures well because the picture petticoat you 2 . 0 will also have a distributed version of it so I retention committed for this as well um college phase we quick what is called a document database but to more of be more specific I call it distributed in-memory key-value store with persistence and indexing that's minibus words but what it boils down is to distribute it so you have fit the data are spread on several machines in memory means if you use college basis should have a lot of RAM and it's really good for no money for ideal cases you truly have your whole dataset in memory you on your Service you don't have to because there's also persistence um which is for 1 thing is if a cold can't hold your data set in in memory both fetch the stuff from this the by and of course also if server crashes and up again you want to have the data you uh good from this but we will mostly concentrate on the indexing in this talk obviously them 1 thing we I think is really finds is the usable power relations so you have a nice web interface and if a cluster and all you decide OK it's only have so many more people accessing my side I want to scale up you just add a few servers click on a few buttons click on OK and then it also distributes everything and then after the peak is still carries the servers again click 1 Batman out so this is basically what I think it's a big strength of of culture but let's get back to the indexing in then upcoming version of college this culture 3 . 0 we changed a lot of things in the background so previously the indexing happen from the phys understood it added the persisted and with the new words and we get it for TCP which means you can itemset that just got stored in memory can already be indexed it also opens the door for a bright future because the the currently the indexes are on the same nodes as the data this problem is that it is hashed efficient which means for the indexing when every query you cluster you need to ask every single node to get a response back and with the city there was up for in future versions we could make the indexing on separate machines so can them scale-out in Canada the your data and Unix but that's really in the future so I I guess support and won't happen soon but this for the future with eventual i mean the indexing is kind of like with in that materialized views in relational databases so that Don up that automatically you might have a trigger of n um so it is not that you use a document into college best and automatically in that in that and and that's in the index you could either trigger them automatically like in in certain in certain um intervals or you could say OK now I carried and cured you want the most recent data and then the next starts the next happens the charge function we have a method using these and what I call spatial index which is the multidimensional thing um and how this works and everything is what I talked about now space media account that of a so you get a feeling of hold feels like and what does and so on the so it's about a database to store things into a song and this might be 1 of those satellite imagery metadata documents we have some area which is shortened it a bit you have as a mention start date in and out for the image and all the other information so now you want to build this five-dimensional index the as a said is a simple task of function this function basic it's applied to every document in the database and the doctors exiguity at that's the document and you have a custom function coach space which is called the MIT and this basis has put this into the index you supply key and the value in the hands of index for example the start symbol with 2 dimensions I want to have a polygon in your index so that a courier on the area so you say you name it as a key the area and you find so we currently don't have about the value for now the now it of course you want it is should get more exciting so we won't have more dimensions as I mentioned them in the index over beginning the want to store the start and end at 1 imitation of this is being numbers but the cool thing about this is you can use the full power of JavaScript for example just transform your eyes so they into a Unix times so this is what this function does it looks complicated but the edits while chose works with states so the best API but this this just transforms it from a you in the eyes of a to it they do next time step and then at the bottom you see at the key you have again the area and as the 3rd and of that as the 3rd dimension you just ate the range f for a date the it
is the more complicated now say that direction you want to make again is number so I say it's ascending and 0 else and so on and the track number it is already an integer so you can just read that so you're done you have your five-dimensional index no by auditory to query it it's just a
simple HTTP API and the finest grade each in and range and this is the example I want to today's images from Oregon with from ascending satellites all what you do with this so the 1st recorded work the longitude and the attitude and the time is that the duties times and for today and you only want to have the ascending once for your start and the end of the query is 0 and you don't care about the track number so the knowledge space is like a wild card the n the result of the back with like this
so the beginning had the key demo and them which returns the key that was indexed with so we've emitted the geometry and what does the bigger automatically Cutlass a bounding box was in the index and then blows up the key and the celery at the richest have put anything put in in novel as you can see and you also get the original geometry M. another example would be you know interested in this is images from track number 1 to form so you don't care about the area so put just put in wild cards and you say I want everything from the beginning of the year till now so the have open range in the end I don't get cable the direction and Hollywood really care about is the track so just put it in the end and this yet would accept the return what what you wanna now I talked about what's
really cool of all the whole staff and this I think you need to it so um but for other databases that do two-dimensional indexing but normally it's only on points what I mean with it we store in one-dimensional Mr. appoint another range so let's take this as a tyrant they have some document at the bottom the document and it has to the time range 3 PM to 6 PM above it you see Crary's and you might think Rocha related with use just all the document twice a stored 1 time with the start date and 1 time with the end this works in a few cases so ago for fall onto the top of my 1st period spends the whole thing so if you would stored as 2 documents would just go get back the 2 documents that we could easily duplicated on current it also works if you just hit the beginning you work at the top we went back with sign same for the query at the end of the problem is this query at the top you would have accrued which is in between the range and if it's just all the start and the end you would get the results back but with my index to get the result that because we really curry on the whole range the implementation wise and is actually simpler than you might think it's not tube exciting bits justice of object and is just a generalized toward i in mentions ended them the interesting thing is in literature you hear about our mutations in was funny because just 2 days ago in a bar e I talk with someone about it and the number you normally read this and it's more than 6 dimensions it doesn't work well in 1 we both had haven't had a clue where this number comes from but where k readers always 6 so there must be 1 original paper that says after 6 images is slow then there the this event is called the cause of dimensionality so it means if if more dimensions in extracts get so complex that the overhead of creating it is more than just a simple scan through all the data and this is that the the ends of the octree but as I said if the number 6 dimensions as great always might be 10 dimensions and all this is something if to try out but there were grounds for example what you could do it similar to what for example it is listed search of also does is used just define a certain number of dimensions re between x and if you have additional interviews with a filter on just filtered out on curry time the this is something that is not implemented yet but I definitely want to implement it so this already proves that into really small set and then Crary in their filtering out after should pretty fast now of course we cannot have a performance them I say this probably every year in much you coach talks expected to be super law and I have an optimized idiot the so um you've read you benchmarks but I guess uh cultural lose probably um I'd doesn't want to work on it and this will be my primary focus for the for for the commute time and element the I quickly want go into it will what what ways yes to improve the performance so basically fog trees there 2 ways of creating Lester Pollack loading in what I call a single inserts which means either you know what it is upfront and you can just build during is out of it or is this a dynamically updated and you did it just comes in and you inserted air that into a tree problem with our trees is the optimum and loading of arkoses an NP-hard problem for those who are not computer scientists this basically just means is to get the optimal result would be a performance close you can't really get the optimal result the them in in the 4 so as a sensing but and so still inserts the problem with them is it depends on the insertion on solid so you have a data set that is as an input of point in Portland and a Francisco and now audience at 1st the points from Portland and then use it all the points from so Francisco the tree structure will be different from when you insert 1 from Portland 1 from services go what reported and so on so the result will be a different index so all secret high will be different spiraling as advantage you have an a priori knowledge of the data so you can focus on if you know that the performance would be better if I 1st insert the Portland data and then Cisco data just below like this so therefore all the way to go is barcoding Unix problem is that he of course need to know your data out from but normally if if analog occasion of course the user adds new stuff we need the than against us but this a solution to it and I come to the future
which is what I want to build them 0 sorry and and so the futurist doing only thereby live building stuff it's based on a paper so I want to base it on a paper called sort based very adaptive loading of our trees such as type this into google and you will find the paper and I won't go into much detail but it's really it's a really nice scientifically made paper um was a together with 1 of the creators of the our star-tree so it's really solid science and so the future is Ellis and Alice M. stands for lock stock to merge and what it means is you create the variance of data 1st offered in memory and then invited to this and then if you have it on disk but they don't you just there so when the atoms are just them this began B submitted create kind of on this like signifiers whenever you have too many of those just merge them again and build them up this means you can always use spike going for it and this should be really good for the performance and also the bio argue rhythms Fotis away simpler than the uh died in mimic what's for and um so there several key value stores that already do realism staff um this equal the upcoming sequel in for for example and that is a particular source code because I don't want to start from scratch there's no point of building and emissions scratch and another into it but the code base is more on this bit coupled with the other stuff so some a good fit than their stability b and this Roxy Roxy B is a level GP fault from them Facebook and the big difference is a level the B is developed in Google Source mode and rocks TVs will consult the difference is if a look at the Committee's reformers in that of the PCA just commits with release this means this release this so every few months there's release and that the Committee approximate can receive any comments you see the code review so it's written OpenCOM unit you can control could be chemical requests assist also so I would want to build my nature Carterette words and based on the uh books DVDs because what they also do is they more make the database as a framework so they have several implementations for the actual storage and they have order already 1 for in memory for all and is it can better just to pass those things and what I will try to do is I just replace it with them some spatial and excellent occupation I don't know if it really works out um I hope so and yet this is what I'm trying to do and finally um I have prepared um binaries but then found out yesterday that is about so fixed about but they are not really binaries yet and I just thought well if I discover about just 1 copy for my and 1 day before my talk is better you just put the scene from source tree like that that that the box and you want him to the binary so just go to this address it could this instruction hotter because the from source should be really simple if you can't build it come on the IRC channel or drop many e-mail I will help you because that's the logo which would be easy to build the whole thing and it should be just a matter of fact and make a new that all right thanks for your attention few sold in Christians and so
on I'm sure lies in the reason why you are approached the pumps area as need to build a multidimensional that's because you're basing of your core bring this into a key-value store which has 1 and X 2 key obser therefore you need to compare all the ball the nature of your query into the key as opposed to say was seen correlation very stable artist about any other index in which you have fielded structure each once you have latitude and longitude and x at time index and that track ID just simple matter of inertia so yet so um so that is the reason to like this so this also 1 1 that I'm not sure about the future is if what you say it's better if really built independent and in the CIS for data and then basically much curry time such as the 1 huge genetics um I think so I would say for of like if you want to pre-final finally 20 dimensions for Europe which will be better but this has has something I basically just want to try out it's just something I'm interested in and I think it could work well so and as a said the capable to base it on they even have a performance test with 10 dimensions and seems work well and Amber so you know so what i was we want to end up with is more like a small key-value store uh library which you can use for any database you want it of course will be tied in good with Koch face but could also use with the other database and this is just be the little BBU form one-dimensional index this is what I want for function and are there any specific criteria used for determining how to split internal nodes and so on for splitting nodes is so this is the basically what this this paper I am mention is about them it's so where does it at 1st sorts the data with the settling coffee and then a petitions the out the the the nodes in a smart way and was the few bottom-up and basically the petitioning as a states that in fixed and getting things as the smart move basically and what they do is they found a nice cost function which is pretty close to the real cost of Crary's and it's just like an incremental approach so you need to look at I think them like the maximum number of items to store and 1 leave square I think is is this is the number of items you the to a loop in book at and then you end up with them pretty optimal tree the the estimated that the bit about the keys of and the and didn't dimensions to be specified only as numbers or our geometries because the converted ask and there's Community directives this of electric 2 numbers is are is it admits strong requirement or something that the chain in concert so clearly so so so the limitation currently is sitting on it can store uh and numbers in the next and I would guess that post-date quite awhile like this and because you like the so numbers suffix and also if you 1st summer category so what's have something like you have certain colors for example which you can easily just transformed into integers and then because knowledge of colors who don't carry for something like I would have everything between red and blue but it just won't have all the red items all the blood and um I would want to do strings as well so for example that you can store names and then say I want everything from starting with OEMs going to P M is theoretically from the rhythms it's possible well known and with the bike loading each giving the easier because the problem is you would then need to have something a metric off how far apart are strings and there's something called edit distance and some of this little what it means we like if you start with the bias you need to figure out what for some the volume all 2 strings are so so so so it's not easy and I would expect it soon and um so plan for it supports all in the number so I would say but there is a great idea about it than I'm happy to here yeah did something quickly and what about the kneecap parameter interview and in a few years and that the say it is something and goal yet is culture-specific so so the difference of for those who are where all of them Apache coach to be is is culture-specific and it's does this contains things like the document ID for example some covers is those things are separate from the document any further questions all then things again the time
Demo <Programm>
Gesetz <Physik>
Vorzeichen <Mathematik>
Automatische Indexierung
Kategorie <Mathematik>
Gebäude <Mathematik>
Natürliche Sprache
Dienst <Informatik>
Dimension 3
Topologischer Vektorraum
Ordnung <Mathematik>
Stabilitätstheorie <Logik>
Automatische Handlungsplanung
Räumliche Anordnung
Überlagerung <Mathematik>
Demoszene <Programmierung>
Open Source
Virtuelle Maschine
Spannweite <Stochastik>
Weg <Topologie>
Reelle Zahl
Endogene Variable
Spezifisches Volumen
Verteilter Speicher
Gerichtete Menge
Open Source
Einfache Genauigkeit
MIDI <Musikelektronik>
Wort <Informatik>
Weg <Topologie>
Natürliche Zahl
Fortsetzung <Mathematik>
Element <Mathematik>
Einheit <Mathematik>
Wurzel <Mathematik>
Spannungsmessung <Mechanik>
Lineares Funktional
Theoretische Physik
Physikalischer Effekt
Globale Optimierung
Ideal <Mathematik>
Kommutator <Quantentheorie>
Verkettung <Informatik>
Funktion <Mathematik>
Automatische Indexierung
Ganze Zahl
Projektive Ebene
Overhead <Kommunikationstechnik>
Framework <Informatik>
Inverser Limes
Operations Research
Speicher <Informatik>
Bildgebendes Verfahren
Leistung <Physik>
NP-hartes Problem
Relationale Datenbank
Objekt <Kategorie>


Formale Metadaten

Titel GeoCouch: A distributed multidimensional index
Serientitel FOSS4G 2014 Portland
Autor Mische, Volker
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/31733
Herausgeber FOSS4G, Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2014
Sprache Englisch
Produzent FOSS4G
Open Source Geospatial Foundation (OSGeo)
Produktionsjahr 2014
Produktionsort Portland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract This talk is about GeoCouch, a spatial extension for Couchbase and Apache CouchDB. I will give a quick introduction into Couchbase and Apache CouchDB and why they might be the ideal databases for solving your problem.With GeoCouch it is not only possible to query your spatial data, but also your spatio-temporal, or even higher dimensions. For example you might want to find all parking lots within a certain area that offer free parking from 3-4 pm and have at least space for 100 cars.GeoCouch, Apache CouchDB and Couchbase are licensed unter the Apache 2.0 License.
Schlagwörter GeoCouch

Ähnliche Filme