GeoCouch: Operating multidimensional data at scale with Couchbase


Formal Metadata

GeoCouch: Operating multidimensional data at scale with Couchbase
Title of Series
Mische, Volker
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
FOSS4G, Open Source Geospatial Foundation (OSGeo)
Release Date
Production Year
Production Place
Seoul, South Korea

Content Metadata

Subject Area
Couchbase is a distributed document-oriented NoSQL database. You store the data as JSON and then build indexes with simple JavaScript functions. This talk is about the multidimensional index capability of Couchbase. This means you can index not only geographic data (encoded as GeoJSON) but any additional numeric attributes you like. Such a multidimensional query might be used for an application about car sharing. You would e.g. query for all the cars in a certain area, but you're also interested in additional attributes. Let's say you want to display only cars where at least four people fit in. Or you want one with air-conditioning. Such attributes would be the additional dimensions. In this case it would be 4-dimensional query, two for the location and two for additional attributes. Quite often GeoHash is used for implementing a spatial index, which has some limitations. A notable one is that you need to know that maximum range of your data upfront as it's a space partitioning algorithm. It is good enough for purely geospatial data, but as soon as additinal attributes like time are needed, it might become an issue. GeoCouch takes a more traditional approach like PostGIS and uses an R-tree which is data partitioning, hence you don't need to know the extent up-front. Another focus of this talk will be on the operational strengths Couchbase has. One thing is the web interface that makes administrating clusters very easy, even when there's a failure. The other thing is that you can easily restart servers, e.g. when a Linux Kernel upgrade is due, without any downtime on the full cluster. The system stays operational and handles those upgrades gracefully. In the end you will have a good overview on why you really want to use a multidimensional indexing for your remote sensing data or points of interest in your location aware mobile app. GeoCouch is fully integrated into Couchbase, there's no additional setup needed to get started. All source code from Couchbase is licensed under the Apache 2.0 License. Links: - Couchbase: - Source code: - GeoCouch:
Geometry Axiom of choice Server (computing) Building Sequel Open source Scientific modelling Multiplication sign System administrator Geometry Auto mechanic Bit rate String (computer science) Database Core dump Cuboid Form (programming) Physical system Area Enterprise architecture Product (category theory) Process (computing) Scaling (geometry) Spacetime Demo (music) Software developer Projective plane Polygon Planning Staff (military) Bit Volume (thermodynamics) Representational state transfer Line (geometry) System call Category of being Word Data model Exterior algebra Computer animation Vector space Data storage device Personal digital assistant Phase transition Iteration Film editing
Area Arithmetic mean Service (economics) Forest Green's function Virtual machine User interface Separation axiom
Domain name Computer animation System administrator Video game console Cartesian coordinate system
NP-hard Area Multiplication sign Polygon Physical law Set (mathematics) Business cluster Open set Number
Word Computer animation Vertex (graph theory)
Server (computing) Summation Computer animation Multiplication sign Self-balancing binary search tree Operator (mathematics)
Service (economics) Computer animation Vector space System administrator Video game console
Dialect Context awareness Computer animation Self-balancing binary search tree Business cluster
Addition Server (computing) Mathematics Kernel (computing) Computer animation Multiplication sign
Addition Server (computing) Service (economics) Euler angles Self-balancing binary search tree Software maintenance Mereology Number 2 (number) Category of being Word Computer animation Operator (mathematics) Hausdorff dimension Right angle Data type Physical system
Robot Volume (thermodynamics) Solid geometry Software bug Number Category of being Hausdorff dimension Cuboid Integer Subtraction Data type Traffic reporting Physical system
Building Java applet Multiplication sign Scientific modelling Range (statistics) 1 (number) Sheaf (mathematics) Design by contract Parameter (computer programming) Mereology Computer font Food energy Subset Maxima and minima Sign (mathematics) Mathematics Bit rate Cuboid Control theory Error message Information security Exception handling Physical system Area Curve Email Constructor (object-oriented programming) Cluster analysis Functional (mathematics) Category of being Lattice (order) Hausdorff dimension Order (biology) output Right angle User interface Data type Point (geometry) Server (computing) Presentation of a group Geometry Field (computer science) 2 (number) Attribute grammar Lecture/Conference Database String (computer science) Integer Physical law Polygon Expression Incidence algebra Set (mathematics) Line (geometry) Summation Computer animation Personal digital assistant Statement (computer science) Vertex (graph theory) Business cluster
and a
the alright and this time it started for my talk so my talk is about them to knowledge of rating multidimensional data at scale college based it's insanely complicated title let them at the end of the talk will and hopefully understand what it is about so I'm I'm happy to be here in the 1st few words about me and I'm follicle and I am a developer I mostly coal in early JavaScript thrust Python and the I Love open source this is also 1 here and for only the most successful project ideas to approach which are also talking about and the I wrote for a coach that's cut space there is a bit confusing because it's the company name as well as the product names and 1 important thing is it is not the same thing as a pitching coach to me and there is a similar database and has the same data model but just think of it as but when you compare my sequel to post this this would be like college to be true of this and couch TV is in this bill adopted in the enterprise world but hardly anyone in the open-source world is aware of coach phase although it's fully open source it's licensed under the Apache use lose by since it can just take out the call compiled and running as much as you want by um and then so what what is actually face so of course talking about the product and example of a company so the prior is a database is and no secret database and its so so-called document-oriented 1 which means you store your data is she's on documents humans normally it can also store for binary data because from the form inherited some mechanics from its past as and persistence form and catch the also just binary data but in this talk we consider abilities on staff because while your you ought to able probably encoders teachers strength of culture this is because many people asking well why should I use Koch phase and you count vectors can use post chairs and well if your system roads with as just use it don't which anything else so you're going to the strings for countries is when is about scaling up so you data is so weak that in on several servers distributed then you look into alternatives and then countries might your choice another thing that is really strong form of things that I see a coach this is the administration that evolves there are several different for the 2 databases these days but the important thing is how can you administer the database if some something goes wrong and together this there also RESTful API which means is it's also easy to build your own comes around so maybe I the good news is that also the intro a cost-based bond also use the RESTful API the advantage is that it is really the API not just like somehow put in so that it sounds cool but it's really use those at the ICT me sure they have tested their and yet it in the use of oral use the tools you know that I'm as you're you're conference so this visual features this is what I wrote on the coach based and when the mixing side you can just indeed users of light and the next so you can have polygons geometric corrections lines things for the rela and for the reside is only following marxist but it's might damage volume of this and that's the common thing about so I don't support any currently known any other very things the plan is of course in future something making use neighbor iterators and song crazily body boxes but this or is also a lot of problems because we would use cost this more as a data dump so really stored area it all in a new process and then on top of that it it's got the model amateur thing I would get in the data that in a bit deeper quite cool and what it means so you cannot only store you geometry which might be an area the polygon but can also store additional activities which might be for example a date so yeah spatial-temporal data and varies it's still a kind of normal these days but it is even more often so it can sell something ages so categories so for example if you use case is something like give me all of these building in this area rich have that if we all and sold improve the in which have a certain height and ball after divorce and puts it would have a four-dimensional Korea and this is what you can do with them to touch soul as a research is the core databases can really small as opposed to for some of you do a lot of things and processing within the database and if you want to be some policy would do it on top and not directly in the database alright but also for demo and this is the major the tall and that will be exciting because there was always failed so well let's see
and I've already started a cluster of
countries so we can see here is formed from the windows and it's basically stands for forests are instances around and locally owned by machine but can imagine them as 4 separate service green means they're running fine everything's good and because what I will talk about this really idea how to manage the things and widest demonstration cool and yet what areas of things go wrong so in the summers running well so we take
all the remnant of his work so that's the administration console of Koch so I don't
summarization of cocaine alright and I didn't was around and so I will just uh and show you how to get into several so I've now for running and I want to add another that appears on it and to announces that domain but um that will something is was expected so Madame application is
just showing rather of of so when I prepared the talk I was really impressed how good the Open Data Portal from soldiers they have huge amounts of data it's really great and it's a bit hard for me because was was green but I feel that out so I looked for dataset and this is what have its role within sold so I requested and take some time but the majority of the time it's
just open there's rendering the stuff so what we can see is so you can see in in the bank good then in in the number of on the side number of features so we have about 15 thousand polygons here this is my data set and it is about I think about more than roads where repair ready it should leave the area was really referred and when it started when it stops and what pavement laws and so on so this is what is about and I have for the next major screw the full so now as Italy Riezler 3 cluster as you can see here but aside
for nodes so what I do is I just add the fall summit that will let me do
this in tha inside the plasma word so I knew it and now
I that the sum of the costs of is not really included yet because they can also add for example 5 so in time and then I do and operation called rebalance which means reshuffle the data that is currently on the servers so they that is again usually distributed across the fast so now I can see we have also this year they already signed the and its current thing
again and and you will see it's again the full so noses operation nothing changed his 5 right now the problem is now we come to the problem so what is that that's the easiest just clean 1
room so it means it's red which means 1 service down really on the Admin
Console can see boat 1 service down and so now it happens wouldn't expect revisited
and you want vector for that is that because 1 service star this obviously not what you want you don't want to lose data the good news is a service configured that you always have a replica of the data so what the administrator needs to do
or what it's like to be in this to take on the lower regions visual means that I'm aware there's no this broken please take to be the replicas so you don't think over I need to confirm and
now if I request again you will
see that has the full dataset again because we have activated the replicas the problem is knowledge of the cluster the data might not even be distributed across the cluster it the the so what you do is you rebalance
again to make sure it is this readers would again but was also rose and you see nothing changes
still working time but the thank you so this was about year things going wrong what can also happen it is there another summary fails but it's just 1 to shuttle so 1 reason would you want to do uh kernel elaborate for example In addition on the server and what you were doing this and you click on
remove and then the server gets out of the custody rebounds those around do the maintenance work on the on the server and get it back again a customer of ours that huge 1 called Namibia's this thing 80 % of
the plight seconds of flies services worldwide what he does is he told me that I think they have more kind servers they can really really into upgrades really rebalance once our the system is still running they don't have any additional peaks or something in river runs smoothly further and if then that is done lens and also during the operation rebalance everything's fine so they don't have any downtime at all or any piece so this was a volunteer for tolerance and if things go wrong whatever should show you now is how the current with thing because of the next and so this and it's that and all of these four-dimensional so what just because it is given all the data if you don't have any part the disease which was pretty that was being so out the 1st dimension and I will put in a White House I think what Hans so basically it was the weakest everything but to get and you but that isn't so is the used longitudinal in attitude then we have the data and the last is interesting because it was a sure you like the original wording here I think it's something like pavement or something so if anyone knows this word in English just out was something like that type of concrete that was used of something there is some categories so in in the dataset has numbers 1 through 4 all right so now I'm very in the full data dataset just to make sure everything works it just critical again and not as exciting so example the longitudinal 1 only the data from 100 I Evans 27 the then but then I wanted everything from here but everything else so for are from so what know what happens so it's
basically like was the city so now I wanna this this data and of course you would then normally if you press the data you would do a bounding box solids and fill in the full thing and also in the latitude and and you will see this will be just like a normal volume was regressed on your two-dimensional and I can see it's only 6 thousand items and lowering the aquarium and now we must debate so for example let's say you were all you want to get the data from the street work from 2008 on but and the so the and and if you do things of course like you want only the data that beverages associated 2009 interviews the wildcards again see 1 1st of all the data until 2009 but that this is green and last but not least is the pavement type so as I said it's a category so when he makes the data with with us the common spatial indexes would need to do is you need to intermediate women numbers somehow but categories anomalies integers they can you know if you're a recall so I've matured through an integer so let's say I only want to see the for those robot with the category true pavement all of this no more exciting research and there is a difference all those with category 3 and what is new 1 and then there's a category on which you know hopefully yes alright so this was about
carrying and domestic review more dimensions but I would rather for 5 6 7 dimensions if you use more and more we want to use the just some of the system I guess because then and the only reporting to be but I haven't hasn't yet so I'll doesn't has also with 2 dimensions just to see what's happening but haven't then now I will show you again how to build up
next so does on me to create the next so now my super scary when issues and so on you encounter this is you write a JavaScript function in it could use really the full of trust and this 1 it's just apply to every document so that every Jew just used on a database is the input of the function is applied research important part is now it's applied to every other document so it means is that it isn't run on very time because this would be super slow if it would have provided items run subscript for every 5 min items that have a very this law it's just wrong ones and whenever you other document also can also and then you're Linux will be updated an orange changes will run from the function that what he's what we see here is what you always have to is the meeting point this statement which takes for all the attributes that you phenomena function 1 to use because if you have errors in the function that document on having the next and this way you checked OK everything's fine and you don't have any acceptance in front of so this is the scary part of securities just the simplistic for the properties and now an incident cut off that intraday change the font size than and but the right so the the construction is the construction time and then that where the within a seeing as in expression of bombing boxes you can also in digs time ranges so as not only a point in time because the construction wasn't as is done in time but the concession itself was a time rate so just all the time range so this is the end just the an integer of the start and the end then add another statement in it this just says the contract the construction time beginning should be smaller than you knowing its community kind starts after you finish but this often a problem in data you have just errors and models like using this data set had some days we wrong and in order to fill that out this way for them out they just want the next and now that I use the painting and this was the category and finally I made this is this a custom function from coach based so that it means putting the index and has 2 parameters the 1st 1 is an array the 2nd 1 which is the dimension so every item's dimensions except in this case for the 1st 1 so if you put energy which is not meant which is automatically it makes it and make dimensions the of it's obvious that you interested dimensions further into the construction time the 4th dimension is paving and it can also mean in Italy want and then I have written the an area area of the polygon so you could for example in the education and that if you click on a future you will see the size signs alright let's formentation things with attention few of new research and so even if this was about countries in all of of us the Christians about culture because well I also hope that closely communities so it at they're all because because systems can also talk about an approach to be all had to be or whatever you like but see he said there a yet good API or when you're looking at managing the cluster like you demonstrated when there's problems but are there good ways to have to manage spillovers and 1 I add to an API and an automated fashion so I so obviously like no human has stated that fixed the server it's always so for example this an so also the if you don't know whether interface also uses the HTTP API basically some when I clicked failover like the same thing with cult and what you also then of course to gage and what's that cluster for example and say OK when I find out that the node is responding anymore I want to do a field for example that of the Christian and yes so you may have something as simple as like a crime Java something monitoring system yet do you have any and wiring tools already in place so we don't have any monitoring tools where we have a building golden future it's called out of the lower so for this case city when something goes down in the clusters always style when fulfillment of particularly but that's very is deleted because it can only follow all fatal 1 I think the supported this that normally if something goes wrong something goes down you really want to have enough food control what you doing next is a good lesson also had just pick up and so the data is just going to be the sum of all known so I think is you check every little 20 seconds something in this is that he got in those 2 missing as it was known and it has been again but that have thought of lower as I think it's important just 1 the notification but this is also a feature that you get an e-mail notification for example if something goes wrong in the region yeah that deal with it yourself modern tools and is also often is very fast and this this specific so how and failures so therefore we set but to give a short review of the minimum wage rate was used as part of the typical the it thank you I was wondering I must extend to a welfare geometry presentation like while I polygon curves maybe 3 maybe so but but have geometries can you store evidence so the jury type this every duties geometry type that exist I like polygons what fully the polygons points lines strings polygons and that but the buttons and geology collections but what really is the case and to the deeper is than they indexed only with the ball boxes so I really don't do any anybody section so you might get a so if your for example that good need an example would be Norway for example I guess is the body was by the and if you then good very in the body was the system again and make model the polygon itself doesn't it but there some has little to do this of course because they want really probably completed the section of a cruise and so on but storing works but the OK thank you who any further questions but all right and in the hidden mn thatt


  589 ms - page object


AV-Portal 3.11.0 (be3ed8ed057d0e90118571ff94e9ca84ad5a2265)