An Rtree index for RocksDB
Formal Metadata
Title 
An Rtree index for RocksDB

Title of Series  
Part Number 
8

Number of Parts 
193

Author 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2016

Language 
English

Content Metadata
Subject Area  
Abstract 
This talk is about implementing a Rtree on top of RocksDB, an LSMtree (logstructured mergetree) based keyvalue store. It will give an introduction about how RocksDB works and why LSMtrees are such a perfect fit to build an Rtree index on top of them. Finally there will be a deep dive into the actual Rtree implementation. RocksDB is a popular keyvalue store by Facebook (based on Google's LevelDB). It supports key and range lookups and basic spatial indexing based on GeoHash, but not an Rtree implementation which makes multidimensional queries possible. Those queries can combine things like location and time, but also any other property that can be represented as a numeric value, such as categories. This makes it possible to query e.g. for all flats with a certain size in a specific area that are not older than a few years and have a balcony.

00:00
Software developer
3 (number)
Database
Valueadded network
00:22
Implementation
Pay television
Open source
Sequel
Multiplication sign
Range (statistics)
3 (number)
Price index
Database
Computer font
Network topology
Website
Extension (kinesiology)
Physical system
Computer architecture
Forcing (mathematics)
Projective plane
Interactive television
Data storage device
Range (statistics)
Bit
Type theory
Computer animation
Query language
Point cloud
Key (cryptography)
Quicksort
Block (periodic table)
02:02
Presentation of a group
Source code
Execution unit
Range (statistics)
Zoom lens
Curve
Database
Price index
Water vapor
Neuroinformatik
Medical imaging
Personal digital assistant
Core dump
Set (mathematics)
Finitary relation
Local ring
Physical system
Theory of relativity
Block (periodic table)
Software developer
Staff (military)
Bit
Arithmetic mean
Data management
Process (computing)
Symmetry (physics)
Phase transition
Order (biology)
Quicksort
Curve fitting
Spacetime
Point (geometry)
Sine
Addition
Open source
Computer file
Apache Cassandra
Division (mathematics)
3 (number)
Control flow
Number
Goodness of fit
Latent heat
Profil (magazine)
Googol
Ring (mathematics)
Energy level
Data structure
Form (programming)
Key (cryptography)
Interface (computing)
State of matter
Ultraviolet photoelectron spectroscopy
Cartesian coordinate system
Symbol table
Graphical user interface
Word
Uniform resource locator
Spring (hydrology)
Software
Personal digital assistant
Network topology
Universe (mathematics)
Table (information)
Family
Window
Greatest element
Building
Code
State of matter
Multiplication sign
1 (number)
Set (mathematics)
Function (mathematics)
Mereology
Arm
Cartesian coordinate system
Permutation
Facebook
Heegaard splitting
Order (biology)
Mathematics
Semiconductor memory
Cuboid
Endliche Modelltheorie
Pairwise comparison
Chisquared distribution
Curve
Linear regression
Real number
Point (geometry)
Floating point
Open source
Fitness function
Data storage device
Range (statistics)
Twodimensional space
Open set
Fluid statics
Computer science
MiniDisc
output
Website
Right angle
Block (periodic table)
Freeware
Data structure
Resultant
Electric current
Functional (mathematics)
Freeware
Floating point
Vector potential
Bit
Revision control
Network topology
String (computer science)
Spacetime
Integer
output
Quicksort
Patch (Unix)
Database
Division (mathematics)
Number
Computer animation
Apache Cassandra
Key (cryptography)
Cuboid
Local ring
18:42
Electronic data interchange
Computer animation
State diagram
Meeting/Interview
Multiplication sign
19:01
Curve
Complex (psychology)
Building
Electronic data interchange
State diagram
Weight
View (database)
3 (number)
Set (mathematics)
Benchmark
Dimensional analysis
Neuroinformatik
Query language
Data structure
20:14
Execution unit
Computer font
Equaliser (mathematics)
Projective plane
1 (number)
Set (mathematics)
Database
Mereology
Benchmark
Rule of inference
Product (business)
Expected value
Process (computing)
Lecture/Conference
Meeting/Interview
Computer worm
Proxy server
Form (programming)
Scalable Coherent Interface
21:22
Computer animation
Lecture/Conference
3 (number)
00:08
we and good
00:10
afternoon so welcome to the user session I think it's going to be very interesting and we're gonna yourself developments about spatial databases so even a about a
00:22
different and no sequel tidings and also a little bit of a cloud architecture so I started the 1st speaker for Commission would that there's no way that a lot of introduce the intervention really so always being collaborating in a lot of open source projects but is probably best known by and spatial extension for a college city so I'm going to give daunting and not be here sitting controlling that that thank you type very everyone and I'm happy to be here and it's kind of a premium because I spoke in the last 5 force which is about you couch this time I won't speak about you coach but pretty similar and the talk will be the technical but still of some sort of fun I think so like interactions so I skip this and go straight to the topic so it's about an outline of the implementation for exterior and I start with the rocks to be because I guess not everyone is aware what is so rocks be is to use clearly storm and kid stored is a system request can store data and what they always support is something called key lookups which means you store for example is a billing system and then you store your invoices and therefore reference ID and whenever you want to get the um the voice of the system you just as the reference ID and get it back
01:55
many keyvalue stores than all of them but rocksteady does support range queries so let's say you store whether and you store
02:03
every day the temperature you can say give me all that images from the past week and those committed sources are often a building block for bigger database systems so for example if a huge dissipative systems and I think it might be slow molecular that stores sexually and of course the question is well them and make plenty of committed those all that y is rocks to be so great it's the real consoles restrict with real open source I mean they don't just publish the source code and don't care about the comes unity but what they do is they free open development so you can even see via all internal code reviews when they do changes you see all the changes they're doing and end you can also contribute yourself so even so you can make a change and it they were murdered it can even evidence that breaks this system which I did in the past week but they since then just revert the change that was defined again and so there and they have continued as they have in rules and but also companies and was what was the most surprising to me was there last year and they this started their contributions from Microsoft and what they did they make sure the rest of the world we well and was working well it's not about on about compiling which basically most produce care about but it's also report the performance so for example units the changes that cause performance regressions windows Microsoft makes sure to fix those issues played performs well again we note and as I said is a building block for future databases so and it can already be used as a vacant for mice trail from 1 would be and a couch phase known as a full text search system and there's also pluggable so that you can also use rocks abusive maker and fast and why is it so fast so the main thing about this why gives such a performance is the others as in the stands for like blockstructured merged 3 and it was originally described by a paper in 1996 and at least as far as I know it kind of got a bit forgotten for 10 years and 1 occurrence is then in there will be table paper with described that they used this data structure for the database and then also Apache Cassandra them I took up the system and this was originally created by perform Facebook then in 2011 Google published that will be which is again a key value store and they use it it within Chrome for their and offline source stuff and then face came and for lose ability being created rocks together so you can see rocks to be as an improved version of level and it as as a previously the promise that 1 of the caudal but also in the community so previously the level DB development was in that open is getting better but in the beginning said wasn't that all all right so what is a low started which the data structures I mentioned and the portal paper talks about its managing treelike structures who will do something else so therefore became because sentinels rocks to be used the thing called is table in a sense for sort of strings which means you have flat files and within the file you always only store the key and the value and next key and the value of the prefix by the size of so can easily read it and and you can that also rule right of 35 and this so you really that on this and but there's more and is fairly examples so this example I will and you see potential as a stable so there are still in the just small boxes and we wanted data in there In this case to keep it simple and the keys Iines reductase numbers and we don't care about the value so far I had the number 8 and let's see if it fits in yes if it's in so we had on the edges so that everything fine no with another number and always find out all well the the 1st assessed was full so we just figured out and sort the results because as a further assistance for sorted strings so we sort the result and try to merge with the next level the next level is empty it fits in without of society next we insert another number of 5 fits into the slowly get means another thing it does is it in the sorted no it's interesting because the we try to merge and again it doesn't fit and sort the results again tried to merge with the next level it fits in and red so with a true so I think you get the idea this this whole heart goes on and now if we were like that simple complicated to just get some sort of thing it's step forward to simpler so what are the advantages and did avantages before you become into the details about it is and then the key part is that the data is in this in in the steps in between is always sort and emerging such for the data is very fast and efficient by getting an example and so after all those examples to hopefully get you like why this might be a faster system and other systems so in this case we have again those numbers and now we merge all levels of the and from the input so the top part is that it's the 1 level the lower part is the other level and we want to merge them as you can see that they already sorted useful there always sorted no inwards and so we compare those numbers it's a 1 a form and we just take the smaller 1 of those and is isn't it and this is our results the small value now we can move on on the input and compare the other 2 numbers so we now compare the 3 of the 4 we use a smaller that is the output and we move on so now we have the form 5 this time from the other input the value smaller we take into the result we move on the bottom and especially how you move on you always compare the values used a smaller 1 and give further as you can see this kind of a screening process so you don't really need if you implement it you don't need much memory you can really just move on instrument fashion and efficiently read all the data and also stored on disk all right now you this is this merging superficial what we also get is that those files never change so once you've merge the data with a static dataset instead he did such that it is always great for building up in extraction and in our case you build a about trees and Pollack loading such an offer from bottom up this way faster than inserting into data dynamically with no in place updates which is good because if they my divided the software you don't all right the data but it's just some data might be wrong at the end of the file but not in the bill president overall but the fact as I said the sorting is very fast and of course which I haven't shown is if you delete data you need something process but during this continues merging all the time you can just clean up the staff during this merchants so now we finally come to the DuPont of it because this was this is easy to show with this number and you can easily measure helpless 4 numbers because of flowing afraid of but the difficult thing is how do you sort twodimensional data and this thing called spacefillingcurves there are many
10:23
different end 1 1 of those is the set water fling curve um and I use it in this case because it's a very efficient 1 is used to compute it's easier than others but it's still a FIL usually quite results so in this example we see we have some data skid around some space and but for now we don't really get there we just look at the space that yet so the question we not ask is what flows sort all is it have left to right is at the bottom is it's so different and presents question we look at 1st only the space so if just some empty space and there some data in some now we divide the space and you might know already get did here so now we have 4 quadrants and we draw a line in and now it also can see where the name comes from those ordered because as we know draw those design we we can easily navigable squadrons and have a lot of so if we now look at but they don't always based you can see that you can derive an ordering so in our case is where a is from the location where a is smaller than the c is smaller than the B and this way you can easily sold geospatial data but now what happens if you add another same quadrant that you could of course it well it's is the same as the sine qua resource d equals p but this is not very useful but what you can do is and that's another nice thing about this is of a coffee Split the squad and again the same thing and I can see that the is smaller than b with specific interest OK now we come to my family examples and because so this is already really like and some computer science stuff it was so word that so far was more less an introduction into computer science data structures and so for those who are into the spaces of 40 wasn't that exciting but I come to thing which is interesting because it is a problem that I haven't thought of when I wanted to create and that Eliza much was a lot because recently I think 3 years ago there was a paper from that a university in California and a date and paper on as symmetries and solves the problem and it's just mentioned the small sentence in the permutation and this really mentioned the paper but I think it's a great idea and I want to represent them and what they're doing because the problem is how do you merge that if so that the thing how do you merged profiles I 1st again describe the problem so you have 2 sets and what merchant you can now of course them mistreated previously but the problem is now they have only a local sort order so can say within those 2 datasets novel that sorted but how what is the relation between those data sets is a is 1 of the oldest speed because than it's hard to tell 1 solution would be to just create another space around all your data and sorted again but this obviously is super inefficient because he would always need to resolve the data that's not what you want to go then so the and this is Stacy water were set so that phonemes the and the sorted there the locally and you would need a lot of free competition the solution is to to divide all the spaces the model the space is infinite how do you divide an infinite space the good thing about computers is almost nothing is infinite and so this solution um to this end and I'm pretty said they did the presentation only a week ago because at the Cold Spring I was talking to someone about their floating point numbers and so on and so on I will probably change it and won't use cities 64 bit floating point numbers for wide geospatial data structures but it would work the same also if you would use an integer so fixed point in In some ways of work the same because numbers on a computer sound emitted by a certain sites so In this case the the space is pretty huge but you just take the whole 64 or bit range for floating point numbers and keep dividing the space and so now we have in this whole space like the full numbers the computer can handle with 64 bits with the data somewhere it's so small like it is but a point but it's like there's a lot of in there but it's still smaller kindly see just somewhere there suggest the need to divide the space that occur the and look at it so a case in the 1st quadrant but can't help with and that it out sorted but what you do is you just look at the squadron divided now as the 1st quadrant so keep dividing and keeps you in and so now we assumed in the fall couldn't be so we can't we see that it is the sole so small interfaces still so big so we keep doing this and in the end we assume so closely and that we know on the final division we have a sort of a so the idea behind this you look at the full space and kind of zoom into your data and but it's all data has the same said curve you can now easily merge those 2 data sets together and so this was all about their friends and all that stuff 1 reason is that I thought I would I would call this within the in a week and other developers as takes a week it's probably it reporting command and I've been 2 months and i it kind of works but not really good and so the current state is the thing I know I have a good honest enough from time books to be which turns all the intro away more more complex than I thought the it is if you can predict the so I can spend that much time on it and there's some code already uploaded but so it should compile but not to kind really carried I think really and I really want to get to be going and the good news is that always the question grasp it is is this whole thing a good idea why you doesn't say story to it and the thing is I was at the approximately made up last year in the US and they they are asked a few core developers and 1 of also said that yes he things it is a good yes tried and you will see how how it turns out so for the future this obviously would be if my changes will be merged upstream with trucks to be and it would have been supported by them and you could store spatial still a good would be I can convince those who want to be caught him that they in these change some API so I can just make an and on and don't need to change the customer for folks to be still OK would be it an easy to maintain for let's say I need some internal functions that they don't want expose publicly I could just calling them and have a symbol for it was the worst cases they don't care about it all I need to have stable but well that's OK as well that's a then it is like then all right so that's all I have from our you can trigger the cold or if you have any any ideas on that is how was off the problem we want to help let me know intentions in so
18:49
thank you very much for this interesting talk uh and all we have time for a few questions actually we have considered 10 minutes but then there's
18:59
questions yeah OK so the
19:09
question was a repeated so everyone also with stream coherent and the question was is also looking to the Hilbert curve and if I did if they were performance used and I did the reason why
19:22
you look into the curve only is that there is a paper about and query adaptive so it's contrary adaptive Saudi by loading was something and they visited the work and they compare the center curve with the view that the Hilbert curve and they said that and that Hilbert curve craze of data structure can also but it's not worth the performance of a because of Hilbert preface weight or complex a computer and the other reason is that if you go into higher dimensions the set article is still very simple to compute and because that's really comedies once you get into more than 4 dimensions that's in the building reason for for those gain any more questions now I just like to allow us to do at some benchmarks
20:16
of the decelerations on themselves so I haven't done any benchmarking because it's it's how the working at and of possibly come but I
20:26
would so from my and what I think is like the SVM is and stuff with that part is so well suited for outfits and dimensional data I would expect it is this probably the fastest thing you can get it might not be super fast but I don't think you can get much faster if it's 40 implemented was it works out because rule proxy B I think that's a really good job on the form side and the benefit is also wide use of books to be as a set is a set of equal unity so even they can expect future performance improvements and with a given for free because if they prove something the and so I had to so with person is also really run it faced productions of not like simply project from someone would really there was part of the infrastructure really on drugs cities Australia uh while working the database OK so if there are
21:24
no more questions let's settling speaker once small