MySQL & NoSQL - Best of both worlds

Video in TIB AV-Portal: MySQL & NoSQL - Best of both worlds

Formal Metadata

MySQL & NoSQL - Best of both worlds
Alternative Title
Mysql And Friends - Mysql Nosql
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year

Content Metadata

Subject Area
Presentation of a group Multiplication sign View (database) Source code Relational database 1 (number) Numbering scheme Set (mathematics) Chaos (cosmogony) Real-time operating system Parameter (computer programming) Mereology Replication (computing) Proper map Word Data model Data management Semiconductor memory Different (Kate Ryan album) Single-precision floating-point format Finitary relation Cuboid Information Endliche Modelltheorie Theory of relativity File format Relational database Real number Point (geometry) Open source Data storage device Streaming media Bit Database transaction ACID Variable (mathematics) Type theory Data model Digital photography Process (computing) Order (biology) Configuration space Pattern language Quicksort Reading (process) Row (database) Table (information) Software developer Consistency Similarity (geometry) Student's t-test Attribute grammar String (computer science) Operator (mathematics) Computer hardware Integer Home page Default (computer science) Focus (optics) Graph (mathematics) Scaling (geometry) Information Key (cryptography) Consistency Projective plane High availability Denial-of-service attack Database Line (geometry) Cartesian coordinate system Software maintenance Single-precision floating-point format Uniform resource locator Software Query language Personal digital assistant Computer hardware Synchronization Table (information)
Group action Java applet Code Multiplication sign ACID Set (mathematics) Insertion loss Database transaction Different (Kate Ryan album) Chromosomal crossover Partition (number theory) Physical system Scripting language Theory of relativity Data storage device Database transaction ACID Benchmark Demoscene Process (computing) Configuration space Right angle Quicksort Point (geometry) Trail Server (computing) Freeware Service (economics) Gene cluster Virtual machine Scalability Field (computer science) Metadata Number Architecture Term (mathematics) Computer hardware Address space Computer architecture Form (programming) Installation art Scaling (geometry) Key (cryptography) Interface (computing) High availability Client (computing) Database Cartesian coordinate system System call Query language Personal digital assistant Object (grammar) Table (information) Communications protocol Library (computing)
Injektivität Scripting language Asynchronous Transfer Mode System call Table (information) Key (cryptography) Relational database Forcing (mathematics) ACID Coma Berenices Cartesian coordinate system Perspective (visual) Demoscene Power (physics) Wave packet Blog Data conversion Table (information) Information security
OK we'll get started with the the last session of the day I'll try and give it a reasonably the short of it people can get on and
hit the bottle travel home whatever you do so I mention more than I'm part of my security in historical and my
focus there is on an interview with high variability so my skill replication mice fabric and my slope cluster which is 1 of the technology is going look at the OK so the topic of my talk is an SQL and no SQL is it possible to get about best of both worlds so people typically think I need to decide that the user to know SQL stall company use SQL under try and demonstrate that you can actually lead to get the best of both worlds get the attributes of Noah SQL stores that people like without giving up on and a relational SQL database the so this is gonna be and this is not the new information to most people but to remind people over a couple pieces of terminology for different kinds of know SQL stalls threat the 1st type in the simplest type is the key value store and the key value so is exactly what it says it stings I meant past so I wouldn't he uh which may be an integer and I wanna store some data against it says that value may be a string so it is to go to the you store value against it you retrieve the values in the same case similar but a bit more sophisticated is the document store and it's very similar in that I've got a key and I want to store this data against it bring this case be data can be richer so that typically it would be a J some and document so similar to XML ad every and if you compare it to a relational database you have things called collections which like tables but what's different is that every row in the table could have a different schema when using them a document store and so it's and very simple to use and set up because you don't have to define the schema but obviously it can also be a chaotic as well as you don't have a schema telling you were enforcing what goes into the tables and then graph databases adenovirus delve into this much gold today uh sort of this is something more specialized we stored data at the nodes and on the relationships between the nodes so that that's a bit of a but uh more special case so most people that so on the left-hand side here we can see what are some of the attributes that people like that no SQL styles so simple access patterns so applications do very often just got this data and no 1 store it and I wanna retrieve the data so so nothing like joins foreign keys et cetera that people use no scale source the usually except that they have to compromise on at data consistency in order to get the best possible performance and so you'll hear and like eventual consistency so basically once you've uh when you write a data someone else reads the data they may see something different to what you are and some applications that's OK some it's not uh at all data format you don't have to define scheme ahead of time and they tend to be very simple to operate so to get up and running as is very quick and you don't have to set millions of configuration parameters etc. conversely what still good about relational databases and those you can have much more rich queries to get joins across and scores of tables you have foreign keys you got the safety of transactions so you DataSafe as you going to read back what you've written as well-defined schemas so you don't have the chaos of a document store where the application could still completely random and data in each of and because they've been around a lot longer you tend to have a much richer set of tools so as we go there we go through this of a scorecard and I'm going to try and convince you that with my SQL into the marginal cost that you can get all of those advantages of another SQL stall while still having the benefits of a relational SQL database it that so just a slight diversion before we get into the technology just pointing out 1 of the issues that people very often hit when they think that uh no SQL store with the right answer for them and but then this is good show a couple of problems that people tend to hit so so borrowed this from an etc. may so I uh had to do this at presentation where she the and presented on a lot of this so she's 1 of the committees for uh the diaspora social networking software and for that project they speak they were going with the documents all uh and this is the process they went through and then eventually reverted back and decide they have to go with the relational store instead so this is a simplified view all the time line where you have a you user as a user has many friends that those friends have many posts the posts have comments etc. and so the relational model we all know how to implement this I will create some tables will have some foreign keys between those tables and it fits in very nicely but when people about they think yeah but if I just want to render the timeline and having to do all these chips the database sometimes do all of these joins and and so it's inefficient and I'm having to read the data through from not certain places resulted of Rose it's it's a pain would be better if I just store the whole thing as a single data entity and read it all in 1 go and so that's why the where the document store because they could model that as a single Jason doctor and so it was very efficient to read at the time line this single read you got the entire and timeline so since matter that idea and seem to fit into the document model well where sort of each document uh can have different pieces of data in it and each document is self-contained to dispatch the entire document it's got everything you need that inside the document because it's it's more complicated than that but all of these entities the shaded in in red they're actually people and those people are themselves uses those users have their own set of friends they and have their own timelines etc. you may have other data do you store against them say the URL of the home page you may have their photo so if you follow this through with the doctor model you should be storing that data against everyone of these so red boxes so basically you have the same data that's duplicated many many times over your document document model the and step it's showing the adjacent feature goes what you really have to do if you going to do it in a proper document way uh Fred is actually expand it and as well as including his name you include is URL his photo and all of these post etc. so and so then you end up having a lot of duplication of data and and sort of huge amount of storage and they're reminded some of the biggest documents stalls to get any kind of performance out of them you have to have your entire working dataset in memory then obviously when adjudicates they over over that becomes very expensive but of course you don't implemented like that and so what people end up doing is rather than storing the users within a document historic key that gets them to the to the user and they're remembering here that this is down to this is effectively the application and implementing their own foreign keys and joins but they've got no foreign keys of joins in the database to help them and have got no transactions to help so it gets and quite quite messy and then the final thing just to make it even a bit more ugly in a document store at if we go back to uh how it looks here in a in a document store your actually consuming memory for each of these effectively column names so for every attribute every time you store it you storing the string comments as well as the actual idea of the person making comment and so that's again very inefficient of memory so what people end up doing is they'll actually go further into having meaningful actually names that just over very short ones and then suddenly you're looking at this and thinking do I really have that simple data model Simple access patterns and there's that hours after and that's why that project then went into riddle of the relay at the a document store out and put a relational database patterns OK so and much more cluster so this is the database that's presents probably best meets the actually people want from SQL source so it's real time by default all data is stored in memory uh very scalable for both reads and writes at extremely high availability so five-nines availability less than 5 minutes of outage a year and that includes the maintenance operation so for example software upgrades replacing hardware etc. changing the scheme the as go to the ways of getting the data you got both SQL and non-SQL API ice and flood rest with my students is open source and that can run on commodity hardware so no
need to shed still address that the architecture is quite different from my SQL uh within OGP for example so we muscle cluster you have my skull service but they're only there to give you the SQL interface the date is actually stored down in these and local DataNodes and so no matter how you're accessing the data you always go to the same place and you see the same copy the data uh within uh the data when you write a rope it's written to both of the data nodes in a pair so if you lose 1 these nodes is not a problem because the data's been Cincreasin written the 2nd so what that means is that we get and so scalability and you need more storage and more data nodes as you need more throughput at more modest losses so it's a very simple and online scaling Out story cluster and all of the partitioning of the data is completely transparent to the application it doesn't have to do it and in terms of high availability every 1 of those processes that's not shaded in red could fail simultaneously and you still got access to all of the data and in your the so all those processes fail you still reading right data you just need 1 surviving DataNode from each node group and 1 surviving the API no track should be able to get the data the performance so we all know how to scale performative my scale and you add lots not to reach slaves and you can scale out reads as much as uh as much as you like where it gets tricky is when you want a scale rights and so that's something muscle clusters very good at an angle and go through the details of the benchmark that is a benchmark was done a year and couple of releases go and skating out on commodity hardware and with 30 service of 30 DataNodes were able to scale to 1 . 2 billion updates minute so if anyone ever tells you that they have to go the notes field data store because that's the only way they can get the right forms the then just ask them what are you doing you could be more than a billion updates in that the ability this 1 comes in large number as tuning another go down the time to go into the details of all the tools available but we have for example a nice also install a so it's a it's a browser-based install is telling which machines you want closer to run on and and the installer will go in query to the machines find out what resources there are and it'll come up with a nice configuration and then automatically deployed uh the database uh Kostas's so and then all of the regular mice QL tools like and uh workbench for examples of those well with us as well she put a rich and set of tools so at this point I think was scoring pretty well against what you need from no SQL databases scalable high-performance highly-available and very easy to use it it could be simpler a sort of but it's it's got a lot better especially with your to install and then the bonus of still willing to do joins and so you can have a join that spans across all of the data nodes taken a single joint growing entire datasets have acid transactions where again those ACID transactions the crossover data noted is not a highly partition the data and all the transactions a tell but this the other at tribute to be able to get the data without using uh and SQL the will of SQL at this of lots of applications where it fits very well but it is also a lot of applications where people want a
simple way of getting to the data and safe example if you've got a Java application you probably like to store the Java object rather than turning back into a set of relational tables so we have a bunch of API is system so getting to date notes the wire protocol is written in something that we call the and he'd be a kind library and so whenever anyone's getting data from the DataNodes it always goes to the C + + API and then goes over the wire to the data if you're using a SQL then you have my skull server that implements that the users that library and then you have all the regular connected sitting on top of that but if you using Java then we have something called cluster j to you using Java script on nodes that we haven't actually been used as well uh we also have a Memcached API so all these different access methods they're all getting the exact same data and you can mix and match so for example you could provision the user using SQL and then you could query that uses data using the HTTP request is they're all going to the send data an example 1 of those API is is the Memcached API a so here we're doing it in the simplest way if you can't be bothered finding any kind of schema you can have a key which in this case is the town maidenhead undervalued the store against it which is the postal code and then and so you just make that call to men cached and non-cached behind the scenes will write that too much well cluster table and you basically have 1 large table uh that contains all of the keys and all of values so that the schemas if that's what you want or if you prefer you can set up some metadata and effectively say that whenever I see a key the got the prefix of town then uh that's gonna map onto this table and sort of the the key is going to be uh this column in the table the values that we that column the and so you can map that Memcached API onto an existing schema and so built out the relational schema behind the scenes but get the really quick memcached key-value store reason right into the 1 of the nice things about when
using these uh no SQL API muscle cluster you still have all of the safety of the relational database and the native example we still have acid transactions the uh you still have foreign keys and so this is an example where we're trying to comment was removing a and I use deleting around maybe uh but with got a foreign key defined on the on the underlying table so even though we're going java scripts in in OJS and there's no SQL involved will still in force the foreign key because of foreign keys in connected down in the DataNodes and not having my skills and so you can have the API is that you want you don't have to convert things into relational tables from the application's perspective but you still have the power and the security of having the full relationship relational database behind the scenes
the advantages that this it's makes resources here and so my my bloggers custody b . com so there's a lot of stuff on my sled cluster on there and we got a foreign that's dedicated to my cluster and if you want to refinance lots and lots about these issues and a brand new and my so as the training course the more University as well so that something was going to cover that in any questions before we break not a big picture much Thapar