Bestand wählen
Merken

MySQL & NoSQL - Best of both worlds

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
OK we'll get started with the the last session of the day I'll try and give it a reasonably the short of it people can get on and
hit the bottle travel home whatever you do so I mention more than I'm part of my security in historical and my
focus there is on an interview with high variability so my skill replication mice fabric and my slope cluster which is 1 of the technology is going look at the OK so the topic of my talk is an SQL and no SQL is it possible to get about best of both worlds so people typically think I need to decide that the user to know SQL stall company use SQL under try and demonstrate that you can actually lead to get the best of both worlds get the attributes of Noah SQL stores that people like without giving up on and a relational SQL database the so this is gonna be and this is not the new information to most people but to remind people over a couple pieces of terminology for different kinds of know SQL stalls threat the 1st type in the simplest type is the key value store and the key value so is exactly what it says it stings I meant past so I wouldn't he uh which may be an integer and I wanna store some data against it says that value may be a string so it is to go to the you store value against it you retrieve the values in the same case similar but a bit more sophisticated is the document store and it's very similar in that I've got a key and I want to store this data against it bring this case be data can be richer so that typically it would be a J some and document so similar to XML ad every and if you compare it to a relational database you have things called collections which like tables but what's different is that every row in the table could have a different schema when using them a document store and so it's and very simple to use and set up because you don't have to define the schema but obviously it can also be a chaotic as well as you don't have a schema telling you were enforcing what goes into the tables and then graph databases adenovirus delve into this much gold today uh sort of this is something more specialized we stored data at the nodes and on the relationships between the nodes so that that's a bit of a but uh more special case so most people that so on the left-hand side here we can see what are some of the attributes that people like that no SQL styles so simple access patterns so applications do very often just got this data and no 1 store it and I wanna retrieve the data so so nothing like joins foreign keys et cetera that people use no scale source the usually except that they have to compromise on at data consistency in order to get the best possible performance and so you'll hear and like eventual consistency so basically once you've uh when you write a data someone else reads the data they may see something different to what you are and some applications that's OK some it's not uh at all data format you don't have to define scheme ahead of time and they tend to be very simple to operate so to get up and running as is very quick and you don't have to set millions of configuration parameters etc. conversely what still good about relational databases and those you can have much more rich queries to get joins across and scores of tables you have foreign keys you got the safety of transactions so you DataSafe as you going to read back what you've written as well-defined schemas so you don't have the chaos of a document store where the application could still completely random and data in each of and because they've been around a lot longer you tend to have a much richer set of tools so as we go there we go through this of a scorecard and I'm going to try and convince you that with my SQL into the marginal cost that you can get all of those advantages of another SQL stall while still having the benefits of a relational SQL database it that so just a slight diversion before we get into the technology just pointing out 1 of the issues that people very often hit when they think that uh no SQL store with the right answer for them and but then this is good show a couple of problems that people tend to hit so so borrowed this from an etc. may so I uh had to do this at presentation where she the and presented on a lot of this so she's 1 of the committees for uh the diaspora social networking software and for that project they speak they were going with the documents all uh and this is the process they went through and then eventually reverted back and decide they have to go with the relational store instead so this is a simplified view all the time line where you have a you user as a user has many friends that those friends have many posts the posts have comments etc. and so the relational model we all know how to implement this I will create some tables will have some foreign keys between those tables and it fits in very nicely but when people about they think yeah but if I just want to render the timeline and having to do all these chips the database sometimes do all of these joins and and so it's inefficient and I'm having to read the data through from not certain places resulted of Rose it's it's a pain would be better if I just store the whole thing as a single data entity and read it all in 1 go and so that's why the where the document store because they could model that as a single Jason doctor and so it was very efficient to read at the time line this single read you got the entire and timeline so since matter that idea and seem to fit into the document model well where sort of each document uh can have different pieces of data in it and each document is self-contained to dispatch the entire document it's got everything you need that inside the document because it's it's more complicated than that but all of these entities the shaded in in red they're actually people and those people are themselves uses those users have their own set of friends they and have their own timelines etc. you may have other data do you store against them say the URL of the home page you may have their photo so if you follow this through with the doctor model you should be storing that data against everyone of these so red boxes so basically you have the same data that's duplicated many many times over your document document model the and step it's showing the adjacent feature goes what you really have to do if you going to do it in a proper document way uh Fred is actually expand it and as well as including his name you include is URL his photo and all of these post etc. so and so then you end up having a lot of duplication of data and and sort of huge amount of storage and they're reminded some of the biggest documents stalls to get any kind of performance out of them you have to have your entire working dataset in memory then obviously when adjudicates they over over that becomes very expensive but of course you don't implemented like that and so what people end up doing is rather than storing the users within a document historic key that gets them to the to the user and they're remembering here that this is down to this is effectively the application and implementing their own foreign keys and joins but they've got no foreign keys of joins in the database to help them and have got no transactions to help so it gets and quite quite messy and then the final thing just to make it even a bit more ugly in a document store at if we go back to uh how it looks here in a in a document store your actually consuming memory for each of these effectively column names so for every attribute every time you store it you storing the string comments as well as the actual idea of the person making comment and so that's again very inefficient of memory so what people end up doing is they'll actually go further into having meaningful actually names that just over very short ones and then suddenly you're looking at this and thinking do I really have that simple data model Simple access patterns and there's that hours after and that's why that project then went into riddle of the relay at the a document store out and put a relational database patterns OK so and much more cluster so this is the database that's presents probably best meets the actually people want from SQL source so it's real time by default all data is stored in memory uh very scalable for both reads and writes at extremely high availability so five-nines availability less than 5 minutes of outage a year and that includes the maintenance operation so for example software upgrades replacing hardware etc. changing the scheme the as go to the ways of getting the data you got both SQL and non-SQL API ice and flood rest with my students is open source and that can run on commodity hardware so no
need to shed still address that the architecture is quite different from my SQL uh within OGP for example so we muscle cluster you have my skull service but they're only there to give you the SQL interface the date is actually stored down in these and local DataNodes and so no matter how you're accessing the data you always go to the same place and you see the same copy the data uh within uh the data when you write a rope it's written to both of the data nodes in a pair so if you lose 1 these nodes is not a problem because the data's been Cincreasin written the 2nd so what that means is that we get and so scalability and you need more storage and more data nodes as you need more throughput at more modest losses so it's a very simple and online scaling Out story cluster and all of the partitioning of the data is completely transparent to the application it doesn't have to do it and in terms of high availability every 1 of those processes that's not shaded in red could fail simultaneously and you still got access to all of the data and in your the so all those processes fail you still reading right data you just need 1 surviving DataNode from each node group and 1 surviving the API no track should be able to get the data the performance so we all know how to scale performative my scale and you add lots not to reach slaves and you can scale out reads as much as uh as much as you like where it gets tricky is when you want a scale rights and so that's something muscle clusters very good at an angle and go through the details of the benchmark that is a benchmark was done a year and couple of releases go and skating out on commodity hardware and with 30 service of 30 DataNodes were able to scale to 1 . 2 billion updates minute so if anyone ever tells you that they have to go the notes field data store because that's the only way they can get the right forms the then just ask them what are you doing you could be more than a billion updates in that the ability this 1 comes in large number as tuning another go down the time to go into the details of all the tools available but we have for example a nice also install a so it's a it's a browser-based install is telling which machines you want closer to run on and and the installer will go in query to the machines find out what resources there are and it'll come up with a nice configuration and then automatically deployed uh the database uh Kostas's so and then all of the regular mice QL tools like and uh workbench for examples of those well with us as well she put a rich and set of tools so at this point I think was scoring pretty well against what you need from no SQL databases scalable high-performance highly-available and very easy to use it it could be simpler a sort of but it's it's got a lot better especially with your to install and then the bonus of still willing to do joins and so you can have a join that spans across all of the data nodes taken a single joint growing entire datasets have acid transactions where again those ACID transactions the crossover data noted is not a highly partition the data and all the transactions a tell but this the other at tribute to be able to get the data without using uh and SQL the will of SQL at this of lots of applications where it fits very well but it is also a lot of applications where people want a
simple way of getting to the data and safe example if you've got a Java application you probably like to store the Java object rather than turning back into a set of relational tables so we have a bunch of API is system so getting to date notes the wire protocol is written in something that we call the and he'd be a kind library and so whenever anyone's getting data from the DataNodes it always goes to the C + + API and then goes over the wire to the data if you're using a SQL then you have my skull server that implements that the users that library and then you have all the regular connected sitting on top of that but if you using Java then we have something called cluster j to you using Java script on nodes that we haven't actually been used as well uh we also have a Memcached API so all these different access methods they're all getting the exact same data and you can mix and match so for example you could provision the user using SQL and then you could query that uses data using the HTTP request is they're all going to the send data an example 1 of those API is is the Memcached API a so here we're doing it in the simplest way if you can't be bothered finding any kind of schema you can have a key which in this case is the town maidenhead undervalued the store against it which is the postal code and then and so you just make that call to men cached and non-cached behind the scenes will write that too much well cluster table and you basically have 1 large table uh that contains all of the keys and all of values so that the schemas if that's what you want or if you prefer you can set up some metadata and effectively say that whenever I see a key the got the prefix of town then uh that's gonna map onto this table and sort of the the key is going to be uh this column in the table the values that we that column the and so you can map that Memcached API onto an existing schema and so built out the relational schema behind the scenes but get the really quick memcached key-value store reason right into the 1 of the nice things about when
using these uh no SQL API muscle cluster you still have all of the safety of the relational database and the native example we still have acid transactions the uh you still have foreign keys and so this is an example where we're trying to comment was removing a and I use deleting around maybe uh but with got a foreign key defined on the on the underlying table so even though we're going java scripts in in OJS and there's no SQL involved will still in force the foreign key because of foreign keys in connected down in the DataNodes and not having my skills and so you can have the API is that you want you don't have to convert things into relational tables from the application's perspective but you still have the power and the security of having the full relationship relational database behind the scenes
the advantages that this it's makes resources here and so my my bloggers custody b . com so there's a lot of stuff on my sled cluster on there and we got a foreign that's dedicated to my cluster and if you want to refinance lots and lots about these issues and a brand new and my so as the training course the more University as well so that something was going to cover that in any questions before we break not a big picture much Thapar
Wort <Informatik>
Bit
Prozess <Physik>
t-Test
Datenmanagement
Kartesische Koordinaten
Information
Eins
Homepage
Datenreplikation
Unordnung
Mustersprache
Default
Gerade
DoS-Attacke
Parametersystem
Zentrische Streckung
Nichtlinearer Operator
Softwareentwickler
Sichtenkonzept
Hardware
Synchronisierung
Datenhaltung
Hochverfügbarkeit
Abfrage
Ähnlichkeitsgeometrie
Nummerung
Quellcode
Atomarität <Informatik>
Widerspruchsfreiheit
Softwarewartung
Transaktionsverwaltung
Menge
Ganze Zahl
Festspeicher
Dateiformat
Projektive Ebene
Reelle Zahl
URL
Information
Ordnung <Mathematik>
Schlüsselverwaltung
Tabelle <Informatik>
Zeichenkette
Lesen <Datenverarbeitung>
Relationale Datenbank
Subtraktion
Quader
Wort <Informatik>
Kombinatorische Gruppentheorie
Open Source
Datensatz
Variable
Informationsmodellierung
Software
Digitale Photographie
Datentyp
Speicher <Informatik>
Konfigurationsraum
Widerspruchsfreiheit
Hardware
Attributierte Grammatik
Tabelle <Informatik>
Relationale Datenbank
Graph
Relativitätstheorie
Datenmodell
Datenmodell
Einfache Genauigkeit
Fokalpunkt
Quick-Sort
Einfache Genauigkeit
Echtzeitsystem
Mereologie
Binäre Relation
Eigentliche Abbildung
Streaming <Kommunikationstechnik>
Einfügungsdämpfung
Prozess <Physik>
Punkt
Atomarität <Informatik>
Adressraum
Applet
Gruppenkeim
Kartesische Koordinaten
Metadaten
Freeware
Skalierbarkeit
Skript <Programm>
Schnittstelle
Benchmark
Zentrische Streckung
Hardware
Datenhaltung
Hochverfügbarkeit
Abfrage
Systemaufruf
Atomarität <Informatik>
Dienst <Informatik>
Transaktionsverwaltung
Datenfeld
Menge
Rechter Winkel
Client
Server
Schlüsselverwaltung
Tabelle <Informatik>
Subtraktion
Zahlenbereich
Term
Code
Demoszene <Programmierung>
Virtuelle Maschine
Bildschirmmaske
Knotenmenge
Weg <Topologie>
Programmbibliothek
Installation <Informatik>
Cluster <Rechnernetz>
Speicher <Informatik>
Konfigurationsraum
Transaktionsverwaltung
Architektur <Informatik>
Protokoll <Datenverarbeitungssystem>
Relativitätstheorie
Physikalisches System
Cross over <Kritisches Phänomen>
Partitionsfunktion
Quick-Sort
Objekt <Kategorie>
Computerarchitektur
Tabelle <Informatik>
Relationale Datenbank
Umsetzung <Informatik>
Wellenpaket
Web log
Atomarität <Informatik>
Computersicherheit
Kartesische Koordinaten
Systemaufruf
Demoszene <Programmierung>
Forcing
Perspektive
ATM
Injektivität
COM
Skript <Programm>
Schlüsselverwaltung
Leistung <Physik>
Tabelle <Informatik>

Metadaten

Formale Metadaten

Titel MySQL & NoSQL - Best of both worlds
Alternativer Titel Mysql And Friends - Mysql Nosql
Serientitel FOSDEM 2015
Autor Morgan, Andrew
Lizenz CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/34443
Herausgeber FOSDEM VZW
Erscheinungsjahr 2016
Sprache Englisch
Produktionsjahr 2015

Inhaltliche Metadaten

Fachgebiet Informatik

Ähnliche Filme

Loading...
Feedback