Storing Non-Scalar Data
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 95 | |
Author | ||
License | CC Attribution 4.0 International: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32284 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FrOSCon 201763 / 95
4
8
9
15
20
22
23
24
25
27
29
32
36
37
38
39
40
45
46
47
48
49
50
51
53
54
59
63
64
65
74
75
76
79
83
84
86
87
88
89
91
92
93
94
95
00:00
Open sourceFreewareDebuggerTwitterExtension (kinesiology)Scalar fieldString (computer science)Boolean algebraObject (grammar)Array data structurePoint (geometry)Data structurePurchasingSpeech synthesisRelational databaseField (computer science)WebsiteBitQuicksortRight angleSingle-precision floating-point formatObject (grammar)DatabaseArray data structureMappingPoint (geometry)Descriptive statisticsLine (geometry)CASE <Informatik>Scalar fieldElectronic mailing listNumberBoolean algebraCodeString (computer science)XMLComputer animation
01:25
Source codeElectronic program guideWeightVolumeData modelSingle-precision floating-point formatProduct (business)Web pageFormal languageDimensional analysisContent (media)ScalabilityBenchmarkDatabaseData typeTable (information)Data storage deviceRelational databaseJava appletBit rateDatabaseLink (knot theory)QuicksortMultiplication signLattice (order)CASE <Informatik>Entropie <Informationstheorie>Set (mathematics)ScalabilityLibrary catalogPresentation of a groupCategory of beingRight angleHigh availabilityMetropolitan area networkProduct (business)Query languageWeb pagePattern languageScalar fieldDifferent (Kate Ryan album)Sheaf (mathematics)BitField (computer science)Cellular automatonNumberBenchmarkAttribute grammarData typeComputer animation
03:55
DatabaseData typeOperations researchScalar fieldBinary fileString (computer science)Key (cryptography)Electronic mailing listSet (mathematics)Hash functionNormal (geometry)Element (mathematics)Query languageIntegerImplementationElasticity (physics)Complex (psychology)Open sourceLatent heatData structureFormal languageDevice driverGastropod shellUtility softwareLibrary (computing)Object (grammar)MiniDiscTime zoneUniqueness quantificationArray data structureClient (computing)EmailRevision controlContent (media)Event horizonOracleData typeDifferent (Kate Ryan album)Relational databaseDatabaseProcess (computing)Error messageGoodness of fitGroup actionMereologyCASE <Informatik>Content (media)BitLocal ringKey (cryptography)Open sourceInformationSet (mathematics)Category of beingString (computer science)Library catalogLatent heatNumberProduct (business)SpacetimeMultiplication signEndliche ModelltheorieExtension (kinesiology)Electronic mailing listCommunications protocolHash functionOrder (biology)Demo (music)Array data structureScalar fieldRevision controlException handlingData storage deviceFormal languageReading (process)FlagInteractive televisionService (economics)Operator (mathematics)Device driverWritingBit rateTime zoneQuicksortConnected spaceObject (grammar)Particle systemMoment (mathematics)Insertion lossSource codeLogicConfidence intervalTable (information)Speech synthesisRight angleState of matterWordRepresentational state transferDemosceneDistanceVideo gameAdditionCombinational logicCache (computing)Ultraviolet photoelectron spectroscopyElement (mathematics)Cellular automatonData structureSystem callPattern languageDialectAuthorizationSymmetry (physics)Field (computer science)MultisetMiniDiscJava applet1 (number)MultiplicationPoint (geometry)Binary codeComplex (psychology)Query languageComputer animation
13:44
EmailTime zoneContent (media)Data typeSearch engine (computing)Field (computer science)Subject indexingObject (grammar)Interface (computing)Elasticity (physics)Normal (geometry)TheoryForm (programming)DatabaseRelational databaseString (computer science)Data storage deviceLatent heatOperator (mathematics)Table (information)Query languageExtension (kinesiology)Similarity (geometry)Operator (mathematics)Office suiteCategory of beingLaptopRelational databaseData typeNear-ringStandard deviationReal-time operating systemMultiplication signData storage deviceSubject indexingElasticity (physics)InformationField (computer science)QuicksortWordLatent heatExtension (kinesiology)Set (mathematics)Key (cryptography)Beschränktheit <Mathematik>Goodness of fitMereologyRepresentational state transferWebsiteOrder (biology)Group actionInteractive televisionReading (process)DatabaseDefault (computer science)MappingCASE <Informatik>Object (grammar)Level (video gaming)Point (geometry)Right angleStreaming mediaOpen setProduct (business)Classical physicsDescriptive statisticsArtistic renderingRevision controlLibrary catalogFunctional (mathematics)String (computer science)Uniform resource locatorBitSlide ruleConfiguration spaceCuboidQuery languageMathematicsSoftwareSchmelze <Betrieb>MaizeSearch engine (computing)Library (computing)Table (information)Source codeMetreFocus (optics)Graph coloringNumbering schemeSocial classNumberDemosceneJava appletStaff (military)Image resolutionDispersion (chemistry)Statement (computer science)Scaling (geometry)Rule of inferenceEscape characterFehlererkennungElectronic mailing listSingle-precision floating-point formatForm (programming)Bit rateInternetworkingP-valueParsingDifferent (Kate Ryan album)Wage labourFormal languageComputer animation
22:59
Set (mathematics)Hash functionElectronic mailing listRange (statistics)Query languageDefault (computer science)Key (cryptography)View (database)Operator (mathematics)Axonometric projectionRegular graphLocal GroupComplex (psychology)Search engine (computing)E-textContent (media)Data typeBootingTable (information)QuicksortDatabaseQuery languageLocal ringDefault (computer science)Array data structureCuboidField (computer science)View (database)ResultantLatent heatGeometryCASE <Informatik>Electronic mailing listCombinational logicBit rateFinite differenceHash functionSubject indexingGroup actionKey (cryptography)Operator (mathematics)FluxRegulärer Ausdruck <Textverarbeitung>Selectivity (electronic)Set (mathematics)DialectProjective planeEqualiser (mathematics)Multiplication signInsertion lossRight angleMatching (graph theory)Category of beingStatisticsMereologyArrow of timeCountingScaling (geometry)Object (grammar)Table (information)Different (Kate Ryan album)Doubling the cubeBoolean algebraRootSquare numberRelational databaseData typeProduct (business)BitFrequencyMoment (mathematics)Series (mathematics)Service (economics)Message passingFormal grammarExpressionAreaDirection (geometry)Analytic setGreen's functionBeta functionScalar fieldData storage deviceComputer animation
29:48
Subject indexingData structureView (database)Function (mathematics)Level (video gaming)Key (cryptography)Query languageSingle-precision floating-point formatUniqueness quantificationAbelian categoryField (computer science)Array data structureE-textData typeTable (information)Glass floatMultiplicationTable (information)Key (cryptography)Query languageVolume (thermodynamics)Subject indexingSpacetimeRight angleFunctional (mathematics)Field (computer science)Medical imagingCategory of beingLevel (video gaming)DampingWordSet (mathematics)QuicksortUniqueness quantificationDefault (computer science)MereologyView (database)IntegerPointer (computer programming)Multiplication signRelational databaseFunction (mathematics)Operator (mathematics)Rule of inferenceElasticity (physics)Heegaard splittingMatching (graph theory)Data typeEqualiser (mathematics)Data structureBoolean algebraRevision controlMoment (mathematics)BitNeuroinformatikDatabaseOrder (biology)Data storage deviceFormal languageCASE <Informatik>NumberAdditionLatent heatString (computer science)Semiconductor memoryArray data structureExtension (kinesiology)Source codeMathematical analysisNatural languageDataflowNetwork topologyConfiguration spaceCuboidOffice suiteDialectComputer animation
37:30
Condition numberDatabaseACIDDifferent (Kate Ryan album)Data typeWordComputer programmingScripting languageRelational databaseMultiplication signDatabaseTrailBit rateAtomic numberClient (computing)Computer animation
38:11
Binomial coefficientOperations researchAtomic numberIntegerQueue (abstract data type)Electronic mailing listRevision controlDemo (music)EmailTime zoneContent (media)Data typeOperator (mathematics)Sinc functionComputer multitaskingDatabaseRelational databaseStandard deviationField (computer science)Category of beingCommunications protocolOperator (mathematics)Key (cryptography)Local ringSet (mathematics)Array data structureString (computer science)NumberCartesian coordinate systemData storage deviceRight angleRelational databaseContext awarenessNumbering schemeAtomic numberRevision controlQuicksortCombinational logicParameter (computer programming)2 (number)Replication (computing)Program slicingQueue (abstract data type)DatabaseMultiplication signTrailLatent heatInsertion lossInformationSlide ruleSimilarity (geometry)BitRouter (computing)Sound effectArithmetic meanForm (programming)Computer animation
43:42
DatabaseRelational databaseStandard deviationQuery languageError messageString (computer science)IntegerFile formatData storage deviceUniform resource locatorQuicksortRelational databaseField (computer science)Different (Kate Ryan album)Cartesian coordinate systemProcess (computing)Coma BerenicesSystem callNormal (geometry)Natural numberPresentation of a groupProduct (business)Library catalogCASE <Informatik>Data typeForm (programming)Right angleElasticity (physics)Pattern languageSubject indexingFitness functionDatabaseString (computer science)Latent heatFunctional (mathematics)BitValidity (statistics)Set (mathematics)FreewareInteractive televisionPlotterDistortion (mathematics)WebsiteArray data structureDiscounts and allowancesStatement (computer science)Particle systemSource codeMessage passingKey (cryptography)Numbering schemeSheaf (mathematics)Multiplication signConnected spaceConfiguration spaceMatching (graph theory)WaveLimit (category theory)Moment (mathematics)Local ringComputer animation
49:13
Query languageOpen sourceFreewareDifferent (Kate Ryan album)Data typeBackupDatabaseCore dumpSlide ruleFunctional (mathematics)Physical systemRight angleUniform resource locatorElasticity (physics)Parameter (computer programming)Streaming mediaMiniDiscSet (mathematics)CASE <Informatik>String (computer science)Reading (process)Food energyRule of inferenceSemiconductor memoryMultiplication signVarianceProgram slicingSemantics (computer science)Moment (mathematics)Array data structureLecture/Conference
51:52
Slide ruleFreewareOpen sourceEvent horizonINTEGRALPresentation of a groupSlide ruleElectronic mailing listComputer animationLecture/Conference
Transcript: English(auto-generated)
00:07
Welcome this morning. I'm Derek. I'm Dutch British long story and most people tend to know me because of PHP stuff. I wrote some things for that. I won't be talking about that whatsoever, but there might be an example that uses PHP
00:22
code in one of these slides. I like maps, I like beer, I like whiskey and you'll see those items coming back in the examples that I'm using. It's a bit, well besides the maps, the beer and the whiskey, it's a bit too early for that if you ask me. All right, so what are we going to talk about this morning? I'm going to talk about non-scalar data and first of all I'd like to sort of define
00:42
what I mean by this. Traditionally what you would store in a database you have like fields and you have values and those values tend to be a single value, right? So there can be strings or numbers or booleans or whatever thing you come up with there. But then you have non-scalar things which are things like arrays and nested objects, arrays of objects and so
01:01
on and so on. I'm sure some of you have in the past stored a comma-separated list of values in the database field at some point, right? Just raise your hand, yep. And there's people not paying attention or lying, always the case. So yeah, this is the thing, right? The non-scalar data aspect you see actually show up quite a bit more and in many relational databases you couldn't
01:23
traditionally do anything with that. So a few examples here, if you have an article you want to store some tags with that, right? Well you want to say that this article is about Java and PHP and databases then. So initially what you'd have to do in a relational database is you have to have this table that defines all your tags, then you have to define the table that stores
01:44
all your articles and then you have this link table that links the articles to the tags, right? So you get three tables for doing that. But that's not particularly very efficient because you need to do like a two table join any time you share an article. Now other examples where this becomes sort of important if you have things like a set of properties that you need to
02:03
store with something. Traditional example here is using like a big product catalog, right? So if you have a product catalog, most products will have a name in it, they will have a price in it, right? So those are the fields that are the same for almost all things. That wasn't me.
02:25
Nope, that wasn't me. I don't know. We'll keep an eye on that. So yes, you have this set of properties that are always going to be the same for a product and you have a set of properties that are going to be different. Like for books you have an altar and the number of pages, right?
02:41
Whereas for whiskey you'll have how strong it is and where it's from and how heavy it is for shipping and things and so on and so on. So there's a very different set of properties. Traditionally in a relational database how you would store those is you would store them all in very complex pattern called EAV which is entity attribute value
03:02
relationships. It works well but it is not very easy to find back things in the database when you look at it yourself and it's kind of a complicated thing where you need to multi-table joins to get all your properties out of it. Unless of course you normalize it but that's something in relational database you traditionally don't do, right?
03:22
So those use cases here are the sort of data things we're going to have a look at but not with books. So what this presentation is not about, I'm not going to talk much about scalability or high availability or benchmarks or I don't have the time to go in that covering all the mentioned technologies here because I probably need a whole day for that. I only have 57 minutes left.
03:44
All right, so the things we're going to look at is database type for non-scalar data. We're going to look at some query and manipulating data sections and we have a bit of a recap at the end. Let's get started with that. First of all, let's have a look at different database types. Now, you're probably familiar
04:01
with relational databases, MySQL, PostgreSQL, Oracle, IBM, DB2, MSSQL, one of those, right? Probably should know. But in the last, say, what, 15, 20 years, a whole bunch of other different types of databases that are also on the market, often put under the Gnome or NoSQL, which I don't
04:22
think is a particularly good name for them but basically non-relational database is probably a better way of describing that. So there's three groups I want to look at. There's many model groups. If you look on Wikipedia, there's probably 80 different categories that people come up with. That's, I mean, overdoing it a little bit, I think. But to look at the most simple thing that you
04:44
have here is usually a key value store. And a key value store works really simple. They tend to be used for caches because they are so simple. Lookups and storage is only one specific key. And you can't do operations
05:00
or anything else besides this key. A good example for this is Redis. And in addition, I forgot to mention point three here, is that the values that you associate with those keys, they don't have to be just single values, right? They can be lists or hashes, but I'll go back into those in a moment. So the keys
05:21
and values are binary safe strings. The values are safe strings, but the strings do a little bit more than just strings sometimes. But besides the single, the scalar value of strings, they also can store hashes, lists, sets, sorted sets, and some others, of which I'll show you a few in a moment. The interaction of it goes through a Redis CLI,
05:42
or a PRadis PRadis, which is a PHP extension to talk to Redis. The Redis protocol is particularly quite simple. It is still a binary protocol, so you do need a little bit of logic to talk to it. But the Redis CLI tool that comes with it is very simple, and I'll have an example of that in a moment. And the normal
06:02
strings are sort of special, though, because you can treat them as numbers, and I will then use them as numbers instead of just pure strings. So let's have a look at one of those data types. As I said before, in key-value storage, the only operations you can do on a key, right? So if you want to store, say, properties of a specific product, in this
06:24
case, the whiskey Glenfiddich, and I want to do something with the property tags, what I actually construct myself is this key. Convention dictates that you use a colon in between the different things in here. So we have the product category whiskey here, we have the name or the slug that we have here, Glenfiddich12, and then we have the
06:42
property tags. So you build this key up yourself because, as being a key-value store, there's no other way of doing that, really. There's no multi-column keys. And then with specific operators, you can add tags to this. So the first one, we do sadd, stands for set add, we add the tag fruity. And then the next one, we add vanilla to it, and then we add fruity once
07:03
more. But because there's a set of data, and the set only contains the same values and not duplicate values, we're not fruity again. I won't do anything with it because it's already part of the set. And then if you want to do things with spaces in there, you need to use double quotes, right? Then when you do a
07:21
query, you can do many different queries on sets. You can test whether the set contains a member. So when we set, well, if there's a tag PT set on this whiskey, return one or zero, and of course, because we hadn't set it, we get a zero back out of it, right? So it means false, not part of it. And of course, you can query all the members of the set. It is
07:42
probably wise if you sit on that side of the room. I'm just warning you. So yeah, those are the sets. Now, an additional thing to this is hashes, and they work in a little bit of a slightly different way because in a hash, you have key value pairs that you
08:01
store with a property. So to go back to the example that I have about a product catalog, if I want to store multiple properties, what I'm doing here, I'm storing for the whiskey Ben Nevis 19. I store in the props field. I will store the key value combination distillery Ben Nevis. So it says basically the distillery for this whiskey is Ben
08:21
Nevis. You can also do HM set, which sets for hash multi sets. Basically, I'm setting two properties here, right? The first key value pairs region Scotland Highlands, and then the second one is H19, and that's how you specify the command line. Of course, when you put things in a database, you can get things out of it again. So
08:41
you get the hashes out of it again by using hash get all, which gives you all the key value pairs that are in there in a array, starting with one. I don't know particularly why they chose that, but that's how that works. And of course, you can get values for specific properties as well. I also mentioned here that although H, we
09:00
stored like 19 as a number, it comes back as a string because the values are always going to be strings, with some exceptions. So that's what Nevis does. As an example of a key value store, and there's other examples of course as well. So slightly more complex data stores are the document data stores. They have often more richer data types. You
09:20
can often do operations on both keys and values. In many cases, when you see documentation for these document stores, all the examples will show you the data objects in JSON. That does not mean that those things are actually stored as JSON in the database, but it's often a way for communicating it, or at least visualizing the data that goes in and out of it. Examples of this
09:42
group are the ones that Lisa will be showing here. It's MongoDB, CouchDB, and Elasticsearch. But of course, there are dozens more than these three here. All right, so let's have a quick look at MongoDB. It's an open source product, just like all the others really, so it doesn't really particularly need mentioning. The documentation is also using
10:00
JSON, but it stores the something called Bson, which is binary JSON on disk. The interaction with those documents is often through language-specific data structures. If you are using Java, you build the documents of a Java builder kind of pattern. If you're using PHP, you can use PHP arrays and
10:20
objects to store the data, and the driver will automatically convert us to this Bson layer under the hood. It's not something you have to take care of yourself. The interaction, however, is a bit more complicated than with Redis because it is now a binary protocol, so you need an extension for languages to interact with that. In PHP, it's called MongoDB, but
10:41
there's drivers for any other language you can think of as well. The interaction is a bit more tricky, but it makes it easier to deal with from your language because it's a natural way of dealing with data in there. The documents are a bit more complex now because we have key-value combinations. At
11:03
least in MongoDB, there's always this underscore default, which is your primary key. It's immutable, so you can't change that. Then you have the whole bunch of properties, and I just want to point out a few. Like the words one is an array of words, and that is the native data type in there. It's an array of words that you can do
11:20
queries against even. Then batches is a bit more complicated as well because it's an array of objects again. Not only can you store just an array of things, you can store an array of those nested documents. This is basically the same about all the other document data source. They store something very similar, as you'll see in a moment. This is, I
11:43
think, the only PHP example that I have, but it should illustrate enough how this sort of works. If you want to insert something, you create a connection to a database and a table name, which in MongoDB we call collections. We insert two documents. As you can see, they're just simple two
12:00
PHP arrays that we insert. We treat them as key value pairs, and then with insert one, we insert one document, and with insert many, we do many. It's not particularly very difficult. In all the languages, it works very similar ways. Having a look at CouchDB, it's also an open source Apache product. The documents are JSON objects, and that is also
12:21
what you talk over the wire. CouchDB exposes a REST API, which you can talk to with curl, or if you want to use an HTTP peak line, there's things like Guzzle and all the things. It's not a binary protocol, simple, very much a REST API, which is handy because you don't have to write specific language drivers, but then it's also not a
12:41
binary protocol, so you end up having more data over the wire. That's a little different trade-off they have made. To have a look at the documents, as you can see, this is exactly the same document with one exception, is that every document, so having this unique ID, but I also have this underscore ref stands for revision, which is important. You'll see
13:01
later when we update documents. This is something that is generated by the database for you. The number one intersect increases every time you update the document, and then you have a hash describing the contents of the documents. That's CouchDB. Inserting in it is not very hard either. I forgot to
13:21
mention that both MongoDB and CouchDB, you don't necessarily have to create a database, they'll just get created for you without having to do anything, which is kind of handy. That's what I'm doing here. I'm posting in this demo collection. I'm posting with a unique key, which is in this case Derek at localhost as my underscore ID value, basically. I'm telling
13:40
that this is JSON, and then I give it a JSON string containing information. You can see here that the underscore ID field is not in here because you don't have to set that, and the return value that you got back includes this revision number, which is important to know. I'll show you why later. Elasticsearch, again, is a very different
14:00
thing. It is not so much a database, it is more like a full-text search engine, but it is also sort of a database because you can store documents in them. They use fancy words like near real-time text search, and basically what it means is that as a database, if you store new documents in it, it takes some time for the index to parse
14:20
this document and add them to the full-text search index. That is often really fast, but it isn't instantaneously, whereas in other databases, when you store something, you have it immediately available through the indexes. That's slightly different here. It is based on Lucene, just like Solar is, and Lucene is, well, I think originally a Java-based storage engine
14:42
that knows how to store full-text search information, basically, and as many other tools like Elasticsearch have been built on top of that, and probably one of the selling points of Elasticsearch is that it is very easy to cluster. Actually, it is so easy to cluster that you sometimes end up doing it without wanting to do it. About a year and a half ago, I was speaking in a user group
15:01
in Amsterdam, and I was borrowing the afternoon office of somebody, and I was giving a slightly different talk, but it involves Elasticsearch, and it connected my laptop to the network, and it started replicating somebody else's data, because they were also playing with Elasticsearch. I mean, it wasn't important data, but it started replicating it. Luckily,
15:20
they have now changed their default configuration, so that doesn't do that out of the box anymore. Sorry? I think it's changed. Yeah, it's changed now, yeah, absolutely. Yeah, you now need to configure your cluster key, I think they call that. That's not a problem anymore, but that time I thought that was quite funny. But it still illustrates that it is very easy to cluster, because you just start all the nodes
15:40
up in a network with the same configuration name, and it'll just start replicating, so that's really handy. So they don't, as I said, because it's not a traditional database, they don't really call them database and tables, they tend to call them indexes and types, and the type is basically a collection where there's lots of common fields, although you don't necessarily need to have the common fields, because they are being
16:00
those call solutions. Again, you interact with this with JSON objects, you can do it with the REST interface, or with some helper extensions or libraries, in this case there's it's Elastic Search PHP, which is something that the people from Elastic Search have written, and there's, it's Elastic Search documents,
16:21
do you see any differences? Nope, I don't either, so that's good. And you can insert us in, yeah, again, with cURL or REST API back into the database. The cURL URL is a more complicated, but other than that it's practically
16:40
the same thing, right, there's not much difference, yeah. Right. So one thing that Elastic Search doesn't really do is again, that doesn't mean, what it doesn't really do is update documents, anytime you do updates, it basically replaces the documents, and has three index the whole thing itself. right, so
17:00
we've spoken a bit about key-value storage, and we've spoken a bit about document data storage, but we also want to have a quick look at relational databases, because I thought all those documents stores with all those properties that can get kind of handy, and we want to have this functionality in a relational database, because lots of people already use them, right, it makes sort of sense to to out some of the properties from the document stores and
17:20
key-value stores as part of in this case where I'm showing you is MySQL and PostgreSQL. Of course, all the database will have different things, but let me just talk about these two. So, the first one to to have a look at has is MySQL, which has support for a JSON type. I call this very basic, because the interaction with it are very different
17:40
from what PostgreSQL allows you to do. They're also working on something called a MySQL document store, which in my opinion is not a great name, because the only thing it really allows you to do is it's like CRUD operation, create, remove, delete, delete. No.
18:00
Read. That's the that's what the air stunt but it's not both remove and delete, because that be a bit silly. so it's it's a very simple layer built on top of MySQL that allows you to interact like it as it was like a a store. I don't think it's been released yet, but this field goes really quickly, so
18:20
the slides might be out of dates a little bit, but I don't think it's been in a this has been in a release yet. So this JSON type, the interaction with this is through strings containing JSON, which is a little bit different than than what this the the rest API's for couch TV and elastic searches because you still
18:43
and the storage again is better than just storing the JSON data because that's kind of pointless to search and things do it right and the manipulation is with specific SQL operators they are usually luckily part of the standard so at least between MySQL and PostgreSQL are practically the same and
19:00
one of the properties that my phone and website said that the field order in these in the JSON type for the
20:45
for the hstore field I think I think the value can be in h store so you can use the key and put an h store value in there again that's probably true but I did not try that yes I did not try that okay I'll give that a shot
21:00
later I tried that already okay good okay I'll play with that um let's pretend we have only the simple set of two particular queries against that and what they style the sensor there's often a
21:20
field that is often not queried although you can index this for properties that are not necessarily part of every document so for example as I showed with the product catalog it could be those extra properties I do quite a lot of stuff with open stream map stuff and open stream map uses post classical and hood quite
21:42
a description of objects in there that create a style of the map but you can't necessarily store all the tags that objects can have because it's an unbound set of tags and the way how the rendering of those maps work is that for every tag that you put in their schema
22:00
in the rendering schema you have a specific column and of course you can't have an unbounded set of columns right
22:43
um and yeah we have now this richer well you might remember this this json document from before right because it's practically the same that i've inserted in both elastic search and couch b and mongat b right so it looks like this
23:00
so after inserting data we need to sort of query data so course database are sort of useful to get data back out of again um and yeah i already sort of showed you how to do with redis right you can in stats you can check for whether a key value is part of the set you can retrieve all the members
23:21
of a key the hashes you have h get all to get all the information out of key or you can do h and gets which you get a set of keys out of the hash and i didn't show you the lists but there's also like lists that you can set ranges from that you want to retrieve and stuff and that's all the data types too but there's
23:40
many more like this so that's how you query there or of course you can request the whole property by by its key so how to be queries are primarily don't through the underscore d field which is the default now it is possible to set up secondary index and secondary keys as well but you
24:00
do that to something called MapReduce so you need to define those views upfront it's clever enough that it doesn't have to recalculate a view every time you insert a document because it stores the keys any time you insert or update a document no doesn't do that it doesn't incremental update of the index once you carry it huh so the insert is
24:20
cheap right the first query is slow huh it updates incrementally the view i did not know that well thanks for that um so yeah if you do a look up on that you tend to do it only a primary key now i have to say couch to be is not the database in our best in here but that's she'd be all right
24:40
and mongo to be allows for doing queries without having setting up secondary keys up front so you can do those queries right away this is a simple query that you look at which is basically what we find i won't find won't again that wasn't me find out all the check-ins where the there we go
25:00
region slug matches Scotland island and a rating is larger or greater than three and we only want to store the whiskey a rating and age fields so there's this equality match there's this match with a query operator and there's many more than just gte stands for greater than or equal and there's a projection basically what it says is the following SQL that's what it sounds
25:20
like select risky rating and age from check-ins where regions lucky for Scotland isla and the rating is large or equal than to three smooth liking me this morning
25:44
another way of doing queries is something called an aggregation pipeline where you run different operations on a whole collection in a go so in this case what we're doing is well we want to find all the documents where the region flux start with Scotland it's a regular expression operator and then
26:00
we do group by by region but so basically this says find me all the whiskies in the region Scotland and then group them by specific sub region so something you get back out of this okay I said those things already and then you get something back like this so you have the id fields which is the name of
26:20
the region and then it has an array of the whiskeys that come back out of that so it's a simple group by that you do here and then of course in like a relational database it actually just returns the array here instead of doing a single record per single record per return basically all right so elastic
26:40
search has two different ways of doing queries you can do it on this primary key or you can do this on a field value combinations that's a simple way so here we're searching in users count 16 and you get a result back out of that but where elastic search really shines is of course it's full
27:00
text search capabilities right so you can construct queries like the following like it's a Boolean query and they have all the queries as well like just geo queries and whole bunch of others and it says the documents that I find should match the city equals pandering and and a match whiskey should be sherry and in this case they both need to match but
27:20
you can construct queries where there's an an or query or a Boolean not query and it's so much richer than I can show you here which is really really good for building full text search index like having a search box saying will give me all the documents all the products that match these keywords and stuff and so on um it
27:41
yeah that works really well for that querying JSON objects and MySQL I found a bit complicated because there's new SQL operators that I didn't really know upfront so if we have this table where we store the name of a whiskey and the properties in this case we store properties age cast strength and ABV
28:00
or colored so the last one isn't shared among the documents we can construct queries say select name and age from the whiskeys where cast strength equals true and that's what you get out of it right so cast strength in this one is true and what we want to get out
28:20
of it is the name of the whiskey which is there and then the age as a value I'm not sure why it showed only single arrow here expected to show to really in any case what I want to that is the root of your document then
28:41
you have the field name and if you have nested field names you can use another dot and then another field name to match against these squares the difference between arrow and double arrow is that this escapes the unescapes the json value whereas the single arrow does not in this case that doesn't show difference because it's scale value
29:00
but otherwise you would have seen that all right so the h story yeah you can also query a slightly small example here so what you can do is get a value for a key you can again use this arrow operator so find me for every document in the table find me whether the is admin field the value of these admin fields
29:20
and then you get that back you can find all the unique use account with again using finding from the h doc d underscore d fields which is there got local host where the count unique is larger than ten so then you find the documents there so you can do queries on these on these fields quite long you can also
29:40
check whether keys exists which of course you sort of need now and you can match against key value pairs as well so let's have a look at indexes so redis is primary only looks at the key the only keys you index that isn't quite true because
30:00
the data structures that you store for each keys like sets and lists they are also optimized in its in memory base index so that is sort of an additional index to it but isn't something that is exposed so this data structures can provide your song with additional indexing encounter be indexes
30:20
as I said there's by default you have the primary key being index and then you have the secondary keys that you can construct with MapReduce MapReduce is a basically way for every document that you have it will emit something key value pairs which you then can use to do queries again yeah it's kind
30:40
of tricky to explain but it is sort of how it works the output you get back from that is also sort of defined by the view because you define the sort order in there as well you construct a new document and the original document yeah it's or a modified document yes
31:01
so what i've done here is so those those secondary indexes you store in something i believe called a design document as part of the database and then you can do queries against those views so that's why the word design pops up when you use the query so i've only showed you really the map function basically what i'm doing here for every
31:21
risky documents that i have i emit something on based on the region slug and then a slug of the document when i do a query against this then against this design document in docs i do the view by region because that is what i named
31:43
because i admitted that and then the value erin because that's the one that comes out of the document and you do queries against that works quite well but you need to think up front very clearly how you on which bits of data you don't do good queries on which is something you don't really
32:00
need with a relational database with the caveat of course that if you do a query on something Think of it as a postgres computational index. I'll get back to the MySQL version of those in a moment.
32:21
So in MySQL you can't sort it's a different different word starting with an in MongoDB you can do indexes as I said already there on the on on the underscore default because that's unique primary key so that's on there you can set up of course indexes on on keys as well
32:40
as you can set indexes on nested fields you can do that in couch to be in the fields as well of course in PostgreSQL you can do this as well and what is kind of useful is that if you set an index on a key that has an array of values you can actually do an equality match against word so you can say
33:00
find me all the documents where words matches cadence hats for example and then
33:34
space but that isn't necessarily the best way of splitting up a string into words depending on what your subject
33:40
matter is you can set up a language and language rules then define how words are being stamped so if you want to in English you want to have the word walks and walk met at the same time you configure a stammer for English that allows you to then to match those words directly on top of each other you
34:00
can do stop words that are not part of your index so words like the and uh and and the kind of pointers in the in the index right because it's not something you're ever going to look up on and they take up lots of space in the index because it yourself some you can boost specific fields to make them more important at sets right
34:28
documents after you inserted them and post classical you have something called the gin index you can think about gin as a solution later if you want to i'd just like the image
34:40
so there are two types of indexes for jason b fields like key value indexes or value indexes actually i don't think of an example showing you how to create the second one but it shouldn't matter much so again we're inserting this whiskey in here we have a key value match after we create this index
35:00
and what a gin index in very simple words does it looks at all the key and value pairs and store them in an index so whenever you match against either of them it can find back the documents out of this in a quite a simple way what it doesn't really allow you to do is create an index on specific sub fields you
35:20
can only index the whole data type which is different than what CouchDB and MongoDB and to less extent elastic search too but in most cases it's pretty good right again you see this and present I don't know I don't know the name that operator but you can match against key value pairs again in here you
35:41
can also check whether just like in MongoDB where I showed you that the words have a value called album in that right so we have the words here where you have Glen album 25 so the gin index is a bit better than just collecting all the days like a full search index but it does this kind of things yeah good
36:08
from what I believe so yes that is the case yeah so I could probably build an index with a functional index where you yes a jason number cast into an integer
36:20
and put that as a key into an index yes so you could use you can do that but as I said I can't talk about all the functionalities in your databases but yeah so the annoying thing is that my scope doesn't do any sort of this whatsoever the only way how you can
36:44
to virtual generated columns so if you want to index on say the abv the outcome by volume what you have to do is alter my table whiskey other column abv float generated always being always there as properties abv so that plucks out this abv
37:00
field out of the properties the jason fields create a virtual column out of that and then you can put an index on this virtual column to index on whiskey abv which is the value i plucked out of it even though i hadn't stored that in the document itself it's not
37:22
you you can make this a materialized view but i don't believe you have to actually so yeah that is currently how my skull does that so the last thing i want to talk about quickly is manipulating data and the important word there in any database that you want to make sure that no
37:40
two programs or scripts or threats at the same time can update data traditionally relational databases have like this asset compliance kind of thing that is not something that no skull solutions often do but have different ways of of making sure that you don't update data incorrectly so
38:01
important words to remember is that you should never retrieve a very dangerous thing to do you get really loopy so how Redis does this it has like special operators that you can run so if this is the start of my document that starts so that we
38:20
have just properties again where the age is 19 you can run h inc by which increases the value of a specific property by the number that you specify so what this says is increase the property age in this key by one and here
38:40
Redis does something more than just strings because it knows how to add one to the string 19 and then you get it back as a string 20 no you do not get one that be mean I don't think it does hexadecimal actually so that that is how Redis does it it has specific operators to operate on this date and
39:00
basically doing a set add is a very similar thing that I showed you before right you can also do queues like using l push and l pop and that push things on the front of a array pop them off at the start of an array so you can implement the kind of queuing protocol with that as all all using this atomic
39:20
operation operations yeah yeah one more in couch to be it is say we already had this document with the key Derek at local center and we would do we want to store this again it tells you no can't do that because we already have a document with this key
39:40
which is kind of annoying because how would I update this document right because if I want to update I need to store it with the key again and so the way how you can do that is
40:00
this is that if you do this at the same time from two different threats only one is going to work you also know which one is going to work because it will tell you either the new revision number or it tells you the you got a conflict because the document that you want to store with a revision number you
40:20
thought it was it's no longer there because the first threat has completed can decide what to do out of that right it can decide to restore the document potentially probably overwriting the original save that has been made or just something
40:40
more complicated where it compares the documents and see which values have been updated and issued that as a new document and the revision is used by from couch before application yes sure yes yes the yes yes okay yep in a similar way when replication has its internal things
41:00
that work sort of in the same way so i should mention that oh this my second php example i forgot tonight it doesn't particularly matter basically what i'm doing here with find one is i find a document there car then update the data locally i said the steps
41:20
made key the key two thousand seventeen zero two zero two and then i alt seven thousand one twenty four and then i update it again with a document this is not an atomic operation right never ever do something like this so what you can do instead is you can use update operator so what this example says is that we're
41:40
going to find all the documents a person equals derek r which is hopefully unique and then we're going to increment the value that we have stored in steps made dot two zero one seven zero two zero two which is the dot operator just like in the jason field tips that i've showed you before is array deferencing
42:00
operator and we are adding seven thousand one hundred twenty four to that so this is one of the atomic operators that sort of like radis head with the h in combine right it's similarly um also if you want to the sets in to the field
42:20
tags we then add the value of the tag and again you don't get duplicates in it because it's a set you can also do is you can do push which is very similar to add the set but it doesn't care ready of duplicate values so what i'm doing here so i'm all in two tags mongo to be a next debug for example and then i have an extra argument to this which is called slice
42:41
minus two it basically says only keep the last two in the array so this sort of allows you to keep only the last 10 tags that i've stored and things like that as an atomic operator and of course most of
43:04
in a relation database you of course aware that you have to create your scheme up front right in most of the no skull solutions you don't really do that you don't really do that by configuring it in the database you do that by thinking about is really really hard and and then think about it once more
43:21
to make sure that the application knows how to deal with the data which is store my data in the most efficient way right you you create a schema according that you have no data duplication and so on and so now
43:40
I realize that that doesn't always work very doesn't always work as well as it as expected to do so you do denormalization now no skull solutions or document data stores you don't tend to do that what you tend to do is you create your
44:03
that often means that you end up storing some duplicate data but that's all right and you shouldn't be too afraid of that and it's it's a common I wouldn't call practice but it's not a bad thing anymore it's important that your
44:20
application can interact with your data in the most efficient way not necessarily how efficient is to store your data and you end up making trade-offs here right because some some database schemas are going to be really good for retrieving data really fast but it sometimes can be more cumbersome to update because you have this duplicated data and you'd have to update it in multiple
44:40
locations so this is something that you need to think of when designing a schema how that works now to look a little bit at schema validation is that most no skull solutions will not do anything for that it tends to be pushed to the application and that is also true for the relational databases
45:00
where you've aged or JSON JCB I'm sure that PostgreSQL has some functionality that you can enforce a little bit more on it but it isn't going to be as rich as enforcing that on on normal schemas in normal relational databases and in no skull databases is generally also not possible to do many people use ODM kind of patterns on top of
45:20
that to deal with this data I just want to show you MongoDB because it does have something like this it's like it you can actually configure per collection a way to enforce a schema this is a very basic way of doing it what you can control as that specific fields need to be of specific data types
45:40
or fit in specific values I couldn't find that but I believe you when you say that
46:02
it's a very basic way of doing this still I mean you can't for example say that we only want these fields in here you can't exclude fields from being stored in the data base and stuff like and so on and so but hopefully that's coming in the next release of MongoDB 3 6 where you it likely have full support for JSON schema
46:20
which is something I'm going to expect to see coming and post classical and on some of the other no solutions as all because it's such a nicer way of defining a schema for sort of free form data which is I think
46:42
the way non-scalar data is stored in the other database is quite different right in Redis it is stored in only sets or hatches in the document data stores it tends to be JSON documents in traditionally relational databases you have fields where you store JSON documents and you tend to interact with
47:00
that as they were strings all of those solutions are going to be valid all depending on what you want to do with your data right so if you already heavily invested in relational database like MySQL and PostgreSQL adding these JSON or JSON V or H store type is a valid way of doing things if you want a full text search index want to
47:20
search on natural data elastic search is going to you application would interact with this data like the products catalog that I showed you before a
47:40
document data store is going to be your best bet of picking there's lots of different things that you need to think by picking the right solution and it is a very difficult thing to say which one is best for which use case because it all depends on which data you're storing and how you interact with your
48:25
data really do updates of documents you can only replace the whole document for example whereas with both Redis and Relational database you can tend to plot or update specific fields but if you want to update adjacent b type field then
48:40
because this is a single field it is more difficult to update specific values of the sub fields in there although you can do things like that the process com so so yeah it is a very complicated things and what and what i hope you to show you here is more what all the different things are out there and sort of the difference between them
49:01
more as a as a hint for you to go look at them in more detail because as i said started presentation if i want to talk about all these things i probably need about eight hours which i don't have so
49:29
i don't believe they are and if they are they are going to be very minimal said that i would definitely expect maria to be to be further along adding
49:40
more of these features than my skull would be because yeah that's my feeling on this but i have no specific evidence to us okay
50:09
yeah yeah
50:22
but mongodb has the same thing right it has mongo done yeah comes with the database yeah elastic search has a tool to dump it no you can snap yeah but mongodb has that for example sorry yes arguments don't really work well for the for the streams
50:43
clearly you need to think about backups but there are different tools do that in a different way i only know how mongodb does this really well here because i'm most familiar with that there is a mongodb dump tool it comes yeah
51:00
but there are other ways of doing it as well right because the the no skill solutions tend to be more distributed systems what is also a common thing to see is that you have an extra node that is there just taking the backup and having backups of a distributed set is often less necessary not saying that you shouldn't do it you should always
51:20
have a backup but it's less necessary of doing as a dump to disk there's functionality for doing that but it's not what is made for right so you need to take those things into account or some that's
51:40
all about what I want to say about that anything else no all right well in that case I have one more slide that says thank you very much uh I'm going to upload the slides to this URL I will probably
52:00
some of the research things that I found while making a presentation if you have any questions also feel free to contact me I'm more than happy to answer questions or integrate comments if you have them in the presentation as on so with that's that thank you very much and enjoy the rest of your day