Storing Non-Scalar Data

Video in TIB AV-Portal: Storing Non-Scalar Data

Formal Metadata

Storing Non-Scalar Data
Title of Series
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
In this presentation we will look at storing complex data in a single field. Many noSQL solutions are created around this (such as Redis' lists, sets and hashes; MongoDB's and CouchDB's records), and many relational database now also support storing complex data in a single field through specific data types (such as PostGreSQL's JSONB or hstore, MySQL's JSON). Each of the different database engines support different things, and handle these data types in different ways. In this session we compare the different approaches to storage, indexing and interactions with these data types in different databases.
Keywords Databases

Related Material

Video is cited by the following resource
Purchasing Point (geometry) Mapping Relational database Electronic mailing list Database Bit Line (geometry) Field (computer science) Number Array data structure Computer animation Personal digital assistant Scalar field Single-precision floating-point format Speech synthesis Website Right angle Object (grammar) Quicksort Descriptive statistics
Web page Relational database Java applet Multiplication sign Cellular automaton Data storage device Set (mathematics) Database Library catalog Field (computer science) Number Product (business) Category of being Computer animation Bit rate Different (Kate Ryan album) Pattern language Right angle Table (information) Metropolitan area network
Presentation of a group Relational database Multiplication sign Sheaf (mathematics) High availability Bit Database Benchmark Event horizon Scalability Process (computing) Computer animation Personal digital assistant Query language Different (Kate Ryan album) Scalar field Entropie <Informationstheorie> Right angle Data type Oracle
Addition Musical ensemble Key (cryptography) Relational database Data storage device Ultraviolet photoelectron spectroscopy Database Bit Cache (computing) Category of being Goodness of fit Computer animation Different (Kate Ryan album) Operator (mathematics) Data type Error message
Confidence interval Set (mathematics) Mereology Field (computer science) Number Product (business) Latent heat Scalar field String (computer science) Operator (mathematics) Endliche Modelltheorie Extension (kinesiology) Exception handling Time zone Dialect Key (cryptography) Data storage device Electronic mailing list Interactive television Content (media) Bit Library catalog System call Particle system Category of being Computer animation Hash function Logic Personal digital assistant Order (biology) Video game Speech synthesis Quicksort Data type Reading (process) Spacetime Multiset
Musical ensemble Open source State of matter Source code Combinational logic Device driver Insertion loss Mereology Distance Event horizon Element (mathematics) Formal language Product (business) Array data structure Bit rate Operator (mathematics) Authorization Data structure Extension (kinesiology) Key (cryptography) Cellular automaton Moment (mathematics) Interactive television Data storage device Database Bit Demoscene Connected space Particle system Category of being Word Computer animation Symmetry (physics) Personal digital assistant Speech synthesis Pattern language Right angle Quicksort Object (grammar) Table (information) Data type
Standard deviation Service (economics) Demo (music) Information Schmelze <Betrieb> Multiplication sign Device driver Database Representational state transfer Fehlererkennung Formal language Product (business) Number Revision control Latent heat Computer animation Personal digital assistant Flag Object (grammar) Communications protocol Writing Local ring Exception handling
Point (geometry) Classical physics Parsing Wage labour Java applet Multiplication sign Set (mathematics) Real-time operating system Graph coloring Field (computer science) Formal language Single-precision floating-point format Elasticity (physics) Office suite Extension (kinesiology) Social class Data storage device Database Representational state transfer Demoscene Subject indexing Word Maize Computer animation Software Search engine (computing) Query language Personal digital assistant Configuration space Right angle Object (grammar) Quicksort Table (information) Data type Near-ring Library (computing)
Slide rule Relational database Data storage device Interactive television Database Bit Mereology Field (computer science) Category of being Computer animation Bit rate Personal digital assistant Different (Kate Ryan album) Internetworking Quicksort P-value Data type Reading (process) Form (programming)
Metre Multiplication sign Source code Mereology Rule of inference Field (computer science) Latent heat Different (Kate Ryan album) String (computer science) Elasticity (physics) Dispersion (chemistry) Standard deviation Focus (optics) Data storage device Interactive television Representational state transfer Category of being Computer animation Personal digital assistant Order (biology) Statement (computer science) Quicksort Escape character Table (information) Data type
Image resolution Numbering scheme Set (mathematics) Mereology Field (computer science) Number Product (business) Goodness of fit Latent heat String (computer science) Descriptive statistics Scaling (geometry) Mapping Key (cryptography) Data storage device Electronic mailing list Staff (military) Bit Library catalog Category of being Subject indexing Computer animation Query language Personal digital assistant Beschränktheit <Mathematik> Artistic rendering Quicksort Object (grammar) Data type
Category of being Computer animation Key (cryptography) Hash function Query language Multiplication sign Set (mathematics) Database Quicksort Mereology Data type Local ring
Musical ensemble Service (economics) Beta function Direction (geometry) Multiplication sign Combinational logic Set (mathematics) Field (computer science) Product (business) Frequency Array data structure Latent heat Bit rate Different (Kate Ryan album) Operator (mathematics) Scalar field Green's function Square number Cuboid Arrow of time Selectivity (electronic) Series (mathematics) Area Boolean algebra Default (computer science) Dialect Matching (graph theory) Key (cryptography) Relational database Moment (mathematics) Expression Projective plane Data storage device Counting Analytic set Subject indexing Category of being Message passing Computer animation Doubling the cube Personal digital assistant Query language Finite difference Formal grammar Right angle Object (grammar) Quicksort Table (information)
Functional (mathematics) View (database) Multiplication sign Source code Set (mathematics) Function (mathematics) Mereology Rule of inference Field (computer science) Formal language Revision control Heegaard splitting Array data structure Semiconductor memory String (computer science) Cuboid Data structure Office suite Boolean algebra Addition Default (computer science) Dialect Matching (graph theory) Key (cryptography) Mapping Relational database Uniqueness quantification Moment (mathematics) Mathematical analysis Database Bit Subject indexing Word Pointer (computer programming) Computer animation Personal digital assistant Query language Order (biology) Configuration space Right angle Natural language Quicksort Spacetime
Dataflow Functional (mathematics) Set (mathematics) Field (computer science) Medical imaging Array data structure Latent heat Operator (mathematics) Elasticity (physics) Extension (kinesiology) Matching (graph theory) Key (cryptography) Moment (mathematics) Data storage device Database Volume (thermodynamics) Subject indexing Category of being Word Computer animation Personal digital assistant Network topology Right angle Quicksort Table (information) Data type
Scripting language Trail Word Computer animation Bit rate Meeting/Interview Atomic number Relational database Multiplication sign Database Client (computing) Computer programming
Trail Multiplication sign Set (mathematics) Similarity (geometry) Insertion loss Parameter (computer programming) Replication (computing) Field (computer science) Revision control Latent heat Array data structure Atomic number Operator (mathematics) String (computer science) Program slicing Router (computing) Form (programming) Key (cryptography) Data storage device Sound effect Bit Cartesian coordinate system Category of being Arithmetic mean Computer animation Right angle Quicksort Communications protocol Local ring
Slide rule Context awareness Sheaf (mathematics) Numbering scheme Field (computer science) Latent heat Matching (graph theory) Information Validity (statistics) Relational database Moment (mathematics) Interactive television Data storage device Database Bit Cartesian coordinate system Limit (category theory) Connected space Wave Uniform resource locator Computer animation Normal (geometry) Configuration space Pattern language Right angle Quicksort Data type Local ring Distortion (mathematics)
Presentation of a group Multiplication sign Plotter Source code Field (computer science) Array data structure Different (Kate Ryan album) Natural number Elasticity (physics) Key (cryptography) Relational database Data storage device Interactive television Database Discounts and allowances Particle system Subject indexing Message passing Computer animation Personal digital assistant Statement (computer science) Website Right angle Quicksort Data type
Slide rule Backup Presentation of a group INTEGRAL Multiplication sign Moment (mathematics) Electronic mailing list Variance Set (mathematics) Database Semantics (computer science) Food energy Rule of inference Array data structure Different (Kate Ryan album) Personal digital assistant Semiconductor memory String (computer science) Program slicing Right angle Reading (process) Physical system
Computer animation
welcome this morning and Eric I'm Dutch purchase long story and there was people tend to no miracles speech itself and
proceedings for that and I will not talk about that whatsoever to but there might be an example but you speech because homologous sites uh I like maps I like the I like whiskey and you'll see those items coming back in the example that I'm using it's bit will besides maps the beer with its victory for the right so other than to talk about is money I'm going to talk about non scalar data and 1st of all I'd like to sort of the final I mean by this and traditionally what you'd stored in database you like fields they have values and those values tends to be a single value right so they can be strengthened numbers of Willie until everything you come up with the and but then we have non-standard things which are things like arrays objects arrays of objects and so on and so on I'm sure so you have in the past stored to a comma-separated list of values in the database will at some point right description and yeah and as people look at line if it always the case so yeah is the thing right on the north scalar they'd aspects you see at show quite a bit more and in have many relational databases Yukon that you couldn't traditionally do anything with so few
examples here if you have an article you want to store syntax that's right here will you say this article about Java and PHP in databases then to issue what you'd have a relational database it has to have this this table that defines holly attacks then you have to find the table that stores all the articles and then have this thing table that links the that articles to the tax rate you get 3 tables for doing that but that's not particularly very efficient because you need to do like to the table joint anytime now all are examples where this becomes
sort of important if you have things like a set of properties that you need to store which with something and the traditional example here is using like a big product catalog right so if can follow a catalog most public so have a name and a lot of prices so those are the fields there is saying for almost all all the in the the the the the cells time will keep an eye on that I say yes the set is a set of properties are always going to be the same for quality of a set of properties that are going to be different like 4 books you have an altar and the number of pages right worse for whiskey you'll have astronomers and where the strong man and how heavy this shipping instant things and so on and so on so very different set of properties racially relation data is I stored as assume stored on all in a very complex pattern called UAV at something and then entity-attribute-value relationships it is it works well but In this null Teresa to find back things in the database we look at it yourself this kind of a complicated thing right to multi-table joins to get all the properties of the and this is what the that's something
animation at which it's resting on the right so those those use cases here are as disorder data things going to have a look at that but that's lot
books so this presentation is small about and local to talk much about scalability or high availability of benchmarks or I don't have the time to go in that's covering all the mentioned acknowledges him because the problem all day for that I only have 57 minutes left right so the things are going to look at is database time scalar data are we going to look at the query and manipulating data set sections and we have a bit of a recap and
let's start events 1st of all let's have a look at different database types now you probably familiar with relation datatype of databases and my skull process was called oracle IBM DB 2 and assess girl all of those words provisional and
that errors in the last say about 15 20 years a whole bunch of all the different types of database it also marked Allston good and the no more no less
Scott which I don't think it's particularly good name for them but basically known relational database is probably better at describing so and
this 3 groups I want to look at as many smaller groups if you can we could be another spot 18 different categories of people but that's over the little bit I think but to to look at the most simple thing that you have the is usually a key values of the key-value store is this works really simple they tend to be useful caches because there are so simple and look ups and store has only don't own 1 specific key and to conduct operations all anything else besides the sky a good example for this is rather rabbits and and addition to of of mention . 3 years that the values that you associate with keys they don't have to be so just single fellas they can be lists for hashable go back into the so the
keys and that s are binary say strings and the values are the same strings it traced through delivered what industry sometimes and the besides a single the scalar value of strings they also can store hashes lists that sort of Saturn lots of it'll show you a few moments and interactional that goes true I the life or appearance Puranas which is a speech the extension to talk to us and the reddest follicles predicted quite simple it is still a binary particle so you do need a little bit of logic to talk to it and the that's Doretta's CLI tool that confidence very simple model a couple that and the normal strings are sort of specials zone because you can't read was numbers and I will then use them as numbers instead of just pure strings so let's have a look
at 1 of the set of datatypes in and I said before in Kigali searching on the operation chicken keyword so if you want to store say properties of a specific product in this case the received landfill and I want to do something with the property tax what actually construct myself it's a sky at convention dictates that uses that the colon between the different things in here so we have the product category whiskey here have the name Order slots that that we have is can tell and then we have the property tax so you build a skip cell because as the a key value store there's no other way of doing that there's no multi-column case and then that specific operators you cannot text this so refers only to as AdSense was set at we all detect fruity and then the next only often enough to its in and then we'll once more the because as the set of data and a set only contains the same values and not duplicate values but not and I won't do anything with because it's only part of the set and then if you want to do things like the space in a new to the topic of the the main nuclear you can mean many different queries and such you can to test whether the set contains a member so on the set all if the if there's attack pt settlers whiskey return 1 0 0 and of course the content said that the we get a 0 that kind of a product so means false not part of it and of course you can create all the members of the set it is probably wise if you sit on that side you I'm just warning if densely that those are the sets now an additional thing to this is hashes and they're working in a bit of a similar or slightly different way because in the hash you have key value pairs that you store away it's a the property so to go back to the example that I have about a product catalog if I wanted to multiple properties on doing here installed for the risk Ben Nevis 19 I saw in the props fields and L. story keys on commission distillery banner so set basically disagree for this risky is better and you can also do and set which that's what hatched multisets and basically setting to properties here right the 1st key value pairs regions compliance and then the 2nd 1 was H. 19 and that's how he's that specify the local of course we put things in the data which you can can get things out again so you get the hashes out of them by using hatched get all that gives you all the key value pairs are in there in a or a starting with long the diction wide it shows that that's how that works and you of course you can get values for specific properties as on the ice mention here that although a restored like 19 as a number articles back strict calls to those always going to be straight with some exceptions so that's
what matters us a as an example of a key-value store and as all examples of course also slightly more complex data that data stores are to document it starts there also more richer data types and you can go often do operation could both keys and values in many cases when you see documentation for these documents stores all the examples will show you the data objects and Jason and that does not mean that those things are actually sort of states in the database but it's also a way for communicating authorities visualizing the data that goes in examples of this group part the wanted that the all of the showing is moment because we inelastic search but of course there doesn't more than than these 3 here right so the symmetry can get moment the it's an open source product just like all the all result doesn't believe between you mentioning and did oxidation of using Jason but the source the documents and some to companies in which is binary Jason on this and the interaction with those documents is also true language-specific data structures so if you are using Jaffe right you build like the doctrines of like at the job of build a kind of pattern using PHP you can use speech purees and objects to store data and the driver will automatically word as to to this these and they are and what is not something you have to take care of herself and the interaction however is a bit more complicated than that gratis because it is not a binary particle so you can extension for languages have to interact with that impede called moment of being and but as drivers for 4 any element you can think of the cell so the interactions that were tricky and but it makes it easy to to deal with from Europe from language because it's a natural way all dealing with data in the the documents
are a bit more complex now because can well we have key-value combination right so we have a in at least among the there's always this in this sky the filter just a primary key uh is immutable say culture that's and they have a whole bunch of properties and not just to point out a few like the works on is an array of words and that is the native data type in it's an array of worked and to queries against and then that is is a bit more complicated as love because it's an array all of like objects events so not only can you store it just an array of the chickens on array of those next door and this is basically the same about all the all our documentation so they start something very actual scene moment right so distance I think the only piece the example that I have but this should the straight of other sort of works of short insert something you create a connection to a database and a table name which in moment maybe call collections and we answer to documents as you can see that just simple to PHP arrays of insert between them as key value pairs and it long it's a long documents that many intimate and it's not particularly but in all the languages of works very similar rates
then and look at culture be it's also open so Apache products and the documents are adjacent objects and that is also richer talk over the wire so proud to be exposes a REST API which you can talk would grow on cable TV they should be could be minus the flag goes on all things so they did smaller binary political simple very much service i which is handy because you don't have to write specific language drivers but then it's all smaller binary protocol so you end up having more data over the over so this is a different tradeoff
that made so to have a look at the documents as you can see this is exactly the same document written exception is that every document so that having this unique ID but also this in this court stands for revision which is important will see later when the of the documents and this is something that is generated by the database for you the number 1 2 stock increases every time you data document and then you have a hatch describing the consulted documents so this kind be and inserting and that is Nova I forgot to mention that both momentary encounter the you don't necessarily have to create a database of just get created for you without having to anything which can and so that's what I'm doing here right and posing in this debt demo collection and posting with the the UK which is in this case Derek at locales as by is quite value basically turning them is adjacent and then I give it adjacent that straight containing information they can see here that there do you miss already filled with molten error codes and standards and return value that you get back includes this revision number which is important to know cannot show you why later bytes Search again
is a very different thing it is not so much a database is more like a full text search engine but this also sort of database because you can store documents in and then use fancy words like near real time text search and and basically all it means is that as a database it if you store new documents in that it takes some time for the index to parsers document arts and the full-text search and index that is often the fast on but it isn't instantaneously resin although database waste store something evident immediately available for the right so that's slightly different here the it is based on the scene just like so there is seniors wallet incorrectly Java based storage engine and that knows how to store full-text search intonation basically and as many other tools like Lessig search being bold top of that and probably 1 of the selling points classic search is that is very easy to cluster actually the so to cluster that is sometimes an of doing what I wanted to do it Cambodia and after that I was speaking to conform the user governance then and I was born in the afternoon uh an office for somebody and I was giving a slightly different but intervals elastic certain this connected my letter to the never again it started replicating someone else's data because you're also playing with that's exert I mean of of the importing data and from this side replicating and that nowadays articles configurations so that doesn't it out of the books in sorry yes change that I have to admit yeah yeah you now need to configure your cluster k a color so that's not not from anymore but at our balls and that is it still illustrate the this very easy to class because you just start all the nodes of the network with a single query configuration it and understand replicate and as we it so they don't understand because of all the traditional database and don't really common database and tables that tend to indexes and types and type is basically a collection writers lots of common fields although you don't necessarily need to have a cornfield because they are being of skull solutions again you interact with this with Jason objects and you can do the rest interface or its own helper extension libraries in this case as elastic sets Elastic Search PHP which is something that the people from elastic sort of to and there's this several languages again social commence we see in a
different sense no I don't either discuss and you can insert as in yet again but girl REST API back into the database the curl Urals labor with more complicated boats all and that's practically the same thing right there's not much different the right so wanting adolescence search doesn't really do is again that doesn't make uh will that doesn't really do most of their don't any time you do updates it basically replaces and has to read annex the whole thing the right so a
spoken about p values so that is spoken with about document data start but we also want to have a quick look at relational databases because of all those documents that all those properties that could kind handy and we won't have assumptions in relation database of closer people already use a rider make sort of sense to start some other properties from the document stores and key-value stores as part of of in this case no I'm showing you is much composers got of course all the database will have different things let me just talk about these 2 so the 1st long-term sort of look at how this
is my skull which has support for adjacent i I call this very basic recalls the interaction with that are very different form of PostScript school allotted to new and and there yet also working to called my school documents story but should In my opinion is not a great name because the only thing that really allows you to do is slide could operate like rates removed in the internet did no read that's that's that's all TS not boat that remove and leave because that the bits city and so it's it's a very simple layered built on top of my skull that allows you to interact like it as it was like nose college data I don't think it's been really sense and but this field goes really quickly so slides might be out there it's a little bit but I don't think it's been in the this has been a so this Jason type the
interaction with this is through from strings containing Jason which is a little bit different than them what this rule the rest API for couch Debian elastic such as because you still need to construct this as part of all the all statements and the storage again is better than just on Jason data because that's kind of pointless to certain things they would right and manipulation is dispersive specific as co-operators they are used look part of the standard so at these between mice composed of small arms practically the same and the ball of properties that my about us that that the filter order Indies and adjacent type isn't guaranteed which is sort of the case where the older documents sources all but attempts to know to do anything bad about a meter so in general they should be all right just as a quick look at how to
store something like this in my school so if you look at all my skull command line you can you create table users and yet a jade of fields and we call to Jason type then in there you can insert this Jason documents and she considers a single quotes around it difficult and then all fields in there and so inserting this string as of adjacent document which means you need to be careful about escaping time because the US called self need something to make sure you know the skating focus for example if you had at the quota monitor fields here to make sure escape because it's part of a string the the opposes girl has a
few different data-types international to called H story and which is very similar to q value kind of data type bells of adjacent I but the talk about it because they basically don't want to recommend anymore and adjacent the data is much better anyway so there's no need to use a long of particularly in 1 and the query syntax is is and index support is quite a bit better than myself the who score so again said look at each store 1st so each store is a again a common type yeah which is a list of key value pairs and their boat strength so it doesn't support richer data types here than just scale of values are 48 store field of the people in the case of that's that's probably true but I did not scientists did not trying to look at all this other the shelf later or other goods that all play that Don let's pretend we have only the simple set of documents as so yet it was important that I said is that even though store numbers and women's and there's still strings it comes to predicting career against and while all the senses also a field that is also note clearly although you can index this and for properties that are not necessary part of every document so for example show that a product catalog that could be those extra properties and tag quite a lot of stuff that OpenStreetMap staff and OpenStreetMap users post this woman would quite a molds for starting up data so the small solve descriptions of objects in there that's greatest all but you can't necessarily store all the tax that'll picks can have because it's an unbound set of tags and the way how the rendering of those maps work that for every tag that to put in their skin in rendering scheme I you have specific column and of course you can't have an unbounded set of columns right so what I also knew that a set of properties describing 1 document at all and there necessary for styles and evolve so that hardly ever use that could have been a store so you can still do something with it but use performance but allows you to store all the things that you want so that is sort of use case for that and the Jason V. type
is similar to my skills Jason type but also quite a bit better because unlike mostly connect the index it's and and support better or richer data types in the H star because it supports all the Jason data and the
so that the column times call Jason Bay and yeah we have now this richer will you might remember this this date Jason document from before i because practically the same that I've inserted in boats classic searching couch and mom would be much so it looks like this as so often certain data we need
to sort of query data locals database are sort of useful to get data back out of again and
and yet I saw it sort of show you how would renders right you can see in sets you can check for better at it but key values part of to set are you can retrieve all the earth and members of the key and the hashes yet it h can all to get all the information of key or we can do and gets which you get a set of keys out all the hash and I didn't show you do this but there's also like this that you can set ranges from that you want to retrieve themselves and that's all the data types of the as many more like this so that's how you query there uh or for she can redress the whole property by bytes so got to be queries opera married on
underscore the fields and which is the default now it is possible to set up secondary index and secondary keys as well which to something called MapReduce so you need to define those upfront it's clever enough that it doesn't have to recalculate a few every time inserted Okun because the stores skis anytime you consider a beta documents and notice that the government of the day was very I is she right the 1st very slow and I you I did not know that not thanks for that and so yeah if you look up on that you tend to that only a primary key now I have to say guy be is molten data is another thing that let's assume be a right among them to be for doing various without having setting up secondary cues the from so you can squeeze I when it is is a simple query that you look at uh which is basically all your final 1 find world again in and find all the checkins where d regions like matches Scotland Ireland and a rating is larger or greater than 3 and we only want to store the risky a rating and H fields so this is a quality match there's is match for the query operator and has many more than just GTE centigrade great analytical and there's a projection basic mother says is the following a style that's all that's like select whiskey rating and age from check-ins where we just like the full scope discovered either and a rating is larger or equal than 2 3 would liking this morning the street thinking of dozens of people in the the height of it and the if you had there is also a another way of doing queries is something called aggregation pipeline where you run different operations on a whole collection goes in this case what we're doing is all we want to find all the documents where the regions let's start the skull that direct expression operator and then we do by by region is so basically this says find me all the whiskeys in the region Scotland and then group what then by specific subregion so something about out of this uh a set of things on and then you get something back like this so you have DID fields which is the name of the region and then it has an array of services that Compaq so it's a simple grew by that you do in and of course in like a relational database in exodus returns the array itself doing a single record for single record for which are basically right so elastic Search uh has 2 different ways of doing queries and you can do all this primary key or you can do it is only a field value combinations that simple way so it researching in users can 16 so that's a key value pair count has to match 16 again result back out of that but very less really shines is of course it's full text search capabilities right so you can construct queries like the following like it's a brooding period that all selected due queries moment of others and assess the doctor that I find should match the Serie equals pandering and and and match whiskey should be sharing and in this case bony need to match but you can construct queries where innocent and and or query or a Boolean all to query and so much richer than I can show you yeah i which is really really good for building full-text search and I like have search box saying will give me all the documents all the products that match the keywords of solvents all it yet works really well for that and green Jason objects in my school and I found bit complicated because there's new less operators that I didn't really know France so if we have this table where we store the name of whiskey and properties and this case history properties age constraints and a beefy or courts so the last 1 isn't shared among the documents and we can construct queries say select name at and age formal whiskeys were cast strand equals true and that's what you get out of it right so constraint in this on through animal to get out of it is the name of the whiskey which is there and then the H as a value I'm not sure large showed only single area of expected to show it to In any case loan to point out is that's the way how you match against sub fields is by using the dollar adults that to root of the documents that he has a few main if you'd message field and you can use in all adult and animal the field to match against the squares and the difference between the arrow and double arrows that this escapes that in this case Jason value whereas the single-aisle does not but in this case that doesn't show different because of scalar value but otherwise you would have seen that the right so the each story here you can also
query slightly small example here so what you can do is get a value for k you can again use this arrow operator so finds me for every document in the table finding ideas admin filled out the value of the that feels named in the back and you can find all the unique user counted lies 10 by again using a finding from the age don't Moscati filled with his Derek Abell holes where the country is larger than 10 so they find a document that so you can increase only is almost fields and quite long it also check whether keys exists which of course you sort of need now it can match against key value pairs as well I
so let's have a look at indexes so renders primary only looks at the key the only key to index that isn't quite true because the data structures that you store for each he's like sets and lists now also optimized it's in memory based index so that it's sort of an additional index to but isn't something that is exposed so does data structures can provide a song that additional indexing encounter the indexes as a sense there's by default you have the primary key be index and then you have a secondary keys that you can construct of MapReduce MapReduce is bait basically where for every document edge have it's will even something key-value pairs which then can use a new careers against doubts kind of tricky to explain but any is is sort of part works the output you get back from that is also sort of defined by the few because you define the sort order in there as well do you construct a in a new document the itself yeah source well yes sort of what I've done here is and so you know the secondary indexes you on something obvious scholar design document as part of the database and then you can do queries against those few so that's why there were designed pops up a user query so only show during the map function basic what I'm doing here fot every this key documents that I have I something on on regions luck and then the slope of the document then do a query against then an against as design documents in a box I the view by region because that's what this design documents and then you get a result out of its as they get older rosy Tukey's governed islands you need for why the field because admitted that and then the value Arabic office local to looking at and you decrees against that and bricks quite low but you need to think up from its very clearly how you who own which bits of data don't various on which is something you don't mean needs but a relational database Riddick caveat of course that if you agree on something that you have an index is going to be slow and that is basic into the case for everything that you haven't index right in any database also so for all the back to the demise go version those in a moment so in my skull and becomes closer so different different words starting than among the weekend indexes as I said already there only all only in the sky the film because that's unique primary key so that's on there and we considered of course that indexes on on case and as well as you can set indexes all nested fields we can encourage abusive users don't of course and poses fully candidates as long as the and it was kind of useful that if you set an index all in keen that has an array of values you can actually quality match against work so you can say find me all the documents are words matches dating that's for example and and will still find bad so this kind of honey less 630
indexes on gets very complicated if you want to so because this is a very different database the index to configure full-text search indexes and they are men's for clearing natural language so they have lots of index functionality for for language analysis so you can set of different ways of how strings all texts need to be separated all the about space that is a necessary the best way of splitting up the string into words depending what a subject matter is because of the language and language rules then defines how works are being stand so if you want to be in English you want to have the words that walks and walk not to the same chemical for configure stemmer for English that allows you to them to match those words directly on top of each other and you can to stop words and that's final part of the index so work like the and uh and and the kind of pointers in and it's articles not Santiago to look up on and they take a both space in the indexical seduce also he can't do specific fields to make them more important et cetera et cetera et cetera I mean there's there's multi-style than there and you nexus as I mentioned before asynchronously builds so it's sometimes they a it a bit of time before you can find documents are the answer to the and PostgreSQL you have something called the gini
index you can think about Genesis solution later shown to construct the image and so that 2 types of indexes 4 days in the field said he value indexes or value indexes atom thing of an example showing you how to create the 2nd 1 that but shouldn't matter much and so again we inserting this whiskey in here we have a key-value match after we create index excellent in index in very simple words starts it looks at all the key and value pairs and sort and so whenever you match against it it can find better documents out of this so in a quite a simple way but that doesn't really allow you to do is create an index on specific sub fields I can only index do hold it which is different than what culture be and moment and to less extent elastic to but in most cases that's pretty good right and again you see this and presented I don't know I don't know the name that operator but you can match against key-value pairs again in here and you can also check whether that just like in a moment of your I showed you that the words have a value called all women are right so we have 2 words in the array Glen Ellyn 25 so the gene index is a bit better than just collecting all this like a full search index but it does this and things know the trees of the and from what I believe so yes that is the case in so I you always get that the last of all that you and you ask we can do that there is a set I can't talk about all the functions the databases and attempt so annoying thing is at
my school doesn't do any sort of there's also the only way I can index those days and documents at the moment as a set this field moves quickly is only true generated columns so fertile generative columns so if you want to index on CDA weaving alkyl by volume what you have to do is also my table with the other columns in few flows generated always being always there as properties ABC so clicks out this in the field out all the properties Jason fields create a virtual column out of that and then you can put an index was virtual continents and that's what I'm doing here right it's we create is a beefy annexin whiskey a review which is the value of but out that even though I hadn't store that's in the document so it's not can he you can make is a materialize you but I don't believe you have to and so yeah that is currently on my so the last thing I want to talk about
quickly if manipulating data and the important
word in any database that you have this atomicity you want to make sure that no 2 programs or scripts tracks at the same time can of big data actress relational databases to have like this as it compliance kind of thing and that is not something that no skull solutions often do but at different rates of all making sure that you don't of Big Data incorrect so important words to remember is that
you should never retrieved document debated in your client and then stored back again go that's a very dangerous thing to it get really looking so how red
it has like special operators that you can run and so if this is the sort of my document of science so that we have this property is again when he ages 19 he can run H. in by rich increases the value of a specific property right and we let you specify so what this says is increase the property in this key y 1 and here red is something more than just strings because it knows how to alter along to the string 90 and they back as a string 20 if no you do not that long and that the mean to I don't think it does hexadecimal sleeping so that that as hard as those that have specific operators operate on the data and basically do in the set act is a very similar thing that I showed you before and you can also choose like using a low-pollution L. Paul and that that pushed the form of a people a array Baltimore to start an race can command the kind of cute protocol with an all using this atomic operations operations yes yes almost I encounter be it that
is on say we already have his documents with 2 key Derek at locals in there and we would do we want to start is again it tells you know come to that because we already have a document that is key which is kind of annoying because how would I update document right because of amount of data need to store to keep them and so the way how you can do that is this right thing called into play what you need to specify for a day to work is the are in field that you have built on that previously and cool thing about it is that that's if you do this at the same time from 2 different tracks only 1 is going to work and you also know which 1 is going to work because will tell you either do you revision number or tells you the you get a conflict because D. document that'll just over the revision number you folded loss is no longer there because the 1st rat has completed and stored in revision of so this is how to solve conflicts and so on as of missing data and then the application can decides what to do about about it can decide to restart a document potentially probably overriding the original assayed has been made or do something more complicated where compressor documents and see which those have been evaded an issue that as a new document the way we yes shit yes I know or have have been similar way than readers replication has internal things that sort of in the same way so I should mention that focus my 2nd PHP example for both I too and uh doesn't particularly matter basically what I'm doing here would find 1 is that funny document when their car than they'd D data locally as said the steps mate keyed to the key 2017 0 2 0 2 and then I also 7 thousand 24 2 does not mean that I abated again that the document this is non atomic operation right never ever do something like this so what you can do instead is you can use a bit operator so what these examples as that the going to find all the documents of person equals Derek are which is hopefully unique and then you go to increments do value that we have sort of steps made adults as the 2 0 0 7 0 2 0 2 which is the both operator just like in adjacent but filtered so showed you before is that in array differencing operator and we in 7 thousand 24 to that so this is 1 of the atomic operators that sort of that by gratis had each in why right this similarity and also if you want to the sets among them you can do the same thing so they can alter sets which is routers set ads for and the to the fuel taxi then arts developed effect and again you don't have duplicates in it because it's a set you can also do is you can you push which is very similar to add sets but it doesn't guarantee of typical values so what I'm doing here some owing to tax Mondrian xt with for example and then i have an extra argument to this which is called slice minus 2 would basically says only keep the last 2 in so this sort of allows you to keep only the last and text store and things like that and as an atomic operator and of course most of the data is very many more those of some operators but again come we did talk 1 considered and time so then the last thing I was
I was slide you want to touch on is that in a relation database you of course aware that you have to create a scheme up front right in most of the no skull solutions you don't really do that you don't really do that by configuring it in the database you do that by thinking about is really and think about the bonds more and to make sure that the application knows how to do with the data which is a very different thing than the new information that is very use always start out by how can I store my data the most efficient way right you you create a scheme according that you have no data duplication and so on and so on now I realize that that doesn't always work right doesn't always work as well as a which expected so the normalization now know school solutions for document distortion don't tend to do that's what you intend to do is you create your schema and he's logically depending on how you application interacts which data so that determines how you store your data that that also means that you end up Sorensen duplicate data and but that's all right and you shouldn't be too afraid of that and it's said it's a common local practice but it's Nolde's and bad thing anymore but the it's important that the application can interact richer data in the most efficient way not necessarily how efficient is to store your data and you end up making trade-offs Iraq because some some data base scheme as are going to be really good for retrieving data really fast but sometimes can be more cumbersome to update because you have is duplicated data it have to abated in multiple locations so this is something that you think of when designing its schema how this works no to look a little bit at schema validation is that most no skull solutions will not do anything for that it tends to be pushed to the application and that it is an and also true for the relational databases wave H 2 adjacent days be sure that post well some function and that you can enforce limit walmart but it isn't going to be as rich as enforcing that all all normal skin must be in normal relational databases and then no so there is a salary also not possible to do so many people use only and kind of patterns and toppled that's to deal with data as is to show you moment to be because it does have
something like this it's like if the connection configure per collection a way to enforce a scheme and this is a very basic way of doing it what you can control at that specific fields need to be of specific data types or it fit in specific values the of the house as the you will get it I didn't find the use of the all of that match so this is the this OK I couldn't find it but I really furious and say that and it's a very basic way of doing this so I mean you can't for example say that the only along these fields in becomes execute field from being stored in the database and stuff like that and so on and so on ad but hopefully that's coming in the next release of money be 3 6 where you likely have full support for Jason scheme that which is something I'm going to expect to see coming in PostgreSQL on on Saudi Orlando got solutions as well because section nicer way of defining a schema for sort of free-form data this I think a useful thing to have right so to draw 2 conclusions
before we have some time for some questions yes the scalar data is stored in the all the database is quite different rights and renders it distorted only set so hatches in the document a distorted intensely Jason documents in regard traditionally relational databases you have fields where stored Jason document tend to interact with that as their strengths although solutions are going to be all the pending only but if you want to do which you data so if you already have invested in relation database like most composers on and Jayson or Jayson view sort type is a valid way of doing things and if you want a full text search and that want to certain natural data message such as to that thing if you want a very fast guassian key-value store a key like red is is going to be a thing if you have them need for storing more complicated documents it is sort of how he applications interact with this data like the public's Catholic I showed you before a a document source going to be your new best that all picking there's lots of different things that you could think of by picking the right solution and it is a very difficult thing to say which 1 is best for which use case because it all depends on which day story and how he interpreted at the 2nd give you a a real if you do this then you have to big discount of statements particles is such a vast array of them and that you really need to think about how you how you this kind of stuff but the thing about all sorts of the former to data in how we interact with this up how you want to find data back and disagree G. indexability comes back out of its and how 0 and then you want to data data-manipulation site and as an example it Elastic search doesn't really do updates of documents you can only replace all document for example where it's the boat Radisson relation database you can tends to plot of a specific fields but if you want date adjacent the type field then because this is a single field it is more difficult to abate specific values all the sub fields in that although you can do things like that post tho as so yeah it is a very complicated things and most of all I hope you to show you here is more than what all the different things are out there and sort of the difference between them and more as a as a hint for you to go look at them in more detail because as I said started presentation held to talk about all these things are problem in about 8 hours and which I don't have so with that sense are there any more questions yeah here present
here is the growth are the differences in my school or Maria be Jason V. type I don't believe there are and if they are going to be very minimal said that I would
definitely expect Maria to be to be for a long and in what the features the my skull goes yeah that's my feelings but I've no specific evidence and yet but the more energy than what we have in arrays US also and she's your true room and you know have is yes and in all of these you know all the rules the lots and lots of it so the next time you were in the the variance of all of you the the in the in the moment we have the same thing right as moment of of the yeah because we know that you have money but normally that for example sorry yes they are items only a year ago for the and for the strings and he clearly need to think about backups but different tool to that in a different way and I only know how to be this is really here goes by most familiar that's there is among the read until it comes as a database for we need more but they're all the ways of doing it as all right because I didn't know still solutions tend to be more distributed systems was also common thing to see is that you have an extra nodes that is they're just taking a backup and having backups a distributed set is often that's necessary not saying that you should do it you should always have a better is less necessary because you're going to have multiple copies of a date in the 1st place but things like gratis because this all runs in memory there's not really a way of doing is deemed as a as a as adopted this dysfunctionality for doing that but is not is made for right so you need to take those things into account of that's all about long semantics anything else not article in that case I have 1 more slide and assess thank you very much I'm going to blow the slice of this year all I will call in the
next few days I've not only can you find the slides on this year all I also have a list of resources that points is somewhat of research things that I found while making a presentation if you any questions also feel free to contact me I'm more than happy to answer questions or integrate comments if you have them in the presentation of so that's that thank you very much and enjoy the rest of you that and I