pg shard: Shard and Scale Out PostgreSQL

Video in TIB AV-Portal: pg shard: Shard and Scale Out PostgreSQL

Formal Metadata

pg shard: Shard and Scale Out PostgreSQL
Alternative Title
pg shard: Shard and scale out PostgreSQL
Title of Series
Number of Parts
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place
Ottawa, Canada

Content Metadata

Subject Area
PostgreSQL extension to scale out real-time reads and writes pg shard is an open source sharding extension for PostgreSQL. It shards PostgreSQL tables for horizontal scale, and replicates them for high availability. The extension also seamlessly distributes SQL statements, without requiring any changes to the application layer. pg shard addresses many NoSQL use-cases, and becomes more powerful with the new JSONB data type. Further, the extension leverages the rich analytic capabilities in PostgreSQL, and enables real-time analytics for big data sets. In this talk, we first summarize challenges in distributed systems associated with scaling out databases. We then describe "logical sharding", and discuss how it helps overcome these challenges. Next, we show how pg shard uses hook APIs, such as the planner and executor hooks, to make PostgreSQL a powerful distributed database. We then cover example customer use-cases, and conclude with a futuristic demo: a distributed table with JSONB fields, backed by a dynamically changing row and columnar store. pg shard is an open source sharding extension for PostgreSQL. It shards PostgreSQL tables for horizontal scale, and replicates them for high availability. The extension also seamlessly distributes SQL statements, without requiring any changes to the application layer. pg shard addresses many NoSQL use-cases, and becomes more powerful with the new JSONB data type. Further, the extension leverages the rich analytic capabilities in PostgreSQL, and enables real-time analytics for big data sets. In this talk, we first summarize challenges in distributed systems: dynamically scaling a cluster when new machines are added or old ones fail, and distributed consistency semantics in the face of failures. We then describe "logical sharding", and show how it helps overcome these challenges. We also discuss this idea's application to Postgres. Next, we show how pg shard uses hook APIs, such as the planner and executor hooks, to make PostgreSQL a powerful distributed database. We then cover example customer use-cases, and conclude with a futuristic demo: a distributed table with JSONB fields, backed by a dynamically changing row and columnar store.
Slide rule Order (biology) Context awareness Computer animation Expression Petersen graph Personal area network Coma Berenices Resultant Systems engineering
Greatest element Dynamical system Distribution (mathematics) Scaling (geometry) Equaliser (mathematics) Multiplication sign Demo (music) Virtual machine Set (mathematics) Real-time operating system Computer font Metadata Product (business) Social class Routing Scaling (geometry) Stress (mechanics) Planning Density of states Word Process (computing) Computer animation Personal digital assistant Query language Logic
Functional (mathematics) Building Dynamical system System call Scaling (geometry) Equaliser (mathematics) Decision theory Archaeological field survey Virtual machine Materialization (paranormal) Real-time operating system Replication (computing) Mereology Neuroinformatik Radio-frequency identification Semiconductor memory Single-precision floating-point format Ring (mathematics) Set (mathematics) Energy level Extension (kinesiology) Physical system Boss Corporation Scaling (geometry) Wechselseitige Information Real number Stress (mechanics) Real-time operating system Physicalism Database Cartesian coordinate system Limit (category theory) Particle system Word Computer animation Personal digital assistant Logic Query language Order (biology) American Physical Society Website Riemann hypothesis Energy level Simulation Library (computing)
Point (geometry) Unitäre Gruppe Slide rule Asynchronous Transfer Mode Information Multiplication sign Execution unit Physical law Virtual machine 3 (number) Heat transfer Cartesian coordinate system Mathematical model Dimensional analysis Field (computer science) Computer animation Software Personal digital assistant Operator (mathematics) Cycle (graph theory) Table (information) Partition (number theory) Asynchronous Transfer Mode
Asynchronous Transfer Mode Server (computing) Distribution (mathematics) Virtual machine Water vapor Replication (computing) Scalability Metadata Theory Number Goodness of fit Roundness (object) Different (Kate Ryan album) Operator (mathematics) Single-precision floating-point format File system Energy level Diagram Damping Endliche Modelltheorie Extension (kinesiology) Partition (number theory) Physical system Mapping Server (computing) Bit Database Instance (computer science) Cartesian coordinate system Variable (mathematics) Entire function Word Computer animation Personal digital assistant Query language Right angle Finite-state machine Table (information)
Distribution (mathematics) Equaliser (mathematics) Correspondence (mathematics) Range (statistics) Numbering scheme Mechanism design Core dump Finitary relation Special functions Physical system Social class Constraint (mathematics) Electronic mailing list Drop (liquid) Instance (computer science) Partition (number theory) Category of being Process (computing) Befehlsprozessor Hash function Order (biology) Phase transition Multiplication table Point (geometry) Sequel Computer file Maxima and minima Data storage device Drop (liquid) Regular graph Event horizon Rule of inference Number Subject indexing Authorization Distribution (mathematics) Scaling (geometry) Information Key (cryptography) Server (computing) Chemical equation State of matter Line (geometry) Cartesian coordinate system Limit (category theory) System call Word Personal digital assistant Query language Optics Statement (computer science) Utility software Table (information) State of matter Multiplication sign Set (mathematics) Design by contract Insertion loss Replication (computing) Mereology Usability Formal language Subset Mathematics Exclusive or Bit rate Spherical cap Phase transition Hash function Logic Data conversion Extension (kinesiology) Partition (number theory) Covering space Parsing Logical constant Distributive property Data storage device Physicalism Connected space output Configuration space RWE Dea Right angle Resultant Laptop Asynchronous Transfer Mode Functional (mathematics) Identifiability Table (information) Divisor Virtual machine Metadata Regular graph Field extension Integrated development environment User-defined function Haar measure Consistency Expression Planning Database Particle system Subject indexing Event horizon Computer animation Logic Blog
Metropolitan area network Standard error Asynchronous Transfer Mode User-defined function Divisor Consistency Data storage device Parallel port Bit Cartesian coordinate system Replication (computing) Connected space Number Order (biology) Computer animation Query language Logic Personal digital assistant Order (biology) Touch typing Representation (politics) Right angle Resultant Physical system
Metropolitan area network Asynchronous Transfer Mode Context awareness State of matter Surface State of matter Ext functor Insertion loss Client (computing) Mereology Metadata Neuroinformatik Single-precision floating-point format Computer animation Personal digital assistant Different (Kate Ryan album) Query language Set (mathematics) Pattern language Right angle Pattern language Selectivity (electronic) Resultant Row (database)
Group action INTEGRAL Interior (topology) State of matter Mountain pass Weight Multiplication sign Direction (geometry) Range (statistics) Insertion loss Mereology Replication (computing) Semantics (computer science) Type theory Befehlsprozessor Single-precision floating-point format Multiplication Logical constant Feedback Sampling (statistics) Partition (number theory) Type theory Befehlsprozessor Website Linear map Filter <Stochastik> Point (geometry) Game controller Server (computing) Sequel Transformation (genetics) Disintegration Data recovery Event horizon Metadata Field (computer science) Product (business) Number Revision control Frequency Term (mathematics) Operator (mathematics) Touch typing Computer hardware MiniDisc Dependent and independent variables Multiplication Distribution (mathematics) Server (computing) Projective plane Cartesian coordinate system Single-precision floating-point format Computer animation Query language Personal digital assistant Blog Object (grammar) Table (information)
Standard deviation Greatest element Group action Structural load State of matter View (database) Archaeological field survey 1 (number) Insertion loss Open set Disk read-and-write head Finitary relation Special functions Extension (kinesiology) Descriptive statistics Social class Physical system Metropolitan area network Distributive property Electronic mailing list Message passing Process (computing) Phase transition Order (biology) Right angle Resultant Slide rule Service (economics) Table (information) Sequel Real number Virtual machine Electronic mailing list Event horizon Metadata Value-added network Power (physics) Product (business) Revision control Field extension Integrated development environment Data type Standard deviation Distribution (mathematics) Scaling (geometry) Mathematical analysis Database Particle system Word Event horizon Computer animation Personal digital assistant Internet forum Video game Table (information) Business cluster
Metropolitan area network Email Server (computing) Table (information) Moment (mathematics) Ext functor Insertion loss Electronic mailing list Digital signal Event horizon Metadata Value-added network Event horizon Malware Benchmark Computer animation Radio-frequency identification Data acquisition Table (information) Data type
Point (geometry) Email Identifiability Table (information) Multiplication sign Storage area network Benchmark Radio-frequency identification Data acquisition Newton's law of universal gravitation Metropolitan area network Webcast Lucas sequence Digital signal Port scanner Word Event horizon Malware Computer animation Raster graphics Universe (mathematics) Social class Personal area network Figurate number Form (programming) Curve fitting
Scripting language Metropolitan area network Email Table (information) File format Wrapper (data mining) Server (computing) Bit Automorphism Benchmark Computer animation Data acquisition Quicksort Pressure Table (information) Message passing Resultant
Email Functional (mathematics) Table (information) Linear regression Maxima and minima Cloud computing Portable communications device Theory Value-added network Flow separation Operator (mathematics) Data acquisition Information Statement (computer science) Data compression Metropolitan area network Server (computing) Real number Forcing (mathematics) Parallel port Database transaction Lattice (order) Message passing Event horizon Computer animation Pauli exclusion principle Different (Kate Ryan album) Software testing Summierbarkeit Table (information) Writing Row (database)
Email Regulärer Ausdruck <Textverarbeitung> Table (information) Linear regression Insertion loss Electronic mailing list Mass Storage area network Value-added network Mixture model Workload Inference Order (biology) Flow separation Information Metropolitan area network Server (computing) Client (computing) Group action Process (computing) Computer animation Order (biology) Video game Software testing Table (information) Form (programming) Resultant
Pulse (signal processing) Group action Multiplication sign Execution unit Client (computing) Replication (computing) Mereology Medical imaging Bit rate Core dump Extension (kinesiology) Error message Physical system God Scripting language Metropolitan area network Block (periodic table) File format Physicalism Database transaction Bit Mereology Category of being Exterior algebra Process (computing) Linker (computing) Telecommunication Arithmetic progression Resultant Functional (mathematics) Sequel Virtual machine Online help Mass Regular graph Rule of inference Graph coloring Metadata Number Revision control Causality Medizinische Informatik Associative property Data type Dependent and independent variables Information Military base Server (computing) Consistency Forcing (mathematics) Weight Client (computing) Database Line (geometry) Cartesian coordinate system Human migration Uniform resource locator Computer animation Query language Personal digital assistant Table (information)
the but they are I know you find 1 of the following results to state of prior to sigh to was assaulted will perform in the distributed systems engineering team at Amazon . com today I'm going to talk about you shot sharding scaling expression for I hear about 35 slides and stuff fairly technical talk so if you're questions please feel free to interrupt you for start I have just 1 slide to put things into context as a
quick question prior to this for column your you have heard of sight the and how many of you have heard of PG shots and just to clarify to on PG shot . 2 separate products that complement each other's PG shot targets real-time reads and writes use case so the high-throughput reads and writes use cases in other words it targets the most equal use cases and sigh to is more applicable when you have big data sets and you want to analyze that data in real time so you can think of to still as your massively parallel processing based on both sides of to RPG shock shod and replicate the top in exactly the same way they share the same metadata and that's why the 2 products are fully compatible with each other and also Cyprus TV doesn't for stressful that extends through the plan and execute approach so that's the context like about the 2 products knowledgeable Talk
Outline 1st unworkable by motivating the shot I'm going to 1st talk about the use cases repeating shot is applicable then I'll think about the data how does the shot they the data in the clusters how does the class of dynamic scale out as you add new machines into it and I was a data replicated next 1 will talk about the execution logic the competition the what happens when you send the query to the PG shot clusters how do you not that clear what happens when there are failures during the bottom of the created the last time to preserve the shock for PG shots product roadmap and Mark was going to draw cool y all sum them up and then there uh and this is the talk outline let's start with the motivation and why PG shot where did it come from we initially started by
saying situs only sports particles so you take your data you hope your data and then we had 1 customer who came in and said I like to insert my data in real time reset site to still currently only supports by and they were isn't this course Chris equal can't I just like my library uh to do this as so their entire there'll be a lot of libraries we had the 2nd customer who came in us the same thing and why the 3rd customer we got the hand to the logical step back to and started talking to existing post physical uses you could say that survey of the use cases where PG shot was applicable and what people were doing when they started learning into the mission limitations or single single-pulse critical database we sought to the MSE walls going sharding at the application level so this is you take your data and application that manages the shopping on the replication it is obviously a lot of effort for the and users you need to understand distributed systems need to think through what happens failures in the distributed systems and the 2nd thing that we saw was people took their dating post-critical you normalize that dumped it and I wouldn't almost equal database and of course when you do that you're giving or a lot of the functionality that course critical provides but that we started asking if you had ideal course critical scaling solutions what are your top 2 3 richest of course materials for the top 10 things it makes up a long list of you're curious what are the most important things for the end users and he had to answers 1st people said I don't want to think about how to build was like clusters when I add a new machines and similarly I don't want to think about feel handling in the clusters that should just work the 2nd 1 was you want things to be simple I don't want to set up multiple components and configure them also if I'm using critical 1 . 3 it should just work with false critical line quickly if I want to assume the order 1 . 4 it should just work with that and we took all of
that and and entitled to these 2 architectural decisions the 1st decision dynamic scaling is what I want to talk about next they're we use the concept of logical shots to make scaling up and failure handling for the simplest a decision we leverage for stress equals extension APS mutual capacitance planned and executed they built to read and data from this or memory in other words boss Chris equals executor operate by pulling in the data but if you're building a distributed database you want to take that created transform that created a pusher computations to the worker nodes so your distributed query planned and executed need to be fundamentally different than those apostasy equal and also fully cooperate with false equals logic and will cover this part right after the logical parts let's start by
looking at our looking at scaling up the clusters 1st I'm going to talk about how partitioning has been done in the past let's say you have
3 nodes are you partition your dataset into these 3 a 3rd of the base of a small ball of 3rd would cost more to and so forth and the partitioning dimension and here's time what it could really be anything and the partitioning method could really be anything that ideas they say you have 12 terabytes of data and I'm putting 4 terabytes of it in 1 of the more info table there's 4 terabytes school to another table or to any ideas how this could introduce problems where you want to scale lot when you add a new machine into the clusters In yes so let's say you have an emotion and your clusters now that you the new machine unitary because that's the point of adding a new machine overseas and what you're going to have to do is to transform a large data sets so you're transferring of terabytes of data over the network and then this is already good network and the transfer itself is going to take hours and not only that you need to coordinate the transfer from old 1 mole to animal feed and make sure that they complete and while you're transferring the data because the of the operation you may have failures in the transfer operation itself so getting the cycling this slide is a lot of work and not only it's a lot of work it's also minus the work because all the time it takes to transfer this data now that we've seen the scaling issue that's also take a look at all the others are handled in traditional partition you can also it this is what you see in the in the you know well suited for each partition there if you had hash partitioning so you harsh partition customer I. D. you need to rehash we now introduce application into this picture this is the replication and you're using exactly because in the set up so mode 4 is an exact replica of an old 1 world finds an exact replica of more to let's see what happens when we have a failures so we have a temporary failure in old 1 what will happen is more for will start taking more than twice the in this case your clusters throughput and latency is bottleneck toward for and in the cluster exposes simple big deal but imagine the case where you have a hundred questions in your clusters and you had the failures that 100 wishing cluster is know what to make on that field questions replica so you don't get much out of having 100 which is in your cluster was which will become the bottleneck also if this small doesn't come back up if it's a permanent failures if you need to review the case for terabytes of data from a law that might already be the bottleneck any ideas on how to resolve this issue no place you you want to what you are dynamic model of your units here comes logical
shopping I believe will do Distributed File System HDFS was the 1st solution to introduce this at the system level age difference doesn't partition the data into knowledge it partitions them into a much smaller shots in this case there 512 makes each that number would always configured figures and that in this diagram we have the Beineke Scaling Out case we introduce a new node number 4 into the clusters and then you want to dynamically scale up of the and this is easy your shoddy those adjustments to ensure number 4 for a long long shot number 5 from what I showed with 6 from and if you're a failure during the chances that that's OK because these are small instances you can just restart operation from scratch In other words the rebalancing operations that we want to perform they become much easier much more flexible the 2nd benefit comes with the replication case In this example PG shot at looking at your using a round robin policy and ideas more to molds are exact replicas of each other necessary fail number 1 again in this case shot was long small so shots accessible all outlawed small 6 and so forth so we have a temporary failures you evenly distributing the water across state machines in the cluster each model take all of what 20 per cent more all and if you if this failure a permanent once again the application becomes much easier because you're not he replicating from a single machine you're effectively the implicating from the entire cluster bits and pieces of the charge and the and not so impressive right you only have 6 machines but emerging at 100 machines in your costs in that case failures are much more common and then these alterations will become much easier quick checkpoint question harm you have more of what these problems with traditional partition before world Health there was a good thing and that's
how that clusters maps all over to PG sure this is the example clusters an example PG shot clusters in here we have the work of these were criminals are exact false critical databases there is nothing special about this worker uh and then we have our metadata up the metadata servers variable like so again oppose physical database we're glad you create extension PG shot that guy becomes a Metadata Server that's where you create your distributed table batteries and queries and then there's some theories formed out the Metadata Server can also be called the coordinated or the muscle and then you that's like the guy that we talk to me want to increase and Metadata Server
is the authority on this metadata metadata is time is typically on the order of megabytes so you can easily copy around or even reconstructed other metadata changes on you and 1 of the things happen once you create a new table and as a result you create new shots true you rebuild schools jobs in the cluster the previous balancing example even when you have the feeling I had to develop story of the new machines on you have to rebalance and 3 when you fail right into a particle a charged ambient optic that shows that the to reflect on health and the challenges that are keeping this metadata consistent in the face of failures to make a set of data example concrete a piece piece sequel command line on 1 of the metadata tables here's all the left hand side we have identifiers for shots and on the right hand side we have the highest local ranges that correspond to each shot and as an example of query comes into the system say for a table that's partition customer I. D. PG shots hashes the customer I. using false critical such functions and it's a hash value it found something this metadata to find the hash range that correspond to the hash value and then picks up the finds the shot or shots that correspond to the query again let's say in this case we hash the customer I. D. and we got a value minus 1 . 6 billion lookup lines 1 . 6 billion the stable going all the way down its and I that creates 4 thousand per shot 10 thousand 6 and forward the request the particle machine that's the state of the master node for the worker nodes they are regular post basically instances again nothing special about the worker nodes and the workers want shot placement corresponds to 1 post-critical tables if you log into a worker node blog like slashdot you'll see multiple tables and to ensure that the table names are need P. shock politically extends the table name behind the covers so if your table's name is click on the sporting events Irish iodide is a thousand and 1 the corresponding shots tables name will be clicked on the score events on the score a thousand 1 on the work which 1 thing to keep in mind is that you create your index currently unconstrained definitions before distribute your tables so that wraps up the part on how we lay out the dating PG shots let's talk about the logic the really thinking about the shot is that's flexible hours between usability bicycle functionality coverage after talking to pollsters users we found that they just sort a simple called to get them rolling at the shot is just that it's a drop in post-classical extension mistake rate expression it's there you don't start running single commands against a distributed tables in that sense you don't have to aid in the special functions or anything special to create your data a lot Chevalier stores out there is a conversion already had the motion of distribution baked into their API so sorry about you and value and you have actually carrying forward the request rest equalizer language doesn't have the concept of distribution built into it and that's the bells you to strike here we want sport a subset of sequel the fire requiring any changes to application he a simple example that sets up the shot out the distributed table you go and you say create expression Peter shot you create table customer being more moles regular nothing special post sequel table they're not table will soon become shall table the muscle and to do that walked master create distributed table user-defined functions which designates this customer is stable as a distributed table you also specify the partitioning column in that user defined function value Paul must've create workers shards and what this guy does is the function call extra work Reynolds and then it he plays the table scheme of all the work promotes the index definitions the constraint definitions I increase the shards and they're replicas on the war criminals there to finalize the list of this function 16 is the number of shots you want to have in your cluster clusters this number could be a thousand this film could be 10 thousand it depends on the resources and the size of your clusters and to is the replication factor so if your application factor of 2 in case 1 of the replicas fails that table and state is still available the and when the whole into the planning and execution stage of yes sure that's true so if you have say 100 CPU cost won't last that existing customers are using so you have a 100 CPU cores in your cluster its input to set that number 12 thousand that we'll be able to scale up efficiently like by a factor of 10 x in all we have to figure out connection with what in
almost all of the other you are the master node already has that information the metadata so when you create a new philosophy at workers shots that's when that metadata the shot the shock placement of they think it's filled in of the 1 or there is a PG work called but so there's a config file yes it is that a lot very low the what do do we do this in the hands the people at some point there is if you have a million have a million tables in your clusters and there is a limitation we had 16 in there because the 1st thing people do is they set this up on their laptops so and then they have to databases and that's the value everything if they if you had like a million in their and they would have a million tables on the worker nodes so it's so the typical math that readers not you on a scale typically to 10 x 100 cost of thousand of thousand cause like and tying this into the planning and execution part I don't want any of you are familiar with the internal CAP apples this provides but 1 of the wonderful things that gives us this works I have about a dozen of them with these folks you can take over any part of the commons lifetime on the ground you general switching books together to use more than 1 and started flexible there's more strict contract about what you have to do with them this also means you need to be very careful when overwriting them because of the lack of the cold facts but in summary the whole KPI is he was the 1st class way off looking into the system and cooperating with he is always you are planning phase and just as a side note you can have regular tables and distributed tables within the same database for example you can 1 large a song the table does distributed and of those also regular tables or within the same database and taking a step back a query comes in false critical passes the query we then check if that queries for distributed table if it's for a local table we defer to stress if it's distributed you find the partition key the outlier partition pruning and find shots that are involved in the query we don't take the past greedy or queries and use the same the past logic as imports physical properties to generate lexical state he is a detailed example planning MPEG shot and insert query comes into the cluster post-critical passes it redundant into the feathers and see that the customer really stable is a distributed table are we separate articles on the hash partition column in this case there is only 1 it's not 1 customized equals h and then we apply post physicals hashing function and chat the range that that hashed value corresponds to the antique shop shot involved in the creating for this we use the same course exclusion mechanism as imports this equal and we're done there passed that create the the parts that clearly using prosperous rule which not c to get back the sequel statement or statements to send back those to forward to the worker nodes this step is important because on a worker node you may have like you won't have multiple shots so we need to identify which are want right to any questions on how this planning piece words and then there is a part
of that actually execution you have to be a bit careful here because we have a distributed system parallel requests are going wrong we do have a full consistency on rights which means if you write something you're going to see it when you read it back there is none of this potential to read from a still shop and it advice their results as you do in some of the document stores replicas need to be visited in order in order for the constraints toward and if you the replica right side see during the modification you consider the query a successful if I to applicants failed you markets that at the thought of as the shot the Indian act he was unusual
representation of that and research query comes into the western world we determine that chart 6 is the got shot that we want to hit and in this case the replication factors to so we need to touch tools and in huge collections are possibly or a duopoly user session 1 connection cash they actually do similar to post this safety W and the understand this query to all the molds about have that have those replicas here workable number 3 has failed what number was succeeded so you consider the query as a success and the master node is debatable keeping for the failure it March shot number 6 on them 3 actually on here all the metadata as inactive and I shot No. 6 more standard error in order to restore the replication factor and to do this repair user currently calls a user-defined function the single so that logic
is very similar to the research we find the shards which quoting the result refined shot pushed on the computation get there is also give them back to the users we can dynamically failover to all the replicas while that's happening this will be invisible to the eye and users and 1 difference between the insert is failures currently do not modify the metadata state this is the part that's different than in in search of a use case we're targeting here with a single shot is the common key-value access patterns so we have J so only records building rights and I looking them up
here is an example of the query comes in the maximal upset the where clause is says OK this is shot No. 3 last request to a worker node 1 let's say worker 1 and an intermittent failures and master node finds another worker was shot 3 still in the context of the same great because the client was allowing it to its size agreed to their guesses also give them and use them back to the client and 1 difference between the surface and this is the case is the shot state the metadata and the rest of the world isn't marked because we created so this covers the single-shot select use cases and then there's the
multi-shot select what do you shot those for the multi shot is it pushes down the filters and projections history semantic data to the master node those aggregations on the master node and new operations transformations on the rest of node the part about like fully distribution taking query are fully distributed that site Stevie's rhythm what does have starts to be what linear transformations and will handle distributed joints at what's new I guess the big new parties PG shot is already in production at the start of politics it's has been released in the summer version 1 . 1 was released in April and part of the actually the primary reason we're here is looking for your feedback if you're interested in PG shot if there is 1 or 2 or 3 features that would make you use PG shop please get in touch with us we have the project on the top there are 4 rooms on the time we issues or e-mail us and tell us say if I P shattered range partitioning if it was easier to do fully automated recovery I would use it or I need more sequel coverage please communicate with us in terms of the new parts again version 1 . 1 was released in pairs shoddy period that integration center project which follows 4 to 5 times faster in search and 1 thing to keep in mind is when you're doing research you want to do these control the because each insert is issuing a network-on-chip and none of would number about fixes thanks to Georgia's interior I may be helpful open by George and the other half were often like existing customers going into production he shot 1 . 2 will be released in July it has support for partitioning Michael was the types this is in case you want to partition Y 2 columns are the the 1 like 1 example is you have the idea and use an idea by you want to partition by 2 columns people are currently looking at doing it using composite types and 1 . 2 will have that features it has distributed copies certain aggregate push stumps to the responsibilities and again what fixes that's 1
. 2 what's next for 2 . as it stands PG shots skills not this guy and if you're working the CPU-intensive the single Metadata Server becomes a bottleneck CPU between 20 to 40 changes per 2nd depending what he's doing your hardware this is because the metadata server does the passing on the object and the Sharp 2 point all will resolve that it will go fully distributed and it will fully distributed metadata that we have 4 proposals on the table and we discuss thundering Don conference in the session uncle for state 1 of these designs involves y direction replicating the metadata of the metadata updates to all the moles this approach works best when you're capturing immutable events that is you have a lot of research but not many updates or deletes and again tricky thing there is how do you handle failures like when the sample and all these guys talk to each other all those failures and the 2nd is discussed this period in that it handles concurrent updates and deletes well but requires more work in this design we have multiple replication groups and each created gets why applications primary of course for this 2nd approach to be simple for them and uses the primary and secondary field urologic needs to be robust and automated and we'll talk more about the pros and cons of these designs in our technical blog in upcoming reached to summarize the
shot is a simple shopping extension for apostasy because it uses many small logical shots to make moving around the data faster and easier you can buy this scale up by adding machines or handle more failures thanks to these many small logical shots its standard sequel all special functions and it's the standard was physical extension the sacred text actually create table distribute Table II get rolling and it goes great with Jason we think Jason with you shot may cause physical the most interesting if not the best possible database you 1st while there let's say 2 questions and then go to the will that there was also to get on the move along you get a warning message yes so the warning would be to select on this guy download failed Sinai you'd like that would be not so it's communicated to the news of today are on the the right that
will again be a warning message like for purposes of PG shot yeah it will for the the inactive ones goes for the selected a there's state change for the other you watch see that in the state in the metadata states but it will it will be there and wanted the the use of you can't today people typically use are distributed table with other tables as they do on operational distributed table typically creates a temporary table or view on the metadata survey of the master node and then the join those results to get the linear if if that's something you'd like I don't like if there is a particle use case that you're interested in please feel free to join the discussion last question under agenda the questions after markers then we'll move on from words that are like this it's possible that the meta data services like metadata itself is tiny and so we folder with and in that sense it is if it gets the millions and not give you shot but we have users again quarter using situs TV with a bottom you insurers and then they're doing fine I would expect that hold to present in our problem typically on the order like puzzle machines if they get there be more than happy and I'm very very could be done as in power for the doesn't see I skip that or in the slide you chant copied the metadata either using PG back up or you could have on PBS like Babe actually we could take a back up or with are you can even the create that metadata from the class itself in 2 . 0 0 that will be resolved because it will be fully distributed in a room in my life so it was really like you you can only use so I I think the description of the teachers again open Jason table with dynamically changing growing polymers or I think have scales and so on so we do the analysis you want them you know you this is the 1st future yeah yeah right so I want to do something else the real thing and you know this is such that up so we use the data to the last month of the events of the out of the way it's set it's about 40 gigabytes industry cluster of you find with nodes right so that we were looking at the matter so there's no conceivable and you just want the list of events induced on behalf of publishers and so that's the goal is to have some idea sometimes inside and then there's some of his view of the version of the loss of community next I'm so this table is managed by the shot his distributed so any group you on this on these people goes into like the expected to find you now I said this table with great precision so that we that don't support repetition but we kind sort so the phrase that you get a new dataset of ordering so what we have here is that there's a bunch assured that actually is the back of your head out the year is system real production is not 1 at all within the charts so that in the years about solving this system only works for a hundred 5 years but it will basically users with the current time or at least something I actually what will land that chart and it's all when wrote the job so we need a new 1 and you so I don't think that you probably so
these people are able to handle 1 the users myself because there's something called a lot of stuff in the moment something probably
search really lots of so these are like to show that so this is 1 of the last but 1 of the things you do wish that registers and events and like all means so what has happened is we insert table on server actually we have 1 or more nodes
how many of shots this this notion that at all and so they were these about 300 chirplets and the fight and then that my metadata correctly
and so here's shopping
identifier 104 . 3 that is really catching all users so now have
1 worker nodes also uses it for this 1 university but yeah I have I have 5 rose up to this point in time so there is user due the shirt placement of to another so
isolated from there no words just Israel label like can also start with that people like the figures but I was just to you on I have some the so this will be you know those by events from out of the web casting and a transparent way he was not worried about which is shorter than the white nodes and they had to
be useful as if it was a immediately bit messy result Jason be itemized worries and thing I wanted show you is that the review of having a dynamically changing growing always so there's this sort of usage and knowledge India wrapper which lets you store post-proceedings tables called the format and compress the data actually quite well so i is greater or I wrote a script that actually goes through all the shots and then run the pressures for it so
basically like creates on table which is still easy start uses the existing data there's another place you so I I actually do this was
all the most dates so let's say I wanted to rest of they are among of May 94 and that we reject how big is that right now so it's been we worry because we're going to have both foreign tables and greater tables so this sums the Portable's records in there they were not in the kind of by the and so on all the tables are so where is the 1st thing is very much like the workers from the parallel and oppression functions so that can happen now what will this this is
also the master I can and still run that's an interesting story I can still write operations while impression the compression transactional data table and just replace it on wire but I was at the meeting I set up by the force of that might wrong so this is actually read gets based weakening messages from EEG shared passport humanitarian so that's that's that's all willing and this is all you have a theory about 17 million so you can store from them you have to use now so my compression just from the fact
that the hands of the red hair also really fast defined on that side of the table it went down from Monday's device about 7 sunny didn't the rest all the rest of the pack additionally again like it was said not so for archival purposes works quite well we selected for all the data probably rather than doing what we got switched features rather that I might like might want to run for analytical workloads and then actually present quite frequently but I didn't think just 1 of because I also could you about slashed in in the snow William idea about what year so now of quantitative alert so we have a mixture called the role of a story old guys coloring process in the eyes of the world based on the fact that so notice that you trying to I also promised to general somewhere I actually mean you
uh server during the past 3 well on the simple way now and in future stop working now so what happens if you use something like this in the shot was like you get the mass here and you will get a result right so try to connect to the worker the loss of life about what you're reading in order to use the result of the world so 1 of the worker still still matches you know this guy on inference is worth actually yeah that's the thing that could have led to a more just randomly pick numbered has that that the the know that I get the say that
all of the of the you will get an error message here that you'll get an error message will say that I can't call complete the core it's a results unless you have really that's like earlier this year and the year by white line that goes you something like our 1 of course you know that yeah yeah yeah so what do you think that 1 of the we so the bases like you can sort of nicely with them and that way so what you directly have you is this progress but that's all the shots 1st and others the matters is that it's a bit like this it things that you properties and the not weight of the catecholamines easily from the removal of the mass of Earth and workers was probably the number of columns what there work around the what beach and provide you with that scripts and then I guess if it's important to you can be scheduled prioritize for version 2 . this so shards Jerusalem be dynamically changing color rules stored and unfairly if all at the same time we have to thank the response it is clear from the work on the of the work you like that of the of the kind of work or it's blue and that's a good question I would be close the communication between the master node on the worker knowledge Polish sequel and so as long as the the past a sequel query is not degenerative all the us will not compatible with what the guy would they would be comparable I guess I listed clients I expect on from so . 2 124 I would expect them to be compatible with with this in mind . 3 1 . 3 of the last of here we can see that works you know what's going on here and there but I'm not sure I have had no interest if if if you want to go there there will be any other questions all sorts of if you want to if you what do did you think of others the so this is the current go around the world so that it was only able to be placed on each the shock doesn't cytosine investigation we will we don't want to you know about is very hard process where you have a distributed system you get what you don't know what you what your you want cost you for want the following you have to go all the way around in these units are much less because the interest rate used 1 and they have case is easy to handle the question is what happens when you have failures in your cost like around ensure consistency you also being available this is 1 of the 1st of all it's not supported Ronald ideas for enriching it fully distributed this to have wide it like migration replicate the metadata not not so the design proposals with the 2nd 1 is a it's basically using say physical similar replication and sharding interapplication groups so you could tell which makes the whole lot deeply because you that if there is a and I did is we take out the health information all told the metadata and on the only thing that remains is the short the short placements which is very easy to replicate but let's say you have the metadata here and it could really be cached image and you have of primary secondary set up for you have this you have this and on shelves are here so you sharding into replication groups and the queries will have the chance it's either here here there so the application will manage the shards the health of the shores of the test on on you have to walk the are distributing the database you're handling the causes associates stuff differently in the sense that the metadata without the how the formation may take up to help information it's very easy to keep that metadata consistent and here's the cluster the application will be managing this primary and secondary if this guy fails this these motions in here will be responsible for picking the new primary you be the closest and when the query comes into the system you got the creator god application dependent on application handles not query itself there is a 3rd alternative all this extension would just work on Amazon RDS if I was of course supports the shot extension so it's not something you'd like to see at least of the ground like it will just work it just goes on this already takes care of the higher reliability to your location where you have the higher part this is very easy to resolve and would just work on ideas of the US were you know what you want to fly the you might that on the end of the of the of the year and this in in we not you cannot in this so you can look to distribute transactions that touched the same shot but unlike within that machine and then distribute transactions that span across machines but so use cases not intending to handle like the financial transaction use this started transaction today that money from 1 account you could to your account to commit the transaction allow a constant on different machines are not handling it MPEG shot at all talking about the of so could you repeat the question the block of North Korea some of the special pair of all well there is nothing special what like Paul worker nodes are regular pulse physical tables the thing and you may want to do that is if you want to do the shoddy pair function you can create extension PG shot and that discharge repair function if you don't need a shot at the function you can just use force mystical databases but and they will become available worker nodes I think we're out of time I think you