Five Sharding Data Models and Which is Right
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 88 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/37302 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2018 | |
Production Place | Pittsburgh |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
RailsConf 201858 / 88
9
14
16
19
20
22
23
26
27
28
34
35
36
37
38
39
41
42
46
47
53
57
60
62
63
64
69
72
80
85
87
00:00
Business modelRight anglePoint cloudDisk read-and-write headBusiness modelMixed realityDatabaseDiagramComputer animation
00:48
Point cloudRadio-frequency identificationDatabaseGoodness of fitService (economics)BitNewsletterMobile appComputer animation
01:23
Source codeMereologyDatabaseDatabaseMereologyVacuumReading (process)Goodness of fitScaling (geometry)AdditionWritingLevel (video gaming)BitHeegaard splittingSubject indexingComputer animation
02:32
DatabaseInstance (computer science)Right angleCountingFlow separationComputer hardwareScaling (geometry)Instance (computer science)Video gameMultiplicationData modelComputer animation
03:36
Vertex (graph theory)Business modelGraph (mathematics)Series (mathematics)Event horizonTable (information)Visualization (computer graphics)Utility softwareMechanism designData modelInstance (computer science)Heegaard splittingCovering spaceReplication (computing)Process (computing)QuicksortReading (process)Block (periodic table)Rule of inferenceRoundness (object)BitServer (computing)Peg solitaireLogicNumberComputer animation
05:15
Hash functionRange (statistics)Term (mathematics)Vertex (graph theory)CAN busControl flowMereologyHash functionPartition (number theory)Range (statistics)DatabaseTime seriesTable (information)Functional (mathematics)Human migrationMetadataCartesian coordinate systemNumberSkewnessQuicksort1 (number)Expected valueRoutingLevel (video gaming)BitDivisorComputer animation
09:51
Business modelGraph (mathematics)Series (mathematics)Coma BerenicesDifferent (Kate Ryan album)Data modelDatabaseBitMereologyLattice (order)Computer animation
10:20
Boundary value problemMessage passingQuery languageState of matterCodeCodierung <Programmierung>Personal digital assistantRange (statistics)CASE <Informatik>State of matterCodeMessage passingCodeSpacetimeStorage area networkDistribution (mathematics)AreaException handlingNumberQuicksortBitIntrusion detection systemService (economics)DatabaseBoundary value problemBusiness modelGoodness of fitRange (statistics)Hash functionMultiplication signLine (geometry)Logical constantDevice driverFamilyMixed realityKey (cryptography)Operator (mathematics)Computer animation
14:44
MultiplicationDistribution (mathematics)Personal digital assistantKey (cryptography)Query languageTable (information)Key (cryptography)Term (mathematics)Software as a serviceCASE <Informatik>NumberSubject indexingProcess (computing)1 (number)Computer configurationBusiness modelSerial portQuery languagePlanningRow (database)Line (geometry)INTEGRALComputer animation
17:48
Table (information)EmailNewton's law of universal gravitationInclusion mapSerial portTable (information)Software bugBitKey (cryptography)Row (database)
18:47
Revision controlNumbering schemeDatabaseHuman migrationLimit (category theory)Human migrationCASE <Informatik>Connected spaceDatabase2 (number)Point (geometry)Multiplication signComputer animation
20:07
Parallel computingRead-only memoryView (database)Term (mathematics)AverageSummierbarkeitMedianHypercubeApproximationQuicksortBitTable (information)WindowQuery languageCountingParallel portMathematicsMobile appStandard deviationSoftware as a serviceProcess (computing)WorkloadScaling (geometry)Single-precision floating-point formatCartesian coordinate systemHierarchyWindow functionData typeCASE <Informatik>Term (mathematics)Set (mathematics)Analytic setMoving averageDefault (computer science)Level (video gaming)GoogolApproximationAlgorithmWeb 2.0Goodness of fitData storage deviceExtension (kinesiology)Computer animation
23:54
DatabaseGraph (mathematics)Business modelObject (grammar)Reading (process)Execution unitGraph (mathematics)DatabaseObject (grammar)CASE <Informatik>Vertex (graph theory)Associative propertyConnected spaceDigital photographyLevel (video gaming)FacebookMultiplication signNumberCartesian coordinate systemComputer animation
26:18
Series (mathematics)Server (computing)Source codeTime seriesReading (process)Goodness of fitContent (media)Hacker (term)Perfect groupDatabaseRelational databaseStructural loadCartesian coordinate systemTouchscreenReal numberComputer animation
27:04
Series (mathematics)Multiplication signDimensional analysisTable (information)Time seriesComputer animation
27:47
SubsetQuery languageTime seriesSubsetRange (statistics)Query languageComputer animation
28:34
Series (mathematics)Partition (number theory)Range (statistics)Product (business)Partition (number theory)Multiplication signScripting languageSubject indexingRange (statistics)NumberComputer animation
29:32
MeasurementTable (information)Execution unitPartition (number theory)Range (statistics)Price indexRow (database)Function (mathematics)Formal languageException handlingMultiplication signSubject indexingTable (information)Partition (number theory)Range (statistics)Mathematical analysisSystem callProcess (computing)MeasurementTerm (mathematics)Computer animation
31:03
Series (mathematics)Business modelGraph (mathematics)BitExtension (kinesiology)Partition (number theory)MultilaterationNumberFeedbackMultiplication signTime seriesComputer animation
33:00
Coma BerenicesBlock (periodic table)Data typeComputer animationXML
Transcript: English(auto-generated)
00:13
In time, I think we'll go ahead and get rolling. Hi, everyone. I'm Craig. I work at Citus Data. If you're not familiar, we turned Postgres
00:21
into a sharded distributed database. I won't talk too much about Citus in the talk, but we have a booth in the expo hall. So if you're curious, happy to answer questions after about Citus. So generally going to talk about sharding and data modeling. There's a lot of mixed opinions on sharding, I found. And it's pretty well proven that it scales.
00:41
But how you kind of model your data up front is probably the single biggest thing on the impact that you have for success or failure with it. So first, a little bit of background on who I am. I curate Postgres Weekly. So if you're a Postgres fan, is anyone here not using Postgres or not a fan? A couple of hands, OK.
01:00
I'll convince you afterwards. But I curate Postgres Weekly. It's similar to like Ruby Weekly and other weekly newsletters that go out. Targeted more towards app dev than DBA, so hopefully you find something good in there. I run our database as a service at Citus. And previously, I was at Heroku running Heroku Postgres for about five years. So happy to kind of answer questions on that as well,
01:20
having a good bit of experience there. So what is sharding? There's been a good bit of talk this week, I think, already. Like, DHH was kind of proud of how they didn't shard. And yet they kind of split things into smaller pieces by using different databases. Rails 6, we're getting some support for additional databases. But it's not really necessarily sharding. It's, hey, there's multiple databases
01:41
that Rails is aware of. And now you get to figure out what to do with them. Sharding, by definition, according to Wikipedia, and this is a pretty clear definition, it's just the practice of breaking your database into smaller parts. Usually this is for performance. This could be for write performance, read performance. You take your pick. But it's usually because of performance.
02:01
The nice thing about sharding is breaking your database up from one huge database to a bunch of smaller parts. Everything is small again. Everything's faster. Things like vacuum and indexing are all faster. So by sharding, everything is small and easy to work with again. Now it's very well proven at scale. Google, Instagram, Salesforce, take your pick. Any large company at scale has gone down this path.
02:22
You don't necessarily have to be at their scale to find the benefits of it. I've seen it start as early as 10 gigs of data. More commonly, it's 100 gigs, a terabyte, that level. So first, what is a shard? I think this is the most common misconception I see when talking to people about this. We'll get into the data modeling in a little bit.
02:40
But first, what is actually a shard and how should you approach it? So there's a physical VM. There's an instance on Amazon or a Postgres cluster. A lot of times, people associate, hey, I've got two different VMs. Are these shards? Actually, not. So the idea of a shard, you're going to have multiple shards existing within one VM.
03:01
So there's a separation between nodes and running multiple Postgres instances on it. So if you've got a Postgres database, if you've got multiple tables, schemas, or Postgres databases themselves, all those are technically shards. And this is pretty important early on so that you don't tie yourself to the exact hardware you have. The idea with sharding is you
03:21
want to create a bunch of smaller chunks that you can easily move them around. So you can move them between different nodes, scale the hardware up, scale it down, add nodes, and move the tables across. This makes your life way easier. So starting with this, defining the right shard count up front is pretty key. So to kind of visualize it, if I've got two different physical instances
03:42
and I want to create 32 shards, under the covers I might create a table like events. And I'm going to call it events underscore 1, events underscore 2, 3, 4, et cetera. I'm going to create a bunch of shards on one server first. So if I'm using tables, I'm going to create events underscore 1, underscore 3, underscore 5 on this first
04:01
node, and then the even ones over on the other node. Really simple, but what's nice is then when I go from two nodes to four nodes, I'm just going to take half of those shards and move them. And so I'm going to take them, copy them. You can do this using things like logical replication. You can create a read replica, copy them over, and then
04:22
update metadata, that sort of thing. The actual process of moving them, you need to have some utility and mechanism for. But overall, it's more manageable than, hey, if you've got a shard per node, splitting that up inside the same table is a lot harder. So up front, you're going to create a larger number of shards.
04:40
So on to the five data models. We're going to walk through each of these in a little bit of, is this your data model? Here's the things to think about. Here's the tips and ways to rationalize through it. Here's the ways to rule it out. The number one thing I try to do is actually rule out a data model saying, this won't work, and try to find the reasons why instead of forcing it in. When you force it in, usually you've
05:01
had a bad sharding experience. If you've ever tried sharding and it just failed horribly, you probably tried to force a square block into a round peg. But first, a bit on the approaches. So most people, when they think of sharding,
05:22
I think there's two different approaches. And one of them is much more common at scale. The other is one that commonly gets talked about and I see implemented more, but it isn't necessarily sharding. In a sense, it is, because you're breaking things up into smaller parts. But it's not the same scalable approach that a Salesforce, a Google, someone like that would use.
05:40
So the first one, hash. What you're going to do is, as soon as something comes in, you're going to hash that ID that you're sharding on. You're going to define a range of shards. So in this case, like our example before, we've got 32 different shards. And we're going to split the resulting numbers of that hash value up into 32 different buckets and keep some metadata table saying, this resulting value is here.
06:01
It goes into this sharding. And then we're done. Any questions on that? So it's a little bit more than that, but it's actually not too much more at a basic level. So you're going to hash based on some ID. Like if you're using Postgres, there's an internal hashing function you can use. It exists for all the data types.
06:20
So if you've got text, it can hash it. If it's got an integer, it can hash it. You can roll your own, too. You don't have to get crazy on this. The hashing function you use will not determine the success or failure of does it work for you. You're going to define your shards up front. This is really important. You want to go larger than you expect
06:41
to go in number of nodes. So if you think, hey, at our heyday when we're at Google scale, we're going to run 1,000 servers. That's probably an OK number to go with 1,000 shards or 2,000, something like that. You also probably don't want to go with two. You'll outgrow two nodes really, really fast. So some medium based on how large you think you're going to grow is a good start.
07:02
You can go and split up shards later. That's a lot more work. Once you go through the sharding process, you want to go ahead and buy yourself two years, three years, five years, 10 years, and not have to go back and redo a bunch of work there. So as I mentioned, two is bad. Two million is also bad. Now you've got tables that are so small that aren't
07:22
existing, that are sitting there, that are empty. To run a migration on those is now costly. In production, I tend to like things like 128, 256. You also don't have to grow in factors of two, but it's usually really nice and easy and works pretty well.
07:43
So a common misconception is that people just route the values. So you have user ID one, and you say, hey, I'm going to take users one through 10 and put them in shard number one. Then I'm going to have shard number two, and users two through 20 are here. So that's a huge problem. What you find is your oldest customers
08:01
are the ones with the most data. They've been around the longest, and now they're all saturated together. By hashing things up front, you naturally distribute your data pretty evenly. Yes, you may have a hotspot, but it's much, much better way of naturally distributing the data up front. So if you take the user ID one, take the hash value of that in Postgres, it's 46,000.
08:23
The hash value of two, as you can see, like 27,000. And what I'm going to do is have a table that I say, split up the full range of hash values. And between 26,000 and 28,000, that's shard number 13. And I'm going to have a little lookup table either in my application or my metadata in my database.
08:43
If you're using Citus, you don't have to think about this. It's in there for you. But it's the same idea, where this shard, all the resulting hash values live in this range. So you've kind of naturally evened out the skew of your data. It also makes for, as things grow and get older, new ones kind of redistribute. So it works out really nice.
09:06
So onto range, I see a lot more people implement this than actually true hash-based partitioning. It's really common in time series. So what you've got to do is ensure you have a new bucket. So let's say you're doing range partitioning, and you want to keep each week of data by itself.
09:21
Maybe it's an ad network, and you want to count impressions by week. I don't need to join across other weeks. I can do aggregations differently, that sort of thing. So you're going to create a new table for every week of data that comes in. Here, it is pretty straightforward. You create the new range, route to the right bucket, make sure you keep creating the new ranges. I'll talk a bit about this on the time series data
09:41
modeling. But that's an important part, and you're done. So range is a little more straightforward. If you've done it before, you've probably done it this way. It works just like you'd expect. All right, questions? All right, so there's actually quite a bit more. So that's the basic approach. Those two approaches apply pretty broadly to sharding.
10:03
Now, the more interesting part is the data model. If you don't get the data model right, you're going to have a lot of problems. You're going to be trying to join data that's across different nodes. Moving data across the wire is really slow. We haven't fixed that in databases. So it's a matter of how can you co-locate data based on your application?
10:21
So first, geography. So the big question here is, is there a really clear line for geographical boundary? And this is like a 99% of the time. You don't want to do this 100% of the time. There may not be 100% of the time. One time, I was getting picked up at SFO airport in an Uber,
10:42
heading home. The guy I was talking about recently, he picked someone up at SFO airport. It was the day before Thanksgiving, and they said, I can't go see those people. Can you drive me back to LA? Not kidding. He went from San Francisco to LA in an Uber because he didn't want to see his family. Now, normally, I think of Uber as having confined region
11:03
boundaries. Most Uber drivers are not going to drive from San Francisco to LA, so a trip's always in San Francisco. This is usually a pretty good regional boundary. There's a lot of services that are good examples for this. So income by state, health care has some defined geographic
11:21
boundaries. Grocery delivery services like Instacart, those sort of things. If there's a really clear boundary that you don't go out of, this can work pretty well. There are some bad examples, though. So while that's the exception case with an Uber ride from San Francisco to LA, most of their data
11:41
works just fine. And you can deal with those exception cases as they come. Bad examples are when there's one side of the data that has geography, and the other side of it has a exact opposite geography. So text messages go from one area code to another. I don't think people, I grew up in an age where once I moved, I didn't get a landline. I just kept my cell phone.
12:01
So I still have an Alabama phone number out in California, and I don't actually pay attention to the area codes. So it's not localized anymore. When your data always spans geographic boundaries, that's usually a problem. Do you join across geographic boundaries? And so the key here is join, not aggregating.
12:20
So you can aggregate pretty easily across geographic boundaries. Roll-ups work pretty well that way. But when you're joining from data in Florida to California, you're going to have to move that over the wire. So what you need to find is, how much are you joining across those geographical boundaries, and are they really clear? Does data easily move from one place to another?
12:42
So for something like Instacart, where I have my home grocery store, it's OK if I move and they change my geography. That's a one-time operation. That's not happening constantly. At large scale, yeah, it's happening constantly. But it's not like every user is doing that constantly. So if you have something that's happening less than a few hundred thousand times per day,
13:01
that's fine for it to move from one spot to the other. So a few more specifics. You want to find the right mix here, especially from granular versus broad. State, I mentioned, is kind of an OK example for some things. But now you've only got 50 states. And to distribute that, what happens
13:20
when you've got California or Texas, which are huge, and you've got a bunch of other states that are really small? Doing something like zip code might actually be better if you can divide things that way, because your data SKU is a little bit better. So again, some common use cases here.
13:43
If you're Instacart, Shipt, Uber, Lyft, these use cases work really well. You should know if you're in a go-to-market where you start in one geography, then expand to the other, then the other, then the other, this model could work really well for you. Now, in the real world, I see this happen a bit
14:01
with range charting, saying like, oh, California gets their own database. Texas gets their own database, that sort of thing. This is a problem because of some of that data distribution. Now you've got 50 different databases to manage. You haven't done the proper work of hashing things as they come in. So what you still want to do is hash those IDs.
14:20
So give each state an ID, or you could actually hash the text value. You could hash the text value of California and do that same thing and create, if it states, probably, again, a bad example, if it's zip codes, I can create 32 buckets and distribute all the zip codes across those 32 buckets pretty easily. So you still want to go through that hash space charting actually to make this work.
14:41
Otherwise, you're going to run into pains as you start to scale. So that's one. Multi-tenant. This is my favorite probably because it's the easiest. And I won't say sharding is ever completely easy, but it's the one that's pretty straightforward.
15:00
If you have a SaaS kind of B2B business where each customer's data is their own, like salesforce.com, I can't look at my competitors' leads. I only get to see my leads, my opportunities. They are sharding my customer and isolating each customer's data, it's a pretty straightforward process. What is the data distribution? This is the number one thing that'll probably rule out
15:21
the multi-tenant if you have a SaaS business. If one tenant is half of your data, well, obviously, this is going to help for the other half, but it's not going to help for that really large one. I don't usually get worried when one customer is 5% or 10% of the data, there's still some good headroom there and there's some options there.
15:40
But looking at your data distribution, it is really key to know if you're going to be successful or not. So again, common use cases here, SaaS B2B, I see a bunch of CRMs, marketing automation, any kind of SaaS where their data is their own. Even in some cases, banks make sense here. Even though it's a B2C business,
16:02
usually when I'm looking at my finances, I'm not sharing that with anyone and I hope they're not. So is the data isolated to that customer? In which case, it's a pretty clear line. So a few guidelines for this. And you can start this really, really early. When you're just starting out with your first Rails model or your second Rails model,
16:23
put your tenant ID or customer ID on every table. It's going to make things like joins, foreign keys, all of that massively easy. You're not going to have to go back and backfill that later. So yes, you're denormalizing, which yes, in theory, is bad. In reality, it's going to make things much, much easier when you do need to distribute things.
16:42
Ensure your primary keys and foreign keys are composite ones. Rails support for this is OK. There's a few ways to kind of work around it. It's not dropping out of the box, but it works pretty well. And I've seen it work at scale. So here, basically, you're saying my customer ID plus this typical primary key that's a serial on some table
17:03
is my composite primary key together. You get the same referential integrity. It also will give you some nice gains in terms of performance. Usually, by indexing that way, it's more targeted, and Postgres could take a different plan in that case. And then enforce tenant ID on all queries.
17:22
So by going with this model, if you make sure that, hey, you've got a where clause where this customer ID equals foo on every single query, you know there's no data leakage. There's actually a couple of gems for this. The one I most recommend is Active Record Multitenant, which just has a little decorator that you put, and then it applies it
17:41
on every single query, every model, takes care of it for you. So if you're going down the path in Rails, I would recommend it. So a quick look at a little simplified example of Salesforce schema. So you've got your leads table, accounts, and opportunity table.
18:00
Here, you're going to change it just a little bit and add org ID onto every single one of these on every table that you have. And then as you have your keys, you're going to have a primary key there that's consisting of both of those. You do this up front early on, it'll save you a lot of headache later. When you try to actually come back and change and add these columns later when you've got a 10 terabyte table,
18:22
that's going to take a long time, and it's not going to be very fun. Also, you're going to find you probably have anomalies in your data. So a lot of people are like, oh, no, no, this isn't possible with our data. I've seen when they started to backfill that stuff. It's like, well, I'm pretty sure you have two records for this customer. Which one is right?
18:41
And you never know, and you just have to randomly delete one. True story, it really happens. So a few warnings. If you're using schemas, which I see a lot of people do, do a one schema per customer or one database per customer.
19:01
Be careful. So this works great when you're at 10 customers. It works OK at 100 customers. When you're running a Rails migration and you're running it for 10,000 customers and you have to run that across 10,000 different schemas, that thing that used to take a few seconds now takes a few hours and a few days. I've heard actually of cases where schema migrations didn't take days to run.
19:21
And migrations never fail, right? So you never have to go back and clean anything up. So you also then have to worry about connection limits, because you're grabbing a new connection or setting the schema on that search path each time. Now if you've got 10,000 customers, you've got a database with 10,000 connections,
19:42
guess what? Usually Postgres doesn't like that many connections. So I would say, in general, think twice about going one schema or one database per customer unless you know you're only going to have 10 customers. If you have 10 customers, absolutely go that approach if you're never going to scale beyond that, which is valid for some businesses if you have a really, really high price
20:01
point for certain high end customers. All right, so entity ID. So here, entity ID is interesting and kind of a hard one to describe from a textbook definition. But it's something more granular than that default hierarchy. And here what you want to do is actually
20:21
optimize for parallelizing your workload. And you don't want one query to run against a single shard. You want a query to run against every shard. And here what you're trying to do is optimize for performance of parallelization. And this is more common in use cases like ad networks, a lot of kind of log monitoring tools,
20:41
that sort of thing where you've got a massive amount of data. So you do want to join where you can, but you want to say, hey, where can my join be pushed down to these tables and spread out? So again, optimizing for parallelism, less for your data and memory, less for your standard kind of typical SaaS OLTP app.
21:05
So a good example, web analytics. Here it's really common to shard by visitor ID or session ID. Both of these work really well. And because when I'm looking at things, I'm either looking at an aggregation of sessions or visitors that happened, or I'm looking at it for one specific user,
21:22
that I'm not kind of joining of, hey, give me user A that did something against user B. Like if you're joining across those, that's where it gets complicated. Most web analytics tools are just doing aggregations, right? And those roll up pretty well. So here are some things to think about. SQL is going to be more limited.
21:40
You can't have the full set of window functions and CTEs as easily, because to do those sort of things, you've got to bring back data to a single node and resort, reorder it, which is really expensive. If you're bringing back a terabyte of data over the wire, that's going to be slow. Nothing's going to change that. You really kind of want to think in Hadoop map produced style.
22:00
Can you actually split this job up into a bunch of smaller parts, run them locally, do pre-aggregation, and bring that back to a central point? If you can express your application and workload in that sense, this will work really, really well for you. And it actually scales really beautifully. I've seen cases of versus a single node postgres, like 100x performance over a single node in these cases,
22:22
if it's what you need. So a few examples, like count star, right? If I've got 32 shards, I can do 32 smaller counts, bring those back, do the aggregation. Average, also really easy. I can do sum and count, bring those back, do the final math.
22:43
Median, that's a little bit harder. Now this gets cool, though. Medians, window functions, all these things are very cool and very powerful. So for count distinct, there's an algorithm, hyperlog log. Go read the paper.
23:01
I think it's out of Google. It's really, really fun. I also love just saying hyperlog log. It's like probabilistic approximate distincts. That's really, really close. And you can do things like union them, say give me users that I saw on Monday but I didn't see on Wednesday based on this data type. Really compact and efficient in terms of storage as well.
23:24
So for ordered lists, there's top N or top K. It's really common in Elasticsearch. We actually just released a top N extension at Citus, I want to say like a month ago. So if you need top N for your Postgres database, give it a look. Median, there's things like T Digest and HDR.
23:42
So there's probabilistic data algorithms that you really, really close here. They're not perfect, so if you need exact things, these are not going to work. But if you do go down this path and need some of these things, give any of these a look. All right, so a graph database. So this one, you'll really know
24:02
if you need a graph database. The most common use cases are social networking and then actually fraud detection. And here you're looking for a high number of connections. It's a very, very different approach. Like Craig posted a photo which might be some value within my database. Daniel liked that, which is the vertices.
24:22
And then Will posted a comment on it. So they're composed of essentially objects and associations. And you can think of it as two ways. There's things that can occur just once, and there's things that can happen over and over and over, like comments. I can only like a thing one time. Those edges are those things that can only occur once,
24:41
and the other objects can reoccur in different ways. So you can think of liked as true or false. Sharding within a graph database, the most common approach is you're going to write the data twice. So you can say, oh, do I shard on objects or do I shard on the associations?
25:01
And the answer is yes. What you're typically going to be doing is querying based on different things. So for a news feed, you're going to query on maybe associations. On other pieces, you're going to query the details about the objects, or vise versa, depending on what you're doing.
25:24
That's the high level of it. If you're doing a graph database, or using a graph database, there's a newer one in Postgres, Agen's graph. I don't know anything about it. I haven't tried it yet. I've looked at it, and it looks promising. The most common one out here is Neo4j. But I would actually recommend reading this paper.
25:43
So probably the most sophisticated graph database that is heavily sharded is Facebook's. It's not open source. It's not public. It's called TAO. This walks through a lot of it, including some of the sharding principles. It gets pretty deep pretty fast, and the applications are very, very
26:01
different from a relational database, or even like text search databases. The short is you're going to shard it both ways based on your associations and your objects. And I believe Neo4j has some built in tools for that. So probably just leverage those.
26:21
Time series. This is actually real data from Hacker News. I won't say Hacker News is a perfect source for anything, but if you do read it, there's good content there at times. And this is the data from their Who's Hiring on Hacker News.
26:41
It's screen scraped from years, years back. What I love is Postgres is crushing everything. It's clearly the better database. Also interesting is most things are relational there. As cool as NoSQL is and everything, the primary load of most application is still a relational database.
27:01
And I don't think that's going away anytime soon. All right, so time series. A lot of people think, hey, I have time on my data. I have a created at on all of these columns or on all these tables. Let's just shard by created at. Or I have analytics data.
27:20
So time series it is, right? Because it has a time based dimension. Time series, you can always do that, but it really depends on how you're accessing the data. Are you frequently joining things that span across time dimensions? Are you pruning off old data? One of the biggest benefits to time based sharding
27:41
is getting rid of the old data. It's not just to make the recent data fast. That is one piece, but it's also to get rid of the old data. So are you always querying time? Are you querying a subset of the data? And then do you remove old data? These three things, if you do all three of these, then time series is actually a really good fit for you.
28:01
If you do one of these, it's a little more questionable. If you don't do any of these, then you don't have a time series problem at all. So if you're querying over long ranges always, if you're always querying two years of data, it's not going to help you that much. If you're not removing the old data, if you're not querying on time at all,
28:21
hopefully you haven't thought about this at all. So this is really good. And this kind of really varies. You can make it work, but I've seen a lot of people go down this path and run into various pains along the way. So with time series, you're going to do more range partitioning than you are hash-based sharding.
28:41
As data comes in, you're going to create a new range for that new bucket. It could be daily, weekly, monthly, yearly. You get to pick. And you're basically going to split that up. So the key steps is to define your ranges. Make sure you set up enough in advance. The number of times I've seen people run a script once and say, I'm going to create all of my buckets right now.
29:03
They create two years worth of buckets. They think they'll set a reminder, and they'll totally remember before they run out. Two years later, production is down. And they're like, well, crap. We should have automated this when we started. Make sure you're also deleting old data.
29:22
Dropping the old partitions keeps things snappy. Otherwise, it's still just growing and growing and growing. And you could do all of this with just some nice indexing without having to do all the extra elaborate work. So I don't know if any of you are in the Postgres 10 talk. Postgres 10 got native partitioning, time partitioning.
29:40
But it's got a few rough edges. So walking through it here, you'd come in, create a table, and you'd create, say, partition by range and the column you want to partition your data by. So this is the initial setup. Then you're going to come in and create tables for each of these partitions.
30:01
And you're going to say it's from this value to this value. And you're going to do this for every single one. You usually want to schedule this at a job to just keep running. If you don't have the table, it's going to error. You're just going to lose the data. You're going to want to create your indexes up front. If you come in later and just
30:22
want to create index on that initial measurement table, it's not going to do anything for you. So you've got to do this on each specific table. So as you add new indexes, you need to go back and index the old ones, or you need to automate this in some way. And then you're going to create a trigger. And a trigger is basically going to run automatically
30:41
when you insert data into that measurement table. It's going to say, let me do some analysis. And if my range is between this and this, go here. If it's between here and here, go here. So I am very excited about this being in Postgres 10. I wouldn't call this native partitioning personally.
31:00
It's still got a little ways to go, I think, in terms of user experience. So a few tips on that. There is a Postgres extension that smooths out all of the rough edges called pgPartman. pgPartman will automatically create new partitions. As you run a create index, it'll create on all of them. You can schedule it to automatically drop old ones.
31:21
If you're doing Postgres 10 partitioning, I would not use it without pgPartman. Postgres 11 should be much better and start to remove some of that need. But I would highly recommend using pgPartman. So that was really fast. A little high level, but a little bit of detail
31:41
and hopefully tips. And going through each of them, my biggest feedback would be charting is not always easy. But if you need charting, doing it earlier, getting the best practices in, if you're in that multi-tenant model, adding your tenant ID
32:01
to every table, it's going to save you a whole lot of time later. So starting to do the groundwork now is a good bit easier. Identifying the right approach is really key. And I would say try to rule out approaches. Don't try to force it. Be like, oh, we have time series data, so let's go with that path.
32:21
I would try to see where will it break first off. It's not going to be 100%. It's going to fit 98%. If you can get to that 98%, you're in a good spot. I think charting was really, really rough a number of years ago, and there's a lot of tools, both with Rails, libraries, extensions to make it easier,
32:40
and native things that have come in Postgres. So it's gotten a lot easier than it was years ago. So again, it used to be a lot more painful. I wouldn't call it fun. It's not my idea of a Friday night. But it will scale. Once you do it, you don't have to worry for years about scaling, essentially.
33:01
All right, cool. I think that's it, and I might actually have a couple of minutes for questions. All right, cool. Thank you. Thank you. Thank you.