Shootout at the PAAS Corral
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 29 | |
Author | ||
Contributors | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/19146 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Ottawa, Canada |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
PGCon 201523 / 29
1
2
3
6
10
11
12
13
14
15
17
18
21
22
23
25
26
29
00:00
Internet service providerPoint cloudMultiplication signClient (computing)ArmInformation technology consultingService (economics)Cloud computingBitDifferent (Kate Ryan album)Computing platform2 (number)
01:42
World Wide Web ConsortiumFirst-person shooterCloud computingTerm (mathematics)Right angleInternet service providerMultiplication signPoint cloudFreewareDifferent (Kate Ryan album)Cloud computingServer (computing)Content (media)Data managementDigitizingGroup actionComputer animation
02:41
Enterprise architectureCloud computingDatabaseGrand Unified TheoryDistribution (mathematics)Service (economics)CodeComputer configurationPoint cloudInternet service providerAreaMoving averageClient (computing)Point (geometry)Web serviceClosed setService (economics)Distribution (mathematics)Different (Kate Ryan album)Cloud computing1 (number)Graphical user interfaceServer (computing)Multiplication signData managementComputing platformCASE <Informatik>Drop (liquid)Sheaf (mathematics)Computer animation
04:31
Scalable Coherent InterfaceDew pointSign (mathematics)ScalabilityMassLarge eddy simulationValue-added networkEmulationComa BerenicesFormal grammarWechselseitige InformationMUDData managementJava remote method invocationDomain nameElasticity (physics)Web serviceDirac equationData storage deviceLogic gateIdentity managementMaxima and minimaArtificial neural networkPlastikkarteDataflowMountain passHexagonCloud computingEuler anglesMetropolitan area networkNumber theoryDiscrete element methodService (economics)SummierbarkeitRaw image formatUniform resource nameDDR SDRAMArc (geometry)CloningWide area networkPiLevel (video gaming)SineSpecial unitary groupInstance (computer science)Computer configurationComputing platformData typeRevision controlExtension (kinesiology)BefehlsprozessorMultiplication signCache (computing)DatabaseSpacetimeResource allocationService (economics)Data storage deviceElasticity (physics)Instance (computer science)Right angleOrder of magnitudePoint cloudBitData warehouseComputer configurationSoftware frameworkQuicksortServer (computing)DatabasePairwise comparisonComputing platformInternet service providerSlide ruleChemical equationComputer hardwareSystem administratorRevision controlTerm (mathematics)Different (Kate Ryan album)Hecke operatorType theorySemiconductor memorySeries (mathematics)Entire functionNumberCloud computing2 (number)Multiplication signEqualiser (mathematics)SoftwareSocial classMiniDiscVolume (thermodynamics)Cartesian coordinate systemBlock (periodic table)Band matrixLimit (category theory)Axiom of choiceHigh availabilityHard disk driveStrategy gameBackupMereologyMassWritingPlastikkarteData recoveryRAIDInstallation artClassical physicsMathematicsStructural loadOperating systemPoint (geometry)Euklidischer Ring1 (number)NeuroinformatikMathematical optimizationSystem callAbsolute valueResultantBefehlsprozessorExtension (kinesiology)Virtual machineSource codeXMLComputer animation
10:46
Computer configurationData storage deviceInstance (computer science)Insertion lossLimit (category theory)Web pageLoop (music)Price indexReplication (computing)Time zoneCore dumpRow (database)Instance (computer science)Line (geometry)Data storage deviceCASE <Informatik>Operator (mathematics)Software developerDegree (graph theory)Mobile appMultiplication signDatabaseBasis <Mathematik>Cloud computingResultantMereologyAnalytic continuationBackupServer (computing)Point cloudPoint (geometry)Symbol tableDatabase normalizationQuicksortNumberComputer hardwareSubject indexingProjective planeLimit (category theory)Different (Kate Ryan album)Cellular automatonGroup actionRandom accessRight angleSoftwareComputer configurationGoodness of fitTerm (mathematics)Replication (computing)Event horizonBit ratePlanningBenchmarkClient (computing)Software maintenanceClassical physicsWebsiteRandomizationSoftware testingQuery languageLoop (music)Computer animation
16:22
Instance (computer science)Data managementDatabaseService (economics)Physical systemBackupComputer configurationExtension (kinesiology)Information securityRevision controlCloud computingData typeReplication (computing)Time zoneMultiplicationRadio-frequency identificationMoving averageView (database)World Wide Web ConsortiumType theoryLatent heatExtension (kinesiology)Time zoneReplication (computing)Revision controlAutomationComputing platformSystem administratorSynchronizationComputer configurationDatabaseService (economics)Flow separationCloud computingGraph (mathematics)Group actionSoftware developerBenchmarkMereologyMusical ensembleTerm (mathematics)Multiplication signInstance (computer science)Data managementQuicksortComplete metric spaceBlock (periodic table)Beta functionData recoveryData storage deviceView (database)Combinational logicOperator (mathematics)Single-precision floating-point formatPoint (geometry)MathematicsNeuroinformatikClient (computing)Point cloudBitForm (programming)Roundness (object)Internet service providerBackupOcean currentInterface (computing)Server (computing)Universe (mathematics)Axiom of choiceDatabase normalizationBasis <Mathematik>Information securityPhysical systemHigh availabilityRelational databaseSet (mathematics)State of matterConfidence intervalWater vaporUltraviolet photoelectron spectroscopyProcess (computing)BuildingComputerGraph coloringACIDBit rateSurfacePresentation of a groupAnalogyReal numberFile archiverComputer animation
23:58
Grand Unified TheoryGroup actionOnline helpBit rateCache (computing)Standard deviationEmailBroadcast programmingSystem callDatabaseLevel (video gaming)Instance (computer science)Computing platformData typeCloud computingRevision controlExtension (kinesiology)Point cloudHybrid computerTask (computing)Data storage deviceBlock (periodic table)GoogolDisintegrationCore dumpInstance (computer science)System administratorPopulation densityPoint (geometry)BenchmarkLevel (video gaming)Online helpEmailComputer configurationBlock (periodic table)Term (mathematics)Data storage devicePoint cloudServer (computing)Core dumpEquivalence relationSoftware testingRegular graphDistribution (mathematics)Latent heatComputing platformGraph (mathematics)QuicksortCloud computingINTEGRALComputer programmingInternet forumCartesian coordinate systemSoftwareMatrix (mathematics)DatabaseVirtual machineSoftware developerService (economics)Independence (probability theory)High availability1 (number)System callSpacetimeView (database)Branch (computer science)Limit setEndliche ModelltheorieElectronic program guideStandard deviationAddress spaceMathematicsXMLComputer animation
28:59
Block (periodic table)Point cloudBackupData storage deviceInstance (computer science)Vertex (graph theory)Vulnerability (computing)SoftwareData storage deviceInstance (computer science)Cloud computingHigh availabilityBlock (periodic table)Client (computing)DigitizingComputer configurationComputer animation
29:41
Cloud computingBenchmarkWorkloadQuery languageRandom numberChemical equationFreewareCloud computingSoftware testingQuicksortGroup actionMultiplication signLatent heatComputer configuration1 (number)Shared memoryRandomizationChemical equationBenchmarkComputing platformQuery languageWorkloadTerm (mathematics)Row (database)WritingSingle-precision floating-point formatRight anglePresentation of a groupComputer animation
30:48
Read-only memoryMiniDiscDatabase transactionWritingQuery languageKey (cryptography)DatabaseStructural loadInstance (computer science)Condition numberDifferent (Kate Ryan album)Computer configurationInstance (computer science)WeightEvent horizonQuicksortWater vaporTunisSoftware testingRow (database)Single-precision floating-point formatRandomizationDefault (computer science)DatabaseMetric systemDatabase transactionSemiconductor memorySlide rulePoint cloudMiniDiscTerm (mathematics)WritingRead-only memoryMixed realityOperator (mathematics)Multiplication signBuildingComputing platform2 (number)Service (economics)BenchmarkInsertion lossBefehlsprozessorCondition numberSubject indexingStructural loadVariable (mathematics)SoftwareCASE <Informatik>1 (number)ResultantPairwise comparisonReading (process)WebsiteRight angleGroup actionFilm editingMeasurementDiagramComputer animation
34:25
Metropolitan area networkInstance (computer science)Multiplication signMaxima and minimaMedianSoftware testingComputer configurationService (economics)DatabaseComputing platformTime zoneMultiplication signData storage devicePlotterCuboidInstance (computer science)Different (Kate Ryan album)Variable (mathematics)Software testingComputer simulationIdentity managementCloud computingQuicksortShooting methodSoftwareFluxWordPairwise comparisonMereologyMaxima and minimaInferenceComputer configurationSet (mathematics)Forcing (mathematics)Response time (technology)Range (statistics)WorkloadVirtual machineNumberGraph (mathematics)MedianRandomizationStructural loadPartial differential equationDivisor2 (number)Revision controlComputer animationDiagram
37:32
Moving averageMedianRead-only memoryStructural loadWater vaporQuicksortArithmetic progressionDatabaseRun-time systemFile archiverInstance (computer science)Insertion lossStructural loadBitService (economics)Multiplication signSemiconductor memoryVideo gameMereologyThermal expansionArmSlide ruleProfil (magazine)BackupData storage deviceSet (mathematics)MeasurementSocial classSoftware testingComputer configurationResource allocationOverhead (computing)Stability theoryRight anglePoint cloudGoogolComputer animation
40:09
Read-only memoryStructural loadWeb pageMiniDiscMulti-agent systemReading (process)WritingRaster graphicsAsynchronous Transfer ModeReal-time operating systemDifferent (Kate Ryan album)DatabaseMereologyVirtual machineStructural loadSocial classSoftwareData storage deviceTerm (mathematics)Event horizonBenchmark1 (number)Mixed realityInstance (computer science)WorkloadComputer animationDiagramXML
41:18
MedianStructural loadMiniDiscRead-only memorySpacetimeDigital signalLevel (video gaming)Read-only memoryNumberPoint (geometry)Replication (computing)Cloud computingVariable (mathematics)Software testingSet (mathematics)Semiconductor memoryStructural loadDifferent (Kate Ryan album)Complete metric spaceBenchmarkVirtual machineDatabase transactionMathematicsSocial classPairwise comparisonInstance (computer science)Multiplication signDigitizingTunisData storage deviceRegular graphCore dumpExtreme programmingRevision controlSpacetimeTime zoneQuicksortComputer configurationSpherical capBlock (periodic table)Type theoryMedianLimit (category theory)SoftwareMiniDisc2 (number)1 (number)Slide rulePoint cloudThread (computing)Database normalizationGoodness of fitProfil (magazine)Motion captureWritingSound effectDrop (liquid)Normal (geometry)CirclePartition (number theory)ResultantData conversionMonster groupArmStrategy gameSigma-algebraCuboidComputer engineeringIdeal (ethics)Internet service providerReading (process)XMLComputer animation
47:33
Sample (statistics)Asynchronous Transfer ModeStructural loadMedianRepetitionScalable Coherent InterfaceRead-only memorySet (mathematics)Instance (computer science)2 (number)Service (economics)Structural loadMiniDiscWeb pageSemiconductor memoryDifferent (Kate Ryan album)MereologyBitFile systemSet (mathematics)Limit (category theory)Table (information)Database transactionMultiplication signDatabase normalizationAsynchronous Transfer ModeQuicksortComputer configurationMathematicsSoftwareVirtual machinePairwise comparisonBuffer solutionTerm (mathematics)SynchronizationSoftware testingRevision controlProcess (computing)Right angleCache (computing)Rekursiv aufzählbare MengeDatabaseReading (process)WritingCasting (performing arts)Office suiteComputer animation
50:45
BenchmarkGraph (mathematics)Graph (mathematics)Dependent and independent variablesMultiplication signComa BerenicesDatabaseCategory of beingComputer-generated imageryDatabase transactionInstance (computer science)Multiplication signBlock (periodic table)QuicksortData storage deviceTerm (mathematics)Cartesian coordinate systemCloud computingProjective planeComputer hardwareGraph (mathematics)Visualization (computer graphics)Variety (linguistics)BenchmarkCommitment schemeComputer configurationVariable (mathematics)Virtual machineDatabaseBitMedianMultiplicationBit rateService (economics)Computer programmingOvalSoftware testingPublic key certificateReduction of orderSocial classACIDEvent horizonRight angleGoodness of fitProduct (business)Point cloudComputer animation
Transcript: English(auto-generated)
00:03
Welcome to Shootout at the Platform as a Service Corral. I don't necessarily want to see the brand for that. It would be ugly. I'm going to be talking here about Postgres performance on public clouds and using different public clouds and comparing them.
00:23
So if you thought it was going to be something else, you still have time to get to another talk. Otherwise, here we go. So, one of the things that has been going on in the world of hosting stuff is that it's a bit wild west right now, if you hadn't noticed.
00:44
There's dozens of different providers. They all provide very different things. Stuff is sometimes inexpensive and sometimes not. Service providers can go away overnight and they engage in vicious competitive sales wars of trying to undercut each other.
01:03
So it's very wild west, hence the theme. As a result, in my consulting practice, clients come to us and ask us to make recommendations both on how they should be hosted on the cloud or whether they should be hosted on the cloud.
01:24
I'm not really going to address the second question here, but I'm going to address the first. First, how they should be hosted on the cloud, which cloud, how it should be configured. So this started me down the road of doing some benchmarking against running Postgres on various public clouds.
01:42
And that's what this is. First, I want to give a couple of thanks. The other thing that kicked this off was meeting Ruben Rubiore of manageacloud.com. They actually supply a unified API for deploying cloud servers on six or seven different public clouds. And so they did a bunch of the benchmarking in here so that I didn't have to write to all of the different cloud APIs
02:03
because they've already done that. I also want to thank Heroku and AWS who've given me some free cloud time so far. I may have some free cloud time later coming from other providers in terms of getting this going.
02:21
So we have our magnificent seven right here in terms of what I've tested so far. Rackspace, RDS, DigitalOcean, EC2, Google, Compute Engine, Heroku. And then one mystery person who I will be going over at the end of the talk.
02:42
So, start out with the area where we have where I have the most familiarity, which is Amazon Web Services. This is where we do most of our cloud hosted clients are on some variant of Amazon Web Services. But Amazon Web Services is such a large ecosystem that it includes various providers who built on top of the AWS infrastructure.
03:05
And so the three options we'll be covering here are our gunslinger, which is the roll your own option on EC2. The rancher, which is RDS, and the dandy, which is Heroku. There is another option, another major option for Postgres on EC2, which is Enterprise DB.
03:25
Has their own cloud management thing on top of AWS. I have not tested it yet. Not that much free time. That is all I can say. Maybe in the future. So, because these are all AWS options, they have some things in common.
03:42
One is a comprehensive API that is not necessarily easy to learn, easy to use, but it does cover absolutely everything. Anything you can do on AWS or its subsidiary services can be done through a Web Services API.
04:00
And in some cases, certain things can only be done through the Web Services API. And that's not necessarily true with all of the other clouds. With some of the other clouds, you have to do certain things through their GUI tool, etc., which is a little irritating if you're managing a lot of servers. The other thing is that with AWS, you get the largest available global distribution of all but one of the rest of the clouds.
04:26
With, you know, ones in different centers. This is true regardless of which of the AWS-based platforms you use. The other big thing that you get is lots and lots of extra stuff that you can use together with your AWS-hosted Postgres instances if it's useful to you.
04:43
So things like S3 long-term storage or other weird things or Elastic Beanstalk. I took this a while ago. The one I'm excited about now is the Amazon Container Service because they do a lot of Docker stuff. And a bunch of other things and caching and all of these other things that they have packaged as rental services for you to use.
05:05
That can be used very easily with anything that's hosted on AWS, whether it's EC2 or Heroku or whatever. So let's talk about EC2. This was, you know, the other amazing thing about this is this was the only option two years ago, right? You know, two will add now a little bit more. Before Heroku launched, this was really sort of the only option, right?
05:26
If you're hosting, you're hosting Postgres on a public cloud, then you're building your own Postgres server on EC2. It is still a major option. We still do a lot of it. So basically, here's the idea. Here's your gunslinger thing, right? Which is you create an instance, you install PostgresQL on it, you configure that PostgresQL, and now you're running Postgres in the cloud.
05:44
That's pretty much it. Now, this is sort of our little comparison sheet between the various, our sort of character sheet for the gunslinger here, right? So this is platform as a service, as opposed to database as a service. We'll be going in later on.
06:02
Administration is do-it-yourself. High availability is do-it-yourself. Versions are whatever you want to install. You want to install 9.5 Devel, go for it. Extensions, anything you want to install, again, extra features, really nothing other than what the AWS framework provides. And the price is relatively cheap compared to a lot of other options.
06:25
Now, let's talk a little bit about AWS. We have a lot of experience on AWS because we run so much stuff on it. One of the things that Amazon does is give you a heck of a lot of different options in terms of different sort of cloud instance sizes and that sort of thing.
06:42
It can be a little hard to negotiate. It's also hard to negotiate because it changes all the time, as in like I updated the slide this morning. Because the instances that are available have changed in the last couple of weeks. But to give you an idea of the narrow down in terms of what we deploy on, for smaller databases we often deploy them on M3 general instant, general purpose.
07:02
That's Amazon's general purpose class with a sort of equal balance of CPU, RAM, and I.O. and network I.O. In theory, if you had a database that was extremely CPU intensive but not particularly large, it would make sense to put it on one of their C series compute optimized ones.
07:23
In practice, I've never done that. I've just never come across that particular database. What we end up using most of the time is the R series, which maximizes the amount of memory you have available. Because caching, preferably your entire database in RAM, and I'll talk about that in a minute, is
07:41
really important when running on a public cloud because I.O. latencies are so very much higher. And then they have now a couple of different storage optimized things. I for fast SSD optimized storage, D for really massive magnetic volumes for people who have really really large databases and for data warehousing.
08:04
Although by and large, public clouds are not necessarily the best choice for data warehousing applications because of I.O. bandwidth issues. So, general tips. If you don't know what to use and your database is not that big, go ahead and use an M series. And the important thing is all storage on public clouds is shared network storage of some kind.
08:32
There are different options in AWS, and I'll talk to you about that, but it is all effectively shared storage. And as a result, the storage latencies you're looking at, if you're used to your own hardware with SSDs or
08:46
hard drives and a RAID card inside the machine, the storage latencies are going to be an order of magnitude larger. Yeah, in reality. In terms of doing individual database writes. And so the difference between the data you're looking for is in RAM and the data you're looking
09:05
for is on disk is a much larger difference than it is if you're running on your own hardware. Depends on how many reads you're doing. So, this becomes more important than it is if you're doing it on your own hardware.
09:28
So, there's some storage types available on Amazon. The classic recommended for databases is what they call provisioned IOPS. And this is where AWS guarantees that you will get a certain number of writes or reads per second.
09:47
This is available in whatever volume size you want to define, up to 16 terabytes, I think, is the limit for EBS. So, whatever volume types you want to define. There's a bunch of other, this is Amazon's elastic block store network thing.
10:04
They have a bunch of other features in it, particularly what's useful for database servers is your ability to create a coherent snapshot of the volume, which can be used as part of a backup data recovery strategy. The better these days, and what I'm doing increasingly, frankly, for cost reasons, is that for
10:24
the general purpose, which used to be only useful for low performance or very bursty loads, AWS is, you know, in this GP2 thing, they've been guaranteeing a certain number of IOPS based on the size of the GP2 volume. And so suddenly it makes sense to allocate a 2 terabyte GP2 volume and get 6,000 IOPS out of that instead of getting provisioned IOPS.
10:50
And that's what we're doing on new instances that we deploy. And I'll show you the cost difference there. Now mind you, Amazon does have what's called instant storage, which is local storage very close to the instance.
11:02
It's much lower latency than the EBS, but it's also risky. As in, you'd better have a more sophisticated DR plan involving multiple replicas and back up to S3, continuous backup to S3, et cetera, if you're doing this, because the instant storage can go away.
11:26
It can go away not only if the instance goes away, but it can go away if there's a restart for maintenance reasons and a bunch of other things can happen to instant storage. It's not in any way guaranteed. But, you know, if you've got some of these issues where latency is a major issue and you're willing to deal with the extra redundancy, it can be an option for you.
11:43
I did not benchmark this in any of the benchmarks you're about to see. Well, in one of them, actually I did. But in most of them, no. Now one thing to understand about this, all of your storage ratings on AWS
12:02
and actually the other public clouds in general are measured in terms of IOPS. And it's important to understand that IOPS and throughput are not the same thing. IOPS is how many operations can you do per second up to a fairly limited size of operation depending on what storage it is. And the result can be a couple things.
12:21
First of all, on AWS IOPS, it's not just a guarantee, it's also a limit. I don't know what the AWS guys do to engineer this, but I've been fairly impressed at how they can stay within 10% of the target IOPS plus or minus, like on a really consistent basis. So don't think that if you're getting 5,000 IOPS that means 5,000 or more.
12:41
No, it means you are within 4,900 and 5,100 all the time. And the other thing is, if you're doing operations that involve a heavy degree of random access, then each access is going to be an IOP. The classic example of this is an index lookup with a nest loop.
13:00
A nest loop joined in Postgres where you're querying an index repeatedly. In that case, every single row becomes its own IOP. Oops, I don't have that. Every single row becomes its own IOP. And then your IOP rate is the number of rows per second you can read. Well, 1,000 IOPS sounds like a lot, but if you have a big database, 1,000 rows a second is not a lot.
13:21
That's a pretty slow query. So keep that in mind when sort of allocating some of these things. Other stuff you want to set up, we are talking about a public cloud. Instances are ephemeral. They can go away, they can be redefined, they can have all kinds of problems. Redundancy is a lot more important. So if you're on AWS, you want to be doing WALITA S3.
13:42
If you're on a different public cloud, you want to be doing some other redundancy option. The replication to a second server is really not something we regard as optional. As in, every client that we set up on a public cloud has both continuous backup and replication to at least one instance.
14:02
Because you both need to be able to failover fast and you need to be able to recover from more complicated disasters. That's a good idea in general, but if you're doing your own hardware, you can kind of put off one of those two things. You can say, okay, well I've got backup going so I'm not going to do replication right now. Or I've got replication going so I'm not going to do backup right now, I'll do that next quarter.
14:23
You should be doing it from five minutes after you set up the instance if you're on a public cloud on AWS. Monitoring to look for instance failure should seem obvious, but apparently it's not. And you are sharing that network with lots and lots of other people.
14:42
SSL, again, not optional. Use SSL for everything. Lock down your pghba.conf so that only people within your Amazon VPC can connect to your servers. Be very security conscious because you are sharing hardware with everybody else. And you do not want to be the next news item in sites being hacked.
15:06
So, here was my basic setup for a lot of the benchmarks. So I started out with an EC2-based sizing thing because I was trying to provide a group of sizes that I could afford to run a whole bunch of benchmarks on. And that I could kind of match up between the different clouds with a couple of caveats.
15:24
So this is the sort of small one where it's a sort of cheap one-off database. As in, you're just getting started, it's a new development project, whatever, there's not a lot of demand on it. And so this is on Amazon, it's the M3 medium, one core, about four gigabytes of RAM.
15:40
We did 40 gigabytes of storage with a thousand priops. Like I said, all this storage here, by the way, you'll see is provisioned IOPS. Because I was still comparing the difference between GP2 and provisioned IOPS in terms of real performance. So I wanted to go with what I already trusted. Now, the larger instance that we tested, R3 double extra large, which is eight cores and about 60 gigabytes of RAM.
16:08
200 gigabytes of storage, 4000 provisioned IOPS. The I4 for this. And that's really actually more of a medium size in terms of what's available. But it's our larger size for testing.
16:21
And then everything else is kind of comparable to that. Now, I was going to talk a little bit about pricing. This is the only stuff you'll see about pricing in this presentation. I've previously given other versions of this talk that have concentrated more on cost performance. Because that was what our clients wanted to know when I started doing this. The problem is that since I first put together these benchmarks in February, costs for several public clouds have changed multiple times.
16:47
So I've taken most of the cost to understand that even these costs I'm presenting to you now are probably already wrong. But it's the kind of computation that you can actually do. So with a junior gunslinger, in terms of the way that we set it up, we've got an instance with 3650 per month.
17:01
We've got provisioned IOPS, another $100 a month. We're archiving to an S3 archive, which is dirt cheap because we're not actually using very much storage. And then, of course, we have a replica that has the same setup. And so that comes to about $280 a month, plus the mischarges with transferring it out and all kinds of other things.
17:23
But like I said, these change all the time. I mean, for example, if I do another round of this, I'm going to be using GP2, which decreases our block storage cost. And that actually goes down. Now, if you were really trying to do this on the cheap, you really could do this on the cheap. Because if you do GP2, you could decide not to have a replica and rely entirely on continuous backup.
17:44
And then that would actually bring you down to about $75 a month, plus miscellaneous tax title license and fees. So, now senior gunslinger, this is the way I had it configured at the time. Again, if you actually reconfigured this with some of the current pricing in GP2, you could do it a little cheaper.
18:04
By the way, AWS also offers an option, and I believe some of the other providers do as well, that you can actually pay for a large block on the instance, like prepay for a year of usage. And then the price per month gets a lot cheaper, if you do that.
18:21
And these are the on-demand prices that I'm representing here. And that's true for all providers. Some providers don't have non-on-demand prices, some do. So, that's our configuration for AWS. I'm not providing benchmarks yet. So, now our more staid group is the rancher, the relational database service.
18:44
And the main reason to use the relational database service is, I like AWS, but I don't want to do all this instance management. This is too complicated and that sort of thing. I actually want to just use the AWS API and have it deploy my database for me. So, this is what's known as database as a service. You're not dealing with an operating system, you're not dealing with system configuration,
19:03
you're just dealing with the cloud API plus port 5432. And that is your whole interface to the database. I also call it SCDBA. Somebody else is the DBA. They take care of uptime, backups, configuration, post-QL updates.
19:22
This is a really good option if you don't actually have any full-time ops people in the company. And for whatever reason, you don't want to hire our company to manage it for you. There are some downsides to this, though.
19:40
You don't get all of your choices of everything in the Postgres universe you might ever want to do. Only certain versions are available, only certain Postgres extensions are available. You can't do weird sort of offbeat configuration things that on a regular server would require command line access.
20:01
Security, what kind of security setups you can use is limited because the database as a service host are not going to allow you to touch PGHPA.conf. And it's going to cost a little bit more because you are paying someone else to be your DBA even on a percentage basis distributed across a lot of instances. So, here's again our little character sheet here.
20:24
Cloud type is database. Administration is mostly automatic. There are some things that you still want to do yourself. Postgres.conf is available and you often want to tweak a couple of things in there. Depending on your HA setup, it might or might not be fully automatic.
20:43
RDS has this option called multi-availability zone redundancy. And that is automated and it's part of the platform for redundancy. It has some performance drawbacks though. So even when using RDS, some of our clients actually set up regular replication
21:01
and have us do the redundancy for them. So, versions available currently are 9.3 and 9.4. There's about two dozen extensions or so that are available. There aren't any particular extra features for Postgres specifically.
21:21
And the price is moderate. So, now we're talking about multi-availability zone. So, this is basically Amazon's own internal synchronous block-based, file-based, something like that, replication. It's not Postgres replication. It's their own storage replication to another node with automated failover and an uptime guarantee.
21:44
And in terms of availability, that works great. There are some major performance costs associated with it. I'll show you in the graphs. Now, other AWS option is our dandy. That's Heroku. Heroku is, I just want to develop.
22:01
The cloud should handle 100% of administration. Great option if you are a solo developer or if you're a part of a shop that has a large group of developers and really absolutely no ops people. So, this is, again, database as a service. Administration is fully automatic.
22:21
In fact, if you wanted to get involved in administration, you really couldn't. Their high availability thing is replication plus point-in-time recovery that they manage and guarantee and restore and that sort of thing through your various options. Versions available in 9.3, 9.4, they tend to put up Postgres betas on the cloud for people to use
22:42
before they're available. Again, around two dozen extensions. There are several extra features that I'll mention. Pricing is relatively high on this because they're providing you with the most extra stuff and the most sort of complete. You never administer or even think about administering the database option.
23:01
And what I mean by extra stuff, our sort of bling here is, first of all, if you've got a big Git-based development shop, that's what their workflow is designed around. They use a combination of Git and Rake, which is a Ruby tool, to manage stuff, like even to deploy database instances, rather than other forms of API.
23:23
So, if you like that workflow, it's great. If you don't like that workflow, it's a little awkward. The other nice thing that they have that I haven't seen from anybody else is this thing called DataClops. And these are basically HTTP accessible materialized views that you can set up through their API and then make available to your customers or your other users or whatever.
23:45
And they've really simplified replication to be comprehensible to everybody on your team by having this concept by calling it followers and making it sort of point and click or single API command to set up new followers. But the big feature for Heroku is this.
24:02
So I was benchmarking on Heroku, and I set up a large instance to benchmark on. And in less than 24 hours, I got this email. And I checked, and the person who actually emailed me this had no contact with the dev team who knew I was doing benchmarking. So this is something that they normally send people, which is that their administration goes beyond just the automated administration
24:22
to the point where if you are a larger customer, they have help and advice available. Now, they actually have a much more limited set of options. There's five database sizes, three levels of high availability, and that's it. You know, matrix of 15 options, those are your options. So the two sizes we used are small, which is pretty much the equivalent of the M3 medium,
24:45
although it's not actually an M3 medium, and a large standard 6, which is 60 gigabytes of RAM, 8 cores like the other one. Now, a few other clouds. Now we're getting off of AWS and moving into the wide world of a platform and database as a service,
25:05
mostly platform as a service. So Rackspace, our businessman here. So the main reason to use the Rackspace cloud as far as I can tell is I have a lot of servers in the Rackspace, and I want to branch out into cloud-based stuff.
25:20
They do have a public cloud and that sort of thing. It's platform as a service again. Administration is do-it-yourself plus Rackspace's support. Keep in mind that Rackspace has no specific support for PostgresQL. So if you have a problem with Linux, their support is really good. If you have a problem with Postgres, you're on your own.
25:44
The high availability is going to do it yourself. Because you're installing everything yourself, everything's available. The main extra is that Rackspace has regular rental servers and cloud servers available in the same network in the same data center.
26:03
And so you can do this what they call hybrid cloud thing. We have some things that are cloud deployed servers and some things that are regular servers and they're sort of mixed even in the same application. And pricing is sort of moderate for this. Now, by the way, with the Rackspace support, they have this fanatical support thing and they have an interesting pricing thing in the cloud
26:22
because signing up for their extra support is actually not optional. Meaning that your first cloud instance with Rackspace costs a lot because you have to sign up with a support program. But then it's not incremental with additional instances. One of the other things is Rackspace's block storage option is weird.
26:47
Primarily their machines use instance storage and that's all we use in our benchmarks. Block storage is only available with their larger cloud instances. It's not available at all with the smaller ones. At least in early March when I ran these, that was true.
27:04
I haven't checked to see if that's changed. On top of which, it's kind of weirdly not a unified API for the block storage so we ended up doing all of our tests on instance storage because we couldn't make it work otherwise. So this was our sizing.
27:22
Now one of the things we actually had to deviate from is on the small instance, Rackspace didn't offer anything with one core. So you will see higher performance on their small instance in my performance graphs because we have four cores available which is more than any other cloud at that size. Now here's our drifter. I call him the drifter not because Google is a drifter
27:41
but because they seem to be aiming at poaching everybody else's customers. You'll see why in a minute. This basically says, I want an EC2-like platform only much cheaper. I want the global distribution. I want a lot of the options. I want the uptime guarantee. And I want it for less money.
28:01
This is platform as a service. Everything is do it yourself. Install it yourself. The only extra really is that Google was the first one to offer integration with Docker and Linux containers for any of you who are doing the container based thing. But they don't have anything specifically for Postgres. And the price is very cheap.
28:22
These are the instance sizes that we chose. Fairly analogous to the AWS instance sizes that we had available. Now, here's the kid. DigitalOcean. I call him the kid because he seems to be aimed mostly at independent developers in terms of who they market to.
28:40
This basically says I went cheap, simple, and fast. And nothing else. So, it's platform as a service, do it yourself, everything like that. No extras. Oh my god, cheap. This is by far our cheapest option in the public clouds in terms of instance size for amount of money. Now, there's a problem with the kid, which is the kid tends to get shot up a lot.
29:06
So, DigitalOcean does not have any kind of a network block store option. It's all instance storage. Nothing is durable. There is no long-term redundant storage for doing backups. So, we've got a client on DigitalOcean, but we're still backing up to Amazon S3
29:23
because DigitalOcean doesn't have an option for that. No high availability features of any kind. So, these are the sizes we picked. We're pretty good at getting close to analogous to the other instance sizes. There are more clouds out there.
29:42
The EDB thing, OpenShift, Joint, Azure, Cloud Foundry. Given lots of free time and lots of funding, I would test all of these. In reality, we'll see how it works out. Because they're all interesting. For example, Microsoft I think has added a specific sort of Postgres support option now.
30:03
So, I kind of want to test that, but I haven't been able to. So, let's do a shootout with the ones that we have. We already got six of them, right? So, now what I used for the shootout was Pgbench. Now, Pgbench has some advantages. It ships with Postgres, installs the Postgres contrib package. It's a microbenchmark with a very simple bank trade workload.
30:23
Really fast for setup and teardown relative to other benchmarks. The drawback is it's really limited in what portions of Postgres it tests. And what portions of the platform it tests. It doesn't do complex queries. Purely random data access and unrealistic balance of work. Two really reliant on single row write speed in terms of its balance.
30:44
Not very tunable. As an American, I would say not tunable at all. And one of the Amazon instances I got, I did some testing against an untuned instance. So, default Postgres.conf versus what I regarded as a tuned Postgres.conf. And you can see the tremendous difference in performance between the two options.
31:05
So, this is actually one of our problems with Pgbench. Pgbench is because of its sort of purely random single row access workload, it's really not responsive to Postgres tuning in most ways at all. I thought at least increasing checkpoint segments would make a significant difference,
31:21
but they did not. There actually is a difference between these two. It's about half a percent. So, here was the sizing, and then I used three different sizings for Pgbench. So, memory read-write, so 50 percent of main RAM write transactions. Memory read-only, 50 percent of write read-only.
31:42
And then disk read-write, where the database was somewhere between 150 and 300 percent of RAM in size. And doing read-write transactions. So, you'll look at these later on the slides. This is just if anybody decides they want to recreate my results, these are the Pgbench commands that I'm doing in terms of those benchmarks.
32:03
So, now we can actually get two metrics out of Pgbench. Most people know the transactions per second score that you get out of Pgbench, which is your overall throughput for the run. And this measures multiple things depending on which configuration you're running. You know, write, speed, et cetera, contention, all these other things.
32:21
The other thing that I actually decided that I wanted to measure, partly because of who originally requested some of this comparison, was that if you actually look at the initial load time in Pgbench, you see how fast the instance is to do a badly engineered bulk load. As in badly engineered, you're doing lots of single row inserts, right?
32:42
But frankly, a lot of customers are doing, that's how they are doing their ETL, is to do a lot of single row inserts. So, it's nice to know what the speed is there. And it was interesting because there actually is a fair amount of variability in how long that initial database build takes. Also, index building time, which then gives you sort of large memory operation CPU.
33:04
So, other conditions. All these tests were done in Postgres 9.3, either 9.35 or 9.36, depending on both timing and which cloud we were on. Unfortunately, we were unable to do the same OS on all of these.
33:22
There was at least one cloud where CentOS 7 didn't want to boot, for reasons we didn't really understand. So, the benchmarks are a mix of Ubuntu 14.04 and CentOS 7. And also, these are two instance tests, yeah?
33:43
Yeah, well, so Ubuntu 14.04, at least for the platform as a service ones. Oh, and by the way, for database as a service, we don't know what OS they're running. So, but for platform as a service, Ubuntu 14.04 would have been 3.13. I'd have to check on what the CentOS would have been, yeah.
34:06
So, whatever CentOS 7 would have been in February or March. And also, by the way, these are two instance tests. We're not running pgbench in the same instance that Postgres is running on. It's running on another instance of the same size. So, we are actually testing network IO here as well.
34:22
And as a matter of fact, in a lot of cases, that's mainly what we're testing. Yes, within the same zone, except that some database as a service platforms don't tell us what zone we're running in, and therefore we have to guess. And then run it many, many, many, many, many times.
34:40
Go through our entire ammunition store. And why do we have to run it many times? Well, to explain that, I'm going to show you a box plot. Well, actually not this kind of box plot, this kind of box plot. This is actually the range of scores, you know, sorted for one of our size runs. I think this might have been the RDS large run.
35:01
I'm not sure. I actually looked at individual scores. But you can see that there's a pretty substantial amount of variability between individual runs. And each run was, by the way, create a new set of instances, do the performance run, shut the instances down. Create a new set of instances, do the performance run, shut the instances down. So, we're getting different instances each time.
35:24
And you can actually see that compared with the minimum score and the maximum score, that's actually like a factor of 10 difference. Now, some of that is the sort of randomness of the PGBench workload, but a lot of that is the randomness of the actual instance capabilities you're getting.
35:42
The machines are not always the same. The number of other people who are doing busy things on that same physical machine is definitely not the same. The number of other people in the network and how busy they are is not the same. All of these things affect your actual, your real throughput. And for that reason, the scores I'm going to prevent for benchmarking,
36:03
for the load time, where slower is, where larger is worse, I'm going to give you the median score and the 90% score. And for TPS, where you want more, I'm going to give you the median score and the 10% score,
36:21
because generally in the database performance biz, we're looking for, our target for response times is 90% return within x. And so that's what we're sort of shooting for. So, when the smoke clears, time for a whole bunch of graphs. Now, some caveats here. First of all, I mentioned there's some compatibility problems between clouds.
36:41
We're not measuring the same things. With Rackspace and DigitalOcean, we'll have instant storage. Only the database has a platform. Options have fully automated HA, so it's not really the same. Instance sizes are not identical, instance OSes are not identical. So there's a lot of variability in here outside of the tests that we're running.
37:03
Also, prices, instances, different things that are available have changed over the last three months, and will continue to change. One of the things I'll actually mention in the second check zones, but the other thing is, in previous versions of this, I was actually including some cost comparisons, and I've discovered that pricing for the public clouds
37:22
is way too much in flux for me to include in any of this, because it's generally out of date, even if I update it the week before the conference, it's generally out of date by the time I present it. Better not to present it at all. So, overall, this is a work in progress, and more to think about sort of how you would measure it. Now, I'm going to have one cost slide just to show you
37:41
one thing that's actually going on that has not quite changed, which is, when I was presenting a whole set of cost slides, one of the things people noticed here is that we've got, Google and DigitalOcean are substantially different in pricing profile from the other options that we have. And I believe, and people I've talked to in the cloud biz believe,
38:02
that this is because they are currently providing services below cost as part of a major expansion move, which means that if you are cost sensitive, there are great options right now, but might not always be. Now, so let's actually look at some of these things. So, here's our small in-memory instance,
38:21
and we're going to look at load time here. So, how long did it take to load that initial small database, in-memory database, about a gig and a half in size? And, obviously, shorter is better, right? And so, we noticed a couple of things. One is, DigitalOcean interacts a lot faster. Why? Well, because they're only using instance storage.
38:42
And we're doing individual inserts, so latency matters. And because they're only doing instance storage, they're a little bit faster. So, you're getting faster performance there at the cost of reliability. Now, the other thing you notice here is, the RDS is significantly slower for this. And so, the first thing I did when I saw this was, I emailed Grant, and I'm like,
39:01
what the hell is going on here? So, there's a couple of things that are going on here. One is that RDS, if you're doing automated backups, which I strongly recommend, that means they turn on archiving from the get-go, and the archiving is done on the same, using the same IOPS allocation that is available for the database in general.
39:22
Whereas, when I set up my own EC2 instance, I am using a different channel to do archiving, so it doesn't come out of my IOPS allocation. So, it's not exactly comparable. The other thing is, Grant pointed out that they had checksums turned on. And for that reason, there was some overhead on that.
39:43
So, I was like, okay, one of the things I just did recently, between the last story of the talk and now, was to say, okay, let's actually find a relatively stable instance, which I do by going on a phishing expedition, where I keep starting and stopping instances until I find one that performs above what I would expect to be for that particular profile, and does it consistently.
40:01
And that generally means that I've managed to grab the first instance on a machine. And then, I can actually do a set of stable tests on that one. And so, I did this on the machine, and it turns out that, and this is loading a large database, that there actually was, depending on the size of the database, a 12 to 25 percent difference in load time,
40:21
depending on whether or not you had checksums turned on. So, that is part of that difference. And because I would recommend turning on checksums in general. Now, one of the things I'd say is, well, hey, if it has as much of a difference in load time, what's the difference like in throughput? Well, it turns out the difference in throughput was much smaller.
40:43
So, these are three different benchmarks that I did in terms of what was the difference in throughput with checkpoints enabled. So, for the throughput, for the general mix workload, we're only talking about two to three percent impact on that. And so, what this says to me is, hey, checksums prevent a certain class of database corruption,
41:02
which is really important when you're in network storage. And we're only talking about two to three percent impact unless your workload mainly consists of ETL. So, just turn them on. This does mean, you know, initializing a database for a new instance. So, enable your checksums.
41:20
So, let's look at that load time. So, here's our load time for small on-disk. This is the on-disk larger than memory. Not really different from what we were seeing before. Large in memory load time. All of a sudden, Google Compute Engine sort of ramped up here. And what we discovered that was because we didn't really understand how GC allocated IOPS.
41:40
I would like to go and redo these tests because now we do understand it. Basically, we were getting an instance that was capped at 500 IOPS. And that really affects some of the other tests that we did on it. Oops, this slide was supposed to be gone. So, let's skip over those.
42:01
Because those are the price-cost comparison ones. They're no longer valid. So, now let's actually look at some performance benchmarks. So, this is transactions per second. Small in memory read-write. Smaller is better. So, we've got some differences here. Now, again,
42:20
So, for digital ocean and rack space, we're getting that sort of instant storage advantage in comparison. So, again, performance at the cost of reliability. Now, for the rest of it, these are pretty similar. RDS, GCEC2.
42:41
But we've got a couple of other differences here. Heroku is actually, in the small instance, really outperforming other small instances. And going back and forth and asking the Heroku team some questions, well, I figured out why this is. Their small instances are actually larger Amazon instances that they have subdivided. So, there is a tremendous opportunity, one I was able to repeatedly make use of,
43:00
to get extra performance by being a bad neighbor. By using a lot more resources in the machine than I was necessarily entitled to. So, if that's your strategy, then you can actually, you know, Heroku can be a good performance option for that. The other thing was, in my tests, and Grant argues with me about this, but in my tests, having multi-availability zone turned on for the read-write test
43:25
resulted in something like a 30 to 40 percent throughput drop. Basically, because multi-AZ redundancy is synchronous replication. And synchronous replication always adds a substantial amount of network latency to writes. And for that reason, affects your throughput.
43:42
At the same thread level. My point is that you can increase the number of workers and get the same throughput. Now, doing the small in-memory read-only, we actually see some different things. Rack space is higher in this one because we've got more cores. Same thing, DigitalOcean has one extra core.
44:02
And that has a big effect on small read-only. These are actually fairly similar. We got some extra variability in Heroku, again, because of the bad neighbor problem. And in this, actually, here was a weird thing that Grant has not so far been able to explain. And I did extra test runs because of this. For some bizarre reason, I was getting slightly higher throughput in the read-only test
44:22
with multi-availability zone turned on. And I never figured out why this was. It's only slightly higher, but it showed up over the course of 11 or 12 test runs. So it's not statistically insignificant. Running low on time, so let's actually get through some of the rest of the benchmarks.
44:42
Small on-disk read-write here. And now here you're seeing, again, instance storage dominates versus block storage. Block storage tends to equal out performance over the... Oh, and you can see actually here over the whole set. Now we're large in memory, so this is our large instance.
45:03
And you see that the large instance performance is actually fairly different from the small instance performance. Because the different clouds actually treat very different classes of machines very differently. The small instance was actually a lot more consistent than the large instance. So the in-memory read-write... I was actually kind of, because I know how they're designed, a little surprised at the performance.
45:25
This was actually the second run. By the way, I've been working with the cloud providers who I know on this. So I've been emailing Grant's team and they've been changing stuff and telling me stuff and I've been changing the benchmarks. I emailed the Heroku guys because our initial benchmark was way below this. And they actually, based on my results, made some changes to the configuration of the Heroku instances.
45:43
Not just for me, but for everybody. And I retested and performance got much better on this. And so here we're seeing Heroku and RDS are performing better. Partly, I think, because of tuning by those respective teams, by the Heroku team and the RDS team.
46:02
The GCE were hitting our 500 IOPS cap. Rackspace, this is the first time that all of a sudden an IOPS cap shows up. Which it didn't on the small instance. So I really don't know what's going on with Rackspace storage. But this was actually, I mean, notice that the 90% and the median are almost the same. Well, I mean, the whole test run was almost flat, which meant I was really being limited by storage.
46:23
And so I don't really understand how Rackspace cloud storage works. Now, this is the read-only in memory, which is a very different profile, if you notice. This is dominated partly by, again, Rackspace, we have some extra cores because they favor CPU over other resources.
46:41
I got, weirdly, a lot more variability on the RDS tests on this one than I did on the other types of instances. I don't know if that was just random luck of the draw. If I go back for another testing run, I'll see if that shows up. Otherwise, fairly similar, because here we're largely hitting Postgres performance limits.
47:02
Or network performance limits, and the networks are not that different between the different instances. Compared to other kinds of limits. Large on disk. Again, here you can see this is we're being limited by disk performance. Because of the 500 IOPS portion, we couldn't actually make the test complete on GCE or Rackspace.
47:25
And RDS here, this is the sort of extreme version of the difference between multi-availability zone and regular. Oop, dammit, I thought I'd delete early. So, let's finish up with our mystery guest, and then if we have any time, I will take questions. So, here's our mystery guest. Who is our seventh guy?
47:43
Well, this is the gambler. And this is what I call running with scissors mode. Because there was a discussion that we got into online about that sort of thing about could Postgres be run in memory if you didn't care about redundancy or reliability. Well, yes, you can, actually. So, I did a whole bunch of settings to basically eliminate as much disk access as I possibly could from Postgres.
48:06
So, no BG writer, as little wall as possible, turn off sync from it, turn off full page writes, increase wall buffer size, increase shared buffer size so that Postgres would cache as much as possible. And not touch the OS file system cache, but the OS might decide to flush on its own.
48:25
And so, I just ended up with some changes here. So, this is on the small instances. So, one of the first things I tried was actually two different configurations. One is imagining that you were doing a running with scissors database, but it was a replica, and that's the reason why you were willing to run it like this. And the second was if it was a purely ephemeral service where you could also make the tables unlogged so that they weren't logging to disk.
48:47
Well, one of the things that you see is that actually the unlogged version, at least on the small instance, is not significantly faster for load time. Which was a bit of a surprise to me. Turns out, if you're not f-syncing the wall, if you're just writing the file system cache, at least on a small instance, it doesn't really make that much of a difference.
49:07
But, you can see loading is way faster. Again, here, shorter is better, as you would expect in an ephemeral instance. Now, TPS did not actually improve on the small instance as much as I was expecting.
49:22
Part of it is that I did a bunch of test runs for this, and it pretty much hewed to the around 600 to end change. And I think what's happening is, I'm hitting a limit either on the amount of transactions I can pump out of pgbench on a small instance, or more likely hitting network I-O limits.
49:42
In terms of that, because I was checking the postgres instance on the database, and that was not out of resources. Now, in memory read-only, you wouldn't expect this to be a lot faster, you'd expect it to be a little bit faster. And it was a little bit faster.
50:01
Now, the large instance I felt was better, so I didn't bother to do the unlogged comparison because I didn't think it would make a difference. But I did scissors, you know, here. Now, in the large instance, it was much faster. Partly because I think I wasn't actually running out of resources on the pgbench machine the way that I was for the small instance.
50:21
And so you can see it's, I don't know, 200-300% faster in the running with scissors mode for read-write. And for read-only, a little bit faster in those terms. So, an option, if you have lots of replicas that you can afford to replace, and you've got lots of load coming in,
50:40
is to actually, even if they're only processing read-only load, you will get a boost by doing them running with scissors mode. So, a few of the things. I would like to do more tests. I've been writing a new benchmark called pgjsonbench to benchmark JSON databases. I'd like to also do dvdstore, which is a little bit more complicated benchmark, transactional benchmark than pgbench.
51:03
So that I wouldn't be testing just a couple axes of performance, but a lot more variety of stuff. I want to test a whole bunch of other clouds. And I'd really like somebody else to collaborate on a project to actually provide better visualizations for this. You know, in terms of providing charts and that sort of thing.
51:23
In terms of, I want to measure not just TPS, but latency on individual transactions, maybe do some time versus latency graphs as well. And I just don't have time to do all of that programming. So, if you can get involved, that would be terrific. And I'm happy to take questions until the next speaker wants to set up and kick me out.
51:44
So, questions?
52:02
Okay, Emily, when I was specifically looking at this in early March-April, I was looking at their block storage options, like their instructions for them, documentation, and they said block storage is only available in instances of size X and above. They have been actually cutting back on their commitment to cloud. I think it hasn't been, and focusing more on what they call hardware as a service.
52:24
So, it might be that they actually have less options now than they had last year. Anyone else?
52:42
Yes, we have them, but they're very complicated. Because there's multiple axes of configuration that you can take, right? Like if you're on AWS, it's what instance size is it, how is it configured, what kind of storage do you have, and how much of it do you have.
53:01
And that ends up being, you know, a sort of complicated axis of recommendations. It would be nice to actually, the other thing would be nice to start as a, you know, because I only benchmarked two sizes. Because it's like how many things can I benchmark? The couple of things that I actually did, you know, discover, like I said, is it's worth going fishing for a better instance.
53:22
Never, if you're going to be putting something into production, never do it on the first instance you're given. Get an instance, run some sort of synthetic benchmark on it, check that you haven't gotten the instance that's got too many neighbors or is on a six-year-old machine or has some other problem. Because the variability in performance between instances, even on the same cloud, is actually quite substantial.
53:47
And if you're good at this sort of fishing, we've got little, you know, scraps, if you're good at this sort of fishing, you can sometimes get the instance that's better than the median in the pool. And that's a real bonus because you're paying the same amount regardless of how good the actual instance is.
54:03
Other questions? Okay, well thank you very much.
Recommendations
Series of 3 media