We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Up And Down Again: A Migration's Tale

00:00

Formal Metadata

Title
Up And Down Again: A Migration's Tale
Title of Series
Number of Parts
88
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer
Production Year2018
Production PlacePittsburgh

Content Metadata

Subject Area
Genre
Abstract
You run rake db:migrate and rake db:schema:load regularly, but what do they actually do? How does rake db:rollback automatically reverse migrations and why can't it reverse all of them? How can you teach these tasks new tricks to support additional database constructs? We'll answer all of this and more as we explore the world of schema management in Rails. You will leave this talk with a deep understanding of how Rails manages schema, a better idea of its pitfalls, and ready to bend it to your will.
38
Thumbnail
07:23
57
Thumbnail
05:41
60
64
Interior (topology)Customer relationship managementFeedbackEmailDiagramComputer animationLecture/Conference
Moment (mathematics)Software developerOpen sourceClient (computing)TwitterProduct (business)EmailXMLUML
Customer relationship managementCustomer relationship managementDatabaseSoftware developerPhysical systemImplementationQuicksortSoftware frameworkOpen sourceWeb 2.0Pattern languageCodeRevision control
Extension (kinesiology)TimestampPrice indexUniqueness quantificationMethodenbankRow (database)Standard deviationError messageInheritance (object-oriented programming)MathematicsDifferent (Kate Ryan album)AdditionSocial classSystem callRevision controlError messagePoisson-KlammerTable (information)Subject indexingInstance (computer science)Point (geometry)Multiplication signSound effectHuman migrationRow (database)CausalityLogical constantLine (geometry)Direction (geometry)Inheritance (object-oriented programming)Entire functionMethodenbankBit
Price indexUniqueness quantificationTimestampHuman migrationStatement (computer science)MathematicsConfidence intervalSemantics (computer science)Revision controlCASE <Informatik>Arithmetic meanTable (information)
Statement (computer science)Subject indexingCartesian coordinate systemPrice indexSource codeKey (cryptography)Revision controlDefault (computer science)MathematicsMoment (mathematics)Human migration
Price indexString (computer science)Uniqueness quantificationTimestampTable (information)Key (cryptography)Relational databaseField (computer science)MereologyDifferent (Kate Ryan album)Table (information)Line (geometry)Statement (computer science)Category of beingLattice (order)Doubling the cubeType theoryRevision controlDirection (geometry)INTEGRALSubject indexingNumberSerial portProcess (computing)Key (cryptography)Computer animation
Interface (computing)Pattern languageClient (computing)Social classStatement (computer science)Texture mappingType theoryRadio-frequency identificationExtension (kinesiology)Software design patternClient (computing)Interface (computing)Social classRun time (program lifecycle phase)Human migrationPoint (geometry)DatabaseAdaptive behaviorConnected spaceGroup actionDependent and independent variablesCASE <Informatik>Type theoryString (computer science)Table (information)Statement (computer science)Cartesian coordinate systemFormal languagePhysical system1 (number)Latent heatPattern languageRow (database)Lecture/Conference
Radio-frequency identificationRevision controlASCIIOrder (biology)Computer fileHuman migrationRevision controlMultiplication signFunction (mathematics)File systemElectronic mailing listResultantProduct (business)Order (biology)Insertion lossComputer fileStatement (computer science)TimestampComputer forensicsRight angleIntegerCodeTable (information)Row (database)MereologyWeightBitBranch (computer science)Computer animation
String (computer science)1 (number)MathematicsFreewareFlow separationProcess (computing)Figurate numberComputer animation
FreewareBlock (periodic table)Radio-frequency identificationPrice indexInversion (music)String (computer science)Row (database)Inverse problemDefault (computer science)Relational databaseSystem callInverter (logic gate)Table (information)Row (database)Type theoryArray data structureRevision controlMathematicsContext awarenessOrder (biology)State of matterInstance (computer science)Shape (magazine)InformationIrreversibler ProzessParameter (computer programming)Inversion (music)Different (Kate Ryan album)Drop (liquid)Connected spaceMultiplication signCorrespondence (mathematics)Reverse engineeringStatement (computer science)Subject indexingString (computer science)Default (computer science)Error messageRelational database
String (computer science)EmailExtension (kinesiology)DatabaseMathematicsDifferent (Kate Ryan album)State of matterSubject indexingEntire functionOcean currentEndliche ModelltheorieComputer animation
Core dumpTable (information)Extension (kinesiology)EmailString (computer science)ExistenceCore dumpDependent and independent variablesComputer fileBlock (periodic table)Connected spaceExistenceHuman migrationSubject indexingEmailMereologyInstance (computer science)Source codeProcess (computing)Key (cryptography)DatabaseOrder (biology)Table (information)Flow separationRow (database)Extension (kinesiology)State of matterSocial classStructural loadDampingStatement (computer science)QuicksortElectronic mailing listFunction (mathematics)Line (geometry)CodeBitComputer animationLecture/Conference
Human migrationHuman migrationLikelihood functionSoftware developerDatabaseComputer animation
Statement (computer science)Human migrationSocial classString (computer science)Radio-frequency identificationLikelihood functionHuman migrationDatabaseBlock (periodic table)Social classStatement (computer science)Endliche ModelltheorieMereologyArithmetic meanCodeState of matterRow (database)2 (number)Table (information)Query languageOrder (biology)CausalityMultiplication signOcean currentSet (mathematics)
Extension (kinesiology)SubsetKey (cryptography)DatabaseHuman migrationSubsetMultiplication signContext awarenessType theoryState of matterStatement (computer science)Computer programmingLimit (category theory)Key (cryptography)WordData compression
Subject indexingRegulärer Ausdruck <Textverarbeitung>Subject indexingExpressionSystem callFunctional (mathematics)ResultantComputer fileDatabaseFile formatMultiplication signCodeQuicksortOrder (biology)Hyperbolischer RaumStatement (computer science)Row (database)Computer animation
Multiplication signCartesian coordinate systemExtension (kinesiology)View (database)Key (cryptography)DatabaseComputer animation
View (database)DatabaseStatement (computer science)Radio-frequency identificationInversion (music)Message passingMethodenbankTable (information)Inheritance (object-oriented programming)Revision controlView (database)Order (biology)Statement (computer science)Table (information)Multiplication signFormal languageFrustrationPatch (Unix)Row (database)Inverter (logic gate)AuthorizationRevision controlSystem callProcess (computing)Parameter (computer programming)HookingQuicksortDatabaseReverse engineeringBitState of matterDrop (liquid)ImplementationSubject indexingHuman migrationPoint (geometry)CodeCartesian coordinate systemLatent heatFigurate number
DatabaseLimit (category theory)CodeCartesian coordinate systemHuman migrationRevision controlRule of inferenceComputer animation
Event horizonLevel (video gaming)AbstractionTimestampAbstractionSubsetStatement (computer science)Default (computer science)Table (information)CodeDatabaseSoftware developerBasis <Mathematik>CausalityFunctional (mathematics)State of matterSound effectComputer animation
Multiplication signFunctional (mathematics)Hydraulic jumpData structureDatabaseExtension (kinesiology)Right angleComputer animation
EmailComa BerenicesContent (media)XML
Block (periodic table)Data typeComputer animation
Transcript: English(auto-generated)
Welcome to Up and Down Again, A Migration's Tale, where together we're gonna journey into the inner workings of Rails schema management. My name is Derek Pryor.
If you have feedback or questions about this talk as it's ongoing, you can tweet me at DerekPryor, email me at Derek at thoughtbot.com, or afterwards you can come up and introduce yourself, ask me questions, tell me what you thought about the talk and tell me how your conference has been going. I'd love to talk to as many of you as possible. So I currently work as a development director at Thoughtbot where I've been working now for five years.
I wanna thank, just take a quick moment to thank Thoughtbot for the remarkable opportunity to do things like this and to contribute to open source and to contribute to our local communities. It's one of the things I value most about working there. A lot of people use our gems and are familiar with Thoughtbot but aren't particularly sure what it is we actually do. So at Thoughtbot, we work with our clients to build great digital products.
If you wanna learn more about what that means to work with us as a client or what it means to work with me as a coworker in any of our six locations, then please come see me afterwards or send me an email or tweet me. So we're here today to talk about schema management in Rails. The first version with what we'd recognize as migration support came in Rails 0.10.1
which was released towards the end of 2005. As far as I've been able to tell, it's actually the first open source implementation of any sort of conventional schema management system that allowed developers to progress their database schema along with the code that required that database schema. And that's a pattern that became so immediately useful that it's hard to imagine
a web framework that doesn't allow for such a thing these days. So I really wanted to dive in and see what makes it tick. So our agenda for today, the first thing we're gonna do is look at what makes up the migration DSL, what actually happens when we apply or revert a migration, how schema.rb gets generated and what it's used for,
and finally where some of this falls short and how we might overcome that. So let's dive into the anatomy of a migration. Here's a pretty common looking migration to create a table, our post table which is gonna have five columns or so with some restrictions and things like that. The intrigue for me really starts right here on line one with a relatively recent addition to migrations
which I think came in Rails 5. The reason why it's interesting is it looks like we're inheriting from an instance of an array which I can pretty much assure you is not code you've ever written anywhere else in Ruby or Rails. So I wanted to know what's going on under the hood here. So the way to answer that is to crack open the migration class itself
and we can see that that defines the brackets method. So we're not actually indexing into an array, we're calling this method with the version that we passed to it. And that method hands off to compatibility.find. And that compatibility.find call isn't an active record.find call, rather it's a call to the compatibility module method find. And what that does is take the version that you pass in,
turn it into a string, do a little bit of error handling, and then spit out a constant. So basically it's a way to turn this here into this much longer form. So the brackets are clearly shorter and maybe nicer to type, but I have a suspicion it's really only done for aesthetics reasons
because how many of you are actually typing an entire migration class by hand? Usually you generate those and you get the class name and things like that by hand. But nonetheless, that's how we arrive at what we're inheriting from. So what does that class actually look like under the hood? This is it. So it's empty.
So we've gone an awfully long way already and all we're doing is essentially inheriting from migration directly. So why don't we just inherit from migration like we used to back in Rails 4.2 and forever before that? The answer is that Rails won't let us. The inherited callback that you see here gets defined after we define that v5.2 constant.
And what that does is it has the effect of any time from that point on when somebody inherits from migration, we check the direct superclass and say if the direct superclass is migration directly, then raise because that's not supported anymore. So why was this change made? Why is it not supported anymore? It might help to answer that question by looking at a slightly different example
where instead of inheriting from 5.2, we're inheriting from 5.1. So that's gonna change the class that we're actually inheriting from. If we look up its compatibility class, we can see that it looks like this. So these classes are essentially compatibility shims. They allow us to see how migration behavior changes between releases. They encapsulate how a migration behaved
up to and including that particular version. So here we can see that 5.2 must have changed how postgres handles change column. And similarly, 5.2 must have changed how MySQL handles create table. And if we wanted to see how that was done, we could dive into the code and run a get blame on it, or we could read the code and see if we could figure it out.
There's lots of different ways we could see, but this is a nice place to come in and see what exactly has changed between the versions. All of this work is done. It's a lot of work that wasn't done previously, and all of it was added in the name of having stable migrations, which I think is a cause for celebration because having stable migrations weren't always the case. What it does is gives us the confidence
that the meaning of our migrations don't change when we upgrade Rails versions. So it used to be that if you had reason to revert an old migration after having updated Rails versions, or for whatever reason reapply an old migration, you may not end up with the same schema that you had when you originally wrote that migration. So this change was introduced to fix that problem.
So let's keep going. Here's our change method. We're gonna talk about the semantics of change itself a little bit later, but let's dive into its body. So this is our create table statement. Create table is what is known internally as a schema statement. If you look up the documentations, if you look up the documentation for schema statements, you'll see a bunch of things here. This is a truncated screenshot of that.
And you'll see lots of familiar things like create table, add column, add index, remove index, things like that. But if you look at the documentation, you may also see some things that might be unfamiliar to you, things like change column null, change column default. There's lots of things that you can do in migrations that you might not be aware of, so I do suggest when you get a moment to look at these statements and see
if perhaps you were doing something the long way when there was a nice easy Rails helper for you to do it. So each line of this migration is packed with a decent amount of meaning, and it seems like each recent version of Rails has added the ability to pack more meaning, particularly into those belongs to type lines where you're referencing another table.
So for example, this first line here has, depending on how you count them, up to four-ish jobs. So the jobs we're doing in that one line, we're adding a non-nullable user ID to create tables. So we're saying, hey, this table needs a user ID, and it also cannot be null. That's our null false part. We're adding an index to the user ID column,
so we wanna be able to do quick lookups on that field or quick joins. That used to be an additional statement in your migration. Now we can just add it as a property to the line. We wanna add a foreign key to the users table so we maintain referential integrity to say, like, if there's an ID here in our user ID column, it must point to a valid ID in our users table. Otherwise, we have bad data.
And this, again, is something that you used to do separately, but now you can do inside of your create table statement directly. So when we run this entire migration, we run rake db migrate with this migration, Postgres will spit out SQL that looks like this. MySQL will give us something different.
And they differ in many ways. The things that jump out at me immediately are things like column types. So here on MySQL, we can see that the ID column is of type big int, whereas if we jump back to Postgres, we can see that it's big serial. I don't need to know that as a Rails developer, right? That concept is compressed for us. There's other things in here as well,
like the number of statements differs. MySQL, we can do this in what is known as one SQL DDL statement. I believe DDL stands for Data Definition Language. All right, Sean's telling me I got it right. And in Postgres, we actually end up executing five because Postgres allows us to create the table and the foreign key constraint, but then we have to create the indexes separately.
But again, something I don't need to know as a Rails developer. It just happens for me automatically. Quoting styles differ. So Postgres uses what you might expect, double quotes, or maybe you expected single quotes, but it uses double quotes, and MySQL uses back ticks. Who knows why? So the reason why,
the reason we're able, the way we're able to get differing SQL out of the same Ruby is through what is known as the adapter pattern. So the adapter pattern converts the interface of a class into another interface that clients expect. Adapter lets classes work together that couldn't otherwise because of incompatible interfaces. And that quote is from the Design Patterns book, which is a pretty famous book and often called The Gang of Four Books.
It had four authors. It's a book that I like to pretend to have read and quote from often. So what it basically means in this case is we have a group of external dependencies, and ideally, each of these external dependencies would provide a consistent interface, but unfortunately, they don't. That is the role of our connection adapters.
Our connection adapters give us that consistent interface to talk to each one of these things. So we start with the abstract adapter, which defines all of the ways that Rails wants to talk to each one of these interfaces. And you'll see the abstract adapter actually defined in the Rails source. You'll also see things like the PostgreSQL adapter, the MySQL2 adapter, the SQLite3 adapter, any other adapters that Rails supports.
And those adapters actually go about implementing the specifics of the abstract adapter. So the abstract adapter is the interface that these have to adhere to. This is actually a really common pattern throughout Rails. You'll see it in things like action cable, active job, et cetera. And it should also be something you should get familiar with in your own applications. Really, it's a useful pattern anytime
that you have to communicate with an external dependency, and that may be another system that you control or it may be something like S3. And the reason why I find it useful, even if I'm only talking to one other thing, is it allows me to define the language that I want to talk to that thing in a way that makes sense to my application. And then inside the particular adapter,
I can translate that into what it means to talk to, for example, S3. So our connection adapters here have many responsibilities. First, we reflect on the schema, right? Which tables are defined? What columns are on those tables? Which columns are indexed? How do we generate a create table statement on this database, for example?
If we say we want a string in a migration, what does that mean on this database? So in MySQL, you'll see that that's a VARCAR 255 by default, and on Postgres, you'll see that it's character-varying. Again, something I don't need to know, as somebody who writes Ruby, is taken care of for me by the people who manage these connection adapters. Does this particular database support foreign keys?
Does it support the JSON type? Does it support save points in transactions? These types of things can influence not only how migrations work, but how active record statements get run at runtime. And quoting, another example that we saw earlier. There's other responsibilities as well, but these are the ones that jumped out at me.
So by passing that Ruby through the, by passing the Ruby through these adapters, that's how we generate the SQL that we saw and actually change our database in a way that makes sense to the underlying database. So I wanna talk for a minute about how we actually go about running those migrations. So of course, we apply a migration with rails db migrate, but how does Rails know which migrations will run?
To answer that, I wanna start by looking at the file name that gets generated when you create a migration. The thing on the left, the part on the left, the integer on the left, you might recognize as an integer version of a timestamp, and the part on the right is the name that you pass to the generator, or the name of your class. Rails looks at this a little bit more specifically.
The thing on the left is called the version, and the thing on the right is the name. The version is what's important, and it doesn't have to be an integer version of a timestamp. It just has to be unique and orderable. And you can see how Rails uses this version if you trace through the SQL that gets run when you run rake db migrate, which I've done. The first thing you'll see that gets run is this SQL statement here
where we're selecting all of the version, we're selecting the version column from the schema migrations table. The schema migrations table is not something that you manage yourself. It's not gonna be in your schema.rb file. It's something that Rails creates on demand when it runs migrations for the first time. So Rails takes this list of versions, and those are the versions that it has recorded
as having been run on this database. And then it looks at the list of versions from the files that are in your migration path on the file system, and it diffs the two, and it says, any that exist on the file system that do not exist in this migrations table are what need to be run, and will run them in the order of that version column. After it successfully applies one of the migrations,
it runs this insert to record that it's been run, and that's how we prevent running a migration twice. You can get insight into how this works with this handy rake task, or Rails task, I guess we're calling them now. If you run rails db migrate status, you'll see output that looks like this. So it's essentially that cross reference between the file system and the versions
of your schema migrations in your schema migrations table. Anything that has a status of down is what is going to run the very next time you run rake db migrate. It's interesting occasionally to run this command on your production instance, because it can return interesting results. So here in the lower right, you see on the last row, rather,
you see that this migration version, we have a record of this migration version being applied against this database, but we don't know what the migration name is, because we couldn't find the corresponding file. So there's a few different reasons why you might see something like this. If you see a list of no files at the beginning of your migration history, generally that means somebody came along and decided that the weight of all these previous migrations
was just too much for this code base to bear, and decided to delete them or maybe roll them all up or something, but didn't know to clean out the migrations, the schema migrations table of those versions. Not that you would really need to, except for the accuracy of this command here, I guess. You'll also see them at the end, which you see here, and that happens frequently when you're developing locally.
Generally that happens to me when I was working on a branch that had a migration, and then I quickly jump back to master or another branch that doesn't have a record of that migration, and you'll see this no file thing. They tend to get resolved as you merge code back into master. When you see them in the middle, that's really where you have the forensics exercise to go through. What I've found is generally, but not always,
the reason why the migration was removed is that somebody tried to run old migrations and they no longer applied, maybe because we didn't have that compatibility layer or any other reason that we'll talk about later, and so they decided instead of fixing it, they would just delete the migration, because who needs that anyway? So what about when we go in the opposite direction,
we roll back a migration? If you define your migration with separate up and down methods, it's really no different than the process that we discussed earlier, going through the adapter and generating SQL. Where it gets interesting is if you define your migration with a change method. So here this migration, we've decided that we no longer wanna have separate first name and last name columns, instead we'd rather have one name column.
So we wrote it with change to add our new column and then remove the two old ones. Now the idea of change is that Rails should be able to figure out what the down migration should be from how up looks. So the idea is that we get down for free. If I learned anything from my high school economics teacher, shout out to Mrs. Buchanan,
it's that there's no such thing as a free lunch. So I wanted to know how, I wanna know who's paying for lunch here. So the answer is that what Rails calls the command recorder is what's paying for lunch. And so you can see the shape of the command recorder here. It starts, its instance state is just a list of commands. And it provides a single method called record,
which just shovels on the command onto that array. So when we define, when we roll back a migration that has been written with that change method, the very first thing that happens is we actually run that migration. Not against the adapter to generate SQL like we talked about before, but against the command recorder to record the commands that it ran.
So in order for that to work, we actually have to have corresponding, we have to have corresponding versions of each schema statement that we want to be reversible. So here you can see just a couple of them, create table, and then we just record that create table was called with these arguments. Same for add index, same for basically everything. These are actually metaprogrammed in under the hood. And then when it comes time
to actually invert the meaning, we have to have a corresponding invert underscore version of that method. So here you see invert create table, invert add index, invert rename table. The top two are what Rails calls straight reversions because they have the same arguments in the same block if provided, but just change the method name. So create table becomes drop table, add index becomes remove index.
The last one is an example of something that does something a little different. So the inversion of a rename table is still rename table, we just swap the order of the arguments. So if we jump back to our migration for a second, does anybody know what will happen if I try and roll back this migration as it's written? Anybody other than Sean? So what's gonna happen is we'll get
an irreversible migration error because remove column can only be automatically inverted if it's given a type. So SQL doesn't need to know the type to remove a column, but it does need to know the type to add a column. So here we haven't given it enough information because the column is no longer around for Rails to say like, hey, what was the type of this column before?
Because it's gone. So we have to overcome this by tacking the type on to the end, which seems a little weird because remove column didn't need to know the type, we're only doing it because we need to provide this extra context because we're defining this in a change method. Nonetheless, we've now defined a reversible change. And so if we run rake db rollback,
we're gonna execute this migration against the command recorder. If we stopped right there, just as it executed against the command recorder and asked the command recorder for its internal state, that commands array, we would see this. So it's an array of arrays of arrays or something. But we have our call to add column and our two calls to remove column. What happens next is Rails says,
okay, give me that in reverse because we have to do the operations in reverse and also invert each one of the steps. And we invert each one of those steps by calling the corresponding invert add column, invert remove column, et cetera. So what we end up with is this. So we have our two add columns because we need to add back our first name and last name. And then we remove our new column, the name column.
So now we can execute this list of commands, not against the command recorder, but against the connection and the adapter to generate SQL and actually invert our migration. Ta-da. But remember, there's no such thing as a free lunch. So what if that column that we, one of those columns that we removed in the original migration had a null false constraint
or a default value associated with it or it defined a foreign key, something like that. We still need, as we saw kind of when we tacked that string on there, we still need to actually think about how we'd go about writing that down method if we were writing them separately when we're writing change. So we don't really get it for free because it's not magic.
The only way to tell that you've actually successfully written the change migration properly that I know of is to migrate your database and then roll it back and then check for a difference in your schema. So that's how applying and reverting a migration works. Let's talk about how schema.rb gets generated. This is what schema.rb looks like.
It's essentially a single migration that contains the entire current database state. And you can load up the entire current database state on a fresh database by running db schema load, which will load this file and run it basically just as if it were a migration. My daily use for it, and as I was talking to a lot of people for this talk, their daily use for it as well, is as a reference for what columns exist on this model
or when you're writing a query, can I expect this thing to perform reasonably? Are there indexes on the columns that I'm using here? But how is it actually generated? Well, we know it just generated automatically run migrations or if we run the db schema dump command, but what's responsible for doing that under the hood? That's the ActiveRecord schema dumper.
This is what it looks like. We start the process by calling this dump class method here and if you notice the dump class method actually asks the connection to create an instance of the schema dumper. That's another example of the adapter pattern at work here. And that allows PostgreSQL to have a separate connection schema dumper than like SQLite would, for example.
So once we have an instance, we call the dump method on that instance and dump is broken down into four parts. It's easiest to see what each of these four parts does by looking at how it gets reflected in the generated schema. So here's our schema again. The header is just that top part, the comment at the very top and then the ActiveRecord schema defined block.
Extensions, if you're running Postgres, you'll see something like this and you may see other extensions like H store, PG crypto, whatever else you might be using. On any database other than Postgres, this is a no op, doesn't output anything. Tables is what actually does all the work. So this gives me a list of all the tables, a list of all the columns on the tables,
a list of indexes on the tables and then finally a list of all the foreign keys between the tables. It's important that those foreign keys come last and separately rather than in line because the tables have to exist in order to create a foreign key between the two. And then trailer does the all important work of writing the end to make this valid Ruby.
So essentially the schema recorder is Ruby code that inspects the current state of your SQL schema and writes executable Ruby code that will generate that schema again, hopefully. So the schema file attempts to justify its existence beyond a reference with this giant comma block, comment block at the top of the file.
It says a few interesting things in there that I wanted to highlight. First it claims that it's the authoritative source for your database schema. So that makes me think a little bit because it strikes me as aspirational at best. The database is actually what's authoritative. If you've ever written execute inside of your, if you've ever written execute inside of your migration
and done any sort of DDL statement manually, then chances are pretty good actually that this schema file is not authoritative. But it may be, so hey, okay. It goes on to say that if you wanted to recreate a new instance of this database that you should be using db schema load to do so, not running all migrations from scratch because the latter, running those migrations from scratch,
is a flawed and unsustainable approach. The more migrations you amass, the slower it'll run and the greater likelihood for issues. So that of course made me think as well. So let's examine those claims. The first is that it's just gonna be too slow. I've never actually seen schema migrations on an empty database be slow.
Maybe somebody out there works at GitHub with like years and years and years of hundreds of developers of migrations and can tell me that it is a problem. But I think for most of us it's really not an issue. I'm gonna throw it out because it's not a particularly compelling argument. What about greater likelihood for issues? What do they mean by issues here? I think what they mean is what I refer to as migration rot.
So migration rot has two primary causes historically. One is that the meaning of our schema statements might change over time. And the second is that we're using external dependencies in our migrations. Now that first one is largely lessened or maybe even eliminated by that compatibility layer that I talked about earlier. So let's throw that part out. What about migration?
So migration rot is largely caused by the latter, this external dependencies use. What I mean by that is, well, let's look at an example I guess. We're gonna revisit our previous name migration and between adding the column and removing the two other columns, we're gonna have this block here in the middle which says, hey, on the way up, update the user's name to concatenate their first name and last name column.
So what happens if we change the, let's say we change the user model to author, right? This migration can no longer apply because it's gonna run against the current Ruby code that you have. We'd actually have to check out old code where the model was called user in order to successfully run this.
We can get around this by switching to use SQL instead, right? So we do a single SQL update statement here. This is also gonna be much faster on large sets of data because we're not gonna run 10,000 updates to update users. We're just gonna do it in one SQL query. Another common technique for getting around this
that you might see is people will create active record classes in line, right? So they will create a user's class or they will create a user class because they know at the time this migration runs, the table is called user and so it'll be okay to do that. The key really is that you wanna eliminate a dependency on anything other than the state of the database when you run this migration.
So mind your dependencies is great advice for programming in general and if possible, you wanna depend, in the context of migrations, you wanna depend on nothing but Rails, the Rails schema statements itself because we now have those great compatibility layers and the state of the database. So migration rot is an example of particular shortcoming that can come up
but there's some more so let's dive into those. So like I said, migration rot, this can be mitigated by mining your dependencies or maybe rolling up your migrations into your old migrations if you're that type of person or just saying I'm never gonna run old migrations, I'm always gonna use schema.rb, that's always good enough for me, whatever.
Some people say that migrations tend to further remove you from the underlying database, right? This is the counterargument to what DHH was calling conceptual compression in his keynote and to that, I actually think it is really useful to know SQL and to know it well but I don't think it's particularly interesting or useful to know DDL SQL
because when I need to generate a table, I can look it up, that's a pretty quick thing to look up, it's not a concept really, it's just a what words do I have to type to make it happen. So another common shortcoming cited is that the migration DSL only has support for a limited subset of features and that's what actually rings true for me
and that's what I wanna take a quicker look at. So for example, foreign keys are really useful in databases and we've been using them for decades but Rails didn't add support until 4.2 so I'm glad we have support but that was an awful long time to go without support for foreign keys. Expression indexes, the ability to index the result of a function call on a column or something like that. Those weren't added until Rails 5.0.
Database functions, we couldn't do those, we still can't do those. Triggers, nope. Database views, nope. So of course we can do these things because we can always execute whatever SQL we want in our migration. So that's a bit hyperbolic for me to say they're not supported. But once we do this, they won't appear
in that schema.rb file so we've kind of othered ourself, right? Our code is now somehow different. If we cared to have them dumped into some sort of record that we could load on a database later without running all the migrations, we'd have to do this which we can change our schema format to SQL and say instead of trying to make my schema into Ruby, just ask Postgres to dump the SQL please.
The problem with this, every time I've had to do it is that it feels un-Rails-like. There's always a resistance like, oh, I really kind of like schema.rb. Like we're not gonna get, so if we start writing like execute statements to create functions, we're no longer gonna get it down for free. We're gonna have to manage that ourselves. In order to do that, we're gonna have to keep track of like what was the old SQL so that I can run that on the way down
to restore it and things like that. So what if we tried to provide some of those missing behaviors in the most Rails-like manner possible? I wanted to do this so I started to dig in. And the unfortunate thing is that there is no official API to do this. That of course makes me sad, but it's never stopped us before.
So gems that do this have existed for a long time. Foreigner allowed me to use foreign keys on Rails applications for a really long time before it was essentially merged into Rail. Schema Plus still exists as a collection of things that can add various extensions to your database. And so I wanted to create one that did database views
and what I would consider to be like the most Rails-like way possible. So together with a friend of mine, Caleb Thompson, we set out to add support for database views in the most Rails-y way possible. And we did that by developing a thing we call Scenic. So because we wanted to be the most Rails-y way possible we knew we were going to need, for example, methods to call in our migrations.
So we were gonna need schema statements. We wanted down for free, so we were gonna need a command recorder. And we wanted to keep using schema.rb so we knew we were gonna need to enhance the schema dumper in some way. So our schema statements just look like this. And the implementations here are interesting but less blank because this isn't a talk about Scenic itself.
I'm happy to talk your ear off about that later if you have questions. But essentially all you need to know is that we define a handful of schema statements that you see defined here. Okay, so that's our schema statements. Now we need to enhance the command recorder because we want down for free. So here's our command recorder. We know it needs to respond
to each one of those schema statements. So we define create view, drop view, and update view to record those. And then we define invert versions of those. So invert create view becomes invert drop view. That's just one of those straight inversions. And the other two do a little bit something interesting because they have to basically inspect the state of the arguments that were passed into them
to figure out if it's reversible and then do some sort of swapping on the arguments to handle that. But that wasn't too bad. That was pretty straightforward all things told. And then we got to the schema dumper. So in order to enhance the schema dumper the way we wanted to, we had to override the tables method. So we say hey, when you go to dump the tables,
go ahead and do that, but then also dump these views. And so we had to define what does views mean? And this was an absolute bear to write. It went through several versions and we've got it pretty good now. But it was certainly difficult and would be a talk unto itself. So how do we wire all this up?
Well the first step is that we need a rail tie to hook into the rails initialization process. So rail ties get run every time you initialize rails and our rail tie says hey, anytime active record gets loaded, call this scenic.load method. And scenic.load does three things. It hooks up our schema statements, it hooks up our command recorder,
and it hooks up the schema dumping that we want. So those are three monkey patches. And sometimes those monkey patches conflict in interesting ways with other monkey patches for things like this. And that leaves authors to figure out how to work together because we both need to monkey patch the same thing. But nonetheless, this all does work.
You can write an update view statement in this little schema statement language that we've developed here. But there are some frustrations with the approach. Every time I think like okay, we've got this just right, interesting things happen. Like the first thing that bit us pretty early on was like hey, views can depend on other views. So the order in which you dump them matters.
And oh, when you try and update a view that is depended on by other views, you first have to drop those views and then update the view and then recreate the view. And so wouldn't it be nice if scenic automatically did that for you? Well yeah, it would, but now we gotta figure out how to do that. What about materialized views? So we support materialized views because they're basically cached tables
and you can index them and get really great performance. But when you update a materialized view, you basically wipe out the table and its indexes and then you're left to try and remember to apply those indexes again. So wouldn't it be nice if we automatically did that? So okay, we had to try and figure out how to do that. And it turns out through this process, there was a lot of frustration. And to me it turns out that what I thought,
I thought Rails didn't support database views for example because there was some sort of conceptual or I don't know, some sort of resistance to them as a concept. Note that application engineers didn't need to know about that concept. What I think actually is that maybe there's some of that at play, but also that Rails has kind of picked the low-hanging fruit here.
These problems are pretty tricky and they're very database specific. So all of this work led me at one point to say what if we didn't do any of this? What if we got rid of all of this code and we used SQL migrations instead? We've described, what we've described is a lot of Rails code. It's a lot of code that's running in your application to support you.
And now we're starting to hit the limitations of that approach. We want to take more advantage of what our database can do for us, but we're just hitting that wall. And so I really thought about this and I was like, look, SQL migrations could look almost exactly the same, right? There'd be a version and there'd be an up.SQL and a down.SQL, and maybe even if we wanted to keep all that code around, they could even coexist alongside Ruby migrations.
And so the more I thought about this, the more I realized that really, maybe you've heard of this process, of this 80-20 rule before, and really that, and in preparing this talk, I discovered that it actually has a name. It's called the Pareto Principle. And that states that for many events, roughly 80% of the effects come from 20% of the causes.
So in our example, the small subset of database functionality exposed via migrations, which is probably more than 20%, but let's call it 20% for now, that's our 20%. And that actually accounts for a large portion of what we need on a day-to-day basis or even month-to-month or application-to-application basis from SQL DDL. There's actually a lot of value
in the higher-level abstraction that Rails provides. Earlier, when I was showing the SQL, I was talking about how, look at all the stuff I don't need to know, right? In the DHH parlance, look at all the code we're not writing. And we could go even further with what's provided here. How many out there have written a migration to create a new table, pushed it to production, started gathering data, and then realized
you forgot to include timestamps on your table? And so now you have to decide, okay, I guess I'm gonna create those timestamps and then mark everything as created at and updated at now. Well, with this higher-level abstraction, with Ruby migrations, and particularly with our compatibility layer, we could conceivably solve that problem
by just making timestamps default with create table statements. And that's just one example of something that we wouldn't be able to do with SQL, but that I believe Rails developers would actually appreciate. So, Ruby migrations, it turns out, I think they're actually the right solution for Rails. There'll always be room for extensions like Scenic and Foreigner and Schema Plus
and all those other things. But it is kind of hard to envision them providing as good an experience as what Rails currently offers in the migration DSL. The one thing I would say is that maybe we need to become a little more comfortable with making that jump to structure.SQL when it comes time to rely on
some more advanced database functionality. So, that's all I have for you today. As I said, if you have any feedback, I'd love to talk to you afterwards, or if you want to tweet me, you can tweet me at Derek Pryor. If you like the technical content of this talk, I host a podcast every week with my friend Sean Griffin here where we talk about things just like this. So, you can check that out at bikeshed.fm.
And if you're interested in learning more about thotbot, then please come see me afterwards as well. Thanks a lot.