From Tables to Documents — Changing Your Database Mindset
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 69 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/67333 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202133 / 69
11
25
39
43
45
51
53
54
60
00:00
DatabaseTable (information)Software developerBookmark (World Wide Web)BitAddress spaceInternetworkingEmailInformation privacyXMLUMLDiagram
01:09
EmailOrder (biology)NumberDatabaseQuicksortXML
02:27
Row (database)Multiplication signDatabaseSoftware engineeringRelational databaseProcess (computing)Mobile appTable (information)XML
03:36
Table (information)Term (mathematics)MathematicsRelational databaseDatabaseOracleTable (information)2 (number)Negative numberData storage deviceInformationTerm (mathematics)WordFamilyHydraulic jumpField (computer science)Programming languageSimilarity (geometry)Object (grammar)Positional notationNumberArithmetic meanVariety (linguistics)String (computer science)Type theoryJava appletTimestampRow (database)Moment (mathematics)Array data structureDiagramXMLComputer animation
06:34
Directory serviceBuildingElementary arithmeticField (computer science)Row (database)Uniform resource locatorInformationArray data structureProcess (computing)Object (grammar)Data storage deviceData structurePoint (geometry)Information retrievaloutputString (computer science)Computer configurationCartesian coordinate systemArithmetic meanTable (information)DatabaseData modelMehrplatzsystemStudent's t-testPhysical systemPointer (computer programming)Electronic mailing listCustomer relationship managementCellular automatonRight angleMobile appGradientOcean currentRelational databaseSystem callSpacetimeBranch (computer science)Single-precision floating-point formatService (economics)Type theoryComputer animation
13:27
Validity (statistics)Row (database)Group actionRelational databaseMappingField (computer science)TouchscreenInternet service providerTable (information)Term (mathematics)DatabaseCartesian coordinate systemBitMultiplication tableDiagram
14:57
DatabaseRow (database)MultiplicationACIDDatabase transactionTerm (mathematics)Table (information)MathematicsGroup actionTable (information)DatabaseRelational databaseOperator (mathematics)Database transactionComputer configurationMultiplication signMappingPresentation of a groupSubject indexingRow (database)View (database)Term (mathematics)Reading (process)ACIDRule of inferenceField (computer science)InformationSheaf (mathematics)MultiplicationKey (cryptography)Data integrityThumbnailLattice (order)Computer clusterDiagramXMLComputer animation
18:23
Scale (map)Relational databaseVertex (graph theory)Scaling (geometry)Cartesian coordinate systemServer (computing)Table (information)Mobile appOrder (biology)MultiplicationComputer hardwareProcess (computing)DatabaseSpacetimeKey (cryptography)Cloud computingDemosceneVirtual machineChannel capacityTotal S.A.Limit (category theory)Associative propertySinc functionMoment (mathematics)Computer animationXMLDiagram
21:07
DatabaseTable (information)Elementary arithmeticPivot elementRelational databaseTable (information)MereologyRule of inferenceValidity (statistics)Software developerUser interfaceQuery languageVideo gameRelational databaseInsertion lossThumbnailComputer configurationPivot elementDatabaseMathematicsAxiom of choiceInformationSubject indexingData storage deviceInformation retrievalError messageCycle (graph theory)Multiplication signCartesian coordinate systemResultantSimilarity (geometry)Mobile appShape (magazine)DiagramXMLUML
23:56
Cellular automatonUser profileLeast squaresComputer programProgramming languageTable (information)CodeUser interfaceRelational databaseUser profileFront and back endsImplementationAxiom of choiceDifferent (Kate Ryan album)DatabaseInformationData structureRight angleInteractive televisionComputer clusterXML
25:25
CodeString (computer science)Functional (mathematics)Data structureSet (mathematics)Connected spaceInformationProgramming languageUser profileUnit testingCursor (computers)Front and back endsVariable (mathematics)ResultantData dictionaryEquivalence relationData storage deviceSoftware maintenanceDebuggerException handlingUniform resource locatorLine (geometry)CASE <Informatik>View (database)WritingComputer fileCartesian coordinate systemDifferent (Kate Ryan album)Profil (magazine)Complex (psychology)Query languageTable (information)Cellular automatonDatabaseTerm (mathematics)Multiplication signError messageStatement (computer science)Software bugSelectivity (electronic)
30:34
Table (information)Computer programmingSheaf (mathematics)DatabaseQuery languagePivot elementScaling (geometry)Relational database1 (number)DiagramComputer animationXML
31:18
Polymorphism (materials science)Software design patternData structureField (computer science)Relational databasePattern languagePower (physics)
32:11
Data storage deviceCloud computingMiniDiscDigital photographyFlash memoryChannel capacityMultiplication signHard disk driveFraction (mathematics)MultiplicationRelational databaseSpacetimeReduction of orderDifferent (Kate Ryan album)Tablet computerComputer animation
33:14
Database transactionScale (map)Computer programTable (information)Term (mathematics)MathematicsSoftware developerObject (grammar)Row (database)Database transactionEndliche ModelltheorieData storage deviceArray data structurePresentation of a groupLogicDatabasePoint (geometry)Field (computer science)Table (information)Data modelInformationACIDInformation retrievalLatent heatCodeData structureMultiplicationSummierbarkeitMultiplication signSheaf (mathematics)Product (business)Programming languageTerm (mathematics)Line (geometry)MathematicsShape (magazine)System callCASE <Informatik>Shift operatorSingle-precision floating-point format2 (number)Einbettung <Mathematik>SoftwareUser profileIterationDiagramXMLUML
Transcript: English(auto-generated)
00:07
Welcome everyone. I don't know about you, but I've been eating a lot of comfort food and watching a lot of comfort TV lately. My favorite comfort TV is the show Parks and Recreation. One of the things that makes the show so amazing is the characters.
00:25
I want to introduce you to Ron Swanson. This is Ron. He's an old school guy. He's a bit set in his ways. He likes his privacy and he likes to stay kind of off the grid. Now, in season six, episode fourteen, Ron discovers Yelp.
00:44
He loves the idea of being able to review places he's been. However, Yelp is way too on the grid for Ron. So Ron uses the Internet to look up the physical addresses of where he wants to snail mail his reviews. And then he pulls out his big old typewriter and he starts typing his reviews. Now, Ron writes some pretty great reviews.
01:08
Here's one of my favorites. Dear frozen yogurt, you are the celery of desserts. Be ice cream or be nothing. Zero stars.
01:20
Now, this is a pretty great review, but I see three problems with this approach. Number one, snail mail is way slower than posting a review to Yelp where it will be instantly available. Number two, the businesses he's reviewing may never open the review because they may just assume it's junk mail. And number three, no one else is going to benefit from the review.
01:44
I don't know about you, but I live for these kinds of reviews online. I love to see this sort of thing on Amazon. Now, Ron was inspired by Yelp and he saw the value in the technology, but he didn't change his old school mindset in order to really get the value out of it.
02:01
And this is what we see sometimes as people move from tabular databases to document databases. They see the value of document databases and are inspired by the technology, but they bring with them their old mindsets so they don't get the full value of document databases. Now, I don't want this to happen to you. I want to see you be really successful as you work with document databases.
02:28
So before we dive in, let me introduce myself. My name is Lauren Schaefer. Now, I took a database course in college that was all about best practices for relational databases. I began my career at IBM, where I spent eight years as a software engineer. For a lot of that time, I used DB2.
02:47
Toward the end of my time there, my team started getting flexibility in what database we wanted to use, so I started trying out NoSQL databases. To be honest with you, I didn't really get the hype. Without a doubt, NoSQL databases were easy to get started using.
03:03
But I brought with me my relational database mindset, so I kept thinking about my data in rows and columns, even though I wasn't using rows and columns anymore. It worked, but it wasn't great. I joined MongoDB about a year and a half ago, and I've worked through the process of changing my mindset in how I think about storing data.
03:25
And I'm so happy I did, because it's really easy to work with data in the apps that I build now. So today I'm going to share with you what I've learned as I've gone on the journey of moving from tables to documents. When I say tables, I mean a tabular or relational database. For example, you might have experience using something like MySQL or Oracle.
03:47
We'll dive into what document databases are in just a few minutes. So if you're here, I'm going to assume one, you have experience with relational databases, and two, you have minimal to no experience with document databases.
04:01
So today we're going to go on a mental journey from tables to documents. I'm going to kick things off by mapping the terms and concepts you're familiar with in tabular databases to similar terms and concepts in document databases. Then I'll talk about the four major advantages of document databases. And finally, we'll wrap up with the three key ways you need to change your mindset when you move from tables to documents.
04:26
All right, so let's jump in with terms and concepts. Let's talk about documents. No, I'm not talking about Word documents. I'm talking about JSON documents.
04:42
JSON stands for JavaScript Object Notation. If you've used any of the C family of programming languages such as C, C Sharp, Go, Java, JavaScript, PHP, or Python, documents will probably feel pretty comfortable to you.
05:00
Documents typically store information about one object, as well as any information related to that object. Every document begins and ends with curly braces. Inside of those curly braces are field value pairs. The great thing about documents is they can be incredibly rich.
05:23
Values can be a variety of types, including strings, numbers, arrays, dates, timestamps, or even objects. So you can have objects within a document. And you'll see what that looks like in just a moment. Now, when people talk about document databases, they'll often use the term non-relational.
05:42
But that doesn't mean document databases don't store relationships. Okay, that was a double negative. Stick with me. Document databases store relationships really well. It's just different than the way relational databases do. So let's walk through an example of how you would model the same data in a relational tabular database versus a document database.
06:06
So let's say we need to store information about a user named Leslie. So let's begin with her contact information. In a relational database, we'll create a table named users. We can create columns for each piece of contact information we need to store.
06:22
So as you can see, we've got columns for first name, last name, cell phone number, and city. To ensure we have a unique way to identify each row, we'll include an ID column. Now let's store that same information in a document. We can create a new document for Leslie where we'll add field value pairs for each piece of contact information we need to store.
06:47
As you can see, we have field value pairs for first name, last name, cell, and city. We'll use underscore ID to uniquely identify each document. We'll store this document in a collection named users.
07:04
So now that we've stored Leslie's contact information, let's store the coordinates of her current location. When using a relational database, we'll need to split the latitude and longitude between two columns. Document databases support arrays so we can store the latitude and longitude together in a single field.
07:24
We'll call that field location. We're successfully storing Leslie's contact information in current location. Now let's store her hobbies. When using a relational database, we could choose to add more columns to the users table.
07:43
However, since a single user could have many hobbies, meaning we need to represent a one-to-many relationship, we're more likely to create a separate table just for hobbies. Each row in the table will contain information about one hobby for one user. When we need to retrieve Leslie's hobbies, we're going to join the users table and our new hobbies table together.
08:08
Since document databases support arrays, we can simply add a new field named hobbies to our existing document for Leslie. The array can contain as many or as few hobbies as we need.
08:21
When we need to retrieve Leslie's hobbies, we don't need to do an expensive join to bring the data together. We simply retrieve her document from the user's collection and we're good to go. So let's say we also need to store Leslie's job history. How are we going to do that? Well, just as we did with hobbies, we're likely to create a separate table just for job history information.
08:46
Each row in the table will contain information about one job for one user. So far, we've used arrays to store geolocation data and a list of strings. Arrays can contain values of any type, including objects.
09:03
So let's create an object for each job Leslie has held and store those objects in an array. As you can see, we have a job history field that stores an array. Inside of that array, you have an object for when she was the deputy director, an object for when she was a city councilor,
09:21
and an object for when she was director of the National Park Service's Midwest Branch. Now that we've decided how we'll store information about our users in both tables and documents, let's store information about Ron. Now, Ron will have almost all the same information as Leslie.
09:41
However, as I mentioned earlier, Ron likes to stay off the grid. So he's not going to be storing his location in the system. Let's begin by examining how we would store Ron's information in the same tables that we use for Leslie's. When using a relational database, we are required to input a value for every cell in the table.
10:02
So we'll represent Ron's lack of location data with null. The problem with using null is that it's unclear whether the data does not exist or the data is just unknown. So many people actually discourage the use of null. In document databases, we have the option of representing Ron's lack of location data in two ways.
10:23
So one way is we can omit the location field from the document. The second way is we can set the location to null. Best practices suggest that we omit the location field to save space. You can choose if you want omitted fields and fields set to null to represent different things in your applications.
10:42
You've got a little extra flexibility there. Ron has some hobbies and job history, so let's add his information to those tables. And we can add that information to his document as well. So the structure of Ron's document looks pretty similar to Leslie's at this point.
11:03
So let's say we're feeling pretty good about our data models and we decide to launch our apps using them. Then requirements change. We discover we need to store information about a new user, Lauren. Lauren is a fourth grade student who Ron teaches about government.
11:21
We need to store a lot of the same information about Leslie as we as with about Lauren. Let me start again. We need to store a lot of the same information about Lauren as we did with Leslie and Ron. Things like her first name, last name, city and hobbies. However, Lauren doesn't have a cell phone, location data or job history.
11:41
We also discover that we need to store a new piece of information, her school. So let's begin by storing Lauren's information in the tables as they already exist. We can create a new document for Lauren and include the data we have for her in it.
12:00
Now let's talk about how to store information about Lauren's school in our tables. We've got two options. We can choose to add a column to the existing users table or we can create a new table named schools. So let's say we choose to add a column named school to the users table. Depending on our access rights to the database, we may need to talk to the DBA and do a little convincing to get them to add the column.
12:27
Maybe we have to do a little begging to get our DBA to add the column. Maybe we have to do a little bribing. Maybe we need to bring them their favorite donut. Or maybe we need to bring our manager along to really pressure the DBA into agreeing.
12:42
If our DBA agrees, the database will likely need to be taken down. The school column will need to be added. Null values will be stored in every row in the users table where a user does not have a school. And then the database will need to be brought back up. It's doable. It just can be, you know, a little painful. Now let's talk about how to store Lauren's school in documents.
13:04
We can simply add a new field named school to Lauren's document. We do not need to make any modifications to Leslie's document or Ron's document when we add the new school field to Lauren's document. Document databases have a flexible schema, so every document in a collection does not need to have the same fields.
13:27
Now, for those of you with years of experience using relational databases, you might be starting to panic at the idea of a flexible schema. I know I started to panic a little when I was introduced to the idea. Don't panic. This flexibility can be hugely valuable as your applications requirements evolve and change.
13:47
Also, some document databases provide schema validation so you can lock down your schema as much or as little as you'd like when you're ready. Now that we're starting to get the idea of how tables and documents are similar and different,
14:01
let's do some explicit term mapping on the left side of the screen. You'll see tabular or relational database terms. And on the right side of the screen, you'll see document database terms. So first up, we saw this a little bit in our earlier example. A row maps to a document or depending on how you've normalized your data.
14:24
Rows from multiple tables could map to a single document. A column maps roughly to a field in a relational database. Groups of rows are stored in tables and a document database.
14:42
Groups of documents are stored in collections. So tables map to collections. The next few terms will probably feel pretty comfortable to those of you with relational database backgrounds as the terminology is basically the same between the two. Just like you store groups of tables in a relational database, you store groups of collections in a document database.
15:07
Indexes are fairly similar between the two. Indexes help speed up your read queries. Views are pretty similar in both.
15:20
Now, there are a few different ways to handle joins in document databases. The general recommendation is that if you have related information that you would typically put in a separate table in a relational database, you should actually just embed that information in a single document when working in a document database. The rule of thumb, you're going to hear me say it multiple times today, is that data that is accessed together should be stored together.
15:48
So let me just say it again right now. Data that is accessed together should be stored together. So if you'll be frequently accessing information together that you would have put in separate tables, you should likely just embed it in a document.
16:03
Depending on the document database you're using, there are other options for joins as well. Some databases support references between the documents, similar to how you would use a foreign key. Some databases also have special operations to support a left outer join. These options for joins get specific to the database you're using, so I'm not going to go any deeper into it here today.
16:27
All right. Finally, let's talk about ACID transactions. Transactions group database operations together so they either all succeed or none succeed. Now, if you did some research online about relational databases versus document databases before coming
16:43
here today, you probably saw something about document databases not supporting transactions as a major issue. That's your drawback. If you care about data integrity, and really, who doesn't, that's a pretty scary sounding drawback.
17:00
Now, some document databases support ACID transactions while others do not. In relational databases, we call these multi-record ACID transactions. In document databases, we call these multi-document ACID transactions. However, when you model your data for document databases, you'll find that most of the time, you don't actually need to use a transaction.
17:25
Now, I'll explain why that is at the end of the presentation. So what I want you to know now is don't get freaked out if you're looking at drawbacks of document databases and see no transactions listed. Some document databases support transactions, but chances are good that either way, you won't actually need them.
17:45
So to wrap up this section, I created this term mapping summary for you. It's way too much information for you to read now, but go ahead, take a screenshot, feel free to tweet it. You can print it, hang it up at your desk, whatever you want to do.
18:01
But before we go on, I want you to internalize the first three term mappings. A row maps to a document. A column maps to a field. And a table maps to a collection. All right, so now that we have an understanding of the terms and concepts, let's talk about the four major advantages of document databases.
18:24
Let's jump back to Ron for a moment. Ron loves his typewriter, and unless you convince him otherwise, there is no way he's going to change his mindset. Maybe that's you. Maybe you've been successful with relational databases. You're comfortable with them. Maybe they aren't always the easiest, but you've learned how to get the job done.
18:45
So let's answer the question, is it worth changing your mindset from tables to documents? So I'm going to walk you through the four advantages to using document databases. So here they are, no particular order. Let's begin by talking about scaling.
19:01
Let's say Ron has an app where he sells the chairs he makes in his workshop. He's using a relational database behind the scenes. When one of Ron's chairs is nominated for an award from the Indiana Fine Woodworking Association, orders start pouring in. He desperately needs to scale his database.
19:20
Now typically, relational databases scale vertically. When his database becomes too big for its server, he has to migrate it to a larger server. When Ron's chair appears in Bloosh magazine, his chairs go viral and he needs to migrate to an even larger server. Now there are a few key problems with vertical scaling.
19:42
Large servers tend to be more expensive than two smaller servers with the same total capacity. Large servers may not be available due to cost limitations, cloud provider limitations, and sometimes even technology limitations. Also, migrating to a larger server may require application downtime.
20:04
Alright, let's take another example. Let's say Tom has an app for his restaurant called Tom's Bistro. He uses a document database behind the scenes. After a feature in the Pawnee Journal, his restaurant surges in popularity and he needs more space for his database.
20:20
Since Tom is using a document database, he has the flexibility to scale horizontally through sharding. Sharding is a method for distributing data across multiple servers. When his database exceeds the capacity of its current server, he can ban sharding and split it over two servers. When the shelter in place orders hit Pawnee and everyone has to order food online, Tom can continue to add more servers.
20:47
The advantage is that these new servers don't need to be big expensive machines. They can be cheaper commodity hardware. Plus, no downtime is required. So with document databases, you can scale cheaper.
21:02
As the size of your database grows, you can scale horizontally. Alright, query faster. Your queries will typically be faster with document databases. Let's examine why. Even in our simple example that we worked through earlier, where we modeled Leslie's data
21:22
in both tables and documents, we saw that her information was spread across three tables. Whenever we want to query for Leslie's information, we're going to have to join those three tables together. Now on these three small tables, the join is going to be very fast. However, as the tables grow and our queries become more complex, joining tables together can become very expensive.
21:46
So recall that rule of thumb I talked about earlier. Data that is accessed together should be stored together. When you follow this rule of thumb, most queries will not require you to join any data together.
22:01
Continuing with our earlier example, if we want to retrieve Leslie's information from a document database, we can simply query for a single document in the user's collection. As a result, our query will be very fast. As our collections grow larger, we don't have to worry about queries slowing down. As long as we are using indexes and continuing following that rule of thumb, data that is accessed together should be stored together.
22:28
Alright, pivot easier. Requirements change. It's a regular part of the software development lifecycle. Sometimes the changes are simple and require only a few tweaks to the user interface, but sometimes changes go all the way down to the database.
22:45
Let's think back to our earlier example, where we needed to update our database to store information about Lauren's school. To add a new school column in our relational database, we're going to have to alter the user's table. Executing the alter table command could take a couple of hours, depending on how much data is in the table.
23:04
The performance of our application could be decreased while the table is being altered, and we may need to schedule downtime for our application. Now let's examine how we can do something similar in document databases. When our requirements change and we need to begin storing the name of a user's school in a user document, we can simply begin doing so.
23:26
We can choose if and when to update existing documents in the collection. If we had implemented schema validation, we would have the option of applying the validation to all inserts and updates, or only to inserts and updates to documents that already meet the schema requirements.
23:43
We would also have the choice of throwing an error or a warning if a validation rule is violated. With document databases, you can easily change the shape of your data as your app evolves. Finally, you can program faster.
24:01
To be honest with you, this advantage is one of the biggest surprises to me. I figured that it didn't really matter what you used as your backend database, the code that interacts with it would be basically the same. I was wrong. Documents map to data structures in most popular programming languages.
24:21
Now this sounds like such a simple thing, but it can make a huge difference when you're writing code. A friend encouraged me to test this out, so I did. I implemented the code to retrieve and update user profile information. Now my code has some simplifications in it to enable me to focus on the interactions with the database rather than the user interface.
24:42
I also limited the user profile information to just contact information and hobbies. So let's walk through my implementation. I used MySQL for my relational database and MongoDB as my document database. Now I wrote the code in Python, but don't worry if you're not familiar with Python. I'll walk you through it step by step.
25:02
The concepts will be applicable no matter what your programming language of choice is. So before I show you the code, let's do a quick refresh of the data we're going to retrieve and update. On the left, we have a table for users and a table for hobbies. On the right, we have a document for our user that contains both contact information and hobbies.
25:26
So let's begin with that typical top of the file stuff. We're going to import what we need, connect to the database, and declare our variables. I'm going to simplify things by hardcoding the user ID of the user whose profile we will be retrieving rather than pulling it dynamically from the front-end code.
25:43
Now if we look at the MongoDB code, it's basically the same. Importing, connecting, and setting the new user ID variable. So now that we have our database connections ready, let's use them to retrieve our user profile information. We'll store the profile information in a Python dictionary.
26:04
Dictionaries are a common data structure in Python and provide an easy way to work with your data. So let's begin by implementing the code for MySQL. Since the user profile information is spread across the users table and the hobbies table, we need to join them in our query. So we'll say, select everything from users left join hobbies.
26:25
We can use prepared statements to ensure our data stays safe. We'll set the values in our prepared statement, execute the query, and fetch our result. Because we joined the users in the hobbies table, we have a result for each hobby this user has.
26:43
To retrieve all the hobbies, we need to iterate the cursor. We'll append each hobby to a new hobbies array, and then add the hobbies array to our user dictionary. Alright, let's implement that same functionality for MongoDB. Since we stored all of the user profile information in the user document, we don't have to do any joins.
27:04
We can simply retrieve a single document in our collection. Here is where that big advantage that documents map to data structures in most popular programming language comes in. I don't have to do any work to get my data into an easy to work with Python dictionary.
27:21
I get all of the results in a Python dictionary automatically. And I don't need to manipulate the results. I'm done. Our user dictionaries are now pretty similar in both pieces of code. The one exception is how we're storing location data, but this is really a pretty minor difference.
27:41
Now that we've retrieved the user's profile information, we'd likely send that information back up the stack to the front end UI code. When Leslie views her profile information in our application, she may discover she needs to update her profile information. The front end UI code would send that updated information in a Python dictionary to the Python files we've been writing.
28:06
So to simulate Leslie updating her profile information, I'm just going to manually update the Python dictionary myself for both MySQL and MongoDB. These updates are basically the same with the exception of how the location data is being stored.
28:23
Alright, so now that our user dictionary is updated, let's push that updated information down to our database. So let's begin with MySQL. First, we need to update the information that is stored in the users table. We'll say update users, set first name equal to this string, last name equal to this string, cell equal to this string, all the way through.
28:44
Then we'll set the values of all those strings. Is this complicated? No, not really. Is this error prone? Yep. You have to make sure you get every string in just the right spot. Now, to be honest with you, I did not unit test this code. I know, I know. So it's possible I've made a mistake without realizing it.
29:03
This is a pretty simple example, so I probably got it right. But SQL queries can get really long, so you can imagine this getting much more complicated. Next, we need to update our hobbies. For simplicity, we'll delete any existing hobbies in the hobbies table for this user.
29:20
And then we'll insert the new hobbies into the hobbies table. Now, let's update the user profile information in MongoDB. Since the user's profile information is stored in a single document, we only have to do a single update. So once again, we're going to benefit from documents mapping to data structures in most popular programming languages.
29:41
We can send our user Python dictionary when we call update one, which significantly simplifies our code. In this example, we wrote 27 lines of code to interact with our data in MySQL, and two lines of code to interact with our data in MongoDB.
30:02
Now, fewer lines of code does not always indicate better code. But in this case, we can probably agree that fewer lines of code will likely lead to easier maintenance and fewer bugs. This example was relatively simple with small queries. Imagine how much bigger the differences would be for larger, more complex queries.
30:23
Documents mapping to data structures in most popular programming languages can be a huge advantage in terms of time to write, debug, and maintain code. So is it worth changing your mindset and learning how to work with a document database? I think so.
30:40
With a document database, you can scale cheaper, query faster, pivot easier, and program faster. All right, let's move on to our final section. Let's talk about the three key ways to change your mindset as you move from tables to documents.
31:01
People who pick up a document database and try to use it as a relational database are the ones who typically struggle and fail. You can't keep doing things in the same way. So let's talk about three key ways to change your mindset. First up, embrace document diversity. Let's think back to when we modeled documents for Leslie, Ron, and Lauren.
31:24
We saw that not all documents in a collection need to have the same fields. Now, for those of us with relational database backgrounds, this is going to feel really uncomfortable and probably a little odd at first. I promise it will be okay. Embrace document diversity.
31:41
It gives us so much flexibility and power to model our data. In fact, there's a schema design pattern that specifically focuses on documents not having the same fields. It's called the polymorphic pattern. We use the polymorphic pattern when documents in a collection are of similar but not identical structures.
32:02
So embrace document diversity. Resist the urge to force all of your documents to have identical structures just because it's what you've always done. Second way to change your mindset, data that is accessed together should be stored together. We've probably all heard over and over that you should normalize your data, but why?
32:24
Historically, there are a couple different reasons for this. One of them is that when relational databases became popular, disk space was extremely expensive. Financially, it made sense to reduce data duplication and save disk space. As you can see from this chart, and you probably know from buying flash
32:41
drives, if you even buy those anymore, the price of storage has dramatically decreased. Our phones, tablets, laptops, and flash drives have more storage capacity today than they did even five to ten years ago for a fraction of the cost. When was the last time you deleted a photo? I mean, I can't even remember when I did.
33:01
I keep even the really horribly unflattering photos. And I currently back up all my photos on two external hard drives and multiple cloud services. Storage is super cheap. Storage has become so cheap that we've seen a shift in the cost of software development.
33:20
30 to 40 years ago, storage was a huge cost in software development, and developers were relatively cheap. Today, the costs have flipped. Storage is a small cost of software development, and developers are expensive. Instead of optimizing for storage, we need to optimize for developers' time and productivity.
33:42
Now as a developer, I like this shift. I want to be able to focus on implementing business logic and iterate quickly. Those are the things that matter to the business and move developers' careers forward. I don't want to be dragged down by data storage specifics. Think back to the early example where I coded retrieving and updating a user's profile information.
34:03
Even in that simple example, I was able to write fewer lines of code and move quicker when I used a document database. So leverage embedding. Consider how you can use objects, arrays, and arrays of objects to store your data together.
34:22
Resist that temptation to break up your data. Data that is accessed together should be stored together. If you end up repeating data in your database, that's okay, especially if you won't be updating it very often. Alright, last way to change your mindset.
34:40
Tread carefully with transactions. Some document databases support ACID transactions, while others don't. But here's the thing. If you're modeling your data for the document model, you're probably not going to need a transaction. In fact, relying on transactions is a bad design smell.
35:00
Okay, why is that? Well, this really builds on our first two points in this section. First, not all documents need to have the same fields. Perhaps you're breaking up data between multiple collections because it's not all of identical structure. If that's the only reason you've broken the data up, you can probably put it back together in a single collection. Second, data that is accessed together should be stored together.
35:23
If you're following this principle, you actually won't need to use transactions. Some use cases call for transactions. Most do not. If you find yourself frequently using transactions, take a look at your data model and consider if you need to restructure it.
35:40
Alright, so to summarize the three key ways you need to change your mindset. Embrace document diversity. Data that is accessed together should be stored together. Tread carefully with transactions. Alright, let's wrap up this presentation. Today we went on a mental journey from tables to documents.
36:01
We began by mapping terms and concepts from tables to documents. The three that are most important are those first three. Rows map to documents, columns map to fields, tables map to collections. Then we talked about the four major advantages of documents.
36:21
You can scale cheaper because you can scale horizontally. You can query faster because you aren't having to do expensive joins to bring your data together. You can pivot easier because you have a flexible schema. And finally, you can program faster because documents map to objects in most popular programming languages.
36:42
Alright, we wrap things up with the three key ways you need to change your mindset. First, embrace document diversity. Not all of your documents need to be the same shape, and that's perfectly okay. Second, data that is accessed together should be stored together. And third, tread carefully with transactions.
37:02
Relying on transactions when you're using a document database is a bad design smell. So if I had to sum this presentation up in one idea, I would say this. Don't be Ron Swanson. I mean in this particular case because Ron Swanson is amazing in so many other ways. But don't be Ron Swanson.
37:21
Change your mindset and get the full value of document databases.