We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Get creative and squeeze performance out of Sqlite

00:00

Formal Metadata

Title
Get creative and squeeze performance out of Sqlite
Title of Series
Number of Parts
52
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Presentation of a groupSoftware developerVideo gameMultiplication signSlide ruleMobile appServer (computing)Hand fanMusical ensembleSoftware engineeringBitDescriptive statisticsSequelWebsiteWeb 2.0Automatic differentiationPhysical systemComputer animation
Point (geometry)BlogMaxima and minimaHand fanWeb 2.0Musical ensembleMetadataMobile appWeb pageTraffic reportingSemiconductor memorySequelTrailCrash (computing)CASE <Informatik>Product (business)InformationMultiplication signData storage deviceDescriptive statisticsBitGoodness of fit2 (number)Computer animation
System programmingMobile appPoint (geometry)Mobile appSequelIterationStatement (computer science)Query languageWordMultiplication signComputer animation
Query languageStatement (computer science)Query languageMobile appMultiplication signRevision controlComputer fileVideo gameDatabaseMultilaterationBitLibrary (computing)Limit (category theory)Sequel
Point cloudDependent and independent variablesoutputInteractive televisionPoint cloudDependent and independent variablesMultiplication signFitness functionMobile appPoint (geometry)Computer animation
Library (computing)SequelSoftware testingSingle-precision floating-point formatHost Identity ProtocolKeyboard shortcutMereologyPhysical systemDesign by contractNeuroinformatikOrder (biology)CASE <Informatik>Android (robot)SoftwareCartesian coordinate systemDatabase10 (number)outputFormal languageWeightComputer animation
Database transactionStatement (computer science)Virtual machineVirtual realitySubject indexingComputer fileDatabaseBytecodeQuery languageParameter (computer programming)outputToken ringDatabase transactionVirtual machineDatabaseSubject indexingComputer fileOrder (biology)Statement (computer science)Föderiertes DatenbanksystemComputer programmingInjektivitätParameter (computer programming)SequelBlogFinite differenceBytecodeQuery languageTerm (mathematics)JSONXML
Virtual machineStatement (computer science)BytecodeOrder (biology)Musical ensembleStatement (computer science)BytecodeCompilerBitMultilaterationOpcodeSlide ruleRow (database)ResultantLine (geometry)Query languageString (computer science)Musical ensembleOperator (mathematics)Function (mathematics)Data storage deviceTable (information)Uniform resource locatorSpacetimeHydraulic jumpDatabaseParameter (computer programming)Order (biology)SequelTransport Layer SecurityCodeSelectivity (electronic)JSONXMLUMLComputer animation
CodeFinite differenceSequelOpcodeParsingOperator (mathematics)Parameter (computer programming)Revision controlComputer configurationBitAndroid (robot)XML
Query languageSubject indexingParsingAlgorithmOcean currentPlanningQuery languageSet (mathematics)InformationDatabaseStatement (computer science)Subject indexingBytecodeSequelUML
Table (information)Musical ensembleIntegerRAIDOrder (biology)Port scannerEntire functionUniform resource locatorSet (mathematics)Core dumpTable (information)FlagQuery languageSubject indexingMusical ensemblePlanningCodeParameter (computer programming)NP-hardSequelRow (database)Computer animation
Price indexSubject indexingKey (cryptography)Row (database)Intrusion detection systemTable (information)SubsetQuicksortSubject indexingKey (cryptography)Selectivity (electronic)DatabaseMultiplication signOrder (biology)IntegerComputer fileMiniDiscSequel
Subject indexingMusical ensembleACIDQuery languageMusical ensembleBitRow (database)Port scannerUniform resource locatorOrder (biology)Subject indexingInformationTable (information)
Subject indexingBinary fileTable (information)Data structureBinary codeComputer scienceMultiplicationOrder (biology)Subject indexingSpacetimeOperator (mathematics)SequelAlgorithmPort scannerMultiplication signPositional notationXML
Musical ensembleExplosionPrice indexWärmestrahlungACIDBinary fileDot productTable (information)Point (geometry)Multiplication signGraph (mathematics)Musical ensembleRow (database)Subject indexingSheaf (mathematics)Uniform resource locatorGreen's functionFunction (mathematics)Key (cryptography)Query languageBitInformationBinary codeComputer configuration2 (number)Set (mathematics)Water vaporMultiplicationStapeldateiCovering spaceLoop (music)Hydraulic jumpACIDGraph coloringComputer fileOrder (biology)Sinc functionGraph (mathematics)
Subject indexingCovering spacePrice indexBinary fileInformationSet (mathematics)Binary code2 (number)Subject indexingQuery languageComputer animation
Subject indexingPrice indexMusical ensembleIntegerWeightKey (cryptography)Table (information)Subject indexingSheaf (mathematics)BitType theoryMusical ensembleIterationQuery languageDifferent (Kate Ryan album)Key (cryptography)Row (database)Order (biology)Revision controlMereologyQuicksortUniform resource locatorIntrusion detection systemMiniDiscMultiplication signFunction (mathematics)PlanningDatabaseBinary codeSpacetimeVarianceFront and back ends
Subject indexingQuery languageRead-only memoryControl flowSubject indexingBitQuery languagePlanningView (database)Semiconductor memorySinc functionDatabaseSequelCursor (computers)Computer animation
Cursor (computers)System callSemiconductor memoryView (database)CountingSystem callType theoryCursor (computers)FreewareOrder (biology)Computer animation
Cursor (computers)Electronic visual displayRead-only memoryThread (computing)Adaptive behaviorDatabaseQuery languageComputer animation
Statement (computer science)Database transactionInsertion lossCodeDatabase transactionImplementationFrame problemMusical ensembleMobile appData storage deviceDatabaseDifferent (Kate Ryan album)Semiconductor memoryRevision controlRight angleWeb pageLine (geometry)Exception handlingInsertion lossMaxima and minimaOcean currentLink (knot theory)Row (database)SequelGroup actionBlogProper mapBitStatement (computer science)Server (computing)Multiplication signGraph (mathematics)
Database transactionWeightMathematicsQuery languageDatabaseInsertion lossException handlingNumberSound effectLine (geometry)Computer animation
Insertion lossDatabaseMusical ensembleTable (information)Variable (mathematics)Query languageInsertion lossFlow separationSource codeDatabase transactionMultiplicationMessage passingStapeldateiSequelUniform resource locatorStatement (computer science)JSON
Insertion lossRevision controlAndroid (robot)Database transactionNumberRevision controlDatabase transactionInsertion lossRow (database)Variable (mathematics)Library (computing)SequelAndroid (robot)
Insertion lossDatabaseDatabaseNumberCartesian coordinate systemBlogBenchmarkComputer fileInsertion lossPoint cloudType theorySequelContent (media)
Content (media)Data typeInterface (computing)DatabaseThread (computing)Computer fileExistenceDependent and independent variablesComplex (psychology)CodeServer (computing)Query languageRight angleFile formatDatabaseComputer fileSequelOverlay-NetzSystem callData managementBitDependent and independent variablesService (economics)Revision controlCodeCartesian coordinate systemData structureHuman migrationInstance (computer science)Machine visionThread (computing)EmailHand fanField (computer science)Flow separationComputer animation
Query languageCodeQuery languageFocus (optics)PlanningMaxima and minimaCursor (computers)Musical ensembleView (database)Insertion lossMobile appJSONXMLUML
BitArtistic renderingRoutingMoment (mathematics)DatabaseAttribute grammarLevel (video gaming)outputDisk read-and-write headPresentation of a groupAndroid (robot)Physical systemSubject indexingControl flowService (economics)XMLUML
Transcript: English(auto-generated)
Hopefully the talk's not going to be so boring that you'll actually need the cup of coffee. So yeah, as he said, my name's Jason Feinstein, I'm a software engineer at Bandcamp. If you're not familiar with what Bandcamp is, we make it really easy for fans to connect directly with and support the artists that they really love. So we treat music as art, not content,
and we tie the success of the business to the success of the artists that we serve. So far, fans have paid artists nearly a quarter of a billion dollars and more than five million dollars just in the past 30 days. When I was putting together the slides for this presentation, I realised that you need a little bit more than just the creativity
to get the most out of a tool like SQLite. There's something that another Jason said that I think applies. He said, we should leverage our science and our technology together with our creativity and our curiosity to solve the world's problems. Apps aren't necessarily big problems for the world, but it is a problem when your app is slow.
And to be more accurate for the title of the presentation, it's a little bit late to change the description on DroidCon, but I think a more accurate title would be Get Creative and Curious to Squeeze Some Performance from SQLite. So why should you worry about data performance?
Sometimes it can be just enough to want your performance to be good so that your life is a little easier as a developer. Other times, especially when it comes to server-side dev, it can reduce a lot of costs by delaying the need to upgrade your servers or even allowing you to use fewer servers for longer. But as mobile developers, and then also sometimes even as front-end web devs,
the most important reason, at least as far as I'm concerned, and the systems lead at Bandcamp is don't punish your best users. It probably seems obvious, but sometimes it can get overlooked. Your best users are often the users that are helping the most to pay the bills at the company and to pay your paycheck.
They either use the app the most and they see the most ads, or they've purchased the most content. And on Bandcamp, if you're a fan of music, you can create what we call a fan account. When you've got one, anything you buy on the site gets added to your music collection, and you can follow other people whose music tastes you think are good,
and see what they're buying, and it's a great way to show the world how awesome your taste in music is. Also, your entire collection is streamable in our app. But one of our best users, his name's Max, he's from Denmark. He runs this really popular blog called Metal Bandcamp, and if you're a fan of metal, you should definitely check it out. It's really good.
Max buys just about everything he covers on his blog, and he's been blogging regularly for many, many years. He also has a Bandcamp fan account, so he's got a huge collection at this point. And this is what Max's Bandcamp collection looks like on the web. He's purchased, I don't know if you can see, almost 2,000 albums and tracks.
That's so much information, that until a couple of months ago, after we did a bit of work on the web, the page would time out. So we're punishing one of our best customers. He even has a blog named Metal Bandcamp. Because his collection is so big, he's been such a good customer that the page wasn't loading.
So we spent a lot of effort on that a few months ago, got it working great, and it outloads in under a second. But currently, the app that's in the Play Store right now is even worse. The app's collection information... So when you log in, we download the descriptions for all the albums and tracks
and the bands that created those albums and tracks, as well as some other metadata and other important things to make the experience good in the app. We download that all and store it in SQLite. And the current thing that's available in production doesn't like Max's collection either.
And at best, what happens is, if he's got a really good phone, then it's going to just be slow. When it's coming time to search your collection on device or playing back the music, if you have a shuffled playlist, shuffling, you know... So 2,000 albums, if you'd say there's 10 albums per track,
there's going to be 20,000 tracks. That's a lot of stuff to keep in memory. And so the worst case scenario is you have out-of-memory crashers. And that is totally not the right way to treat one of our best customers and one of our best users. We fixed it on the web. We're going to be fixing it shortly in the app.
And this talk covers a lot of the things that I learned while trying to help Max have a better experience using Bandcamp. And just to drive the point home again, I'll reiterate, listen to Lee and don't punish your best users. Make the experience for them really good and the experience for the smaller, newer users is going to be great.
And hopefully they'll also grow into being big contributors to the community that you have for your app or your business. SQLite shouldn't be scary and neither should SQL. And I'm going to throw those words together. SQL and SQL this whole time, it's going to be really confusing. But hopefully you'll forgive me.
My goal is going to be to give you the tools you need in this talk to feel comfortable writing really high-quality and performant SQLite queries and statements. SQL is fairly easy to understand. And once you've overcome the initial hurdles of accepting its limitations, you can sometimes be lulled into a sense of complacency.
So as app devs, sometimes we tend to think, well, we don't need to put a whole lot of thought into it because it's a native library, it'll be really fast. And yeah, it's really fast, but you can make it even faster by doing a little bit of thinking about what you're trying to accomplish. There's a couple of tools that are built into SQLite that we'll talk about
that will make your life easier when we're getting time to try to optimise some queries. And then lastly, given the fact that a SQLite database is a single file and they have really good backwards and forwards compatibility between SQLite versions for the databases, you can do some pretty fun stuff with that. So we'll talk about that later.
Pulling data down from the cloud to the device when your user first logs in or at regular intervals is really common in a lot of apps. And the time it takes to log in and get going with the app should be as short as possible. For some fitness apps or other interesting sensor-based, maybe even IoT things,
you might need to be able to manage a lot of data points. And almost always, to make your data useful, regardless of what the data is, you'll need to be able to show it to the user via the UI and often you'll want to do that in response to interaction from the users. And you don't want them to have to sit around and wait.
So SQLite is a heavy weight. It's still a very lightweight library. I think it's in the realm of 100 kilobytes or something. But it was created in 2000 as a tool that D. Richard Hipp, who's the guy who created it, he implemented it while he was writing software for guided missile destroyers when he was on a contract for the US Navy. And with that as the first application of SQLite
and other mission-critical applications that have come since, like this being used in aircraft systems and in-car computers, it has had the need to be really performant and absolutely rock solid. It's got tens of thousands of test cases and there are dozens of languages which have bindings to SQLite.
So it's the most widely used database system in the world, thanks in part to the fact that it's on every single Android and iOS device in the world. In order to understand it and get the most from it, there's a few things we're going to talk about. We'll talk about prepared statements, what they are, what the virtual machine is, how indexes work,
transactions and why you should almost always use them, especially when you're writing to your database, and we'll talk about that again, that fact that databases are kept in a single file. So the virtual machine, the VDBE, the virtual database engine, this was totally new to me when I was doing my research
for this talk, for my blog posts and for work. I always just kind of took the term prepared statement for granted, but what prepared statement really is, is a byte code program. It's SQLite taking your query, parsing it, turning it into byte code and then storing that,
and you can reuse it. So also, just like you might expect with other machines, it's got registers and SQLite uses them to store values from the database for different operations, as well as that's where the parameters for your prepared statements are kept. And that's what those question marks really are. Again, I always thought that the question mark things
that you'd put in queries were just a way to, like a good way to avoid SQL injection attacks. And well, yeah, they're useful for that. They're also useful for SQLite in making it so that you can reuse these prepared statements and keep them as compiled programs.
To see the byte code statements your statements would compile into, there's a keyword called explain. And we're going to talk about how to use it. Start with a really simple example. The statement's just trying to concatenate world onto hello and then select that as the greeting. If we throw explain in front of it,
you can see that the actual byte code to make that happen is what comes out, and we'll step through it. So first, we get the init opcode, and that tells us with parameter two that we're going to jump to line four, where we are taking the string hello, it's also got a space in there,
using the string eight opcode and storing hello in register two. Then we take world, throw it in register three, jump back up to line one, concatenate register three onto register two and store that in register one, and return it as the result row.
And then we're done. Here's a little bit more realistic of an example and a little bit closer to my heart. We'll use explain to see the byte code that SQLite will generate for this query. It's a select statement that's going to look for bands in a particular location
and then find their albums and then return the albums sorted by the band's name and the title of the album. And it's a beast. There's a lot more going on here. You don't need to write all this down. Well, I'll put the slides online and stuff like that, but you could also end up doing this yourself, just using the explain keyword and setting up the schema that we'll talk about a little bit later.
But in short, after the init statement, what happens is SQLite is going and grabbing all the data out of the database and storing it in a temporary table that it'll use to then sort the data for that order by that we had, where we were sorting by band name and album title. Next, that temporary table is iterated through
and the data is output using that result row operation. There are a lot of other operations that are used in this statement, you know, prepared byte code. And I'm not going to talk about them all here, but if you go to sqlite.org slash opcode,
you can see all the different opcodes that are in the VDBE for SQLite. It's pretty interesting. There's over 100 of them, 100 different operations that can be used. And also, one thing to point out is that the opcodes and the operations
and the parameters that they use can vary slightly from version to version with SQLite, but in general, you're pretty good looking at what's available on the documentation, even though the version on Android, even on O, is quite a bit older than what's available to download via Homebrew or whatever.
Between when SQLite parses your SQL and actually generates that byte code, it uses a set of pretty advanced algorithms called a query planner to figure out the most optimal way of executing the statement, given the current schema you've got in the database. There's some valuable information you can get from SQLite
about that query planning if you use explain query plan in front of your query or your statement. You should consider using it whenever, you know, your queries are slow, when you're suspicious that you might have, you know, a bunch of joints or something, and there's something that you could do with maybe creating a new index or something.
If that might help, and also if you're just interested, like if you're just really curious what's going on. So for the next set of explanations, we'll use this pretty simple schema as the example. There's two tables, albums and bands. Albums have an ID, a title, and they're owned by a band.
And bands have an ID, a name, and a location. You might already see some issues with the schema, but we'll get to that. And let's look at this query. It's very similar to the other one that we were talking about, where we were getting those albums and bands from a particular location and sorting them,
but we're going to set that parameter, we're going to hard-code it to Berlin, because that's where we are. Here's what the explain query plan looks like. If you look at that first row, you can see that SQLite has...
It has to scan the entire bands table just to find bands in Berlin. That should be a red flag to you. Whenever you see a scan table, when it's probably like the core of your query, it's something to watch out for. And when you see that, you might want to consider creating an index. So, we'll talk about how indexes work.
Tables are stored in the database file, and they need at least some kind of order to be stored there, because they're actually written to disk. And the table's primary key is what provides that order. But usually it's pretty simplistic, and if you don't specify one, SQLite will give you one. It's called row ID. So, if you ever want to see what the row IDs are for your table,
if you haven't specified a primary key, you can say, select row ID, comma, star, from my table, and you'll be able to see that they're an auto-incrementing integer there. You can create new indexes using create index, and you can specify any subset of the columns of the table for the index, and you can also specify ordering on those columns.
And the indexes are used to... Or when you say create index, it's going to build essentially another table, but this time the data in it is going to be those columns that you specified and that primary key, and all the data will be sorted by the sort order of your index.
Before you create one, let's look again at that query. The full table scan on the Bands table, when looking for the Bands based in Berlin, we have that scan, it's going to be slow. Maybe we should put an index on it.
But just to make sure, we can talk about what happens when you don't have an index. Let's do a little bit simpler example where we're just selecting just the band information where the location is Berlin, and we don't have an index yet, keep that in mind. We just need to make sure that we have an index, we've got a large table of bands and the records really aren't in any particular order, so without the index, we just kind of have to go linearly, one by one.
Oh, that's Michael Jackson, he's in Los Angeles, that's not Berlin. Madonna, LA, again, not Berlin. Led Zeppelin's in London, nope. ACDC, that's Sydney, no, that's not right. Oh, there's Ramstein, that's in Berlin. OK, we can return that one, then we have to keep going.
ABBA, Stockholm, no. And then you just have to iterate through the entire table until you find all the Berlin-based bands. That can be really slow, really slow. As I said before, indexes let us avoid doing linear scans, and instead they provide a separate ordering of the data
where, because it's ordered, SQLite can use binary search. So, how many people here have heard of binary search? Ah, cool, I don't need to explain it. And big O, performance notation, cool. OK, so, for those of you who aren't familiar with big O,
it's just a way of describing the performance of an algorithm, and it's usually based on the size of the data. So, a linear scan, because it has to go one by one by one, through the whole collection of data, has order of n, because it's going to have some multiple of n operations it has to do to make it through the data. Binary search, because it's going to be looking in the middle, and if that's too big, it's going to look towards the beginning,
and it's going to keep doing this, and it ends up cutting the search space in half every time, you can prove, and I'm not going to do that here, because this isn't a computer science research academic conference, you can prove that the order of binary search is in the log realm.
So, let's create an index called band locname on the bands table using the location and the name. And this is kind of what the index would look like. It doesn't really look like this in the file, but it's pretty similar. So, it stores the location name and the bands in sorted order,
and it also has that primary key. So, once we've found what we're looking for, we have the primary key, and then we can go again, take that primary key, and look in the table itself for any extra information that we need for the query. So, here's that query, and we're just asking for bands from Berlin. Before we made the index, the query planner could only,
the only option it had was to scan linearly through the table. But after we created the index, we can see from the output here that it's going to use search table bands using index band locname. Let's kind of step through how the search would work. Since everybody here, or most people here, are familiar with binary search,
we won't spend a whole lot of time on it, but... So, here's our query and the index, and the search will start in the middle of the data. Assuming we've got a ton of bands in the table, that's going to be somewhere where those blue dots are. It's kind of hard to see in the colour here, but... We'll assume it's in the middle of the data.
It says, okay, wherever we are, we're looking for Berlin. That's way towards the beginning, so it's going to jump, and it's going to jump a few times, and maybe it lands on Liverpool. And since L still comes after B, we're still going to have to look a little earlier, so let's cut the search base in half again and look in the middle. There we go, we found...
We found Ramstein in Berlin. And from there, we know, okay, we're in the Berlin section. We can just iterate from here to find all of the Berlin-based bands. So it'll pick up the real McCoy. But if you look at the query, we don't have all the information that we need to satisfy that query.
The index only has the location and the name, but we've also got the band ID, which is different from row ID here. So, SQLite has to perform another set of binary searches to grab the rest of the data.
And here's kind of what that would look like. So, if you look back here, we've got two rows, one with ID 7 and another one with ID 9. And here we're back at the table itself. Let's look at that first one, at row 7. And since the row ID is the primary key,
we know that the table's data is sorted as well by that primary key, so it can just do a binary search here. And it's going to bounce around a little bit until it finds the data, and it'll output all the things that we wanted it to output from there. So we'll get ID 7, the name Ramstein, and location Berlin.
And it'll do that again for ID 9. But the important takeaway here is that binary search is fast. And in fact, you could have multiple of those binary searches like we would have to do with an index that doesn't cover everything that we need, so you have to go to the table.
Even if you have multiple binary searches, as long as you don't have like an unbounded multiple of it, it's always like way faster than linear. And if you look at this graph here, this poorly made graph that I made, just to get the point across, red is linear, green is log, and if you've got a lot of stuff, so you're further over to the right,
it's just log below is linear out of the water. In some situations, you can even remove that second batch of binary searches that it has to do. And that's called... Or when you can do that is called when you have a covering index.
So when your index has all the columns that you're asking for, it's called the covering index, and SQLite's smart enough to skip that second set of binary searches and just return stuff straight from the index. But there's some downsides, especially like, you know, when you think about this, you might think, well, okay, I need to put indexes on everything for all my queries,
and then it'll be all really fast. But there's some cost indexes that you have to worry about too. Each index on a table is a copy of the data that it's indexing. So if you have a giant index that's covering the exact sort order for what you're asking for, you're going to have an entire copy of your table or stuff like that.
And if you have several different queries and you try to do that, you're going to just have way too much data on disk. So just keep that in mind. Indexes are not as useful without having what's called high cardinality, where all that means is that the data needs to have a lot of variance.
So if you think about what binary search does, if you have like three different types of data but you have 10,000 items, it's not going to get a lot out of binary search, even if those three different types of data are sorted because you still have 10,000 items. And it's going to have to, once it hits what it's looking for,
iterate through those sections. And also, insert, update, delete. Anything that's going to write to the database is going to take a little bit longer because you'll have to update the indexes. So now you know how indexes work. Here's the version of the schema that should help the query planner
choose a more optimal way to execute the query from earlier. And we'll talk about what we added here. We specified the ID column of both tables as primary keys. So now we can drop that row ID and it's going to use the ID from the tables as the primary key. And we can just assume that's going to be okay
because we know our backend and we know that all albums and bands have unique IDs in their own spaces. And we've also created two indexes. One index is on the albums table using the band ID and the other index is on the bands table using both the location and the name columns. So here's our query again.
And when we run explain query plan before and after, here's the difference in the output. Looking at the first row, you can see that the band look name index that we created is helping a lot. We've got... Not only do we have search instead of a scan, we have a covering index.
So it doesn't have to do the second look into the bands table to get the extra information that we're asking for. And the order by part of the query benefits quite a bit from that index as well because the index itself is ordered and we're asking for first albums ordered by band name
and secondly ordered by title of the album. That first part of the sort can be handled completely by the index and the second part is what we still need, a temporary B-tree or a temporary table that was created at query time to do that part. So we've cut out a bit of work for sorting.
And you can read more about explain query plan on the official documentation, how the query planner itself works and how indexes work also on the documentation. And then I made a post about this on Medium that's pretty similar to what I talked about here, a little bit more detail.
But for a short break, let's do something Android-specific. Up to now, we've been mostly talking about stuff that applies pretty much broadly to wherever you're going to use SQLite. But since it's DroidCon, we should at least do something Android-specific. And so consider using a database cursor
instead of loading all your data into memory for a RecyclerViewAdapter. There's a big asterisk there and we'll get to that in a minute. For the cursor-based RecyclerViewAdapter, there's a few things you need to do. So you'll set up your cursor in something you call from either onStart or onCreateView.
When you need to return the item count, you'll use cursor.getCount. It's pretty straightforward stuff. In order to get data for onBindViewHolder, you'll just use cursor.moveToPosition and then ask for the data that you need from the cursor. And then don't forget to tear down the cursor and free up the memory in onStop.
I can't really see it, but I bet a lot of you are starting to get ants. You get a feeling of ants crawling in your skin because I'm saying you should use the database on the main thread. So it's something you just need to keep in mind as a trade-off.
RecyclerView adapters, they need to be able to respond on the main thread, which would mean that your query will have to run on the main thread and it should be blazing fast. Whether or not you want to decide to make an informed decision, whether or not you feel comfortable accessing the main thread,
it's up to your situation. And like with most of the topics that we've covered at this conference, there's something to keep in mind. It depends. But I think it's always better to drop a few frames than it is to crash, because you don't have enough memory for all the data. So if we go back to Max from Metal Bandcamp,
if he doesn't have the newest, best flagship device, his app is just going to crash because of the current implementation that we have, or at least the current version that's in the Play Store. And I think it's okay for him, when he's fleeing through 2,000 albums, for there to just be a few dropped frames
as we go and grab another page of data from the database. I think that's better than not being able to use the app at all. And the next step, we'll go back to SQLite proper stuff. What should you do to make it so that you can insert as much data as possible,
as quickly as possible? There's two main tips that I have for you. The first is you should always wrap a group of statements, especially writing statements, in a transaction. And you'll want to consider potentially using bulk inserts.
Here's some code that will iterate through a bunch of bands that we've received from the server, and it just inserts them one by one, calling db.execxql and passing one record at a time, without a transaction wrapping the whole thing. This is really slow, and we'll get to why in a little bit.
If you just add three more lines... And keep in mind, for the sake of brevity, this code isn't catching exceptions and failing the transaction, or anything like that. But it's still fairly simple. If you add these three lines, it becomes way faster.
Like, ridiculously a lot faster. And I charted this out. This chart is not super detailed. You can find more on my blog, and I'll show a link with that later. But the difference is huge. The dark bars that you see on the graph, they're where we didn't have a wrapping transaction,
so these bars on the left. And the bars on the right are where you do have a wrapping transaction. It's pretty easy to see that as the data gets big, you benefit even more from having that transaction wrapping. And... But so why is the difference so big? What's going on under the hood that is causing this disparity?
Well, from the documentation, no changes can be made to the database except within a transaction. Any command that changes the database will automatically start a transaction if one is not already in effect. Automatically started transactions are committed when the last query finishes.
So what this means is that without a wrapping transaction, all of our inserts are given their own transactions when they're executed. And they're pretty heavy things to do, a transaction, because they have to go actually write the data to the disk and ensure that the data was written correctly. So it's a lot of extra steps for each query if you don't really need them.
So if you can reduce the number of transactions, you can cut out a lot of the weight of individual inserts. And all you have to do to do that is, like we said, there's three lines. You wrap it in db start transaction, mark transaction successful, and then close the transaction. Another thing to keep in mind when inserting a lot of data into the database
is to consider using batch inserts. You can see in this query up here that I've got multiple values that I'm passing. I've got... So the Bands table has ID, name and location, and we've got several question marks wrapped in parentheses.
But there's some problems with this that you have to watch out for, just keep in mind. If you look at the source code for SQLite, you're limited to 999 question mark variables in a statement. So you might have to... Like, you might not be able to insert everything at once. You'll probably need to split things up and make sure you stay under that 999.
Here's some more notes on bulk inserts. It's not a huge improvement like we saw when going from not wrapping with a transaction to wrapping with a transaction. But it's still pretty noticeable. You want to watch out not to exceed those 999 variables.
And for older versions of Android, the SQLite library actually will limit you to the number of records that you can insert at once, and this is because of something under the hood with how older SQLite would actually go about doing that, doing a bulk insert. So, again, more stuff to just watch out for.
And you can find out more about the experience I did on insertions at my blog on Medium. There's actually numbers there, and you can download the application I used to run the benchmarks. Lastly, let's talk really briefly
about how SQLite databases being kept in single files can mean you can get really creative and inventive when you're syncing data from the cloud. So, content-type application slash X SQLite 3. Because SQLite databases are stored in a file, you might even want to consider serving them up from an API.
Here's a really simple retrofit service that defines one call and uses the accept header to ask for a SQLite database file as the response format. And even though I'm not going to go into how you generate the database on the server, I can tell you that I've done it with Python,
and it's fairly straightforward. You create a database, keep it in a temporary file, and then feed it over the wire as the response body. And then in a background thread, if you haven't already synced from the server, you can call the service, copy the data to a file, and then you can just use directly SQLite database dot open or create database with the file.
And then you can start running queries on it. So you've already synced. You've pulled all your data down, and you're accessing it. But like anything else, you would access with another SQLite database that you generated on the device. There's some caveats to this. Managing migrations and how you go about changing the schema
can be a little bit more complex. You need to make sure you keep the application schema version and the server's schema version in sync, and so you want to version the API as well. If you want to allow the user to change the data, you might want to consider adding a second database.
So you pull down one big database, and that will show everything. And then say you wanted them to be able to edit a certain field on that. It might be a better idea, in my mind, to have a second database that has the same structure, but you put the writes in that second database. And then when you show them on the UI, you'd select from both and then overlay the writes on top of the original version.
But also, that can be a little bit difficult, because you have to do that in code. It's not like with MySQL, if you have multiple databases in the same MySQL instance, you can do joins across databases. With SQLite, you're going to have to do that in code, because that's how SQLite works. There's separate actual databases, and there's no way to communicate between them directly, at least.
And also, this method of fetching and opening a database, it's not quite yet possible with Room, as far as I know. So, if you're a big fan of Room, keep that in mind. And wrapping up, always... I'm going to repeat this again. Always keep your biggest and best users in mind.
Make that the focus of what you're doing with... Keep that in mind when you're thinking about performance. And then get curious with Explain query plan and profile your code in general. So, hopefully, you've got a few extra tools in your toolbox
when it comes to looking at persistence and performance. And thanks to some of the things I talked about today, things like fast insertions, optimizing our queries, and even that cursor-based RecyclerView adapter for our main collection view, Max from Metal Bandcamp,
he's going to soon be able to actually use the app, which is going to be fantastic. Thanks, or... Thanks, Jason, for your presentation. Just as a reminder, at 4.45,
there's going to be Eric Koshran talking about upgrading to Moshi. At the moment, if anyone has questions for Jason, it would be nice if you could come to the microphone. Sure. We are developing map services for iOS and Android,
and we had a problem recently. We are fetching 100,000 of roads from a database, an internal binary database, and we have implemented a system in which you could overwrite these road attributes. And we had the problem that to check if a modification has been applied to any of the roads,
we had to check if the index of the road is existing in the database. And each check took about, not very much, but like 15 microseconds. But if you now count 100,000 times and multiply it with 15 microseconds, it was slower to check if the modification is made than the rendering of the whole map.
So is there a solution to check very quickly if an index is even existing in the database? I think that might involve some extra things that you'd probably need to do outside of the database, but we can talk about it a bit afterwards. Off the top of my head, I can't really think of anything right now. Anything else?
Cool. Thanks. Okay, so see you. See you after the coffee break.