We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Reporting on Rails - ActiveRecord and ROLAP Working Together

00:00

Formal Metadata

Title
Reporting on Rails - ActiveRecord and ROLAP Working Together
Title of Series
Part Number
64
Number of Parts
86
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
It'll happen eventually. Someone will come down with a feature request for your app to "create dashboards and reporting on our data". So how do you go about doing it? What parts of your database should you start thinking about differently? What is "reporting" anyway? Is ActiveRecord enough to pull this off? Let's go on a journey through the world of Relational Online Analytical Processing (ROLAP) and see how this can apply to Rails. We'll also look at database considerations and finish with looking at a light DSL that works with ActiveRecord to help make your data dance.
35
Business reportingSoftware developerCustomer relationship managementMenu (computing)Charge carrierSystem callSeries (mathematics)BuildingSystem administratorClient (computing)Staff (military)Read-only memoryGeneric programmingArray data structureInformationLatent heatComputerUsabilityFrequencyMobile WebLine (geometry)Process (computing)Analytic setCalculationComa BerenicesPlanningRelational databaseFunction (mathematics)AerodynamicsFinitary relationTable (information)Standard deviationDatabaseHausdorff dimensionDigital filterMeasurementMetric systemHierarchyData modelLocal GroupAttribute grammarFinite-state machineSoftware developerPhysical systemAnalytic setInformationObject (grammar)Table (information)Set (mathematics)BitPoint (geometry)Filter <Stochastik>Standard ModelWordMultiplication signData warehouseRelational databaseProcess (computing)Term (mathematics)Line (geometry)Ocean currentHierarchyGraph (mathematics)Service (economics)Order (biology)NeuroinformatikBusiness reportingQuery languageType theoryDatabaseMathematical optimizationSystem callDimensional analysisSemiconductor memoryCharge carrierProgram slicingGroup actionArray data structureStandard deviationRow (database)Sound effectProduct (business)Mobile appEndliche ModelltheorieBasis <Mathematik>Key (cryptography)MathematicsClient (computing)ResultantLevel (video gaming)Lattice (order)Local ringNormal (geometry)Software frameworkHash functionCustomer relationship managementCubeForm (programming)System administratorRight angleTouch typingSequelEnterprise architectureSource codeAverageTheoryCASE <Informatik>DivisorCall centreStaff (military)Computer animation
Wireless LANData modelDrill commandsHausdorff dimensionAttribute grammarFinitary relationDimensional analysisHierarchyInformationTerm (mathematics)Business reportingDigital filterMeasurementTotal S.A.Maß <Mathematik>Database transactionNumberMaxima and minimaQuery languageFrequencyClient (computing)Metric systemRow (database)Floating pointLocal GroupComputer configurationSoftware design patternData warehouseKey (cryptography)Charge carrierAbelian categoryLine (geometry)Function (mathematics)Price indexTable (information)CubeRow (database)Standard deviationRelational databaseDimensional analysisGroup actionDifferent (Kate Ryan album)InformationCASE <Informatik>CodeHierarchyWireless LANMereologyProgram slicingPiExistenceUser interfacePie chartResultantCountingException handlingCharge carrierEndliche ModelltheorieData structureSubject indexingSummierbarkeitQuery languageMetric systemLevel (video gaming)Filter <Stochastik>Line (geometry)Order (biology)Regular graphDatabasePoint (geometry)Hydraulic jumpFlow separationFrequencyBusiness reportingNumberMobile WebTotal S.A.Disk read-and-write headMeasurementEquivalence relationTerm (mathematics)Functional (mathematics)ExtrapolationGraph coloringType theoryAverageConfiguration spaceState of matterCausalityService (economics)Context awarenessSequelLattice (group)Sound effectView (database)AxiomArithmetic meanPhysical systemLatent heatProduct (business)Special unitary groupWater vaporComputer animation
InformationLatent heatQuery languageLocal GroupMaxima and minimaTerm (mathematics)Metric systemControl flowComputer configurationSequelDigital filterDimensional analysisEndliche ModelltheorieDatabaseHierarchyHausdorff dimensionLine (geometry)Charge carrierFinitary relationDefault (computer science)Data modelDatabaseProper mapNumberLink (knot theory)Maxima and minimaInformationRelational databasePoint (geometry)Goodness of fitEndliche ModelltheorieTerm (mathematics)Dimensional analysisCustomer relationship managementAttribute grammarRow (database)Query languageWeightLine (geometry)Hash functionTable (information)Default (computer science)HierarchyGroup actionObject (grammar)SequelSound effectBitForm (programming)Software developerTheorySocial classPhysical systemoutputBusiness reportingOpen sourceMedical imagingCharge carrierQuicksortMultiplication signNormal (geometry)MereologyCountingUser interfaceWater vaporStandard deviationGraph (mathematics)LogicVideo gameFilter <Stochastik>Library (computing)Drop (liquid)Grass (card game)Limit (category theory)Entire functionGraph (mathematics)Complex (psychology)Mobile appConfiguration spaceMilitary baseRegular graphExistenceException handlingElectronic mailing listLatent heatOrder (biology)Fitness functionMathematicsComputer animation
Data modelDimensional analysisMetric systemMeasurementCharge carrierDigital filteroutputExplosionBuildingLine (geometry)Total S.A.SummierbarkeitLocal GroupLattice (order)Reading (process)DatabasePrice indexCache (computing)Key (cryptography)Table (information)Query languageMathematical optimizationFlow separationBusiness reportingSlide ruleTwitteroutputWritingDatabaseFilter <Stochastik>TwitterDimensional analysisBusiness reportingPlug-in (computing)Replication (computing)Client (computing)Cache (computing)SynchronizationLine (geometry)Query languageInformation2 (number)Charge carrierGroup actionSequelRule of inferenceCountingPoint (geometry)Hydraulic jumpMobile appFlow separationReading (process)Row (database)Object (grammar)Service (economics)Hash functionMetric systemResultantVector potentialPiSubject indexingIdentifiabilityEndliche ModelltheorieDefault (computer science)Proper mapParameter (computer programming)Power (physics)MeasurementMereologyLambda calculusForm (programming)Interface (computing)Table (information)Mathematical optimizationProcess (computing)Game controllerPie chartSoftware repositoryMotion captureSlide ruleSign (mathematics)Cartesian coordinate systemConnected spaceValidity (statistics)Correlation and dependenceTotal S.A.BitRead-only memoryAreaStability theoryEmailTendonNumberGenderRight angleFunction (mathematics)Computer animation
Coma BerenicesJSONXML
Transcript: English(auto-generated)
All right, let's get this party started. So this is reporting on Rails, active record,
and our OLAP working together. My name is Tony. I'm senior developer at Moby in Indiana. We manage the corporate cell phone accounts for Fortune 500 companies, everywhere from bill optimization to device procurement, including tier one tech support. I work specifically on the billing and reporting team
where I basically work and massage with the one million plus devices that we have under management plus all of the billing data that goes back several years. What I essentially do is take information from various sources, carrier billing, tickets from our call center, orders placed from carriers,
shove them into Postgres, sprinkle some magic on top, and out pops pretty charts and graphs that our clients love. Yeah. So to better explain what reporting is,
the best I can do is use my current experience with Moby. Way back in time around 2012, ownership came to the dev team and uttered the dreaded D word. They wanted a bunch of dashboards to report on all the data that we have. Not just one dashboard, dashboards
for our client administrators, dashboards for our internal support staff, and dashboards for ownership to actually see how the company is doing. And oh yeah, they want user defined filtering. You can slice and dice by just about any point of data that you want and various other bits of scope creep that came up over the years. So where do we begin on this?
Well, a couple of notes that the dev team laid out in front that we don't want to be bloat in our result set. That means we don't want actual active record objects coming back from the database. Active record is kind of large, and we don't really need all that data that's coming back, which means less memory.
What we really want is generic uniform data coming back from whatever system we build. This means a plain old bunch of arrays with rows of hashes with the information that we want. That way we can take that, call to JSON on it, shove it into the flavor of the month's JavaScript front
and framework for charts, and we'll be on our way. So what is reporting anyway? Well, I'm going to assume most of us are apps run against a relational database. A relational database holds data. Data is completely worthless to you and me. Humans can't really work with data.
Computers work with data. What humans can work with is information. What reporting and analytics does on a high level is take data, convert it to information so humans can actually make decisions out of it. And more importantly, reporting answers questions. And this is probably the most important step in preparing your app to generate dashboards is to figure
out up front what questions you want to answer. That means going to all the primary users and ask what do you want to know? What information do you need? Because if you don't get that and you just start randomly throwing queries against Postgres or MySQL, you'll be throwing spaghetti at the wall
and you probably won't end up with a result or a product that people really can use because it doesn't answer the questions that they care about. So here's some examples of questions for each of the three stakeholder groups for MOBI reporting. And we want to answer all of these plus more effectively.
So where do we go about doing that? Well, fortunately, there's an industry standard term called OLAP, online analytical processing. That is built for data warehousing and analytics. Now, commonly with the more traditional OLAP products,
everything is rolled up into memory in the form of data cubes which all the information is pre-sliced and diced, pre-grouped together, everything up front so you can run queries on it very fast. And OLAP commonly deals with aggregates, counts, max, mins, averages.
You don't really deal commonly with the individual rows from the data. You care about the grand picture. However, OLAP, you commonly use OLAP with TACON to Oracle products, MySQL products. There's a bunch of enterprise-y stuff. And when you think enterprise, you think money
and we didn't have money. So there's got to be a better way. Are OLAP to the rescue? This is relational online analytical processing. This is OLAP that runs with SQL, which is what our normal database talks with. It also allows for dynamic queries
to be generated on the fly. So we can get any information we want out very quickly with just some setup. And what's nice about our OLAP as well is that we can work with both our historical data, which is all the billing information, and our transactional data, which is the support tickets
that come through to our local support center. The ever-changing lines of service that change on a daily basis. That's transactional. It changes all the time. So our OLAP can work with both, while OLAP is more designed for stuff that happens, historical information, stuff that happens in the past.
Once it's set, you don't touch it anymore. So like anything enterprise-y, there is a crap-ton of terminology that comes with it. Now, we're going to go through all of these. And what's nice about OLAP is that you can relate any OLAP terminology to SQL. And you can also relate it to something in Rails.
So we're going to use that as examples to build up the vocabulary. Because when you work with reporting, you want to think in OLAP terms, not just simple SQL. The first term is a fact model, also known as a fact table. This is the starting point to get information
out of your, out of the database. In SQL land, this is the from clause. This is the primary table that has the information that you want. In Rails, this is a standard model. So by looking at questions that our users want to answer, we can extrapolate out pretty easily what a fact model is. In this case, the support tickets table
and our lines table. A dimension is a way to take your data and slice and dice it into various chunks. Commonly, these are relations to other tables in the database. So foreign keys linking to other tables.
It can also be columns that live directly on the fact model as well. In Moby land, we have cost center living on our lines table. You can group by cost center to get a report out of that. Or a line of service has a carrier. We have a carriers table. You can link to that to get information out that way.
So a dimension in SQL land is always a group by. And if you have a, if you want to jump to another table, it's also a join. In Rails, this is a standard column or a has one or belongs to relationship. So when you look at questions you want to answer for reporting, I always look for the word by.
I want something, I want to sum up something by something. So in this case, we have support tickets by type. So we're going to group by type. We have active lines of service grouped by carrier. So we're going to join against the carriers table and group by carrier. Next up is a dimension hierarchy.
This is a way to go up and down a hierarchy of information of your dimension. The most common example is dates. So when you think about I want a group of stuff, a group of orders grouped by date place. You just group by the day. But I want to go up a level and I say once,
I want everything grouped by the past five months. Everything from the quarter. Everything from the year. And the hierarchy basically is a structure you build to go up and down from more general groups of data to more specific groups. In Mobyland, we have devices. A device has a model number, a manufacturer,
an operating system, and wireless technology. That's another example of hierarchy where you can go up and down your groups of data. A dimension member, or I prefer calling them dimension labels, is actual information that when you look at a result of a report that you can work with.
So lines has a carrier ID on it. We could easily group by that. But when you look at it in a table or a pie chart, a human can't make sense of that. So instead, a label that you use and shove into the pie chart is the, in that case, the name of the carrier.
If the dimension lives on the actual table, so for example, cost center, the cost center is the same thing you group by and also the label. Next up is filters. This is not really an OLAPI term. This is because OLAP is commonly set up for once you pre-build
and slice and dice your data, you don't really have much maneuverability unless you want to build a completely different data cube to group by or filter by something else. Because we're working with our OLAP, we can use the where clause in SQL to further shrink the data set down or get more specific information.
So in SQL, your filter is your where clause. In Rails, this is your, the where method, standard active record scopes, or if you use the RANSAC gym, that results in a where clause as well, which can come in pretty handy. A measure in OLAP terms is basically the aggregate,
your average sum, max, min, count, pretty much any aggregate based function that your database can provide. And what you commonly do is the measure is also the column that you plug into the function. So I want the sum of the total charges for an order.
That's the measure. It's the sum and the column total charges. Count is obviously the exception because you don't really count on a column. You most commonly use count star in SQL land. Then finally we have the metric.
This is the report. The metric is just a fancy way of asking a, is the end result of the question that you want. So in SQL, this is the whole query. This whole damn thing. Rails, it's all of active record shoved together and executed to the database. So using our examples, the entire question can be the
metric or part of the question can also be a metric. What you can do with our OLAP is start with a very simple and specific metric that you want to ask and then tack on dimensions and more filters later. So you can think of the concept of having a bank of pre-built simple metrics that through maybe a user
interface or just through your code through configuration, you can start tacking on more stuff. So you have a base case and you can expand that out however you want. So I know that's a lot. Here's everything shoved together. Hopefully the colors stand out. But we have a complete question.
Whole thing in this case is the metric. You have the sum for your measure. Mobile charges would be, we'll extrapolate the table out of that, that's your fact model or fact table. Last billing period is your filter and your grouping by cost center, so that is your dimension. And a metric can have as many filters and as many dimensions as you want.
Just keep in mind, the more you tack on, obviously the more complex your query is. But this is essentially all of our OLAP in a nutshell with the SQL equivalent and the Rails equivalent. So that's pretty much the industry level of it. How do you go about implementing it?
Well, first your data has to actually be organized in a way that is conducive to reporting. The most common way is called star schema. There's another set up called snowflake schema which is basically star schema plus plus. I prefer star schema because it's simpler, it's more direct and it's kind of easier to visualize in your head.
The idea is if you can take all your tables and chart them out on a graph or link them together, you have fact models or fact tables in the center of it. And branching off of that is every dimension that you could possibly run against the fact table. And again, a dimension can be a simple relation
or it can be a column on the fact table. But the end result is when you map them all together, it looks like a star. Now a little gotcha and this is pretty much set up for, you know, standard SQL is that it's really hard to report on has-many relationships. It is possible.
You can throw the magical distinct keyword in the front of your query. But if you're on Postgres specifically, that can easily result in invalid SQL just because of the way Postgres works. The reason why it doesn't work with has-minis very well is because when you are aggregating and joining, you effectively get multiple rows back, duplicate rows back
and then you're summing against those duplicate rows and then your numbers are off. So avoid wanting to report on has-many relationships as much as possible. The other way around this is to use subqueries which is hella slow and I don't recommend that. So using Moby as an example, here's three fact tables
that we have identified, support ticket, a line of service, and a row on a bill in various dimensions that they can dimension off of. And again, some of these are actual relations to other tables and some of them are actual columns on the database, on the fact model. Notice in line, wow, that actually almost works.
We have a carrier and a carrier account. In Moby, a carrier account has a carrier. So why not just, why is carrier a dimension on line and why does line have a separate relation for that? Well, you want to avoid doing multiple jumps as much as possible with our OLAP. The more joins you do, the slower the result will be.
So what we do in Moby is denormalize a lot of stuff that we want to group by and put them on directly on the fact model. Now we can get around, we could get around this with a has-one-through and effectively have a carrier relationship directly on line. The SQL can be generated just fine, but again, we're resulting in a double join at that point.
Also, something interesting to know is create a date and the build date for these two fact tables aren't actual date columns. They're actually separate tables to another relation. This is called a date dimension. And the idea behind this is, especially for like a example,
a warehouse of sales information, you want information broken down by year, by quarter, by weekday, it's hard to do that with databases, with a common, with a regular date column and have it fast. You can tell MySQL and Postgres take this date column and give me the weekday out and group by that,
but it has to be done on the fly. And you can't, and it doesn't use the standard index if you just slap it on the date column. You can make standardized indexes for that, but again, Postgres still has to, Postgres and MySQL have to calculate those values on the fly. Instead, you have a separate table with a row for every day
that you effectively care about. So in case of Moby, it's beginning of Moby's existence to 15 years from now, just to cover all of our bases. And instead of having a date column on support tickets, we actually have a created at ID that links to a date dimension. And so what we can do then is say,
give me all the support tickets broken down by quarter. So what we do is then we join against the date dimensions table, group by the quarter column, that's the label, and now we can easily report on that. This also allows you to, since each row in the date dimension already has the various parts of the date broken down,
you effectively have a very complex hierarchy that you can go up and down the data as you see fit. So great. ActiveRecord can do all that, right? Why is Tony up here with his free ticket to RailsConf? Well, it is true that ActiveRecord does have,
does provide all the information needed to actually construct our OLAP queries. It can do a join, it can do a grouping, you can ask it for all of the, you can reflect on all the relations. You can ask for all the attributes on the model. And you can select out very specific columns using the pluck method.
However, it does have some limitations. There is really no good way to group by the non-aggregate columns programmatically without manually putting this in. This is a specific gotcha for Postgres. MySQL doesn't have this problem, you can cheat, but with Postgres, if you have an aggregate
in your select clause in the non-aggregate columns, you must include those columns in the group by, otherwise it considers it invalid SQL, because it technically is. And so you have to make sure you balance the select and the group by, and ActiveRecord can't really do that.
In fact, it's mostly built for the count method, the maximum method, the minimum method. You just plug in one column, and that is the number you get back. You don't get the grouping, you don't get the nice dimension label with that. And there's also not a good way to just have your models be described in our OLAPI terms.
Like yes, you have a has one, you have a has belongs two, but there's no way to actually say that these are dimensions. You can't actually just go out and list all of them without iterating through every possible relation in the table, plus the standard attributes. And there's no really good way to just store pre-made queries very well.
Yes, ActiveRecord for and later does lazy loading, so you can start tacking on a select, you can override the from, you can put on a where, and it won't execute it until you actually need to iterate over it. But there's no way to just grab something real quick and then start tacking on stuff pretty quickly.
So what could we do? Well, we could hard code all our queries. That's great, except good luck trying to define custom where clauses and custom joins, because that was part of the requirements we had. We could build a courier ourselves, which is actually what we ended up doing.
However, for commonly I would think a normal Rails developer would just start tacking on extra methods to ActiveRecord to say, you know, give me all the dimensions, and effectively dirty up the entire class at that point, which we really don't want. Or it could switch to a SQL. There's a, that's a great gem, it's a good replacement
for ActiveRecord, there's a much better way to build and define very complex queries on the fly compared to ActiveRecord. However, that ship has sailed quite a long time ago, and I don't think management was really up for us to rewrite the entire app. So we had to do another way.
And so what we did is we defined our own library for reporting. What I did was recently I extracted out most of the non-Mobi logic, sort of clean-roomed a lot of the stuff and implemented it in a open-source gem.
I call it active reporting because I'm terrible at naming. But basically this provides a DSL-like system so you can tell your app various stuff in our OLAPI terms. It's a very lightweight DSL. And what it does, it uses ActiveRecord
and it asks ActiveRecord for bits of information about the database and about all the tables, tells it how to build the query for you, and it just executes it directly on the database. And instead of ActiveRecord objects coming back, you just get an array of hashes at that point. Very simple, lightweight, and small data set back
with the information that you actually want. And it actually, and it doesn't really dirty up ActiveRecord too much. I think it adds one method, maybe two at this point. I would say it's mostly production-ready. Video games got in the way for me to build a demo app. But 0.1.1 is out.
DSL's pretty much in a good spot. Documentation is pretty much what I need at this point. But this is effectively how it works. For every model you have in your app, you have a fact model to go with it. If you want it to be reported on effectively.
I'm calling this a fact model instead of a fact table because we're modeling how it's used within our OLAP. And because Rails is convention over configuration, the idea is you have your ActiveRecord model name, fact model as your entire class, and it'll just node a link to the proper model.
And there's obviously a way to override it if you're so inclined. But the idea is all the reporting stuff gets shoved into these classes and not your regular models at that point. So with your fact model, you then define the dimensions that you want to work with. Now why are we whitelisting all this stuff instead
of just saying, hey, ActiveRecord model, give me all of your relations, anything that has one or belongs to we can mention by, anything that's an attribute that we could probably group by, let's just use that. Well, what if we want user input, a user interface where we can change the charts and graphs on the fly.
Instead of in Moby I want all of my lines grouped by carrier, I want them instead grouped by carrier account. And maybe we'll have a dropdown saying I want to change this report altogether, change the dimension. So what we can do then is each fact model you can ask it for what dimensions it can work with.
And the gem knows if it needs to join against another table because it's a relation or if it's an actual attribute, just group by that. You can also define the hierarchy in the actual default dimension label. So if a dimension is a relation, the gem will assume
that the default label will be a name. So line joins to carrier, carrier table has a name, we use that as the label or we can override that if we see fit. And then we can also define the hierarchy which effectively makes more dimensions. And so we can have a nice line graph of orders over time, and I want to see it by date, now I want it by month,
now I want it by year, now I want it by quarter. So this allows you to set up a hierarchy to drill up and drill down your data as needed. Dimension filters, again these are just where clauses and the fact model can then whitelist stuff that you can filter by. And again, why are we whitelisting this stuff?
Because this can be possible user input. Scopes on a active record model are just glorified class methods, what's also a class method, delete all and destroy all. So we don't want to just blindly allow any input being coming in from a form to call methods that are not really safe.
So instead we whitelist on our fact models what a user or whoever is building the report can actually filter stuff by. And this can be done by just listing out pre-built scopes from the model. So it'll just whitelist those. You can define your own dimension filter using the same scope DSL, so throwing in a lambda with an input if you're so inclined.
This allows you to not have to tack on all your known filters to your model. If you don't need to use them in the rest of the app, you can throw them just all in the reporting. So that keeps your models slimmer. Or if you happen to have the Ransack gem loaded up, you can whitelist various Ransack calls as well. The other benefit of to specifying dimension filters manually
is now you have effectively full control or mostly full control over what the where clauses will be in the report. Active record isn't always the smartest in building optimal sequel on the fly. So if you can control that to maybe force using specific
indexes or, you know, maybe force a union instead of an or, you can do that. So that's the setup. Here's actually, here's the actual execution. The gem has a concept of a metric. And again, a metric is the question you want to answer.
You build a metric by giving it a name as the first argument. You tell it the fact model you want the metric to be based off of. Then you can pass in dimensions, filters, you can set the aggregate, it defaults to sum. You can set the measure of the actual column that you want to sum, max, min on.
It defaults to value. I think you can override that as well. But this builds an object that holds the question and all the information that it needs to reach out to the fact models, to then reach out to active record to get all the information and build the query that you actually want to run. And then finally, you shove it into a report object.
A report is effectively just a glorified courier that takes the metric and says build the sequel, then active record base, dot connection, dot execute, sequel go, yay, here's your very basic result set back. Now why are these separate objects? Well, as I mentioned in a previous slide,
a metric can be a very simple question you want to answer and then tack on more stuff. The active reporting report object will allow you to take a metric and then merge in user input from the interface to say I want to tack on, you know, the carrier dimension, or no,
I want to change it to something else. Here's my form of all my filters for these reports. Take that hash, shove it in there. It'll go through the wireless and apply the where clause dynamically at that part. This is the power of our OLAP, again, because you can define a where clause. You can define pretty much anything on the fly as long
as it'll result in proper sequel and get data back on it. So we built those two objects. This is the resulting sequel. You have the select clause is very specific to what you want. We are summing on the total charges column. The gem will give you whatever the metric name is
as the aggregate result column. We are dimensioning on carrier. So we are going to then grab the carrier label, which is the name. The gem has a nicety that you can turn off where you can get the identifier back.
So if you want to build filters on the fly, embed the identifier column of the dimension in a pie graph, you click the pie graph and then your filters magically update to have sales impress potential clients, which they never use the feature later anyway. But anyway, we build us the clause from the fact table.
We then have to join on our dimension. We apply our dimension filter, our where clause, and then we finally do the group by because we're in Postgres and we have to be valid query. And the end result for that, we called out run on it
and we get back an array of simple hashes. Then we call to JSON on it or whatever. We can massage it later with another service object if we so want to and then spit out a pretty chart, a table, a large number, whatever we want to do. And that's effectively all the gem does because, again,
reporting isn't about getting a table back of rows. We just want back aggregates of actual information that we care about. And so finally, just some pro tips in general for databases if you want to do reporting
on them or any form of way of getting information out. As mentioned before, try to avoid double jumps as much as possible with your queries. Sometimes denormalizing is a valid solution. It's much easier in Moby to ask, give me all my active lines by carrier because carrier ID is directly on the table.
We don't need to do a double jump at that point. And we just keep the carrier count and carrier in sync from a very simple, an active record callback or you can even just use a database trigger if you're so inclined. You can also cheat around some has-minis by implementing counter caches,
both the built-in counter cache plug-in or the, or just manually generated pre-built counts through background jobs or whatever processes you want. That way you have some data pre-built and pre-setup for you. So you can easily aggregate against that. Also index wisely, if you missed the previous talk about some database optimizations
with indexes before this, look that up. It was pretty informative about when to index, when not to index, and current gotchas with that. But the common rule is if you're going to dimension by something, that's a foreign key, you might as well index that, would probably help.
If you're indexing, if you have common filters that you're going to be filtering by a lot, like I said, you can, you whitelist the filters that you allow users to actually filter by. You can then use that to determine, I'm probably going to need to index these columns or these groups of columns. And use explain analyze as much as possible.
This query is taking 50 seconds. Why is it taking 50 seconds? Well, we have tools that our database provides that tells you exactly what the database is doing. And oh, I missed an index. Now it's suddenly half a second. Yay. So use the tools handy to optimize queries because this is still SQL.
It's still a regular database. All we're doing is just dynamically building a query to run. Also as you grow and you start to outgrow, you know, you go from a small app to a medium app to a large app. Look into read only replication slaves for databases. Anything reporting related, have it hit the read only because you're not doing writes.
And your master can take the day-to-day operations at that point. Or if you're in Postgres, look into sharding or even schema separation if you're a multi-tenant app. That way you have physically less data for the database to work with per client. So you only have, you only have slow queries for a giant, for your biggest clients.
And your rest of the clients that are smaller don't really have to take a hit. And you can focus on optimizing that one client as opposed to having to take a bunch of unhappy users at that point. So that's about all the rambling I have. Jim's on GitHub and released on Ruby Jim's as well.
Copy the slides if you're so inclined. I have a GitHub repo called show and tell. That's where I put all the talks I've been doing. I don't Twitter much. I follow people. But if you want to. I'm on Twitter. Questions, comments, hate mail, death threats, anybody?
Okay. We're done early.