Merken

Data Warehouses and Multi-Dimensional Data Analysis

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
In the end here and then hi I'm that ran on some knows this and I come from far far away from Latvia where is latvia the
that was the question that was asked by many Americans 3 years ago during the London Olympic Games when the USB label team was knocked out by a Latin and and then there are many questions on Twitter there are so these
are latin guys there are many questions on
Twitter where ways Latvia do they even have been so I thought they had just vampires and castles and stuff like that I wanted to start with a shoulder geography lesson so at 1st according to some
movies vampires live here but Latvia his located the across the Atlantic Ocean in
North in Eastern Europe is see there but according to some other movies that uh this vampire stuff originated in Transylvania but that's more due to the south of Europe that's not us but we have 500 kilometres long beach and if you didn't know it so it's 310 . 6 8 6 miles and so we have a lot of beaches so therefore there where a flat when we play
visual able and gay now back to our
topic today warehouses and multidimensional data analysis it have imagine we
are building the Rails application which will track product sales to our customers and so we have several models that no rules obligation like customers which have many orders and then at each order of these placed on particular date and can contain several older items each order item contains the price and quantity of the product that was bought and products also belong to product classes so this is our our simple of rails application and we want also to the last talk land that well we should use
posed as SQL so therefore we designed and stored everything in a post and this is our database schema with customers orders order right and products and product classes tables and so we're proud rails developer and the looking finger revocation then 1 day our CEO mass and ask
us a question well what where the total sales amount mountains in California in Q 1 last year but product families so OK will find it out so let's
look at our database schema so where do we store amounts OK we have order items stable there we can amount column there we should probably start with that 1 uh and that well we don't we like Rails conventions will write everything in we so we start with all the iden some amount of knowledge the I
question is in California where we do have this geography geography would do having customers stable therefore we need the joint order items stable joint customers and add condition that customers countries USA status California no in Q 1 2014 so where we do have this time information it's order date in stable that we have already orders a table joined we need to put conditions so we could translate this condition to that date is between 1st of January and 30 1st of March but we would like to stick to the original criteria and therefore we will extract the year from the date and extract the quarter from the data and will use was the specific functions for that and when into that's 2004 1st quarter and finally we need to do it by product
families which means that now we need to join also products and product classes and then grew product family and get the sum of the so I finally got the balance of so it's a problem of the shortest query rails and we can take a look at that
this was what we were of wrote in Ruby so this is generated SQL so probably we wrote it a little bit shorter but not much of what we could also be read directly in SQL and we presented to
a result our CEO and then he has lagged next question but also sales cost uh well we could write a separate query but this won't be so performance therefore will modify our query unfortunately the rail as uh this uh relations we can't make a sum of several columns and therefore we need to write some tricky stuff yes select explicitly product families and then some of sales amount cells of cost and then map adjustment at tributes of k then socio continues to ask questions and unique customers count OK we can add also count of this thing customers ID and return that as well but we started alluded to worry that so these are kind of ad for questions and we will need to log in 15 minutes of will call us and will to arise new query it would be better if we could somehow of teach users to write these queries by themselves so we we once tried
it and so explain so how easy it is right everything in rails console and get the result in but unfortunate business users so they didn't understand that and was so something not quite quite with the as well as their own business is doing pretty well and the amount of orders and order items is growing and we notice
that the when you need to do some aggregated queries on large data volumes for example we tested we copied some production data based on a local computer and we got some 6 million lines in order right stable and if we didn't have any conditions and just wanted to aggregate uh sales amount cells cost and the number of unique customers it took 25 seconds to do that so which is not quite as good for ad-hoc queries well then we also my consultants so what to do and then there's some consultants came and told that
well SQL is bad that you should do is noise SQL or introduce some Hadoop cluster and right MapReduce jobs and jobs to which will cover everything you need 1 probably also another those who still like SQL and the so of pollution do that but that's a
return to some classics and there were already 20 years ago there was 1 of the 1st edition of this book was written in the data warehouse toolkit by Ralph Kimball also I would definitely recommend anyone here so it's solid foundation and 1 into the the topics of read this book about this book talks about dimensional modeling and what are the main objectives of the dimensional modeling
I'm quoting this book uh so we need to to deliver data that's understandable and usable to the business users as well as we need to better deliver fast query performance and how we do this dimensional modeling the so it was doing
dimensional modeling so we need to identify so which are these terms that we see in these business questions so uh these analytical questions and modal our data structures based on that so it's a hell of a look again at this question what were the total sales amount in Californian Q 1 2014 product families so the 1st thing we will always notice there will be some this so-called facts or measures the there's some numeric measure that will like to aggregated by some other dimensions and the idea and by other dimensions which means which we can identify in these questions so we have California which is kind of customer or region dimension then we see some time dimension and we see some product dimensional product family dimension so when I heard just talking with our business users we can identify which of these facts which are these dimensions that we're going to use and the this data modeling techniques and they'd where so techniques suggest that we modal our so called
data warehouse where rules store the data but up organized according to this you know the dimensions and facts that we see in these queries and the typical would database schema that is used for that the so-called Star Schema uh because the most often is that will see 1 table in the center and then a lot of tables so uh with foreign keys linked to this essential table and therefore it looks like a star and these of this fact and dimension tables so let's start from the center of this will be this fact table of so we are using naming convention that we use as a perfect for that uh for sales data and all in the fact table will contain foreign keys to other dimensions like customer the product at the time and the uh and then there a measures numeric measures you we like to analyze like cells quantity amount and cost and then it's linked to the uh dimension tables we we use this naming convention will start with the prefix for them and we see that here this is customers service dimension where we see all the customers attibutes and then there are some special dimensions like time dimension so instead of a extracting year or quarter dynamically during our queries we want be calculated them and therefore we look for each should data that will appear for ourselves facts will be calculated store corresponding time dimension record will have some time 90 as well as the pre-calculated of which is this year what amounts at both as integers as well as a as strings which can be represented to to users who we want to represent present what enablement and etc.
and sometimes we don't have single Star Schema sometimes we have this so-called snowflake schemas that some dimensions like customer like products in our case are linked further to some classify some classes or categories dimension like product the classes in this case if we have a lot of these ones and then our database schema of something like a unique snowflake and the at
so we will store these a in in a separate database schema Laura here it could be even separate database so if we want to to look for performance reasons and how we would manage it from our Rails application so we create corresponding rails models were on top of these that fact and dimension tables we would have sales factor customer dimension time dimension problem dimension etc. and as these are separate database schema we need
to regularly populate this data warehouse schema with the data from our our transactional data and the simplest case would be that we just and regularly uh repopulate the whole database schema like truncate existing for example customers table and then do selected from our transactional schema and insert all the necessary uh fields in our dimension table or in case of time
dimension so we need to dynamically generated so we need to uh selected what are all the unique order dates in our case which appear and then we have to calculate which year quarter so they they belong and we store the so the calculated values in our uh time dimension table and finally we need to
all the facts and in this case so we and selected the data we do from all orders and order right and stable and extract the cells quantity sales amount cells cost and store corresponding of foreign key values to a dimension tables so 1 thing you know what what you can see in there that to simplify it time dimension idea generation we're using convention that will generate time ID as a four-year digits than 2 months digits and to date did so that that's we always and understand what time and he refers to no egg if we
return to the original question and how we would do so that so now all recruits will be more standardized so that we always start from that sales fact table and then we joined together corresponding this 3 dimensions like customers produced per the classes time and we specify conditions on the dimension tables that we want just USA California uh you get to for quarter 1 and the group by families and of the Son so probably it wasn't a much shorter standards but at least it is more standardized and I we always know how to approach these analytical queries but still we probably
wouldn't teach our users to write these queries directly and we're still emitting us this two-dimensional table models of model so we want to store everything in these standard two-dimensional tables but a much better
abstraction for these analytical queries is multidimensional data model so like to imagine that we have a multi-dimensional data cube the problem we can imagine 3 dimensions but let's imagine that you can imagine multidimensional late yields with arbitrary amount of dimensions and then in these intersection of dimension values we store measures uh which correspond to a particular dimension modes like in our case we have a imagine we have sales Q with customer product and time dimensions and then in in the section for each particle a customer product time period we store what was the sales quantities sales amount cells costs and unique customers come from that and there are technologies
that will the 1st that you needs this dimension and some dimensions might be just detailed list of values but some other dimensions could have hierarchies with several hierarchy levels like for example in customers dimension case in addition to detailed customers level we have all customers together then we uh can expand them to individual countries and countries to EU states that states cities and cities to individual customers or in case of time dimension we could have even
several hierarchies many sometimes we want to make the reporting we start by year quarter an individual day and sometimes you want to make weekly reporting and then the same dates with can group together by weeks and then by years where a they belong to ch
end of there are special technologies that are better suited and which use this multi-dimensional data modal and so they have typically are called all what technologies where all that stands for online analytical processing and viceversa traditional all TPC systems which are online transaction processing so these are the set technologies concentrate more on how to do efficiently analytical queries and there are several commercial and of the technologies for that matter as well as open source technologies and 1 of the most popular open source of all the engine is uh Mondrian engine by Pentaho and the 3rd Java library and where you need to uh write XML so to define some data schemas well we will be is don't like Java and XML so much so therefore a couple years ago I created Mondrian all of german uh which is Jeremy gender which embeds among them all up job engine and creates nice Ruby DSL around it so that you can use it from plane Ruby so how would let's introduce this Mondrian all up in our application
so the 1st thing that we need to define is this uh Mondrian schema where we do the mapping of these dimensions and measures that our users will use and which represent these business barons and we need to map them to that of actin and dimension tables and columns where the data are stored so let's look at example so we define sales fuel uh and the cells q will use this fact stable at underscore sales then we have a defined dimensions so defined we have customer dimension with this foreign key it will be using the uh the customers so table in the data warehouse schema and we specify with which are all these levels that we want to use in this dimension and in which a particle columns there stored as well as we define broadened dimension and the time dimension as well and then finally we also describe which will be these measures that we will use in our schema like sales quantity sales amount sales cost which use uh some as aggregator but then we have customers count and measure which will uh do that this thing count on the customers Aidid 18 ourselves fact able to get that unique count of customers for a particular query and there we use a different type of data so and
now when we look at around the same question and how we could get the results using modern all up so it's very simple and nice so as we uh if we look at that so so it's a minimum and directly translated the question to our query this tell that from sales cube on columns we will want to put as column heading we want which sales amount on rows we want to put all product families we take from product family level all members and we put this a limitation filter that we want to fill them just from customers dimension USA Californian and from time dimension take water quarter 1 at 2040 and we get a result of so we don't have any technical implementation details which are hidden and created 1 once in this modern scheme Commission
and modern engine as as several others and you internally are using this MDX Query Language which is 1 of the most popular query languages for this all tools which looks a little bit similar to SQL but not quite as uh and the wonder of perjury ruby gem and does the translation from this query builders indexed to the MDX Query Language for which will be executed and as a result we get the results object where we can the query and get so what our the column headings what taro row headings and what are the cell value so what we're getting there there are several other benefits
offered this Mondrian engine is that so what will execute some large uh MDX Query where we do not uh do any filtering and again I tested it on some 6 million rows in fact stable so initial pretty also for large query will take some 21 seconds but we execute the same query 2nd time there's cute and 10 ms gaza modern engine does the caching of the results this multi-dimensional data cube model it doesn't and do caching often uh these queries it cash is the actual results the to it when they do the new query we it analyzes OK we have already these data cached in these dead cells these data cube cells we don't have these ones for these ones I generate the corresponding SQL statement to populate the data and as in these analytical solution so we don't need very up-to-date information after the latest 2nd so we typically just regularly populate our data warehouse schema of the data and then while it's populated so it can Cashel results and if many users asking the same thing so results will be very fast additional benefits
are that now we can be much easier to introduce additional dimensions based on additional data abuse that we need for example in a customers table we had gender column which stored in 4 or and as the values of for a female or male and we we want to add a our of schema additional gender dimension and we can easily have created new gender dimension mapped to a customer stable to gender column in addition to so for users we want to decode that f means female and means male and we can put this name expression which will be used uh for generating these names of these dimensions members what we can and then we can use this so of dimension in the same way as we use that in any others and it In all queries and in addition we can do will be even
more advanced dynamic had to you dimensions for example we have a birth date for all our customers and we would like to have analyzed the cells by a a customer agents splitted 2 of them into several intervals for example less than 20 years 20 to 30 years 30 40 etc. and that we have bare then so we need to call a dynamically this we can also define a new age interval dimension at where we once can specify this more complex expression so that we would there this SQL expression which will dynamically calculated the difference between birth date and comments date and then uh based on the central tool output either a somewhat less than 20 years so 20 30 years etc. as well as their we dynamically generate the new dimension with these values and whenever area we have made the query so we we really can be to it will be up to date the based on 1 of what is the current time and then finally a
1 1 of the benefits of this modern engine is that we can make also calculation formulas uh like uh we can make this calculated measures for his unlike profit which is sales amount man minus cells cost or a uh margin in percentage which is profit divided by sales amount and we can specify format string that it should use percentage formatting and as a result there we can query and these calculated measures in the same way as stored measures and get their results back and also properly formatted and in this day and age calculation for so there's almost everything what you can do in XLE and so there's corresponding functional meaning and 4 as well so you can do a lot of more advanced collisions there and they
have as a result of these the you uh this data model allows to create has also from 10 a better user interfaces and uh for doing and hope queries by users uh so this is uh as we don't want to always have tried to these but by themselves but then these objects what we are using are the same as the car customers are asking the questions and so this is just example probably easy DIA business intelligence application that we are building where we provide just a graphical user interface where uh users can move OK we want this dimension on columns this dimension rose filter by these dimensions and then you results in Table on charts and informative so this is a data model is much better for doing these ad-hoc queries OK let's service switch do a
couple of other topics so we discussed about how to do the queries but let's say going back to ETL process so we talked about three-letter acronyms SQL and so let's talk about another is a 3 letter acronym ETL which means extract-transform-load In the simplest case is what we looked at of many we can populate our data warehouse just from the operational tables that transactional tables in a database but quite often we need many different data sources for a data warehouse uh some are stored in our transactional databases some are coming from some external sources as CSV files or from as REST API and then this process how we extract uh this information from other sources then we need to transform them properly parser different formats data formats and maybe a unified and standardized these data uh lead to use the same primary foreign keys et cetera this is this transformation step and finally we populate and loaded them into our data warehouse so there are several other Ruby tools for doing good
CTL so 1 was a done by square mean and etiology and and I want mentioned there's 1 new uh Gemini key but we we would for doing here is ETL process which is oriented to that uh
role-based extraction transformation and loading so this example from the read me there you you can make some reusable uh methods that you will do some data passing and then you define some source as a Ruby class and the source better sciously of database of something like that and then you in you can change several transformations and describing and his DSL how you would like to do the transformations and finally you would like to load the data into the database uh but
1 more thing I wanted to tell that if you do complex transformations then unfortunately is not the fastest programming language and if you need to process so thousands or hundreds of solid lines or millions of rows each might be slow but there's some then is a little if we still want to stick with it will be so maybe we should do it in parallel and wire II I recommend to take for example a look at concurrent ruby gem which provides several abstractions and 1 is the wood which is very well suited for this multithreaded he's a thread pool and that the whole thread pool is that we can create a some fixed or varying size thread pool and we can push jobs to this thread pool and then when it's complete set so it will pushes it gives some results and so then that's that's probably a processed by the next thread pool and which might assume very well we have a set of detailed process that we have some innate extraction thread-pool forms only few fetch the data from external REST API is much faster if it's for example paginated REST API is much faster start to fetch all the pages already in parallel and not fetch page 1 by 1 as the 1st page but after author of the next 1 etc. it will be much faster in terms of total clock time to start already let's fetching parallel 1st 10 pages than next and pages so we can use that pool there then if we need to do complex of transformation of the data so then we can uh can use that uh in parallel threads so these the transformation but there's 1 protein so then please you just to be as jade will be can use all your process of course if you will try to do it in MRI so then I'm first-lien MRI just 1 thread can run in parallel get or you need to vent to make several processes which are run in parallel but the let's look at a very 1
simple example of what we initially looked at single-threaded ETL process where we selected unique dates from orders and then we inserted in our time dimension table and but let's make its multithreaded in this
example that initially we created in said data pool with the sum of fixed as fixed pool with default size 4 and then we select the set of all the unique days but then we push oppose then to this thread pool and in this thread will do the dating session please note also that in that case if you doing multiple threads please always do explicitly check new connection from Active Record connection pool as otherwise if you take you'll Davis connections it to automatically fetch out then you'd uh database connections in new friends but if you will not not to give it back to the you will run out of a neighbor's connections and the family of a shutdown away for 10
nation anniversary in this same case and in some benchmark also locally on the and then I managed to reduce twice the total clock time for loading the data but please also see and be aware that if you start increase this thread pool size even more you might start to get worse results because in this case we still are finally in setting all the data in the same positives data table and then positives might start to do some walking in slow down the process of if we try to do in sessions in the same table from 2 0 to many parallel threads so please do benchmark so if use j will be there and goods standard java tools for that like visual and all and more job Mission Control and that are going to be you don't need to unite all your application JOB you can use just for your data warehouse project you a populated the data and then do the queries at the and to finally
I want to give a short overview of traditional versus analytical relational databases as the so most of us our when we're working with SQL databases so we think of this traditional databases which are optimized for transaction processing like my scale or post this on my gossip SQL Server or Oracle and they can deal with large tables and that they are optimized for doing it smaller transactions like inserting updating and selecting small set of results but as we saw of them if we tried to do aggregations of millions of records so they are not the best technology for that and there are a different set of SQL relational databases but which are optimized for Analytical Processing for example there's so 1 of the pioneers where are open source database monitor DB there are several commercial databases like HP vertical or in Fulbright about which have also community additions where can use them up to some significant size of of your data or with some limited features as well as the guy if you're using Amazon Web Services then Amazon provides Amazon redshift a database which is also the sum of a skewed a scalar analytical or database but the optimized for analytical queries and what is the main this magical trade to that these databases are using all 4 of these analytical queries the so then
mostly use different data storage uh from they store the data if we look at the traditional databases that they mostly use role-based storage which means if we have table and row in a table then physically on the database so this all these columns from this are stored together In 1 this file blocks and when we need to for example to do as some of some numeric amount as we saw do some of sales amount then it will need to read practically all our database table because so we need because sales amount from here from here from here from here and therefore that's law and so then what uh most of these analytical database are doing their using column storage logical perspective we're still using uh um it and then as tables with the rows but the physical storage is organized by columns so like in this case as so in this in the same example all values in the same column will be stored together in the next set of columns will be stored together and that was the main benefits now if we need to do some of all account of some a column for from all records they're all stored together and we we can read that much quicker the other benefit is that especially in these data warehouse sales of fact tables we have also a lot of repeating values for example these foreign keys or a or if we store directly the sum of values which are a repeating some clustering information and then be they all stored together they could be compressed much more effectively and therefore in these so analytical Dave is also do better compression of the data in sets of data the major drawback is that the individual transactions when using a cold storage will be much slower if you will now will want to insert these rows 1 by 1 in these analytical databases so it'll be much slower than in traditional transactional databases or if you update 1 by 1 level if you're using this column database storage database analytical that this is you typically any prepare your data what you would like to have their and then you do bulk import of the whole table on the bulk import just on the changes which will be much more efficient and made a also
simple example on my local machine uh as I said that I had generated over this sales fact table with 6 million rows and I indeed of this query which just does the aggregation of all the sales amount cells cost standard distinct count of our customer I. D. uh from moral 6 million arose and due product families and on so whenever you are run it so it was approximately 80 seconds on my local machine then i in virtual machine installed the HP vertical and we didn't do any specific kind of optimization cooperation there so the 1st query I run so it took about 9 seconds because just and needed to load and catch the data memory but each repeated query to just 1 . 5 seconds so would be exactly the same date amount so I got to 10 times the foster performance so in reality probably you won't get to the 10 times better performance all the time about in some studies of real customer data they've quite often reports on 3 to 5 uh improvement on the query speeds which are like these aggregation by by queries and I did the testing also on Amazon redshift and got similar results to that of the same dataset and so
my very unsophisticated yeah recommendation of the ball on scientific a recommendation went to consider what so if you have less than million rows in fact tables so then you probably won't see any big difference so if you get out 10 million so then complex queries will be gets lower and positive as my skull and if it's will be 100 million so so you won't be able to manage these aggregation queries and the realistic time and so when you have it already 10 million and more records in your fact table them for analytical queries you might need to consider the span analytical column so show to
recap what we did cover so problems with analytical queries using traditional approaches dimensional modeling Star Schema open CTL an analytical corner databases and thank you very
much for attention and you can serve CEO the all these examples I posted on did cover my arson profile there this sales up a demo application so you can find it there what I showed it and the later also my slides will be published and thank you very much and I have some Tools minutes for questions still thank you uh who might this is my in here this
Multiplikation
Twitter <Softwareplattform>
Spieltheorie
Datenanalyse
Vorlesung/Konferenz
Computeranimation
Emulation
Twitter <Softwareplattform>
Computeranimation
Normalvektor
Informationsmodellierung
Weg <Topologie>
Datenanalyse
Datenanalyse
Klasse <Mathematik>
Schlussregel
Kartesische Koordinaten
Biprodukt
Ordnung <Mathematik>
Biprodukt
Ordnung <Mathematik>
Computeranimation
Familie <Mathematik>
Total <Mathematik>
Datenhaltung
Klasse <Mathematik>
Familie <Mathematik>
Ruhmasse
Biprodukt
Computeranimation
Datenhaltung
Rechter Winkel
Total <Mathematik>
Biprodukt
Softwareentwickler
Ordnung <Mathematik>
Tabelle <Informatik>
Lineares Funktional
Stabilitätstheorie <Logik>
Datenhaltung
Gewichtete Summe
Biprodukt
Computeranimation
Konditionszahl
Ruhmasse
Wiener-Hopf-Gleichung
Information
Ordnung <Mathematik>
Ordnung <Mathematik>
Tabelle <Informatik>
Familie <Mathematik>
Summengleichung
Bit
Gewichtete Summe
Gruppenkeim
Klasse <Mathematik>
Familie <Mathematik>
Abfrage
Biprodukt
Ordnung <Mathematik>
Biprodukt
Innerer Punkt
Computeranimation
Resultante
Gewichtete Summe
Eindeutigkeit
Relativitätstheorie
Gewichtete Summe
Familie <Mathematik>
Zellularer Automat
Abfrage
Biprodukt
Zählen
Computeranimation
Mapping <Computergraphik>
Gruppenkeim
Rechter Winkel
Mapping <Computergraphik>
Ablöseblase
Kompakter Raum
Attributierte Grammatik
Biprodukt
Eindeutigkeit
Ordnung <Mathematik>
Ordnung <Mathematik>
Stabilitätstheorie <Logik>
Zahlentheorie
Eindeutigkeit
Zwei
Gewichtete Summe
Geräusch
Zellularer Automat
Abfrage
EDV-Beratung
Cluster-Analyse
Computer
Biprodukt
Computeranimation
Last
Prozess <Informatik>
Rechter Winkel
Konditionszahl
Mapping <Computergraphik>
Zählen
Volumen
Kompakter Raum
Biprodukt
Spezifisches Volumen
Ordnung <Mathematik>
Ordnung <Mathematik>
Gerade
Objekt <Kategorie>
Retrievalsprache
Informationsmodellierung
Data-Warehouse-Konzept
Elektronischer Programmführer
Datenmodell
Klassische Physik
Sollkonzept
Computeranimation
Lesen <Datenverarbeitung>
Familie <Mathematik>
Total <Mathematik>
Data-Warehouse-Konzept
Hausdorff-Dimension
Familie <Mathematik>
Zellularer Automat
Kardinalzahl
Term
Computeranimation
Informationsmodellierung
Datensatz
Hausdorff-Dimension
Biprodukt
Datenstruktur
Speicher <Informatik>
Einflussgröße
Tabelle <Informatik>
Data-Warehouse-Konzept
Multifunktion
Datenhaltung
Datenmodell
Abfrage
Schlussregel
Sollkonzept
Biprodukt
Dienst <Informatik>
Ganze Zahl
Schlüsselverwaltung
Tabelle <Informatik>
Zeichenkette
Trennungsaxiom
Data-Warehouse-Konzept
Multifunktion
Kategorie <Mathematik>
Hausdorff-Dimension
Datenhaltung
Klasse <Mathematik>
Einfache Genauigkeit
Kartesische Koordinaten
Biprodukt
Teilbarkeit
Computeranimation
Eins
Informationsmodellierung
Modelltheorie
Tabelle <Informatik>
Tabelle <Informatik>
Last
Hausdorff-Dimension
Datenfeld
Data-Warehouse-Konzept
Hausdorff-Dimension
Datenhaltung
Eindeutigkeit
Geschlecht <Mathematik>
Ordnung <Mathematik>
Computeranimation
Tabelle <Informatik>
Familie <Mathematik>
Stabilitätstheorie <Logik>
Multifunktion
Oval
Hausdorff-Dimension
Klasse <Mathematik>
Gruppenkeim
Familie <Mathematik>
Abfrage
Zellularer Automat
Computeranimation
Last
Generator <Informatik>
Ganze Zahl
Rechter Winkel
Konditionszahl
Digitalisierer
Total <Mathematik>
Biprodukt
Speicher <Informatik>
Ordnung <Mathematik>
Schlüsselverwaltung
Innerer Punkt
Tabelle <Informatik>
ATM
Abstraktionsebene
Hausdorff-Dimension
Eindeutigkeit
Datenmodell
Abfrage
Zellularer Automat
Benutzerfreundlichkeit
Biprodukt
Frequenz
Computeranimation
Informationsmodellierung
Würfel
Zellularer Automat
ATM
Würfel
Garbentheorie
Partikelsystem
Einflussgröße
Tabelle <Informatik>
Dimension 2
Addition
Hausdorff-Dimension
Hierarchische Struktur
Mailing-Liste
Übergang
Computeranimation
Übergang
TUNIS <Programm>
Hausdorff-Dimension
Hierarchische Struktur
Ablöseblase
Zählen
Verkehrsinformation
Aggregatzustand
Ebene
Stabilitätstheorie <Logik>
Subtraktion
Analytische Zahlentheorie
Data-Warehouse-Konzept
Stochastischer Prozess
Hausdorff-Dimension
Applet
Zellularer Automat
Kartesische Koordinaten
Zählen
Computeranimation
Übergang
Prozess <Informatik>
Primzahlzwillinge
Datentyp
Programmbibliothek
Biprodukt
Einflussgröße
Open Source
Eindeutigkeit
Abfrage
Physikalisches System
Marketinginformationssystem
Gerade
Stochastischer Prozess
Mapping <Computergraphik>
Modallogik
Transaktionsverwaltung
Menge
Geschlecht <Mathematik>
Analytische Zahlentheorie
Partikelsystem
Schlüsselverwaltung
Tabelle <Informatik>
Familie <Mathematik>
Resultante
Bit
Extrempunkt
Hausdorff-Dimension
Wasserdampftafel
Familie <Mathematik>
Zellularer Automat
Abfrage
Implementierung
Nummerung
Biprodukt
Computeranimation
Übergang
Datensatz
Abfragesprache
Würfel
Total <Mathematik>
Translation <Mathematik>
Ablöseblase
Retrievalsprache
Inverser Limes
Biprodukt
Schreib-Lese-Kopf
Resultante
Stabilitätstheorie <Logik>
Data-Warehouse-Konzept
Hausdorff-Dimension
Zellularer Automat
Computeranimation
Eins
PROM
Informationsmodellierung
Datensatz
Arithmetischer Ausdruck
Hausdorff-Dimension
Biprodukt
Addition
Addition
Multifunktion
Befehl <Informatik>
Zwei
Geschlecht <Mathematik>
Abfrage
Hierarchische Struktur
Würfel
Geschlecht <Mathematik>
Caching
Attributierte Grammatik
Information
Tabelle <Informatik>
Zentralisator
Randverteilung
Resultante
Subtraktion
Stoß
Hausdorff-Dimension
Zellularer Automat
Rechenbuch
Computeranimation
Ausdruck <Logik>
Wurm <Informatik>
Metropolitan area network
Arithmetischer Ausdruck
Hausdorff-Dimension
Ausdruck <Logik>
Biprodukt
Einflussgröße
Funktion <Mathematik>
Metropolitan area network
Diskretes System
Abfrage
Gasströmung
Rechnen
Arithmetisches Mittel
Flächeninhalt
Einheit <Mathematik>
Ablöseblase
Dateiformat
Attributierte Grammatik
Zeichenkette
Resultante
Retrievalsprache
Subtraktion
Data-Warehouse-Konzept
Stochastischer Prozess
Hausdorff-Dimension
Kartesische Koordinaten
Transformation <Mathematik>
Computeranimation
Last
Integraloperator
Benutzeroberfläche
Datenhaltung
REST <Informatik>
Datenmodell
Abfrage
Quellcode
Elektronische Publikation
Stochastischer Prozess
Objekt <Kategorie>
Dienst <Informatik>
Ablöseblase
Dimensionsanalyse
Dateiformat
Benutzerführung
Information
Schlüsselverwaltung
Tabelle <Informatik>
Mathematische Logik
Datenhaltung
Klasse <Mathematik>
Transformation <Mathematik>
Quellcode
Computeranimation
Stochastischer Prozess
Arithmetisches Mittel
Open Source
Last
Quadratzahl
Ein-Ausgabe
Quadratzahl
Integraloperator
Bitrate
Schlüsselverwaltung
Implementierung
Lesen <Datenverarbeitung>
Resultante
Stereometrie
Multiplikation
Total <Mathematik>
Datenparallelität
Hausdorff-Dimension
Transformation <Mathematik>
Komplex <Algebra>
Term
Computeranimation
Homepage
Open Source
Last
Datensatz
Bildschirmmaske
Prozess <Informatik>
Thread
Integraloperator
Parallele Schnittstelle
Gerade
Programmiersprache
Autorisierung
Abstraktionsebene
REST <Informatik>
Anwendungsspezifischer Prozessor
Eindeutigkeit
Einfache Genauigkeit
Stochastischer Prozess
Thread
Menge
Datenparallelität
Ordnung <Mathematik>
Tabelle <Informatik>
Gewichtete Summe
Total <Mathematik>
Ortsoperator
Data-Warehouse-Konzept
Applet
Familie <Mathematik>
Kartesische Koordinaten
Computeranimation
Last
Negative Zahl
Datensatz
Prozess <Informatik>
Thread
Default
Benchmark
Einfach zusammenhängender Raum
Datenhaltung
Güte der Anpassung
Globale Optimierung
Applet
Abfrage
Benchmark
Visuelles System
Wendepunkt
Menge
Thread
Datenparallelität
Projektive Ebene
Tabelle <Informatik>
Standardabweichung
Resultante
Retrievalsprache
Relationale Datenbank
Server
Subtraktion
Gewichtete Summe
Analytische Zahlentheorie
Data-Warehouse-Konzept
Mathematisierung
Orakel <Informatik>
Mathematische Logik
Gesetz <Physik>
Computeranimation
Datenhaltung
Knotenmenge
Datensatz
Perspektive
Speicher <Informatik>
Quellencodierung
Addition
Relationale Datenbank
Zentrische Streckung
Open Source
Datenhaltung
Speicher <Informatik>
Abfrage
p-Block
Elektronische Publikation
Stochastischer Prozess
Transaktionsverwaltung
Emulation
Menge
Analytische Zahlentheorie
Transaktionsverarbeitung
Ablöseblase
Server
Information
Schlüsselverwaltung
Tabelle <Informatik>
Resultante
Retrievalsprache
Subtraktion
Analytische Zahlentheorie
Minimierung
Familie <Mathematik>
Zellularer Automat
Zählen
Computeranimation
Datenhaltung
Virtuelle Maschine
Datensatz
Knotenmenge
Reelle Zahl
Transaktionsverwaltung
Softwaretest
Beobachtungsstudie
Umwandlungsenthalpie
Zwei
Abfrage
Biprodukt
Objektklasse
Kreisbogen
Komplex <Algebra>
Festspeicher
Analytische Zahlentheorie
Verbandstheorie
Innerer Punkt
Tabelle <Informatik>
Retrievalsprache
Demo <Programm>
Analytische Zahlentheorie
Datenhaltung
Abfrage
Profil <Aerodynamik>
Kartesische Koordinaten
Computeranimation
Datenhaltung
Rechenschieber
Last
Informationsmodellierung
Hausdorff-Dimension
Offene Menge
Analytische Zahlentheorie
Integraloperator

Metadaten

Formale Metadaten

Titel Data Warehouses and Multi-Dimensional Data Analysis
Serientitel RailsConf 2015
Teil 72
Anzahl der Teile 94
Autor Simanovskis, Raimonds
Lizenz CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/30655
Herausgeber Confreaks, LLC
Erscheinungsjahr 2015
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Typical Rails applications have database schemas that are designed for on-line transaction processing. But when the data volumes grow then they are not well suited for effective data analysis. You probably need a data warehouse and specialized data analysis tools for that. This presentation will cover * an introduction to a data warehouse and multi-dimensional schema design, * comparison of traditional and analytical databases, * extraction, transformation and load (ETL) of data, * On-Line Analytical Processing (OLAP) tools, Mondrian OLAP engine in particular and how to use it from Ruby.

Ähnliche Filme

Loading...