DWs in Praxis (03.02.2011)
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 13 | |
Number of Parts | 13 | |
Author | ||
Contributors | ||
License | CC Attribution - NonCommercial 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/334 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2011 | |
Production Place | Braunschweig |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
4
5
7
10
11
12
13
00:00
ComputerProgrammer (hardware)DatabaseSoftware developerBusiness IntelligenceDisintegrationEnterprise architectureData managementReference dataInformationTelecommunicationTheoryBitWordData warehouseData miningCASE <Informatik>Category of beingWebsiteInsertion lossSemiconductor memoryProgrammer (hardware)Information technology consultingMeasurementAreaSoftware developerInformationFocus (optics)Cartesian coordinate systemDatabaseSuite (music)Data managementPrice indexLogicDirected graphFlash memoryChaos (cosmogony)Information managementProjective planeCountingOffice suiteNumberProduct (business)Computer animationXML
03:19
Local GroupTraffic reportingComputing platformGroup actionComputer architectureOffice suiteInformation technology consultingMereologyMassData managementSoftware developerAreaTraffic reportingExpert systemLink (knot theory)Order (biology)Source codeExecution unitCentralizer and normalizerDifferent (Kate Ryan album)LogicInsertion lossCalculationInformationType theoryPhysical systemPlastikkarteFile formatGroup actionData warehouseMeasurementLevel (video gaming)Disk read-and-write headRule of inferenceCartesian coordinate systemMainframe computerComputer fileProjective planeStandard deviationEndliche ModelltheorieDatabaseComputer architectureCASE <Informatik>Business objectRisk measureTotal S.A.DebuggerProduct (business)Population densitySystem callCodePulse (signal processing)InternetworkingMetropolitan area networkWeb pageSet (mathematics)Multiplication signLocal ringCoalitionGame theoryNichtlineares GleichungssystemCountingMathematicsMatching (graph theory)BitDemosceneNumberIndependence (probability theory)Program flowchart
10:20
Computer architecturePower (physics)Object (grammar)Software testingData managementGastropod shellScripting languageAIX <Betriebssystem>Java appletStrategy gameWordDifferent (Kate Ryan album)QuicksortSuspension (chemistry)MassInsertion lossAdditionLevel (video gaming)Physical systemIncidence algebraPrime idealMoment (mathematics)Multiplication signMeasurementMatching (graph theory)Point (geometry)Theory of relativityBusiness objectHost Identity ProtocolUniform boundedness principleIdentity managementINTEGRALCASE <Informatik>Row (database)DatabasePhase transitionProduct (business)ResultantExistenceDisk read-and-write headTable (information)Sign (mathematics)System callVotingBitOcean currentPolygon meshType theoryRule of inferenceDefault (computer science)MathematicsSource codeRegulator geneSinc functionMappingView (database)SoftwareTraffic reportingAreaComputer fileComputer architectureMathematical analysisRevision controlMereologyData warehouseData managementData integrityAuthorizationPressureNumberUniverse (mathematics)Cartesian coordinate systemProgram flowchart
16:22
Power (physics)Object (grammar)Strategy gameSoftware testingData managementJava appletAIX <Betriebssystem>Gastropod shellScripting languageColor managementKolmogorov complexityPressureProcess (computing)Perfect groupData warehouseNegative numberData managementProcess (computing)Software developerJava appletPhysical lawCalculationRevision controlRule of inferenceMultiplication signOperator (mathematics)Module (mathematics)Level (video gaming)Cartesian coordinate systemSoftware testingGastropod shellScripting languageGoodness of fitPoint (geometry)Task (computing)Electronic data processingSubject indexingSound effectDimensional analysisInsertion lossRight angleRow (database)Source codeStructural loadSoftwareArithmetic meanLogicBasis <Mathematik>Mathematical analysisFormal grammarProjective planeInformation technology consultingAddress spaceType theoryPhysical systemBusiness objectValidity (statistics)Table (information)MassTelecommunicationRegulator geneData qualityPressureMeasurementMaizeCondition numberMappingGroup actionMathematicsState of matterWorkstation <Musikinstrument>Metropolitan area networkWordMusical ensembleSystem callQuicksortMatching (graph theory)DatabaseMoment (mathematics)MultiplicationContext awarenessMereologyFocus (optics)BitClosed setSet (mathematics)Order (biology)Numbering schemePlanningCausalityMessage passingDegree (graph theory)Phase transitionAreaComplex analysisPolygon meshNichtlineares GleichungssystemView (database)Semiconductor memoryComputer animation
24:50
Kolmogorov complexityProcess (computing)MetadataField (computer science)Fluid staticsSource codeAerodynamicsPlanningPressureProjective planeDynamical systemSource codeMetadataTable (information)Structural loadLogicStatisticsPhysical systemRow (database)Process (computing)DatabaseNumberMultiplication signCartesian coordinate systemCrash (computing)Chemical equationAdditionKey (cryptography)Complex analysisDifferent (Kate Ryan album)Revision controlMathematicsFrequencyEndliche ModelltheorieType theoryPhysical lawFlow separationRight angleResultantTraffic reportingOperator (mathematics)Software developerBusiness objectStandard deviationMappingCASE <Informatik>Set (mathematics)Independence (probability theory)Asynchronous Transfer ModeBitSimilarity (geometry)Game theoryDistanceArithmetic meanReal numberPoint (geometry)State of matterInsertion lossBuildingWordFamilyBoss CorporationOnline helpBasis <Mathematik>Lattice (order)Tablet computerCodeCausalityElectronic visual displaySinc functionMatching (graph theory)Metropolitan area networkInformation securitySystem callBeat (acoustics)Discounts and allowancesMusical ensembleCore dumpCircleDimensional analysisComputer animation
33:14
MetadataCASE <Informatik>Decision theoryProduct (business)CalculationTotal S.A.Order (biology)Mathematical analysisHierarchyNumberCodierung <Programmierung>Cycle (graph theory)Address spaceProcess (computing)Data profilingTraffic reportingFeedbackLoop (music)Source codePhysical systemLink (knot theory)Rule of inferenceDefault (computer science)TelecommunicationWordTerm (mathematics)Physical systemData warehouseObject (grammar)Rule of inferenceSource codeMereologySystem callDefault (computer science)Field (computer science)Bit rateOperator (mathematics)Office suiteMeasurementData qualityTable (information)Projective planeProcess (computing)Normal (geometry)CalculationComplex analysisMappingPressureType theoryComputer programmingSpacetimeProgram codeStructural loadRisk measureNumberData managementRow (database)Traffic reportingOrder (biology)Mathematical analysisCASE <Informatik>Cartesian coordinate systemConstraint (mathematics)Flow separationGoodness of fitCausalityPoint (geometry)Endliche ModelltheorieArithmetic meanShift operatorLevel (video gaming)Physical lawAreaQuantumBitProgrammer (hardware)Nichtlineares GleichungssystemCommutatorNetwork topologyBoom (sailing)Electronic mailing listPlanningFunktionalanalysisState of matterMultiplication signDatabasePersonal digital assistantBasis <Mathematik>FreewareRhombusCentralizer and normalizerVotingNoise (electronics)Open setShooting methodLocal ringCoprocessor
41:39
Default (computer science)Address spaceData profilingTraffic reportingSource codePhysical systemLink (knot theory)Product (business)Cycle (graph theory)Self-organizationProcess (computing)Rule of inferenceIntegrated development environmentData typeMaxima and minimaCountingPattern languageLengthInformation securityInformationOutlierComplex analysisArchitectureSoftware frameworkType theoryTerm (mathematics)Phase transitionSoftware testingDisintegrationPlanningData analysisProcess modelingTexture mappingExplosionParameter (computer programming)Standard deviationTemplate (C++)Task (computing)Project <Programm>Complete metric spaceMathematicsFrequencyFile formatEndliche ModelltheoriePoint (geometry)Table (information)Extension (kinesiology)Physical systemProduct (business)Insertion lossStatisticsDatabaseQuicksortField (computer science)Level (video gaming)Information securityPolygon meshMathematicsInformationVideo gameCASE <Informatik>MultiplicationPulse (signal processing)Traffic reportingSemiconductor memoryForestMessage passingMusical ensembleMeasurementPlanningOpen setState of matterArithmetic meanScaling (geometry)Hand fanPopulation densitySystem callUniform boundedness principleMaxima and minimaSound effectSequelLattice (order)Data managementPhase transitionStatement (computer science)Disk read-and-write headCycle (graph theory)Metropolitan area networkScalabilityCausalityClosed setProjective planeData analysisDefault (computer science)Pointer (computer programming)Software developerData qualityData modelProfil (magazine)Source codePhysical lawBitError messageFlow separationDifferent (Kate Ryan album)Mathematical analysisResultantDoubling the cubeStandard deviationType theoryCore dumpComputer animation
50:04
Term (mathematics)Source codeFrequencyPerspective (visual)Series (mathematics)Data analysisRepository (publishing)Traffic reportingTask (computing)Attribute grammarDistribution (mathematics)Mathematical analysisMetadataOutlierPattern languageProcess (computing)Texture mappingFile formatSuite (music)DisintegrationMaxima and minimaDiscrete groupAverageStandard deviationProcess modelingPhase transitionProduct requirements documentData modelInformation securityLogicInformationCodeSoftware developerElement (mathematics)Transformation (genetics)ArchitectureSoftwareComputer hardwareScalabilityComponent-based software engineeringSerializabilityElectric currentDecision theoryComputer architectureVariety (linguistics)Human migrationComplex analysisProduct (business)Axiom of choiceExecution unitBroadcast programmingExplosionConfiguration spaceData warehouseHausdorff dimensionTable (information)Connectivity (graph theory)HierarchyPlastikkarteUniverse (mathematics)Software testingIndependence (probability theory)Strategy gameComputing platformPersonal digital assistantNormal (geometry)System identificationException handlingDigital signal processingInformation privacyIntegrated development environmentData managementMotion captureRootCondition numberScheduling (computing)Endliche ModelltheorieStrategy gameOffice suiteMassLevel (video gaming)Data integrityPhase transitionAngeordneter KörperStandard deviationSoftware developerMereologyProduct (business)System callBoss CorporationData modelProjective planeSoftware testingPoint (geometry)Video gameCASE <Informatik>DampingClosed setGroup actionCycle (graph theory)Physical systemElectronic mailing listLattice (order)Front and back endsProcess (computing)Goodness of fitCausalityInsertion lossDimensional analysisData managementArithmetic meanTable (information)Formal languageMappingExecution unitScalabilityObject (grammar)DatabaseINTEGRALResultantState of matterPlanningDataflowMultiplication signConnectivity (graph theory)InformationData conversionReal numberMetadataBusiness objectEstimatorData qualityField (computer science)Data warehouseCalculationData analysisMixture modelProfil (magazine)Source codeQueue (abstract data type)Open sourceProzesssimulationTraffic reportingComputer architectureMathematical analysisScripting languageUniverse (mathematics)Complex analysisFlow separationOnline helpCapability Maturity Model IntegrationParallel portAuthorizationRevision controlVirtual machineComputer animation
58:29
Phase transitionSoftware testingProcess (computing)Condition numberData warehouseAxiom of choiceScheduling (computing)DisintegrationData integrityComplete metric spaceCodeHuman migrationFormal grammarForceIntegrated development environmentSelf-organizationInformation securityService (economics)Execution unitDependent and independent variablesRevision controlControl flowDatabaseImplementationSoftware developerStatisticsChannel capacityPlanningData modelData structureStrategy gameArchitectureRelational databaseProcess modelingEnterprise architectureDecision theorySoftware maintenanceCategory of beingBusiness IntelligencePerspective (visual)Traffic reportingTemplate (C++)Mathematical analysisBitINTEGRALCorrespondence (mathematics)Goodness of fitMessage passingNumbering schemeMereologySoftware developerData modelPhysical lawData integrityState of matterTelecommunicationWave packetDigitizingCASE <Informatik>NeuroinformatikProjective planeDatabaseQuicksortWebsiteScripting languagePhysicalismPhysical systemPoint (geometry)Data managementArithmetic meanVector potentialMultiplication signAuthorizationIndependence (probability theory)Game theoryObject (grammar)Endliche ModelltheorieExpected valueSet (mathematics)Virtual machineProduct (business)Context awarenessVotingSubject indexingMathematicsNetwork topologyTotal S.A.Field (computer science)Software industrySource codeSubsetArtificial neural networkTraffic reportingTable (information)Software testingDistanceDifferent (Kate Ryan album)Type theoryAnalytic setInformationDefault (computer science)Core dumpRight angleCycle (graph theory)Video gameLevel (video gaming)Mathematical optimizationView (database)Rule of inferenceData warehousePhase transitionComputer animation
01:06:53
Wide area networkReal numberData managementInformationMultiplicationSource codeArchaeological field surveyMobile WebCloud computingBusiness IntelligenceArchitectureInformation securityData storage deviceEnterprise architectureService (economics)Product (business)Operations researchProcess (computing)Electric currentPosition operatorCASE <Informatik>Multiplication signMultiplicationMathematical optimizationNetwork socketForestMereologyGroup actionVirtualizationPlanningPerspective (visual)Link (knot theory)Graph (mathematics)NeuroinformatikUniverse (mathematics)Field (computer science)Network topologyType theoryEndliche ModelltheorieProcess (computing)Cellular automatonIndependence (probability theory)Arithmetic meanCloud computingWordPoint cloudWeb 2.0Flow separationComputer clusterTwitterChief information officerComputer animation
01:10:24
Food energyTelecommunicationVideo gameData miningLevel (video gaming)TelecommunicationNeuroinformatikInsertion lossMatching (graph theory)Connectivity (graph theory)QuicksortPoint (geometry)System callDegree (graph theory)Goodness of fitRhombusComputerPhysical lawGroup actionChainNetwork topologyMetropolitan area networkOnline helpDirected graphMassState of matterSoftware developerProcess (computing)StatisticsRegulator geneData warehouseReal-time operating systemPurchasing2 (number)Mathematical analysisSoftwareMereologyReal numberComputer animation
01:14:00
Enterprise architectureData managementInformationBusiness IntelligenceAsynchronous Transfer ModeDecision theoryMultiplicationDisintegrationIntelStandard deviationParallel computingReplication (computing)Data recoveryMathematical optimizationStructural loadBackupQuery languageTraffic reportingSynchronizationInteractive televisionComputer networkData modelDatabaseSinguläres IntegralPredictionMobile WebComputer programFormal languageNatural numberRootCache (computing)PlanningComputer architectureMKS system of unitsPerpetual motionChainDistribution (mathematics)Open setAnalytic setMobile appArchitectureKolmogorov complexityFocus (optics)Mathematical analysisException handlingData warehouseProcess (computing)INTEGRALMessage passingPhysical systemType theoryGroup actionLoop (music)Term (mathematics)View (database)Table (information)Mobile WebProcess (computing)Computer clusterMathematical analysisDatabaseRow (database)MultiplicationSubject indexingRaster graphicsSoftwareOperator (mathematics)MereologyWordRelational databaseDimensional analysisDialectSoftware developerHierarchyTraffic reportingDifferent (Kate Ryan album)MassCASE <Informatik>Numbering schemeInsertion lossMetropolitan area networkHypermediaLevel (video gaming)INTEGRALQuicksortMeasurementPairwise comparisonNetwork topologyPolygon meshPhysical lawResultantDiagramComputer animation
01:20:23
InformationProcess (computing)Kolmogorov complexityComputer architectureDisintegrationArchitectureFocus (optics)Mathematical analysisINTEGRALData modelTraffic reportingException handlingData warehouseDatabaseTraffic reportingStatement (computer science)Table (information)Direction (geometry)SpacetimeGroup actionCalculationDimensional analysisInformation technology consultingMaterialization (paranormal)Complex analysisArithmetic meanView (database)SoftwareSemiconductor memoryLaptopQuery languageDebuggerMultiplication signStructural loadMathematical analysisHierarchyRow (database)Workstation <Musikinstrument>Operator (mathematics)CASE <Informatik>CausalityState of matterFreewareNichtlineares GleichungssystemInheritance (object-oriented programming)System callNoise (electronics)ResultantMatching (graph theory)Position operatorHypermediaProcess (computing)2 (number)Programmer (hardware)Web pagePoint (geometry)Caustic (optics)Computer animation
01:26:02
Business IntelligenceInformationData managementEnterprise architectureAsynchronous Transfer ModeDecision theoryVideo gameFood energyTelecommunicationPlastikkarteE-learningMaxima and minimaDrum memoryMetreRight angleEscape characterSampling (statistics)Basis <Mathematik>Food energyPower (physics)Row (database)CASE <Informatik>System callNeuroinformatikMultiplication signMathematicsDiagramComputer animationUMLProgram flowchart
01:28:04
PlastikkarteMetreFlock (web browser)Polygon meshLattice (order)Digital signal processingStatisticsWave packetPower (physics)Data warehouseRadio-frequency identificationServer (computing)Local ringDefault (computer science)Physical lawObservational studyHypermediaState of matterSocial classCASE <Informatik>DemosceneComputer animationDiagram
01:29:24
PlastikkarteContext awarenessDigital signal processingAsynchronous Transfer ModeAiry functionMetrePhysical lawData managementBusiness IntelligenceInformationEnterprise architectureMobile WebState of matterMiniDiscSolid geometryAxiom of choiceSelf-organizationAerodynamicsDisintegrationSoftware developerProgrammer (hardware)SummierbarkeitLocal GroupSineTowerSlide ruleCuboidSystem callMiniDiscVirtual machineTelecommunicationLevel (video gaming)AreaComputer programmingStudent's t-testUniverse (mathematics)Self-organizationDatabaseInsertion lossData managementData warehouseInformation managementNormal-form gameOperating systemRelational databaseInformation technology consultingContent (media)Personal digital assistantMereologyComputer hardwareOffice suiteProcess (computing)Direction (geometry)Reading (process)Vector potentialDegree (graph theory)Task (computing)Projective planeStrategy gamePhysical systemChemical equationBranch (computer science)Rule of inferenceDifferent (Kate Ryan album)CASE <Informatik>BitWordAxiom of choiceInformationSoftware developerInformation securityStatement (computer science)Integrated development environmentSource codeSoftware testingStandard deviationType theoryIndependence (probability theory)Natural numberInfinite conjugacy class propertyGoodness of fitEvent horizonThermal expansionSingle-precision floating-point formatMassPresentation of a groupFile viewerNatural languageStability theoryNetwork topologyMeasurementHypermediaCircleSemiconductor memoryDialectCoefficient of determinationScaling (geometry)Field (computer science)Remote procedure callMetropolitan area networkMultiplication signWebsiteMatching (graph theory)5 (number)Musical ensembleInternetworkingClosed setState of matterContext awarenessData storage deviceGroup actionPoint (geometry)INTEGRALMaxima and minimaOSI modelSound effectRight angleQuicksortMathematical singularityApproximationExpressionWritingPhase transitionNeuroinformatikCodeMappingShift operatorCovering spaceEndliche ModelltheorieMathematicsPolygon meshOperator (mathematics)Product (business)TrailArithmetic meanPressureComputer animation
Transcript: English(auto-generated)
00:01
Welcome everyone to the lecture of data warehousing and data mining. This is our last installment, so next week there will be notes of the lecture. And for the last lecture I thought I would bring a little bit of practice into the theory.
00:21
And I invited Toma Vohinske, who is the German chair of the Ad Astra company, which is one of the great companies for data cleaning and data warehousing issues, working with a lot of big customers like Volkswagen and stuff.
00:40
And I'm very excited to hear a little bit about the practical implications of data warehousing and how it is actually used and where the real problems in the field are. So welcome Toma Vohinske. Okay, thank you very much. I'm talking loud enough.
01:01
Okay, so, some words about me. I got my education at the Technical University of Sofia in Bulgaria, and this was 1994. How many years? I don't count too much.
01:22
But my experience, the first years as usual, I was, I work as a programmer, database developer, in the past 12 years, mainly as BI data warehouse consultant, business analyst, of course, project manager,
01:41
and maybe for the last five years, I've been the local manager or the manager here for Germany for the last time. So my experience is mainly in the banking sector and so to motive, but also insurance, aircraft production here in Germany, and some other industries.
02:01
Two words about Adastra. Our focus is data management and data information management. And under this terminology, we understand, of course, data integrations, data warehousing, business intelligence, and also master data management, data quality.
02:21
Of course, we're doing also application development. I'm seeing this, only this light is, some of the words are in German. Currently, we are over 700 employees all over the world. Okay, the number maybe sounds not so great, impressive, but compared to the other big consulting companies,
02:43
and if you only see only this area, data management, also the other big names does not have worldwide more than 1,000 specialists in this area. So our headquarters in Canada, Toronto, we have big offices in Eastern Europe, like the Czech Republic, Slovak,
03:02
they're working mainly for the local customers. We have also offices in Germany here and Great Britain, but also in Bulgaria, there is our near offshore center. Here you see again the offices in the rectangles,
03:23
but also in the countries where our consultants are doing business, and usually these are, we are working for customers, also in the West world, but also in the Eastern part of the world where our Western companies does make business.
03:40
We have also another, within Adasti group, there is another entities like Takamis, pure software development company, specialize in data quality, mass data management. We have a business consulting parts, they specialize in the area of the banking sector,
04:03
credit scoring, credit risk issues, money laundering, fraud detection, and Reporters is a specialized entity for business performance management with strong expertise with cognos, cognos planning. So I will start with some case study.
04:23
Actually, this is the main, what I suppose you would like to hear. I heard that you know a lot about data warehousing, the database modeling techniques, like a star, snowflake, schemata, you know the reference architecture,
04:42
what are the layers, and one typical data warehouse. If you have any questions, and think that something is not clear, just ask her, do not hesitate. So I will talk firstly about one big, very big data warehouse system at one of the biggest German banks.
05:04
This is a huge project, maybe one of the biggest data warehouses in Germany with the technology which used, somebody says maybe this is even the biggest, so with this technologies all over the world. The project was initiated somewhere in 2000,
05:21
so in the year of 2000, the first development in 2012 and first production releases in 2003. The business cases, it's about create risk, Basel II, and regulatory reportings. Within these banks, the user group of these data warehouses
05:42
come from the internal risk reporting, external risk, or the people who are reporting to the regulatory institutions, like in Germany this is the so-called buffing or the central bank of Germany. There are different users from the retail banking, different users from the investment banking,
06:03
internal audit of course, and external auditors. So this is first some initial architecture and it's simplified. So as usual for each data warehouse, there are a lot of source systems
06:23
delivering data usually in different formats. Other sources in here are using different systems, some are using relational databases, some of them are applications on the mainframe of newly developed Java-based and C-based applications.
06:42
But the first rule was each source system deliver as a flat files. That means the data warehouse should not care about whether the source system is on Oracle or Db2 or mainframe. There was a clearly defined standard for delivering flat files.
07:00
So there are two type of stages because there was different requirements or challenges. For example, this part of sources coming from the main banking unit that means there's a big bank acting as a retail banking
07:21
in Germany, but this bank is also the head office for another legal entities like the investment banking with most big part of the business in UK, for example, or other small banks all over the Europe. And what was interesting for the, from the center bank, the source system deliver
07:41
just raw data without any calculated measures. And that's why they should be within the data warehouse, they should be on engine calculating the first measures in the area of create risk. And from the other side, the other legal entities because they are also standalone entities and they also should report to the local central banks.
08:03
They delivered also the same type of information but proves already pre-calculated measures or facts. So typically in one data warehouse or what is in the theory, that's all of the calculations are in the central data warehouse,
08:22
but in a very complex environment, when the business departments are huge departments and maybe there's one department only calculating one part of the risk measures, there is another department calculating
08:41
another part of the risk measures. That means these departments have their own IT people who are very close to the guys who define the logic. For example, this is very complex logic how to calculate, for example, expected loss from one credit or how to calculate the risk-weighted assets.
09:02
So it means coming this role data from the sources and sources means credit accounts, saving accounts, credit cards from the investing bank is total other type of the business entities.
09:21
So maybe there is first calculations here within the data warehouse and after that the data are sent to this business departments where they really make the complex calculation and they could calibrate this, make a lot of front end assets, having this calculated to the right end there, pushing back to the data warehouse or pushing to the next layer
09:42
where also further calculations are done. And so everything is getting back to the central data warehouse and of course, depending on the user's group, there are different data marts and what usually happen in one big bank, somebody in the top management
10:00
is receiving regular reports and they are receiving one report from the internal risk reporting and other reports from the totally independent department for the external reporting, but they are reporting the same numbers. They are reporting, for example, what is the total risk exposure for the bank, for example, divided to the private customers, business customer,
10:21
and suddenly becoming this different reports and supposedly it was supposed that they should deliver the same measures, but the big managers say, the internal reporting reports 10 billions of euro, the other reporting 12 billions of euro, where is the true?
10:40
And the reality was that this all different department, they had their own data warehouses, they calculated in their own ways, almost the same facts. And this is what's the typical, maybe you have learned this silos approach, that means each departments implement their own application.
11:01
So that means usually in our time, often happens data mass consolidation because the top manager says, guys, we could not go in this way, our number should be equal, no matter whether we report for our internal controllers, whether we report for the external authorities.
11:20
So of course, the reality is a little bit more complex. So it's, during the time also the IT practitioners, the business people said, okay, the pressure is big, they're coming constantly new and new requirements or the big bank was in the year of 2008,
11:41
that means all banks in Germany should start it in 2008, reporting to the new Basel II regulations. And somewhere in 2007 or early 2007, they said, ooh, but our original architecture is somehow not prepared. Suddenly, somebody say,
12:01
okay, we are receiving the same type of data, as for example, the same credit accounts, but from very different sources, since before we start with the calculation, we should make first initial consolidation to put everything on one table, but somebody said, okay,
12:21
let's make a new risk and financial consolidation layer, that means this is typical stage area where you have one entity from each delivery object, that means no matter whether you receive from five sources, since credit accounts data, you put all five delivery files in five tables,
12:42
but before you start calculating, you should consolidate this data and really make one entity to keep this simple. So that means somewhere one year before the big bank was introduced a new layer. Suddenly, the people who started to design here
13:00
the data maps for the statutory reporting, for the internal reporting, they realized the feeding of these data maps became totally complex, that means here you have not consolidated this layer, that means, again, he's also consolidated, but the view is more
13:22
what is the work in the source systems. Here you have also need the consolidation, but the view is already what is need for the analysis. And having first delivering raw data after that many other engines delivering result data, and this is a, let me say parallel tables with the same granularity,
13:42
but each engine is delivering just new facts and it's usually in one data warehouse. By default, by rules, the people are using inserts but not updates, that means it will be too expensive with millions of records just to get the facts, the calculated facts into update
14:00
the already existing tables. That means they started, we must load the data very fast and that's why, for example, let me say credit exposures, we have suddenly five versions of credit exposures, the same granularity, the same volume of data, but each table has some additional fields and measures.
14:21
Somebody said, okay, we need also one additional layer. Let's first consolidate this data, chaotic data, and after that, we defeat the data methods, appear the new layer. So this is the reality in one big complex data warehouse.
14:42
Let's go further about this. So what kind of technologies are used in this data warehouse? The first place for the data integration, this is Informatica Power Center. I don't know whether you know the name,
15:01
but this is one of the leading vendor when it's about ETL or data integration software. As a database, it is used IBM DB2 universal database, very powerful products, not so widely used like the others,
15:21
like Oracle, for example, the market leader is Oracle. For the business analysis was used three BI tools, and this is some other part of the reality. On the first place was business objects, but after that came Cognos,
15:41
some other departments are used, yeah, technique. And this is part of the reality also in the big data warehousing system. You see, IBM DB2 database or Informatica, these are one of the market leaders, but what happens and what really cause
16:02
immense problems, so the database is not available. And this happens even within the hottest phase when the data should be delivered after two days and reported to the Bundesbank. And this is not only a reality here, but also in the real IT life,
16:21
that's maybe the biggest company all over the world. That means this is one of these BI tools, it's another reality in the real world. Usually the software vendors, they are very keen to sell. They have very good experience sales managers.
16:40
They come going to the customers and say, hey guys, we need these tools, we are the best. Of course, they are approaching the business departments, one business department, one business department. Bias, for example, business objects and other Cognos, third business department, Bias MicroStrategy. At some point in time, so everything should go to the IT and should be managed by IT people. And they say, okay,
17:01
now we should train people in all these tools. There are political issues, even maybe war sometimes, what tool should be used as a first tool. But on the other side,
17:21
there is also no perfect BI tool to say, okay, we could do all of the tasks, we could cover all of the requirements only with SAP business objects. All of these are really great tools, I could not say something negative. Of course, for test management, there was a special software from HP test director where the test manager could define the test plans
17:47
when during the test phase, the business departments could register the defects, which should be fixed. As a operating system, it is just IBM IEX, UNIX.
18:03
And of course, there are a lot of shell scripts or returns all over the data warehouse. That means it is not only informatic or business object or database, but there are a lot of shell scripts. Small part, there was small task for data quality management.
18:20
It was also implemented with informatic data quality. And of course, many different small or sometimes big modules was implemented with Java or C++. So in the reality, so for very complex calculations, maybe still the best solution is to develop application
18:41
with C++ or Java or C sharp today. So what were the challenges in this project? First of all, there are special regulatory requirements for the data management and development process.
19:04
That means there are requirements about the availability of the historical data. The general rule is what was reported officially by law should be frozen. That means if you see that two months after that, that there was some calculation problem
19:22
or some of the data was not correct, you could not just go and change the data and say, okay, now I have the right data. That means what is reported is frozen. And that's simply the other complications with the versioning. But the law says, okay, you reported some problematic data.
19:42
This should remain frozen because you know that, especially nowadays, many big bank managers, when the bank is getting bad conditions, maybe they could be sued. And the auditors go to the bank and start to analyze the raw data,
20:00
why this bank reported this, why this bank reported as the risk exposure was only 10 billions of euro when the risk exposure was 15 billions of euro. And that means everything should be by law frozen. But on the other side, I said, okay, now you know what, you change this calculation engine. Please start with this old data to recalculate
20:21
to see what could be the real, what should be the real exposure for this time in the past. So also the data processing must be auditable. So high level of formality. This is typical for the banks because this is one another company like automotive, auto producer or some other industry.
20:43
The only controlling authorities, maybe the top management or maybe the internal auditors. But in the bank, in the banking, also the laws, the auditors, they're looking how the developing processes are organized.
21:00
What is, how is the, how the data falls. I will come back to this also. And the documentation is very complex. That means this, typically these projects and actually this is maybe good for us as a consulting companies. The banks had the biggest budget for data warehousing system because they must invest in this area.
21:22
One, telecommunications or Volkswagen or Daimler, they're investing when they have enough profit, they start to invest in some analysis, optimizing the business, but when everything is going down, when there is a crisis, they say, okay, we could stop for two years. And so you're not investing,
21:40
but the banks, they must invest in depending from the, whether this is a crisis time or not crisis, there are regulations and they said, you must, if you want to be a bank, you should implement everything as it should be. So data historiations, data versioning, as I said, maybe you'll learn, so what is slowly changing dimension type two.
22:01
I don't know the MBA, yeah, also they should know. Historiation is typically in the customer table. The customers leave till yesterday in Frankfurt. Now he's living in Berlin. That means for the analysis purposes, this, the record for this customer is historicized.
22:21
That means you have now two versions, one with the old address and validate till yesterday and the new with starting with today. But what is versioning is in the old till yesterday, this customer was defined to live in Frankfurt, but it was false. In the reality, he was living in Berlin.
22:42
That means this is not historicization because the data was wrong. That means you should correct this. But as I said, the old reported data must be frozen. That means what is left, you could not just go and make updates. The wrong records sets a town to Berlin.
23:02
You just have to create a new record and this is a new version. And this is the implications, especially for the data mass for the star schemas. This is where maybe I could... So typically you have effect tables
23:21
with a lot of dimensions. And one of the dimensions, for example, is here is other customers. So it means what is another very important rule in the data warehouse works. The developer should avoid updates or deletes.
23:46
Everything should be done mainly with inserts because updates, deletes decide very expensive operations. Usually you should have activated indexes to make the right updates. That means you should first look for the right records
24:02
and update the records which should be updated. And this is about only one or two records. It's not a big problem, but in the data warehouse it's good. Your dimensions, customer dimension could be maybe, could have, let me say 10 millions of records. And this is so-called very large dimensions.
24:23
That means instead to implement some complex logic for the versioning and historyization to updates on a daily basis these dimensions, maybe the practice show that the best approach is always to have a full load.
24:40
That means you have your source data, you know now what are the right data, what are the false data and what is left to do is just to make the right filter. You filter in the sources in a load as a bulk load, for example, the right data. Of course, you create a new version and you don't delete the whole data.
25:02
High complexity, the complexity is not only that the logic is complex, but also that you have immense amount of source systems, almost the same systems. Sometimes for the same entity, let me say accounts, we have two or three types of different primary keys.
25:22
So how you could manage this? That means the real dimensional model is not exactly this. Of course, you introduce a surrogate keys, but your loading process are a little bit complex because you have depending on the sources and different lookup tables to help to find
25:41
whether you have already this record or not. High political pressure, of course, deadlines are fixed. That means the law says from 1st of January, 2008, you should report according to these new standards. That means there is no postponing.
26:01
Of course, also the internal pressure, the standing fights between IT, business always said, we need these changes tomorrow. The IT said we could deliver this in six months because we have release planning. And this makes, especially in such big project where the budgets are about over 100 millions of euro.
26:23
So you could imagine what is the pressure of the IT managers, what is the pressure of the business people and everybody's thinking that this pressure. So back to this issues,
26:40
challenges, strength, stability, auditability, ability to be auditable. For example, the typical question is, so when the auditors are coming, you have your loading, for example, the best case could be once a month, but typically it's now the frequencies
27:01
is going down to every day. Sometimes you have for the same day several deliveries because the first load just crash and because deleting the wrong data is expensive operation, you don't care that there is a wrong data or not consistent. You just want to restart the whole process and load the data again.
27:21
And some days you have for the same business, there are several versions. And when the auditors come, they would like to say, okay, what are the rights data with which you calculated the reported results. Or restartability, that means as I said, you've loads hundred tables,
27:41
most of them with hundreds of millions of records and the whole process taking maybe one day or something like this. And suddenly in the middle of the process, the system crashed. And what are you doing? So of course you have inconsistent data in the database. So it will be very, very complex issue,
28:00
a very costly way to go and to start deleting this data who are, which are not complete and to first make the database clean after that restarts. This will be very expensive. That means that people are just forgetting these data. They are staying in the database and you start the process again. And this have implications the old physical data model.
28:21
That means you must have metadata. You know what is metadata. This is data about the data. These data are not coming from the source systems. And so I must call on for each business entity table with load number and business day, of course. Of course, you introduce a new additional entities
28:43
like you define ones in several tables. These are so-called static metadata. That means they are defined once and they are at least table during the time like sources you define all of your sources, all of your targets tables, all of the jobs, for example,
29:01
loading credits accounts, loading securities, applications and dynamic metadata. These are the run statistics. What's exactly the jobs did. This is now an example of this tables.
29:21
The first tables is a table with a real business data. What is coming from the source system. For example, these are accounts, you have account ID, customer ID and balance. But for the purpose of the data warehouse, you add two additional columns, business day and load number.
29:41
And what you could see, as I said, for example, we started our job on the 21st of December to load this data for the last day of the year, but suddenly the system crashed and only the first record was loaded. Let me say, you don't see this second, the second and the third records.
30:02
That's why we have here a load number and in another table, we have this load number is the primary key and when the system crashed, of course, somebody should set this status to failed. We restart the system, don't care anymore that there is a such, this is now a dirty record.
30:21
We don't care about this. We start the whole system again. At the end, we have a clean data sets and these are our real two records and they have load number 16. That means our primary key for this table is now the account ID plus load number and maybe plus the business day. And you see here, load number status is okay
30:42
and loaded records rose to millions. And so this is an example, that means some other job is running with ID 17, this is in status running. So that means this makes the whole story a little bit more complex and this table is not only with these five columns.
31:05
In this case, in this bank, the table was with something like 30, 40 columns with totally different statistics. Also the other metadata tables like sources, targets, everything was very complex
31:23
and very complex to maintain during the developments. Because usually these are metadata, most of them, the static metadata, they are maintained manually. You have Excel sheets and okay, if you have enough time, you could develop some application. But through this metadata, you provides auditability
31:44
and you provide also a rewardability. That means if somebody came and say, okay, we have in our latest data map tables also this load number and we know that we reported exactly the data with the load number 16 and if two years later comes the auditors and say,
32:03
let's see what you reported. They could go start with the data maps and analyze exactly, go back and back to see what exactly the data used for these reports. And so again, for the reloadability way, the system crash, we just don't care that there is a dirty record we just started loading.
32:22
Of course, the modern databases, they have also means for deleting in a bulk mode. So that means some days this record will be deleted, but this will be not, maybe once a month,
32:41
there will be a job, another job independent from the whole process, starting and looking in this table, I have load number 15, this is failed. Nobody needs this data because this data was not reported. And the modern tables database like DB tool, like Oracle, they provides the operation truncate
33:00
or the DB tool. If you define the table like a cluster, that means you define that your cluster is built on the load number and business day. When you say delete load number 15, the database go and know that all of the 5 million records
33:20
with the load number 15 are safe in the same place and just remove the, define this space as a free. Nobody's making that physical there is a way to do this. Just from now on, this space is free and nobody's making, yeah.
33:50
So this is where the main issues, you work with huge amount of data, your load process are very complex. The whole process, that means the most important
34:03
reporting data are the data from the end, last day of the month. Usually at this bank, the whole calculation process took or is still taking about 12 days. That means now is 3rd of February.
34:22
They are running, now running the calculation for the 31st of January. And the whole process will be ready something like 10th or 12th working date, working date of in the following month. Do you understand what I am talking? Do I talk too fast or?
34:47
So what's, we have 20 minutes. Actually, this was all about this huge big data warehouse. As I said, it was implemented Informatica and so many people are saying this was one of the biggest
35:02
Informatica project worldwide that let me say with something like more than 1000 mappings. Mappings means this is the smallest piece of programming code for transforming data from one or more tables to another tables. So this is the lock in the programming case function.
35:26
The reality in the other companies, other type of companies like Volkswagen, Daimler, they also have such data warehouses. They don't have this strong political pressure. Also not this complexity and the communications
35:41
between the IT and the business department it's not so tense. Of course, when the IT people has arguments, say, okay, we could not deliver to determine this one. Let's make a smaller release and we could deliver this in the next release. So it's going everything easier.
36:07
So I will talk just give some words about data quality. Dust resources specialized in this area. We have our know-how methodology, how what we understand on the data quality management,
36:22
what should be done and to our customer. What is master data management? What is data governance? We combine these three words, our best words, three in one. So the term above this is data management.
36:41
But we have 20 minutes more.
37:01
So why do you need data quality management? And typically these issues arise in the data warehousing systems. Because in one operational systems, the particular operator who is dealing with the particular records,
37:23
if you see that some of the records, the data are wrong or something is missing, he could fix this on the way. That means some customers is coming in the bank office and doing something and suddenly the other, both of them realize something is wrong, they could update this.
37:40
It's not a huge issue. There is no need for some special measurements. But in the data warehousing world, people are analyzing. They are analyzing based on the geography or based on the region data, based on the sex male status,
38:01
based on whether this is female or male. Based on the part of the town, for example, when you calculate the risk, credit risk measures or credit scoring, it's important when the customer is getting to the bank and ask for a credit,
38:21
the credit employee should know whether this customer will pay the money back. And of course, in the modern world, there is so-called operational BI that means collecting the data from this customer named birth dates, marriage status,
38:43
where is he living. Automatically, the banking employee is getting a scoring call. The typical issue with such age, with the people who are living in this part of this big city is just the probability that this guy will not pay back the credit or you have problem is 20%.
39:02
And this is already big. That means this have impact on what will be the interest rates and what should be the amount of which the customer should bring himself. That means in order this whole analysis that is to be reliable, all of the data must have the good quality. That means it is also important
39:21
some small details, part of the town, marriage status, female or male. And typically, for the normal bank employee, maybe this is not important that when the people start to analyze to make data mining models,
39:41
it's very important to know what is the data quality. Are all of the fields really filled or some of them are null, so maybe these fields are not important for the operation of business they are in the application. Some fields, there is no constraints to check whether this is null or not null.
40:01
But for the people who make analysis important and where usually where the people starting with the data quality, first you have to know the data. And so what is usually missed in one big data warehouse is the profiling stage. That means you have a new delivery objects
40:22
or several delivery objects or you start to integrate the new source systems and the first job is to get the data learned. And we are doing this with profiling. That means we are starting as the first place understanding the data. Now after that, we are validating the data
40:41
and whether we could define some big data business rules or whether they're calling the business rules. So after data validation, you could define for your data warehouse some data quality rules,
41:02
how to clean the data, what should be done if some records, some fields are missing, whether you should use default failures. Whether if some of part of the source system does not deliver these fields, but these fields are delivered from the other systems and if you have to enrich this data, what should be the rules?
41:21
And so who is responsible for this data quality issue? So it's usually who is the data owner? That's in this way, we are coming to the data governance and in the data governance, the company should have established processes. Who is the owner of the information? Who should, what analyze,
41:41
what should be the steering committee because the most difficult is to say, we, to a source system, you deliver dirty data, please change this because we could not calculate. Of course, this is totally other department, totally other manager, they have their budget for the whole year, plan for something else and somebody now comes,
42:01
somebody from the business department, business analysis department, say your data are dirty, please change this. But if this system is 30 years old system, what is the reality? In Germany, in North America, you have the core banking systems still running on mainframes, developed 20, 30 years ago. There's nobody who could change this.
42:20
So if you could change this, this will be very, very costly. That means you have to implement this data called the cleansing in the data warehousing. Unfortunately, I have not brought, there was a very good video, what we showed one of our conferences,
42:41
but the case was such, it's such a small municipality in the United States. One of the employees in the municipality makes of error and estimated the cost of a house instead to be 500,000 euro.
43:03
Somehow she puts additional three notes at the end. And suddenly this house was 500 million US dollar. But this is a big town. So let me say the total amount of the houses, maybe several billions of US dollars
43:22
and nobody will just see this difference. That's suddenly some 5 million hundred US dollars came just suddenly. But this employee made this error. Nobody noticed this.
43:42
And suddenly say, okay, the prices, when they calculated the total price of the house in this area, they said, oh, the price is increased. We will get now additional taxes. And they made the planning. They made the planning because based on this price is increasing, said we will get now for the next year,
44:00
10 million US dollar more. They plan to make some infrastructure changes, to build a new kindergarten, to build, to renovate a new school. And suddenly the lady who is the owner of this house received a voice, a message to pay 1 million euros,
44:20
something like this US dollar tax. And this was crazy. She was started to sue the municipality. Of course it took maybe years. And what happened at the end, the municipality grant money, which they will never get because this lady with a house costing 500 US dollar,
44:40
she could not pay 1 million taxes. But this was a disaster. And this disaster, how you could, for this reason, you have data quality measures, especially for the system. Before you start to calculate, there are special monitoring approaches. You always, you receive the daily data.
45:00
And if you see some pics, suddenly that's yesterday was the total amount. Let me say 10 billion US dollar. Today it's 11. And you received a very small deviation in the past year. So always something like change on a daily basis, about 10, 200,000 US dollars,
45:21
suddenly a big amount. Somebody should be, this should be seen somehow. That's why there is a special reports, how to monitor this. And when the systems see this, the corresponding people should see this in reporting.
45:41
The first question is to ask, just is this true? Is it possible? And the right people should check this and say, oh, this is error. But this is the typical data quality problem. And I could give also, these are typical results from the data profiling. That means when you start to analyze,
46:02
a typical data profiling tool is giving some statistic of one, let me say this one table, with all this one field from one table, it's of type D voucher. Maybe in the next example is more. That means you get statistic,
46:21
this field of type double. What are the values of this field? Are they nulls? How many are they? What are the typical values of, what are the average statistics? What are the variances?
46:41
And sometimes you get very interesting information. You analyze a field, which is for the nulls is very important. And suddenly you see, for example, male or female, you see from 5 million customers, 5% males, 5% females, the rest now. So that means you could do nothing with this data.
47:00
But you realize this in the early stage and start to do something. Either go to the source systems and request a change. This is how we do data profiling. That means when you start doing with new data source,
47:23
get production data, manage security issues. This means if these are customer data, usually the IT people are not allowed to know the real names. It could happen that within this customer, they are employees of the bank. And the laws you are not allowed just to have access to this data.
47:42
That means the first step is always anonymization. And to run the profile, it could be with a tool and it could be also with simple SQL statements. But nowadays they are very good tools. You analyze what identify out layers and get the stewards, data stewards,
48:02
which are usually from the business department to validate this information to say, this is an error. We could do is do. We could define the new master data or you could define a rule, how to update or how to define a default value. Okay, for female or male, you could, it will be difficult to define a default value.
48:29
So I will skip. So what is left? I could talk a little bit about how the projects are organized. So what, maybe what are the roles there?
48:42
We have roles or how the projects are. So system development life cycle, you'll see what are the phases in one project. So of course there's an initial phase. Somebody says, oh, we need this analysis.
49:03
So, so far we have not make customer segmentations because of the marking. The first process, the customer should be segmented to know how many young peoples, how many old peoples, how many are living in North Germany, how many South Germany.
49:20
Let's start such a project. Of course, the next step is high-level requirements analysis, data analysis and profiling. What are the role data? Role data, what is the reality actually? Could we make this analysis at all? Or we should first start changing the source systems. Data modeling and so on. So we could go quickly through these different phases.
49:46
So during the initiation planning, there is this stage mainly deliverable with some roof high-level project plan with some, of course, some status report, but there are a lot of meetings
50:01
usually in the higher-level managers, they actually thank where we have budget for this. At this stage, there is some cost estimation. Of course, the next phase is some detail requirements analysis and somebody said, okay, we could do this, let's do this.
50:21
We will plan the budget. Let's start with the real analysis, whether we, what we need, what should be the requirements. And here coming the, this already, the previous steps was more for managers, for business department people and data analysis are starting to,
50:42
the mixture between IT and business people who could make, for example, data profiling. If you have a good profiling tool, maybe this could be used from somebody from the business department. But if you don't have, you need somebody who knows SQL, who could go to the real role data and say,
51:00
the data's out. So we have all of the necessary data. Data modeling, I don't think behind the data warehouse works during the data modeling queue first creates the logical model and this is the star or snowflake, but the real physical models is totally different thing.
51:22
The data modeling is also make from the people who are between the business department and ideas. So when you have your models defined, that means you know what will be analyzed, you know what will be the information, you know what will be your target models,
51:42
you know what is your source models and you could define what should be the calculations between both of them. And usually the people in the data warehousing who are there defining mappings. For example, you have a target fields or target table, you have source table or tables and this should be the calculations
52:02
and the best way is to do this in a table way. And so usually it is recommended to use some seldom languages and understandable for the business people, but also understandable for the IT peoples. A real total, a real only text is not working.
52:23
So of course, if it is a totally new project, you need somebody to define what should be the architecture. Do you need first high level architecture? What kind of layers do you need? How many layers, of course, you start with as simple as possible.
52:41
What could be the database? What is the tradition in this company? Is there some reference or the running projects? Could we reuse this? Is the typical question is and what is recommended? First try to see whether you could reuse something. Somewhere there is already somebody made the data warehouse. For example, you are in the marketing department, but the risk people already have something.
53:02
Could you reuse this? This is the first question. And this is in the part during the solution architecture. Okay, the IT people, they prefer to make all this everything new from scratch, but the business people say, okay, we already paid for similar stuff.
53:22
Please go and to look what the other department is doing and check whether it is reusable or not. You have the help of process design developers development data integration.
53:41
Maybe this is one of the big parts in the data warehouse is development. And usually the biggest part is the backend development or the data integration process. So, and this takes a lot of time. The people are engaged in the almost
54:01
during the whole life cycle phases. That means, okay, in the early stage, you start with design and analyze, but you always start with prototypes. You always start with to check whether what you have as idea, could whether it will work at all. And the developers should stay till the very end.
54:23
So because there's test phases, they should be ready to fix if something is necessary to fix it. What is usually happening in the big companies when they are ready with one development phase, they start already working on the next release in a big project. And usually you have separate,
54:41
it's a backend developers and front-end developers or the reporting and usually this part is relatively small. As I said, the big part is the backend, the development of the data integration process. In the previous part, so usually in the most of the cases,
55:02
this is nowadays is with ETL tool, like Informatica or data stage from IBM or there's a data integrated from business subject, Oracle has also tools, but there are also many projects and where the ETL is done with SQL scripts
55:20
and these are many huge projects. The reporting development, usually you have for reporting tools like business objects or micro strategy and you also have two layers. The first layer is, for example, business subject is a so-called university. So you define some meta data layer,
55:42
that means you define, you know what are the business requirements, you know what are the physical models and you define this in this meta data layer. And usually the typical IT people just create several complex reports or initial test reports and after that, the real reporting development,
56:00
these are done by the business departments. That's why they are using BI tools. So quality assurance is especially in the finance sector is a very important part and very strictly controlled by the authorities. Usually this starts from the very beginning. You have a standard how to define a high level design.
56:23
Usually they are also pushed to use so-called CMMI approaches where it's defined for each phase of the project, what should be the documentation. So of course it's very costly, but it's a must. You have standards for data modeling design.
56:44
You check this and the data quality assurance team is checking whether the standards are confirmed. That means during the quality assurance phase or in this phase, on one side, you control the quality of the whole process and there is another part, just testing the results.
57:03
And the testing of the results is also starting during the development phase, during the design phase. At this time, the business and IT agreed what should be the test plans for the products. During the development phase, you have in parallel the developers
57:22
but in the quality assurance team is working also in parallel and is generating usually test data. In the most case, this artificial data based on the business rules. And during the tests, of course, they are working closely with the developers
57:41
and the business department during the test phases, the test people just loading the test data in the database and the process are started and they are analyzing, in most case, automatically whether the results are matching the test. So what should be the results? And what is the expected results?
58:06
So integration tests, you have at some point of time, you should test whether your whole system could work. That means whether all the versions of Informatica is confirmed with the current version of the database,
58:24
whether the UNIX machine could or the database could cope with the amount of the data. And during this integration test, you try to really to load the system as much as possible and mostly it's the best with real data.
58:42
And so at some point of time, there's also user acceptance test where either based on the artificial data or part of the real data, the end users also defined what should be the expectation.
59:00
Either they say, okay, we accept this model or we do not accept it. That's why it's very important how to plan the whole phases that you should have a really confirmed time. If you see something is not really developed, you should have time to fix this, to deploy again, to install again.
59:22
Deployment, this is also a very important part. So usually maybe in some small project neglected. That means in a small projects you say, okay, we developed, we have some test machine. Now everything is okay. Okay, let's start. So just copying all the source code to the production machine
59:40
and what usually happen, you just forget something. So then you test it. The whole system but on the test machine, you start to work this in production. So then it does not work because somebody just forget to copy something from the test to the production machine. That's why the deployment is also, let me say also,
01:00:00
mission critical parts and they are, these are usually independent teams. They are trained especially how to control first the deployment on the test machine and they say, people, what you deploy in the test machine, this should be the same package, you give us this package and now that we should deploy the whole package to the production machine.
01:00:23
So project teams, maybe this is the interesting part also for you and what I have from my experience is running interviews, especially with university,
01:00:43
fresh people coming fresh from the university. Of course, you have a project manager, you want big projects. When I'm talking about projects, I forget to say the consulting company usually work for project for their customers.
01:01:02
We are not the software company that's to have our production life cycle. We are working at customer sites in the most of the case, of course, the offshore or nearshore parts, but usually in the big companies, the projects are from the customers. They have the budgets, they build a team and you have usually a project manager.
01:01:22
The project manager is responsible for the project that's first, it is planned in the right way that's the budget is provided. That means they should go to the other managers or the business department say we need for this budget and to try to convince them why this is so.
01:01:40
Of course, he should control the whole team, but he's very important the communication skills and also the skills to take responsibility. So also to be accountable, that means for such people is very important to go and fairly to say to the top manager, to the business guy, people we have a problem.
01:02:02
So this is sometimes difficult parts. It's difficult now sometimes to say, you see that the team is behind the agenda, but somehow the managers are afraid so to say, yeah, we are behind the agenda, let's see what we could do.
01:02:21
Of course, these are special requirements for the soft skills for the project manager. Typically, this person should be open in communications. And when there is a problem in the early stage to say, maybe we will not keep this deadline. Let's see what we could do. So business analysts, these are people who stay
01:02:40
between the business departments and the IT. That's kind of means the people who should really be able to run interviews to ask the right questions, but also describe the information in the way, which is good for the business departments to say that, because they should confirm that these are the requirements
01:03:02
but also for the people that they could understand what should be done. So of course, the requirements here are very also good communications, but good analytical thinking. Data architect, usually these are experienced
01:03:21
IT practitioners. They of course talk a lot with the business department, with the source distance to define what the real data, what should be the data models. They should know the different types of data modeling, normalized, denormalized, snowflake, star schema.
01:03:45
So solution architects, as I said, this is the research phase, also there is such people who should really define the system as a whole, like layers, also what could be the technologies since this should be
01:04:02
somebody also could be who is experienced. When we talk about project team, this does not mean that you should have a physical persons for each of this role. Usually what happens, you have one physical person covering two or three of the roles. For example, somebody could be solution architect,
01:04:23
but he also could be the, for example, on the next stage, the data integration designer, the lead data integration developer, because on the other side, the big part of the data warehouse is the data integration. So you have a lot of data integration developers who really are doing with the corresponding ETL tool,
01:04:45
usually by default in the data warehouse works, this should be very good database specialist, people who really understand to know what this is, how to analyze very quickly the data and know the tools and the concepts, the ETL concepts.
01:05:02
Core analysts or testers, people who really should very quickly understand what are the business cases, what are the test plans to define the right test data covering all of the cases. For example, you should have positive, negative tests. For example, you define a test data
01:05:21
where there is a rule that some field should be filled and you generate this data where this field is not filled and to see how the system will react. So this is also some special parts. You need also special trainings and skills. DBAs, these are people who are more technical.
01:05:41
They should have very good knowledge about the particular database, whether this is DB2 or Oracle, how to create, usually they are maintaining the DDL scripts. They also defining the physical data model.
01:06:01
For example, they define the index policies where they should be used materialized views for performance optimization. What should be done in the database? If you see all people, you insert in this table a lot of data, but we have index on this and we're causing the same times when you load,
01:06:21
somebody's reading from this table, what should be the solution? It's also a very important part of the team. Repoles developer, these are people also must have good communication skills because they are also dealing with the people who are working in the business departments
01:06:42
and develop the complex reports and should have somehow some understanding what is the business about. So now I will talk some,
01:07:02
was for example, we at Astra as a search company also needs people, smart people. We are recruiting a lot of junior people, so people who are coming from the universities. Usually we work with people.
01:07:26
So what we provide and what's the typical in Germany, what is the typical intentions for our young people is to start in one company to stay till end of the,
01:07:41
for the lifetime there. Maybe this is not anymore the case with you, but at least it was for several years ago, everybody's looking for lifetime employment. So, but nowadays it's not important to have a lifetime employment because you could start in a big company and after three years,
01:08:01
they say we move half of the company to India. So you are on the streets. That means the most important to have a lifetime employability. That means you start somewhere and important is what are you learning there? What you, you could say is this words, could you apply this independent
01:08:20
whether you're still working in this company or what is the industry? And actually this is what we provide. The typical people are coming to us, they say, okay, I want to work with the top technologies really to work something what is independent from the crisis.
01:08:42
And yeah, if you need to also look for fantastic organizational culture, we are in a truly a mean of the words, multicultural company. First stock is a big part of the company is working in Canada. The Canada is typically immigrants countries.
01:09:01
We have people, not only they're also in Germany from all over the world, all parts of the world, Africa, Asia, Far East, Eastern Europe, but also South America, Europe, North America.
01:09:26
I don't know how much time do we have now. Let's talk very quickly about what are the trends. You could see what are the issues
01:09:41
so with which the CIOs nowadays are dealing. For example, from the business perspective, of course, from the first places, process improvement and cost optimizations on the IT side, so for the technical stuffs, okay, there are some busy words now like visualization,
01:10:01
cloud computing, web source, social networks, but you have also as a constant or this cloud computing is maybe not older than two or three years, but you have constantly business intelligence, now also mobile technologies. So and what we see this business in danger is the relative constant on this top high priority issues
01:10:22
of the top IT managers. And so our business is really spread all over the old businesses.
01:10:41
Of course, the people, the companies with the highest budget, these are banking since specially they've made some investment banking, now they're making the highest profits, but they're now maybe the companies who really invest in the cutting edge computational technologies. For example, you maybe you have heard
01:11:01
so the American private investment companies, so they were playing on the stock exchange all over the world, but they're not playing with the real persons, they're playing with computer systems. That means they're now buying the most expensive computers, they're investing in the most expensive software.
01:11:23
And this software should really analyze in a real time what are the statistics from the past, maybe from the past years and to predict what will be the development of a price in the next seconds and to buy or not buy a stock. But however, all of the banking sector,
01:11:40
they will continue to invest. So there was a big crisis, of course, there are new laws and this is welcome for us, for the IT people. The insurance also, they have a lot of regulations which should confirm telecommunication that typical users of data warehouse,
01:12:02
usually they have the biggest amount of data because if they collect the data on the call level, that means you imagine just in Germany, within one day, maybe there are several hundred millions of calls and you have all the statistic data and you have to make analysis where it happens.
01:12:22
The same is also with the retail companies like Metro or revenue in Germany or Walmart in US, they also have data warehouse with huge amount of data. Typically, telecommunications retail, their data warehouses are not so complex. The data are also not so bright like the banking,
01:12:42
but they are really dealing with huge amount of data. In the other sectors, for example, to motivate the main users of data warehouses, these are supply chain, for example, how we could optimize the supply chain of a global company.
01:13:00
For example, Volkswagen, they have in a year something like over 100 billion of euros revenue. From this 120 billion euros, 80, 90 billion euros are going to suppliers all over the world, hundreds of thousands
01:13:20
with billions of parts all over the world. So the question is how you could control this, how could you optimize this? And how you could say if some purchaser negotiated, whether if the company is making savings, whether this is, for example, because the purchaser was very good
01:13:41
and negotiated good prices, or maybe the prices on the market went down, and once the company start controlling and calculating exactly this, what are doing their purchases all over the world? What is the scoring, what is the quality of the suppliers?
01:14:05
Nowadays, in the past, the typical traditional users of business intelligence was usually the top managers, or some small group of analysts. But these users groups are going down and down.
01:14:21
Now we are talking about operational BI. As I said previously, one small employee, maybe he is not highly educated, but when he have to give a credit, he should get automatically the credit score for this type of customers, and this is calculated somewhere in a data warehouse, and he's sent back to the operational systems,
01:14:40
and this is so-called closed loop. So on multidisciplinary, this is our business, because this is not only, most of the job is done by IT people, but this is not just programming, this is not just management. This is business analysis,
01:15:00
quality assurance, data quality, front-end development, data integration. Yeah, so you see all these terms appeared or came in the past years, and typically the vendors are trying to play with this now.
01:15:24
Now it's the buzzwords, mobile BI, or operational BI, and each year are coming and coming new such words, and typically for you, the question is where should you go and which direction?
01:15:41
So it's not so complex as it seems to be. Well, leading-cage technologies, let me say, let's go to another part. Let's say some words about,
01:16:05
you know, what is star schemata. I suppose you know what is also roll-up or relational, or mollup, but what is attendance? Typically, the traditional data warehouse,
01:16:22
or data multi with star schemata develop. So what usually happens, this fact table is growing and growing, and some days it's rich. Let me say maybe one,
01:16:42
okay, that was one billion records. The dimensions, the typical dimensions, okay, you could have dimensions, accounts, types, or factories, this could be 100 or 300,
01:17:03
but you have also customers, and this, the customers also could be also 100 million records. And the traditional data maps, so it's still broadly applied based on the relational databases and implemented with star schemata,
01:17:21
whether it's Oracle or Db2, and the typical problems are the performance problems. So this is what the business users mostly complain for. You know, the reports are too slowly, and of course, the database producers started to invest in the optimizing the software.
01:17:41
For example, Oracle introduced different type of indexes, so-called bitmap indexes. They introduced so-called materialized views. Do you know what is materialized view?
01:18:01
Everybody? So that means you have your dimension. This is a fact table. Let me say 1 billion records. You have dimensions, customers, regions, and date, time, customers, 100 million records, but the typical analysis is, for example,
01:18:23
the people who analyze, they don't care about to make analysis on the particular persons. Usually you have here some hierarchy in the customer, for example, towns, regions, or another classification, male or female. They want to analyze, let's see what we have sold,
01:18:43
for example, in the region of North Germany. That means if you have denormalized dimensional, this is the typical star schema, you have here the hierarchy.
01:19:02
Let me say you have towns, you have regions, you have regions within the country, we have continents, and you have also another business regions all over the world. So the typical BI tool will generate, for example, if you have to analyze sold amounts
01:19:23
in North or South Germany, you'll generate the select statement, you'll join these huge tables. That means you'll join 1 billion records with 10 million, so 100 million, and this view will be very slowly, because these are your physical data.
01:19:41
Although actually what you need is just two type of data, whether this is North or South Germany. And in this way, what we could do usually, in the previous time, you go and define
01:20:02
some aggregated table. Facts on regions, that means you have now here, let me say, 5 million records, but not 1 billion. And what you do, you say, okay guys,
01:20:23
now you have one additional table, you could start and send your report directly to this table, since it will be very performant. Okay, and the next day there is another requirements, another aggregation time, the database goes and create a new aggregated table, and so on and so on. It's going bigger and bigger.
01:20:41
And sometimes the end user say, but now we have here 100 tables, so we could not manage this. So what did Oracle also do? They say, okay, there is this aggregation here. We will make the database so intelligent, or they made the database so intelligent, and it turns out that when the front end tool
01:21:03
sent the select statement on this fact table and this dimension table, making group by, the database is saying, oh, but I have already aggregated data. Why should I group again and calculate when I could go here directly here? And this is so-called query rewrite. That means the database intelligent
01:21:21
and rewrite the query. The query is going to this small table, getting the data back, because at the end, the users usually have on their, on the reports, maybe 10, 20 or 100 rows, not more. And this is milliseconds. And so this is the investment of the database.
01:21:42
What is the problem? So the complexity of the database become very high. That means you, even the, most of the database administrators, they don't know these features. Even the consultants of the software producer, there was not always up to date about these features. So that's why what's appeared,
01:22:04
I could open a new, maybe I could delete this. And so appeared the so-called mollabed databases,
01:22:22
where you have really tubes, where the data are really already saved in a pre-aggregate. It's a way, but this is,
01:22:42
this is now know how of the database itself. That means you don't need a DBA to analyze which kind of aggregates do you need, because it's also an issue. Should we have pre-aggregated five materials views or tens? That's why this mollabed tools,
01:23:01
the most important what they are bringing is there. The tool is the database intelligence, intelligent itself to define itself, what should we pre-aggregate within all dimensions and all groups and hierarchies. Of course, you have the means to control this, because this is this space on the other side. And so you have to load these cubes and this,
01:23:21
to load and to calculate all the aggregations, we also take time. That means you could control this. And so typical software vendors in this area, Oracle, SBase is a typical multi-dimensional database, IBM, bank, Cognos, Cognos both another company,
01:23:40
which is producing so-called TM1 database is also truly a multi-dimensional database. But as I said, this also have such costs, you have additional knowledge. That means now you must have a relational DBS.
01:24:03
Now you should have for your data warehouse, somebody who also understand this technology and you should also calculate that the calculation time should be also immense. Loading this cube, it could also take maybe days. Another tendency is so-called in-memory BI.
01:24:23
That means coming again from this performance, ensure that the reports are too slow. There are tools, analysis tools that they take the data and the working commit in the arm.
01:24:40
And for example, if this is a privacy, we saw 1 billion records with dimensions and 10 million, 100 million customers. Nowadays, the memory is very cheap. That means my notebook has five gigabytes of RAM.
01:25:00
You could buy easily notebooks with eight gigabytes of RAM, one working station you could have it. 16, it costs maybe how much? 2,000, 3,000, 5,000, 3,000 is nothing. That means what they are doing during the analysis,
01:25:21
they load the whole data in the RAM memory and from this point, this is very, very fast. That means you don't have this very expensive, this success operation. Typical software producers, ClickView, this is a small company from Sweden, but it's very popular in the last time Volkswagen started to use this ClickView as a tool.
01:25:44
Of course, SAP Business Object announced that they will also go in this direction to have in-memory BI analysis. Well, but in typically, the reality is that the amount of data will grow and grow and grow.
01:26:02
Here, there's some example, you could make escape or, right.
01:26:37
Just, this is small sample.
01:26:46
For example, this is an example for counters of the power energy use. Typically, let me say you have one town, one million customers or households
01:27:01
and each has such counter and two, the companies are reading the data once a month and we have one billion records monthly. But now the technique is very advanced. Everything is happening wireless and the companies could get this data on an hourly basis.
01:27:22
And you usually could have not only on one such counter in the house, but you could have several. And the same example is you have one million customers, but if everybody has 10 devices in his house and the company's reading once each minute,
01:27:45
you could have monthly, how much over four billion is this? Records, yeah, 400 million records. That means you see what is, how the data is changing.
01:28:02
Here was another example. Yeah, this mostly from the, so now we have all over the world.
01:28:22
So the statistic is that in the next, for example, three years, the world will generate so much data, what was generating in the past 400 to 4,000 years, because it's not only this example with the power counters, power usage. You have now RFIDs when you buy a ticket,
01:28:45
for example, for the stadium and everybody knows who is getting in the stadium. Maybe you, the train tickets, you have such RFIDs and everybody will know where the person, and they are overall readers and these readers will generate statistical data,
01:29:03
statistical data. Okay, this will be collected, but some days somebody will say, let's optimize the business based on this data and this is huge amounts of data. And how the industry is addressing this issue, this is, so it's so-called data warehouse appliances.
01:29:32
The slide was in the, so it's the slide from the leading cash technologies,
01:29:41
but data warehouse appliances, we are coming back to the performance problem. So one of the tenders of the solutions could be more of database, it could be memory, but another solution is data warehouse appliances. That means you have a box and this is a database machine.
01:30:03
In this case, on the picture, this is Netezza, this is also producer of such databases. Another big producer is Teradata, this is the traditional, the most used database for such very, very large databases, usually the telcos and the retail companies.
01:30:20
But what they say, we are selling you a total solution. The hardware is specially designed for this database. Even Torco also has their solution called Exadata. They said the disks are so intelligent that the disk could resolve the work loss of the SQL query.
01:30:40
That means the disk itself sent only is making the on this level of the filter. So, and here, the tendencies, if you have such powerful machines and really the vendor saying, no matter how stupid SQL you are sending to the machine, the performance is always better. Then we have the first cases in our customers,
01:31:01
one customer says, but why do I need a data man? So I just, we will have a central data warehouse, everything third normal form. Why should I develop a data master start a snowflake schema? I have a very powerful database and no matter how stupid is a select statement, the performance will always be better.
01:31:23
So this is another tendency. So far in Germany, it's still not so widely used, except this traditional cases with Teradata in the telecommunications companies. But many companies just are checking this,
01:31:41
whether this could be a solutions, whether really they could save developing a data map or just having such a box, database box and having calls in it. I will finish my presentations.
01:32:05
So information management, be at the Dastra. First to go in this direction, information management is a great choice. The best choice to go to the Dastra of course. So we are very dynamic organizations.
01:32:22
You could really start getting your first experience with really cutting-edge technology in such great areas like data integration, like data quality, master data management or BI. And we have also a special program
01:32:40
even for students in the last year or just who finished its study, we are sending here in Germany, such people from the German university to Canada for to get their first experience in Canada and to deserve in areas where in Germany, we still do not have experience.
01:33:01
Actually, yeah. So also final, I'm getting in the wrong direction. Some advices. So what is my experience?
01:33:21
Running interviews with people like you. Even going to the university, I had such experience, somebody students in informatics, master in informatics, he said, oh, I don't want to program. So I don't want to make SQL. I don't want to make database. This is sounding too technical,
01:33:40
but I asked, what are you studying? What you finished in formatting? But what you should, you're afraid to do this. Or somebody else said, oh, I made during my study, a practice semester and I was the project lead because the professor gave me this role
01:34:01
and I did this very well and I want to be a project manager right away starting the job. This not so easy. You should not have, you should not be afraid of programming. You are young people. Now in Germany, I personally will go in pensions with 67 years.
01:34:22
So I'm now 42. You maybe will be allowed to go to pension maybe with 70 years. That means if you're now 25 years old, so end of 30, you have at least 40 years to get experience just to start, to try start right away,
01:34:40
right now to be a project manager and to fail and to be disappointed and to go in depression. It's not worth. Just start with programming. Just start with something technical. This is, you should be not ashamed doing this. You should not be ashamed doing database job or be ETL developer. You could collect two or three years.
01:35:01
If somebody has from you the skills to be a project manager, this should be recognized very quickly. And without the people I don't waiting, somebody to say, we want to be a project manager. He will just receive this task to be a leader. Of course, nobody, not everybody's have these skills.
01:35:20
So somebody's better to go deep in the technology and to be expert in the database area or the ETL or to be a technical consultant. Somebody going direction, business analyst. That means he's communicating very easy. You could understand the people what they need, could describe this very good way, but say I don't want to be manager
01:35:40
because I don't want to care with other people. So my advice is really, start doing simple and smart thing. That's just doing something technical, programming is really a lot of fun there. You could see what is the real world. You could see how your colleagues behave, what are the problems. You are one personality.
01:36:00
The colleagues beside you is totally different personality. Having this experience, you could be a very, a better manager knowing what is the reality. Just starting as a manager to be disappointed and I could give you such negative example. We had, even in this bank, we had suddenly a need to make some of our,
01:36:22
some one of our junior system leads. Or just helping the project manager to be team leader of part of the team. Suddenly the project was in a hot phase. The people should go make overtime, go work on the weekends.
01:36:40
And suddenly we had a consultant from our Canadian office and she made a big scandal, complained to the company owners that she is forced to work on Sundays. And from the same time, she is from some special Christian religion in Canada.
01:37:00
And in this region, it's not just allowed to work on Sunday. It's just for her, this is total, it's not just to work in some law. This is just her internal physical law. And she just complained and made a big scandal. And this young guy,
01:37:21
he could not manage with this because he said, yeah, we must work. We must work because we are behind the schedule. So, and there are such cases. Instead to go and be yourself disappointed, first collect the experience and see what is the reality. And having this experience, you could be a very good senior consultant.
01:37:42
You could redefine a good concept. For example, it's really illusion that you could come from the university and go make some strategy consulting. That means that you go to some top manager and say, you need to make your data warehouse now this, that means you need 5500 million zero budget.
01:38:01
And he will believe you. What is happening with such people, they're going and they're doing just the PowerPoints or writing stuff that there is somebody who is experienced, who is defined the contents. And you think you are making some strategy consulting, but what you're doing, you're just assistance writing the PowerPoint slides.
01:38:23
So this is the reality, the top manager they need is when it's about consulting, they need to read somebody experience who really know what is the real world, what could be the problems, what are the risks. So usually it's what's happened also
01:38:41
with junior not experienced people. So you become your first tax, okay, you are masters and you're supposed to be better. But typically this fear to disappoint the manager, that means, okay, you receive some tasks, this should be done in five days.
01:39:02
Five days, you'll say, okay, I could do this, the five days over, the manager is coming into you are not ready. So that means, although you're so already on the first day that you could not cope with this timetable. That means usually the typical rule is during the first 20% of the time,
01:39:22
you have all the feeling, are you going to cope or not? And this just easy go and communicate, guy, I think I could not keep this deadline. What could you do? Should you take another resource or somebody to help me or should we give the task to another who is more experienced? And this is what is expected, what is the right thing to do?
01:39:43
So that means you should be proactive. So just not wait, typically the junior, they're just waiting and somebody to ask, are you ready or not? Just go and say, I am ready, or I could not be ready in five days, but not wait to the fifth of the day. And to learn SQL, this is the typical,
01:40:02
what is luckily we have also people who studied here in branch bank and this is one of our best guys. But we have experience with people coming from another universities, they have their lectures
01:40:22
and secure databases, relational database. Since we have also a test, everybody must make this test. Since we saw people really, they could not understand the simple select statement and they could not differentiate with a bit of word having and these are really simple stuff.
01:40:41
And I could say really, from my personal experience, I was also in this situation. I also studied in university databases, some of our SQL. After that, made two years only programming C++, after decided to go to apply for a job for the company in Bulgaria, who was the Oracle specialist,
01:41:02
they gave me a test and I felt on this test because of the SQL, so I just did not make this effort to go and read simple stuff until it's just within several hours, you could go get, if you don't have books, you could go in internet,
01:41:21
there are thousands of examples and really learned it because the data warehouse is the business intelligence. This is the basics. If you don't know what is segregation, how you could your group, how you filter the groups, you are really lost there. Okay, of course, we don't employ people
01:41:41
who don't have the chance to learn. We of course, see what are the potential in this case. You know that this could be learned quickly, but this is really disappointing from somebody even with master degree to mix where having or to write where some balance greater than 1000 or 2000 is.
01:42:03
Just my recommendation, my advice. Okay, actually I am done. Yeah, we took more time. Hope it was interesting. Yeah, thank you very much.
01:42:45
I was wondering which of them had the worst data and which of them is the nicest? Actually, this is independent from the branch. The best data have maybe a middle-sized
01:43:01
insurance company or bank, which have their only one source system. For example, if the bank or the insurance company is not so big and you have only one SAP solution or it's our solution, it covers all of the cases, that mean you have really totally integrated environment.
01:43:21
You have only one source systems, you get all of the data from there. It's typical of the data clean data because if you even have problems, they could change very, very quickly. The dirty days of the promise happening in the big companies when they have hundreds of systems, hundreds of subsidiaries, independent subsidiaries and like companies like Volkswagen,
01:43:42
they are buying and buying, now it's Porsche, tomorrow maybe it will be something else. They're integrating the systems, but they could not make this at once. This is happening during the years and maybe if at all. And it's just normal, if you take the same type of data from different legal entities that they have really,
01:44:02
they have different standards and when they develop the system, they have different requirements and it's natural that you have dirty data there. But today you could not say, there is an industry where the data are the best. Of course, the banking are very strictly controlled. They must make this, they're investing in this,
01:44:21
in this such project in the data quality, master data management, but however, also this is not clean. That means you could not change the operational system. You do this in the data warehousing systems and the operational world is think not so good.