We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

From telemetry data to CSVs with Python, Spark and Azure Databricks

00:00

Formal Metadata

Title
From telemetry data to CSVs with Python, Spark and Azure Databricks
Title of Series
Number of Parts
115
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Tenova is an engineering company working alongside client-partners to design and develop innovative technologies and services that improve their business, creating solutions that help metals and mining companies to reduce costs, save energy, limit environmental impact and improve working conditions for their employees. In the context of Industry 4.0, Tenova provides each equipment with a field gateway, named Tenova Edge, to collect telemetry data, perform edge analytics with AI models and send data to the Tenova Platform (hosted on Microsoft Azure) for further elaborations. To develop analytics solutions, data scientists and process engineers need the data in a manageable format. Furthermore, continuous retraining of AI models is necessary to guarantee high performances and reliable results. For all of these reasons, we needed to implement an ETL solution to transform the raw data in formats ready for analysis and retraining. In particular, the key requirement was to convert the JSON Lines files coming from the field in CSV files ready to be used. The CSV files have to satisfy the following conditions: - each file contains the data for a device - only one file for device per day - each file has a midnight row containing for each cell the value recorded at midnight or the last value of the previous day (SPOILER: here it’s where the fun happens!) For this purpose, we have implemented a series of Databricks Notebooks, run daily by Azure DataFactory, that leveraging Pyspark and Pandas manipulates the raw JsonLines files in nicely formatted CSVs.
GoogolRWE DeaPresentation of a groupPoint (geometry)2 (number)Moment (mathematics)Computer fileFrame problemFunctional (mathematics)Level (video gaming)Arithmetic meanLaptopRow (database)ImplementationInformationInternet der DingeLogicVariable (mathematics)Computer architectureLine (geometry)Perfect groupMatching (graph theory)NeuroinformatikMultiplication signNetwork topologyElectric generatorAutomatic differentiationMereologyReduction of orderBitSeries (mathematics)Group actionTable (information)NumberPivot elementFigurate numberTimestampException handlingResultantTraffic reportingMultiplicationCore dumpBranch (computer science)Right angleControl flowSlide ruleData managementCartesian coordinate systemData structureSoftware testingProduct (business)Query languageFunction (mathematics)Software developerMathematical analysisFile formatTransformation (genetics)Integrated development environmentProcess (computing)Data analysisSampling (statistics)Key (cryptography)Endliche ModelltheorieVirtual machineMachine learningData transmissionPointer (computer programming)Complex analysisStapeldateiReading (process)Service (economics)Subject indexingDecimalMedical imagingCASE <Informatik>Uniqueness quantificationDatabaseCategory of beingRepository (publishing)Direction (geometry)Graphics tabletPhysical systemMaxima and minimaInformation privacyPacket Loss ConcealmentLoginInformation securitySequelExecution unitComputing platformFactory (trading post)Field (computer science)Limit (category theory)Context awarenessCondition numberCodeDigitizingSource codeInstance (computer science)Analytic setGateway (telecommunications)Model theoryFood energyOrder (biology)Visualization (computer graphics)Phase transitionTask (computing)MedianMathematicsWindowSoftwareDifferent (Kate Ryan album)Set (mathematics)WritingFrequencyRange (statistics)Moving averageDataflowRun time (program lifecycle phase)Goodness of fitMeeting/Interview
Codierung <Programmierung>ArchitectureCodeLaptopCartesian coordinate systemCodeData managementBitComputer architectureComputer animation
PowerPointMereologySlide rule2 (number)Series (mathematics)Presentation of a groupComputing platformImplementationDigitizingLevel (video gaming)Internet der DingeComputer animation
User profileFood energyIntegrated development environmentSoftware developerData miningIntegrated development environmentCondition numberSoftware developerLimit (category theory)Computer animation
NumberFigurate numberComputer animation
Model theoryDigital signalGateway (telecommunications)Repository (publishing)Endliche ModelltheoriePhysical systemInformationModel theoryMultiplication signComputing platformDigitizingContext awarenessVariable (mathematics)Internet der DingeResultantAnalytic setField (computer science)Repository (publishing)Machine learningVirtual machineDiagram
Source codeWritingKeilförmige AnordnungRobotInformation privacyDatabaseSource codeComputing platformWritingInformation securityData transmissionComputer animation
Data analysisModel theoryField (computer science)Machine learningCartesian coordinate systemLine (geometry)Computer fileBuildingRow (database)Endliche ModelltheorieFile formatVirtual machineData analysisMultiplication signInstance (computer science)InformationProcess (computing)Sampling (statistics)Computer animation
Computer fileFile formatMultiplication signMathematicsSet (mathematics)Row (database)Variable (mathematics)Different (Kate Ryan album)Mereology2 (number)InformationComputer animation
ArchitectureGraphics tabletTimestampMaxima and minimaVariable (mathematics)DatabasePivot elementPrice indexBlogInformationUniqueness quantificationLogicComputer fileTimestampLaptopTraffic reportingDatabaseRun time (program lifecycle phase)ResultantMaxima and minimaTable (information)Line (geometry)Error messageDataflowUniqueness quantificationSeries (mathematics)2 (number)InformationNeuroinformatikVariable (mathematics)Subject indexingPhase transitionTask (computing)LoginProcess (computing)Computer animation
Scanning tunneling microscopeCodePivot elementVariable (mathematics)Price indexMenu (computing)Uniqueness quantificationDatabasePi2 (number)Category of beingLaptopMultiplication signResultantQuery languageElectric generatorCASE <Informatik>Line (geometry)CodeGroup actionTimestampFunctional (mathematics)DecimalCore dumpKey (cryptography)Variable (mathematics)Transformation (genetics)BitOrder (biology)WindowFrame problemMedical imagingComplex (psychology)Arithmetic meanFunction (mathematics)MereologyComputer file
Maxima and minimaGraphics tabletTimestampVariable (mathematics)Pointer (computer programming)Query languageTable (information)Row (database)Lattice (order)InformationLaptopData structureTimestampFrame problemLine (geometry)Computer fileFunctional (mathematics)Maxima and minimaRange (statistics)Reading (process)2 (number)ResultantDatabaseVisualization (computer graphics)Neuroinformatik
Source codeOptical disc driveFactory (trading post)Integrated development environmentWeb serviceVisualization (computer graphics)Product (business)Service (economics)Local ringLaptopIntegrated development environmentData structureSoftware developerMathematicsCodeBranch (computer science)Computer animation
Moment (mathematics)NeuroinformatikComputer animation
Wechselseitige InformationComputer animation
Witt algebraStapeldateiRow (database)2 (number)Software testingMultiplication signLine (geometry)Virtual machineFrame problemMoment (mathematics)Right angleDifferent (Kate Ryan album)Product (business)Presentation of a groupComputer fileFrequencyVariable (mathematics)MathematicsResultantDirection (geometry)InformationLaptopSource codeExecution unitTimestampMIDIMultiplicationControl flowMeeting/Interview
Transcript: English(auto-generated)