We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

ETL pipeline to achieve reliability at scale

Formale Metadaten

Titel
ETL pipeline to achieve reliability at scale
Serientitel
Anzahl der Teile
132
Autor
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
In an online betting exchange, thousands of money related transactions are generated per minute. This data flow transforms a common and, in general, tedious task such as accounting into an interesting big data engineering problem. At Smarkets, accounting reports serve two main purposes: housekeeping of our financial operations and documentation for the relevant regulation authorities. In both cases, reliability and accuracy are crucial in the final result. The fact that these reports are generated daily, the need to cope with failure when retrieving data from previous days, and the fast growing transaction volume obsoleted the original accounting system and required a new pipeline that could scale. This talk presents the ETL pipeline designed to meet the constraints highlighted above, and explains the motivations behind the tech stack chosen for the job, which includes Python3, Luigi and Spark among others. These topics will be covered by describing the main technical problems solved with our design: - Fault tolerance and reliability, i.e ability to identify faulty steps and only rerun those instead of the whole pipeline. - Fast input/output. - Fast computations.