We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Data pipelines with Celery: modular, signal-driven and manageable

00:00

Formal Metadata

Title
Data pipelines with Celery: modular, signal-driven and manageable
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Writing pipelines for processing large datasets has its challenges – processing data within an acceptable time frame, dealing with unreliable and rate-limited APIs, and unexpected failures that can cause data incompleteness. In this talk we’ll discuss how to design & implement modular, efficient, and manageable workflows with Celery, Redis, and signal-based triggering. We’ll begin by exploring the motivation behind segmenting pipelines into smaller, more manageable ones. The segmentation simplifies development, enhances fault tolerance, and improves modularity, making it easier to test and debug each component. By leveraging Redis as a data store and Celery’s signals, we introduce self-triggering (or looped) pipelines that efficiently manage data batches within API rate limits and system resource constraints. We will look at an example of how we did things in the past using periodic tasks and how this new approach, instead, simplifies and increases our data throughput and completeness. Additionally, this facilitates triggering pipelines with secondary benefits, such as persisting and reporting results, which allows analysis and insight into the processed data. This can help us tackle inaccuracies and optimise data handling in budget-sensitive environments. The talk offers the attendees a perspective on designing data pipelines in Celery that they may have not seen before. We will share the techniques for implementing more effective and maintainable data pipelines in their own projects.
Modul <Datentyp>RootCASE <Informatik>Data managementModul <Datentyp>Computer animation
SoftwareDigital signalSoftware engineeringStructural equation modelingSystem programmingIntegrated development environmentSoftware frameworkHookingCurveTask (computing)Complex (psychology)Software maintenanceCodeFunctional (mathematics)Physical systemWritingScaling (geometry)Cartesian coordinate systemMessage passingProcess (computing)Series (mathematics)ScalabilityLimit (category theory)Multiplication signSoftware developerSoftware engineeringScheduling (computing)Queue (abstract data type)DigitizingForm (programming)ChainService-oriented architectureRadiusResultantInformation engineeringTelecommunicationLogicCompilation albumAdditionCycle (graph theory)INTEGRALPresentation of a groupComputer architectureVideo gameComputer animation
Process (computing)Maxima and minimaSet (mathematics)Multiplication signUtility softwareComplex (psychology)Function (mathematics)Data storage deviceModul <Datentyp>Software developerVariable (mathematics)Task (computing)Different (Kate Ryan album)Scheduling (computing)State of matterWechselseitiger AusschlussCycle (graph theory)Video gameRow (database)2 (number)Metric systemLimit (category theory)Structural loadSynchronizationStapeldatei1 (number)International Date LineLatent heatHookingEvent horizonSoftware maintenanceProcess (computing)LogicPoint (geometry)ResultantData managementComputer animation
Event horizonPoint (geometry)Task (computing)Latent heatHookingBasis <Mathematik>Task (computing)Computer animation
DatabaseTable (information)ResultantDatabaseData storage deviceStructural loadProcess (computing)Task (computing)StapeldateiQuery languageMultiplication signPoint (geometry)LoginCASE <Informatik>Beat (acoustics)WeightComputer fileSingle-precision floating-point formatComputer animation
Commercial Orbital Transportation ServicesQuery languageQuantum stateValue-added networkOrdinary differential equationTask (computing)Table (information)Frame problemPoint (geometry)State of matterMultiplicationFunctional (mathematics)Task (computing)Query languageDifferent (Kate Ryan album)Data storage device2 (number)Computer animation
Limit (category theory)Task (computing)CASE <Informatik>2 (number)Parameter (computer programming)AdditionLoop (music)Row (database)Multiplication signBit rateStructural loadData storage deviceComputer animation
Computer engineeringReal-time operating systemLoop (music)Row (database)RoutingCASE <Informatik>Parameter (computer programming)IterationTask (computing)Limit (category theory)RootCountingSystem callLine (geometry)Computer animation
ChainBroadcast programmingTask (computing)FrequencyProgrammschleifeLoginCASE <Informatik>Loop (music)Task (computing)StapeldateiChainProgrammschleifeSoftware developerScheduling (computing)CodeError messageComputer animation
Event horizonContent (media)QR codePresentation of a groupError messageMultilaterationComputer animation
Successive over-relaxationOrdinary differential equationWeb pageRoundness (object)CuboidTask (computing)BijectionRow (database)HookingArithmetic meanChainCycle (graph theory)Video gameComputer animationLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)