Merken

Distributed Workflows with Flowy

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
everyone thank you for joining me my name is similar and this presentation is about the library that I'm working on it a and which makes it easy to model and run distributed workflows but there will be a Q and a section at the end hopefully there is not enough time you can also stop me during the token the question is so if anything is unclear OK so let's get started we start by discussing the a bit what the workflow is and I will show you a quick demo and spend the next part of the presentation trying to explain what happened during the and the the demo at so the term workflow is used in many different contexts but for our corpus a distributed workflow is some kind of complex process which is composed of a mixture of independent and interdependent units of work there are called best and usually workflows are handled with DAG is now which stands for directed acyclic graphs think dependency graphs between the task and our model using some domain-specific language or we add code like when you have a job queue but what you really try to accomplish used to have 1 entire workflow and use the job queue and the best in the job queue but to do some work but also to schedule the next steps that should happen during the workflow and neither of those provided a good solution and the reason for that is because the is part of a region of and you cannot have a dynamic stuff happening there usually and the the ad hoc approach where you have to do the job queues balance to create code that is hard to maintain because the entire workflow logic is spread across all the tasks that are part of the world another problem with get help approach is that usually is very hard to synchronize tasks between them so if you want to have a that started only after other tests are finished and it's usually pretty hard to to and so it takes a different approach for the workflow modelling problems and it uses a single threaded Python code and something that I call gradual concurrency inference yeah apply example all of the 2 processing workflow but the top we have some input data and in our case is that there are 2 euros for of the youngest of title and then there is an entire workflow that could process that this data in what it will do if we had try to all the subtitles on the video and the encoder during some target formats but it will also try to find some chapters some cut points in the videos that can extract thumbnails from from there and we'll try to analyze the subtitle and the target someone ads for this video so these interesting thing here is and is something that cannot be easily didn't do we DAG is is the the part where the timers are extracted but this is dynamics that and the number of Don extracts extraction task and the front based on the video so this is where you need some some flexibility it and so next I would like to show you how to use a workflow is implemented in in flowing and then like this the yeah I've tried to explain what what really happened there let's the all right so
well I thought we get
activities already in the past and In this case I'm using some tests so you can see all of them have some money sleep primary there just to simulate that they're doing something and there are regular Python function that there's nothing special about them they just get some input data the some processing and output the result so this is a similar with what you will get in summary for but regular job queue this is the the workflow code so it's the code that would implement the workflow that we saw earlier again it's a regular Python code that we are just calling did for past where there is something funny about it because it has a closure and we have not importing the tasks of the test functions themselves and there is a reason for this this is a kind of dependency injection and there is a reason for it and we see later why this would be useful other than that the largest function calls and regular Python code actually i'm gonna demonstrated that decision not to anything special by running this gold so what they do here for all the tasks and the workflow function I'm gonna pass the tasks to the workflow closure and then called the closure with the input data and this will run the workflow code sequentially most
of the time use this execution so it would take a while because of the the time was that there have their forcing end of the task to sleep and hopefully yeah that what happens
so about that training right and and alternative but but at
yes something something is wrong but as
so usually have it should work with uh is as regular Python so there is no reason for it not to work with the track but that the the the interesting part here so running that
code would take about 10 seconds because of all the time the and everything was happening sequence so the
interesting part is being able to run this as a workflow and have all that concurrency are happening so I tried to do that for OK so
it do it went much faster about 2 seconds and the reason for that is because all tasks that could be 1 of the key could be executed in parallel were executives at the same time as we can see in the diagram that was generated so that the error was there to represent the dependency between the 2 tests
and we can see a lot of them were being executed at the same time and I'm going to
try to explain how that works and the white it it went so fast that the reverse of the previous version which this report that right so in other in order to understand what was happening at doing them all that I have to talk about work findings for and we begin with a simple task you where we have all the tasks that we want to be executed the were boarding the task from the queue and our bonding them and as I said when you have won the prose similar to to there must be some additional code in the past that will notice casual other tasks when they are finished so they also generate other tasks beside the use of data-processing that they are going and this is not very good because there were 4 logic will get spread and like I said it's also very hard to synchronize between different tasks so another idea would be not to have the fastest generates a special type of tests called the decision and what the decision that instead of doing some data processing it will all only schedule other tasks in the queue so it acts as a kind of orchestrated the like we can see here but the arrow from the storage to the worker is the for because that orchestrate the decision will read data from the data store in order to try to get a snapshot of the workflow history and the workflow state and based on that state in all the work all the tasks that were finished it will try to come up with other test that must be executed next but the solution is also not very good because of you could have concurrency problems so if you if pass Finnish wrote 1 upper right after the other you can get to decision to this season's schedule and if those are executed being followed by 2 workers they will generate duplicate tasks in the queue so this is not a perfect solution so another thing proved even 1 more we need to have the cues managed in a way that all the decisions for a particular workflow execution what will happen in sequence and for this we introduce and and another layer that will ensure the release of these another thing we would also want to add some kind of time tracking system that we don't know how much time a worker has spent running some tasks so and can it can declare tasks as time out before a certain amount of time passes without there crumbling any progress at so this is not something new but these kind of workflow engine is implemented and provided by the Amazon as w as a service it's also available as an open-source alternative their and and collectors project with the same API that Amazon has there is also edges based on gene in similar to this in the works that I know of and there's also the local bank the 2 so earlier in the demo and the local back and will create all all these and gene and the workers in a single machine on a single machine and will run them only for the duration of the war with that for the duration of the workflow and then have thing gets destroyed it so hopefully by this time when this was the code that dwarf open code in the demo so hopefully at this time well you kind of get an understanding that this called will run multiple times so every time a decision is to be made for this work flow to have no progress on a disco the executed but again so if I were to put a print statement there and run the workflow I was see a lot of print messages but so I mentioned earlier about dependency injection and and why that's needed and there is a 48 is because flow we will will inject some proxies instead of doing that but instead of the the yield tasks functions and the proxies and are callable and will act just as a task would but they at special so when a proxy is called the core itself is non-blocking so it with the current very fast and the determined of the value of the proxy is of test result and the past result can have 3 different types it can be a placeholder in the case that we don't have a value for that task and it can it can or maybe that task is currently running and we don't have the result for for it can be a success if the task was completed successfully and we do have a value for it or it can be an error be for some reason the test phase the other thing a proxy called but does he looks at the arguments and tries to find other task results that are part of the arguments if any of the argument is the placeholder I didn't this means that the current activity or task cannot be scheduled yet because it has a dependencies that are not yet satisfied so it will track the the the the results of the previous proxy calls through the entire workflow and like we can see here so in this case when the call this and for the 1st time in a workflow does and that it's
particle task will be scheduled and its result will be a placeholder because we don't have a value for but the calls for the and the giant coding 1 schedule and activities because they would have a placeholder as part of their arguments meaning that there are unsatisfied dependencies and in this case the the results for the protocols for God they do encode the get past will also be placeholders so what these days is it's actually building the DAG dynamically at runtime by tracing all the results from the proxy calls to the arguments of other protocols and finally awful of Finnish Finnish its execution when the result that the return value contains no placeholders meaning that all the activities or all the tasks tests that were needed to compose final result are finished and like you can see here that this is true for all of even for data structures so we have here a couple and the values are inside the toppled and this will continue to work and the time is audio least and those will also get picked up so you can use any kind of data structures for the return date up as long as it can be adjacency relies that's what its use for sterilization so there are a couple of important things to keep in mind when you're writing a workflow basically what you want to use for all the decision executions to have the same execution path in your code for the same work for instance right so for all the decisions that belong to the same were for instance and this usually means that you have to to use pure functions in your workflow or if you want some kind of side effects either send those values through the input data put through to the workflow or have a dedicated activities for them or dedicated tasks for so the other thing you can do with the test result is to use it as a Python value like we see here I'm scoring 2 numbers and then I'm adding them together and when this happens of the if any of the value involved is a placeholder meaning that there is no result 48 yet our a special exception is raised that really interrupts the workflow of the disk of the execution of this function so in In the fact that these acts as a barrier in your work for all and each 1 get passed on to you you have the values for for both of the results that are involved this also means that if you have a cold after the displays that can be concurrent it can't be detected so you have to make sure that Jackson values as latest possible to to have the greatest concurrency a similar thing that happens in the Bureij code of the example where we iterate over the the chapters that are found in the video so here's to of these factors a barrier but being at the bottom we didn't affect the rest of the code so you may have not noticed that at another example is at when you have a situation like this 1 so here I'm going to numbers and then I will I may want to do so much often those additional and computation and it is not clear in what order the if conditions should be written because if in this case is that the the computations of squaring of the the is the 1st time to finish because I have the conditional on the a value it will have to wait until the result for any is available available at the progress for during the workflow and no matter how I try to tried the cold there will always be a case where there were 4 cannot make progress on the the other values available and this is kind of a problem but it can be solved with something that is called the sub-workflow so here I effect or the cold and that did the processing for each numbering part in units of workflow and then in the main workflow I'm using the sub-workflows as I would use a regular task and this way they can all happening parallel and when both are finished I can sum them and return the result so worthless are a great way to to do more complex things that you couldn't without them and we another thing to notice here in the main workflow and I didn't have to do anything special to use the the sub-workflows they are just 2 of the use just as a regular tasks at so for error handling you might expect that the error handling to look something like this this is how well the normal Python code would look like if you will had some exceptions in in a function but this is not possible because as I said earlier the protocol is non-blocking so you cannot get the exception this point so actually this is the place where you have to do right to the trial try clause at so that the reason for this is because only at this point we can for the evaluation of the results and only at this point we know for sure if the competition was successful or not and this is looks a bit strange and I'm I I don't like that too much there is a better way of doing it using the weight function and it comes in flowing and what is the
it really try to have right and what to do dereference the the proximity the test result of any similar as a doing an operation on it and the name is a reminder that this will act as a barrier so nothing will pass this point until and not not only that 1 pass these point with 1 be detected by that even if we could be executed in Parliament until this value is available the but this is not always the case if you are not you cannot that maybe you don't want to use the value in the workflow itself you just want to the value from a task to another test and the induced here how do you become popular areas but also what would happen here on the floor the result for the user never know when you're passing an area In the arguments of another protocol the protocol we also written error so that there is a propagate from 1 task to the other and is the result value there to try to return from the workflow contains area then the workflow for itself we will find so you cannot Dodge errors you have to deal with them or you can ignore them by not making them part of the final result in which case you would get some warning message that you had some areas that were not picked up by your code or handled at so the workflows can also scale by using some of the other but has the dimension of earlier the Amazon 1 FIL collectors at the end of did not who when you want to scale basically nothing changes in the workflow so you would still use the code that too so are you there are some additional configurations that you have to do that happens outside of the codes are of part of the code no because linear scale and you want to run the war from multiple machines in a distributed system that can be all kinds of failures and theories of their semantic execution time was the kids you can set and those will we'll will had to its fault tolerance but there is another type of error addicted can get when you scale on which is time of error which is a subclass of the test error that we saw earlier so we can have special handling for a time there is automatically try mechanisms in place and you can grouping for for the time also and it can consider them as you wish there is also the notion of the heartbeats and Hobbits are some callable was that of task can call and what it does doesn't run a Hobbit is called the to send a message to the back and then in the back and that the the the current that's is still doing progress but another thing that it does it will return a boolean value in the past and that will value can be used to know is the task of timed out in which case you can abandon its execution because even if it's finished digs ecution successfully its result will be rejected by the back but that another thing to keep in mind you should aim to have tests are written in such a way that they can run multiple times just because of the of failures that can happen and the tries rejected the tasks for the activities I'm using the the mean mostly the same thing can be implemented in other languages so you can use flow we only for orchestration and were from modeling so the engine and the logic to activities but there are some restrictions on the of the the size of the data that can be passed as input to worry about the result size at each worker so when you are working a scaling and you're on multiple machines you would have workers that are running continuously not like we we had for the local bank and when they were running only for the duration of the workflow and those were costs are single threaded single process so if you want more of them machine you have to use their own process manager and stop them and make sure that they are alive and then and is the history gets too large so the decision must use the word for history there were 4 execution history and the work for state to make decisions and if the history gets too large uh and actually the history in the data that is transferred to buy because of the history as an exponential growth you can use that by using sub-workflows sub-workflows will only appear as a single entity in the history so there there will you can get basically can get globally unique data transfer by using the sub-workflows in a smart way and because of what the fault tolerance beauty in so you can scale down so of it could like for example all the workers can die at some point in time and then after a while they would come back on line and the water flow progress 1 would be lost you missed the use of the progress of specific tasks but the workflow itself there were 4 progress 1 be lost and this is very useful for for that take a very long time to run and I think of the maximum duration for Amazon is like 1 year for a workflow so this can be very useful in some situations they can also scale up very easily just start new machines and they will connect to the cues and start pulling out tasks that need to be executed who thank you that was all if you have questions like I think now it's a good time who
was the least competence to lose various the ferry cut you can create and it love the made them How can you compare it as with with salary so so there is a fast distributed task you or job you and what did they it is a bit different because here you have the orchestration all the tasks so if you have many tasks and you want them to operate in a certain way with some dependencies between them and the best that the data between them so that you can do that by writing single code and from that single attribute cold the dependency graph will be inferred for you and do it if we make sure that the tasks are scheduled in the correct order and they get the data the need fasting so so that I would use every form a 1 of jobs some sending an e-mail or something but not for hundreds of jobs that are somehow in interdependence you have yes yet it also has campus which is the more likely the where you define your workflow topology before that the in such a dynamic way you can do with single-threaded called where you can have conditions and for and and all thank you what a single most library you use so as the bottom of the following it was very sug that what this is 1 most of a library of assessing the all licenses or maybe and but I don't think I'm using any a synchronous library for the real Quebec and I'm using the the future's so module implement the workers but there is not a synchronous library involved OK thanks who and yet in example workflow you should them 1 of the tasks returns the list so the list of chapter points that thing gets fed into something a boast of miles for the chapters do you have to wait does that tell us essentially block until every single chapters that found all it would be possible maybe K changes to support size generative function so you should you could start building a fun of the 1st chapter wall toss is still forming the later chapters so here yes so here it will block uh so any code under the time line 1 would be executed on the we have 2 chapters and this is because the final chapters it turns a least and it is a single result and we cannot get partial results
from the test so we have to wait until the entire result is available at the yeah so so anything below that will be blocked until the result is available and DC isn't such a big problem usually because there are ways to write the code in mind and this doesn't become a problem or if it is a problem you can create the sub-workflow site I could have had a sub-workflow that would do on the 5 chapters and the time the generation and then call the sub-workflow from here and had to training in polymers with the other we get code so just to follow them to main then this example of them and tags which you could stop processing immediately when be executed merely because you waiting for the video encoding to finish know it so well in in this case as in this example all the tasks there can be executed in parallel will be executed in borrowers so they actually execution topology will basically look exactly like this so this is how it will get executed that's why get there were there were 4 duration was about 2 seconds instead of 11 or something the time for the last question kind of a
repeat of the previous 1 the yet made a good point about mobile fun along with a lot of of words funding which chapters returning list they were returned from falling chapters into this funnel through the chapters thought when if could you convert be fun job has to be a generator will return it get it to return next chapter and then you can do that from now for the 1st chapter will fall into the system 1 is so yes you could do you could have a task that will only find the 1st chapter right there on that and then called the task again and it will resume from that point you can actually send the last chapter and and find the next 1 and this way you can solve the problem the for you want to so the and it really depends on how you write your code he just do rule you have to remember is that when you try to access the value in the word for between block and deal with the values available that's basically the only thing you need to know anything below that point 1 being directed and cannot be concurrent and that can be solved through some work so I think you were much for talk thank you
Bit
Punkt
Inferenz <Künstliche Intelligenz>
Datenparallelität
Zahlenbereich
Ungerichteter Graph
Kombinatorische Gruppentheorie
Hecke-Operator
Komplex <Algebra>
Term
Mathematische Logik
Code
Computeranimation
Videokonferenz
Task
Informationsmodellierung
Einheit <Mathematik>
Prozess <Informatik>
Maschinencode
Programmbibliothek
Warteschlange
Thread
Ganze Funktion
Schnitt <Graphentheorie>
Hilfesystem
Demo <Programm>
Softwaretest
Diskretes System
Stochastische Abhängigkeit
Stichprobe
Einfache Genauigkeit
Automatische Differentiation
Sollkonzept
Ein-Ausgabe
Domänenspezifische Programmiersprache
Zusammengesetzte Verteilung
Summengleichung
Mereologie
Zentrische Streckung
Dateiformat
Garbentheorie
Decodierung
Sollkonzept
Resultante
Algebraisch abgeschlossener Körper
Code
Computeranimation
Task
Metropolitan area network
Regulärer Graph
Prozess <Informatik>
Warteschlange
Funktion <Mathematik>
Normalvektor
Modul
URL
Softwaretest
Ortsoperator
Lineares Funktional
Distributionenraum
Systemaufruf
Ein-Ausgabe
Gerade
Entscheidungstheorie
Videokonferenz
Injektivität
Sollkonzept
Datenfluss
Warteschlange
Task
Elektronische Publikation
Systemaufruf
Gerade
Computeranimation
Warteschlange
Elektronische Publikation
Wellenpaket
Rechter Winkel
Äußere Algebra eines Moduls
Systemaufruf
Gerade
Computeranimation
Folge <Mathematik>
Datenparallelität
Mereologie
Vorlesung/Konferenz
Code
Softwaretest
Task
Metropolitan area network
Spezialrechner
Diagramm
Reverse Engineering
Zwei
Elektronischer Datenaustausch
Schlüsselverwaltung
Parallele Schnittstelle
Computeranimation
Fehlermeldung
Resultante
Proxy Server
Demo <Programm>
Datenparallelität
Hochdruck
Versionsverwaltung
Mathematische Logik
Code
Computeranimation
Task
Metropolitan area network
Virtuelle Maschine
Task
Arithmetische Folge
Reverse Engineering
Datentyp
Zeitrichtung
Äußere Algebra eines Moduls
Warteschlange
Speicher <Informatik>
Phasenumwandlung
URL
Softwaretest
Lineares Funktional
Parametersystem
Befehl <Informatik>
Freier Parameter
Finite-Elemente-Methode
Systemaufruf
Strömungsrichtung
Physikalisches System
Datenfluss
Entscheidungstheorie
Warteschlange
Portscanner
Scheduling
Dienst <Informatik>
Rechter Winkel
ATM
Injektivität
Deklarative Programmiersprache
Mereologie
Projektive Ebene
Speicherabzug
Ordnung <Mathematik>
Verkehrsinformation
Message-Passing
Aggregatzustand
Resultante
Bit
Punkt
Extrempunkt
Datenparallelität
Datensichtgerät
Ausbreitungsfunktion
Formale Sprache
Idempotent
Computerunterstütztes Verfahren
Computeranimation
Formale Semantik
Videokonferenz
Fehlertoleranz
Metropolitan area network
Datenmanagement
Einheit <Mathematik>
Maschinencode
Minimum
Parallele Schnittstelle
Große Vereinheitlichung
Gerade
Feuchteleitung
Umwandlungsenthalpie
Softwaretest
Kraftfahrzeugmechatroniker
Zentrische Streckung
Nichtlinearer Operator
Lineares Funktional
Parametersystem
Datentyp
Freier Parameter
Gewichtung
Systemaufruf
Ausnahmebehandlung
Ein-Ausgabe
Teilbarkeit
Entscheidungstheorie
Arithmetisches Mittel
Scheduling
Rechter Winkel
Konditionszahl
Ein-Ausgabe
ATM
Grundsätze ordnungsmäßiger Datenverarbeitung
Ordnung <Mathematik>
Message-Passing
Aggregatzustand
Instantiierung
Fehlermeldung
Proxy Server
Maschinencode
Hausdorff-Dimension
Wasserdampftafel
Mathematisierung
Zahlenbereich
Mathematische Logik
Physikalische Theorie
Code
Task
Virtuelle Maschine
Multiplikation
Arithmetische Folge
Mini-Disc
Datentyp
Vererbungshierarchie
Abstand
Datenstruktur
Konfigurationsraum
URL
Leistungsbewertung
Soundverarbeitung
Transinformation
Protokoll <Datenverarbeitungssystem>
Eindeutigkeit
Mathematisierung
Datentransfer
Rechenzeit
Physikalisches System
Datenfluss
Einfache Genauigkeit
Flächeninhalt
Mereologie
Wort <Informatik>
Partikelsystem
Normalvektor
Sollkonzept
Resultante
Web Site
Punkt
Wellenpaket
Mathematisierung
Code
Videokonferenz
Task
Netzwerktopologie
Metropolitan area network
Bildschirmmaske
Reelle Zahl
Prozess <Informatik>
Minimum
Programmbibliothek
E-Mail
Schnitt <Graphentheorie>
Parallele Schnittstelle
Gerade
Gammafunktion
URL
Attributierte Grammatik
Softwaretest
Graph
Zwei
Einfache Genauigkeit
Mailing-Liste
p-Block
Warteschlange
Erzeugende
Generator <Informatik>
System F
ATM
Ordnung <Mathematik>
Task
Umsetzung <Informatik>
Punkt
Prozess <Informatik>
Güte der Anpassung
Mobiles Internet
Mailing-Liste
Wort <Informatik>
p-Block
Physikalisches System
Code

Metadaten

Formale Metadaten

Titel Distributed Workflows with Flowy
Serientitel EuroPython 2015
Teil 21
Anzahl der Teile 173
Autor Banesiu, Sever
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/20202
Herausgeber EuroPython
Erscheinungsjahr 2015
Sprache Englisch
Produktionsort Bilbao, Euskadi, Spain

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Sever Banesiu - Distributed Workflows with Flowy This presentation introduces Flowy, a library for building and running distributed, asynchronous workflows built on top of different backends (such as Amazon’s SWF). Flowy deals away with the spaghetti code that often crops up from orchestrating complex workflows. It is ideal for applications that do multi-phased batch processing, media encoding, long-running tasks, and/or background processing. We'll start by discussing Flowy's unique execution model and see how different execution topologies can be implemented on top of it. During the talk we'll run and visualize workflows using a local backend. We'll then take a look at what it takes to scale beyond a single machine by using an external service like SWF.
Schlagwörter EuroPython Conference
EP 2015
EuroPython 2015

Ähnliche Filme

Loading...