Bestand wählen
Merken

Streaming: Why should I care?

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
yeah thank you good morning and welcome to my talk streaming why should I care my name is Justin speeding I'm working at under a German company doing and
be tailored solutions predicting what they need what they should order it so that's the background of me and now let's start with the talk
what will expect you in the in this task and this task at 1st I would give you modulation on why it makes sense to look at streaming at all and yeah that's also the title of my talk and that will give us some background on why I can that question at all I want to give some introduction into it what the streaming what is the basics of that and then show you how you can use this was in Python finally showing some challenges that might wake you when you go down this road or maybe swim down the Stream so that you are prepared to tackle them OK let's have a
look at the situation um we faced as a team for quite some time we have data processing application as already said we're doing a machine learning we get lots of data from our customers and we had a quite monolithic application so uh what we have is we get input data from the customer the standard in large XML files we need to when the data data we need to make sure that this is find that this this has the right quality to go into the machine learning we do the machine learning that takes a lot of input data right stuff right lots of altered data and then we have to solve that interface for the customer can query the data on them but the results of all machine learning so this was quite fine I mean the results are quite are really good but we have some would have some issues with operations and with extending with that so it would have to speak database in between will restore the data with all these boxes that exists and that we're using the me to exercise from Python will using a limit for database upgrades and all this works fine got if you have a also an implication of that time you might know some of the paints that come with that so for example um with several teams developing this application and now there are dependencies we all depend on this 1 database and the customer desperately wants a new feature in machine learning also the machine learning teams as well that's great that's no big deal we just have to change or Korea little bit to we have to write 1 more few to the output tables and then that's fine then we and that's no big deal not the other team is also working on that and is also working on the database and they're changing some refactoring on-time they're changing the data obligation uh also revealed that they do that not those conflicts and there's while you're just takes 2 weeks time just then we're finished and we need to see that we have conflicts and that so you really get to know you as a machine learning can say well now we have to wait that's bad but that's nothing we can do about it this is 1 thing 1 challenge that you might face other things are uh for testing so uh it's always hot if you have 1 big
application to establish clear boundaries to have solid uh testing strategy that you try really hard to the world this is at least not an architecture that supports you in that that uh also
a like the testing in layers talk from Monday and it will you see all these nice little boxes and how you can how you can separate that into layers and for sure you can do that with such a monolithic application but it's just hard to do so yeah the then
I went to Europe ISim already 2 years ago in 2015 I heard many talks about microservices and I really like that it was a great idea it
solves these issues that uh that you split and the
monolithic application into several boxes and you can handle the boxer separately each has its own data each has its own upgrades to teams can you will of independently while that's great and had lots of discussions after waltz uh after these talks on how we can use that for us that well as a set with the data processing applications and for data processing we have to process lots of data and that's what we saw that we cannot use this model in this purist function as it is we would have to do to transfer a lot of data between the services we have to save lots of data in each of the service as each service needs many of the data so unfortunately as nice as this looked as it looked but that was nothing for us yeah it understood Michael so so some great but I mean we had lots of case studies they were used in building that were used for more transactional applications and this was fine but the source said we couldn't use that and then at other conferences so many people were talking about streaming and also here at at this Europe Python already heard many talks and it's great it's great to see but most applications use that and financial services of for stock markets for the have lots of data 1 you would have to process that really fast or they're using it for online advertising where they have to process clickstreams or locks and in bed really fast and wealth of these are the traditional domains of stream processing and we are not in there also so what possibilities to we have which is standing in the midst while we continue microservices uh and streaming seems to be the domain of other people so interesting but yeah maybe nothing for us the fortunately during a different project I came into contact and do that Samuel annotations for streaming down and suddenly came to my mind that uh that this might be fine for us also we don't have millisecond where clickstreams to process many things that well might be more side effects of streaming and not just this few millisecond processing are really really good and we can use that to solve the issues we have and that's the reason for me for giving this talk maybe if you are in a similar situation I hope that this will help you also give you some ideas on how you can improve applications the so let's 2
Introduction to streaming the and give you some basic idea of what this is my background as a sentence coming from a database-centric application and therefore I want to compare this and give some idea where the differences it so let's look at
databases and streams I heard a talk from marching clip money which she gave the to Conference 2014 turning the database inside out and I thought this was very good way of thinking about databases and strings because essentially in a database and in stream you have the same information when you have a database then this database has internally changelog this change not tells you what the change was in the data your database tables and from this change lot this is essentially stream and you can reconstruct the database content at every point in time so when you when you look at it here in the 1st entry it says well change the row with the key aide to the when you 1 and then you see in this table has 1 row was he a and what he 1 then as next uh and other information comes in change road B to the what you 5 and a suit in the table then comes change through C 2 way of 3 and you see that in the table and then it gets more interesting you have opted for existing tables so a gets to where U of 8 then a gets too well your foreign seek to what you have to and each at each point in time you have a consistent table in your database so that's the basic idea of wall the the similarity between tables and strings of databases used it internally for replicating 2 different nodes for example the but why does this matter for us the the most interesting thing from in my situation was that different services we have can be in different states so we don't have this dependency to 1 single state because each stream process each service can have a different offset within that stream to let's take this example here would have to so was 1 which now is an index that 1 offset 3 entries use the table as a it's 8 years 5 Francese 3 we have a different service that is already at the offset 5 and there a for p is 5 and sees 2 and this is totally fine both services or in a consistent state and if there was 1 that catches up to the offset of 5 then you will have the same information systems to so the interesting thing is you can have several services that can operate at different speeds and this is 1 thing we have for all application and 1 point in time we might get lots of new data from the customer we might write lots of data into that stream and we have services that process that and with services that process that lower but as long as they are able to catch up that's totally fine and yeah you can add new services to that structure which can process the stream from the beginning on and you can you have just more control you have more possibilities uh how to scale your services so you don't have uh that all sources need to be at the same speed but you can be designed by you needs to have so student that faster do I have services words find that they are slower for example just uh at predicting some reporting mechanisms whereas on the other hand you want to answer the customer fast the OK so we have that possibility that's great but how can we use it and you set some services might be faster themselves as might be slower How can we influence the speed of all sources well we can program better that's fine so we can we can just improve coach but the success just a limited effect sometimes we need to have even more and there comes 1 idea into play that's also where a helpful and streaming you can partition used streams this means to and and that's always a decision based on your your business domain on what petitioning makes sense for us for example we get say it's streams with information about it and what location it would have which say you and we can use this for petitioning Austrian so let's say we have eurocents training we have 3 locations we have been in E will be an armed with the ball last year a Python conference and we have called through aware of companies based on so we sell spaghetti with still run you only it's and we have to some different times and from quantities and now we can have 1 process so that works in all these 3 partitions that's fine and that's maybe the best place to start but as we see we need to get faster with the possibility to split this art troupe introduce 1 or 2 new processes and they can work in parallel on these different string partitions so each petition could be handled by different processes and the state stream example for through all the rest of this presentation how does it look like so we don't have the database-centric we don't have the micro-services but we have here the streaming platform on the merits and the idea is that we can be at different offsets and that the so what did we gained we have clearer boundaries for all services we have all these ideas that we can deploy them independently that independently that we can operate them independently and the data it's mostly in the stream platform what you might ask itself why do we have this data still in the process of I answer that later and what it's about this database schema changes uh that are talked at the beginning when the data will idation team needs to update it and also the machine learning he needs to that that also this safe this question that will be on later yeah so what if we gain as a set of independent were lopment upgrades and will have more options for scalability decrease databases completely he would have to streaming performing between wall think of these data bubbles and would come to that later for me 1 important question was this all sounds so create this such a good idea and it seems to solve so many problems sounds like magic and magic that's always something this makes me suspicious maybe there's something I don't see maybe uh there are new problems that arise with the 2 no off the moment and yet it is not magic it is a trade off I mean databases for a powerful you have many guarantees that the given you by the database to have to the S. consistency guarantees the guarantees new transactions if the sequel language which is so powerful way to retrieve the data that the star in the database the you can do nearly everything but as we have seen this comes at a price we are depending on 1 single state and also the scaling is hot muscle scaling of the database it's hard the so we have to think too we need all these things that database gives us in all applications and what of things that we use so I was streaming we don't have the as guarantees anymore we have an ordering on stream partition so what streaming platform would give us it would guarantee us when we feed entries into the stream each service that will retrieve these entries will retrieve them in the same order this might be a small thing compared to the as guarantees but it's fascinating what you can construct from that and when you have similar things on several streams just given this ordering constraint you are able to construct many of the guarantees that you need and we don't have to cicle trees anymore so it is not possible to fury stream and this is something that you really have to get a new hat when you're thinking about that you might be used to sequel you might be used it you can query for any year a role with any what you are doing a join on that but this is not possible the stream just goes through your processor and that's your possibility to keep that state to remember what was the last while you of the and you have decided whether you can live with that are not and what so what mechanisms you employed to help with that I mean you see what you use but at least I feel better now I know it's not imagined that might come back to me in the worst moment at all but it's a it's a conscious decision that you can do and you can see the trade of on whether that's good or bad for you OK so much to the
fury now all we are using Python all company really left to use Python and root most of all services and how can we do that in Python the I just
taking a step back Apache Kafka as streaming platform this is not hyphen it's implement in the job market but what content to Python soon this is uh and this is what we and what this is an example of a streaming platform so you can have fewer producers that put data into the streaming platform because of consumers that retrieve data from the Streaming Platform and you have to stream processors that what they take that data and they uh the uh uh wrangle that data and they might as they put it back to the different streams and you can also have some connections to the other 2 databases to get the data from the currently so that's a really cool streaming Platform it's used by many people it's very scalable and it's really better proof so that's the thing that you can build on and there are also tough compliance in Python we also had just yesterday also from others that way that are using them and you have like pricing and Kafka and confluent have declined which of the 3 I've seen and also other people already have done very nice comparisons on that for example here the completion gains you can if you look there in detail if you want the interesting uh the uh the interesting differences between them up I kind Python task coughed are both depending written completely in Python I Kafka has very high phonic interface Python Kafka more similar to see interface and would have to confront Kafka client has not pure Python but this season the library live or at the Cusco but is the most performant of these so we decided to use the confluent have declined this many configuration options and it's really worth looking at them because uh this might uh you can use them for performance tuning so at 1st I was a little bit surprised when they use declined and it seemed also slow but it was just you my testing set up a where had very little red cards and the some settings are not for these little records of 4 more at a small buffering in there and if you recruit reduce that you can come to a very low latency also new test set up which then that much better so let's see how can we use these clients 1st thing is to have to
produce something I just have to give the producer with the bootstrap so which is my have node which has default on Oct 1992 I have some data you can see here the states 10 remaining on the uh on Monday we so some your link and you want to the input that a stream of states input and we're using Jason to you realize that it of then we want to consume that and also the consumer it needs to know the Kafka nodes to connect to what has some further settings of which you here the topic config might be the most interesting for you because topic conflict tells you do I just use the new 1 is appearing on that stream or do I want to process the stream from the beginning on so before this did you just look at new well use but here especially testing of always very helpful to start at the beginning the we subscribe all consumer to the same topic to say its input and the most important things here we are putting the consumer we're checking that this worked fine that we have something in there and then we just print the received message and it'll show up as the string we put in there it's great this already works as we needed to know we use traces the serialization format it's also good for a starting point but as you are working with many teams uh it's always good to have some defines the my new databases the schema was defined by the database and the good so that everyone knows what is in there and also for your for your streaming applications you can use more rigid schemas and just put on the when
we designed for it's for Apache Avro and this is a schema we define this is something would define the schema 1st um you give some data times to give some fields has many many possibilities to define that schema and it's also a very good compacted so it's not just writing the checks informal but it has a compaction and it which is also very good if you want to save some space but what excited me most about Apache Avro it that's it that it defines also schema evolution this means you can enhance his schema you can add new fields and it's uh it defines 2 criteria on how you can enhance the schema so for example from new fields you always have to give a default when you because this ensures that also process of that that an older state can use that data and if they want to retrieve data that was written by an older service than the default will use applied in a fair reading the data that was written by a newer was not the service than it already has this feud with the uh with water so put and so this is really great and this solves 1 issue which I promised to answer you 10 minutes ago this this 1 different teams wanting to its wanting to enhance the schema so by that they can use in a compatible way different portions of records in there and you don't have to reprocess all entries in the in the stream S would have done with a database of graphs so and so how does that look like
Tyson we have here the schema defined it has a name it says it's a record of many fields and say what still what other types of that field we have strings with quantities and many other types also and as I said you can uh you can say whether futures optional on you can say well if you'd it has a default when young so we use in this
and now we can use a different producer and consumer we used to ever approaches and therefore consumer and you can see here still we have to get the now we have to give a schema registry you so this is something where this this scheme as registered work every service can retrieve all worsens of that and this schema registry this also checks that you only do compatibles cannot which is great because as soon as you want to write a streamlining you worship and it's not compatible it raises an exception and has well this is from so you know it from the body point of the other things are mainly the same we're giving default or schemas for a T and the 1 you Wikinews or data here and we knowledge don't encoded in Jason but we give it directly to the of producer he uses to efforts schema to encode that and rights to the street also for a consumer
on the main new thing is that we have give that schema registry US was so that he doesn't know how to interpret things that on that stream and we are using here also calling and they just can't check on what's is to why you on here so these are mainly the example it's used from the confluent Kafka clients so you can also have a look there to taking the time to have some more explanations and obtain this these are the basics how we can write and read from the from the stream
what do we do with that so let's have a look at data would idation is to remember that 2nd box ahead in there and what do we want to do I would have to say to input and we need to check whether that's correct grammar amount so we're separating the well at and in well it's records in the same way as during the Cinderella fairy tale she has to separate the good piece and that piece and this is these are what we want to do here with and also so very basically be just pull the new record so let's say we have a function which checks whether the states struck swallowed whether the location is well at 1 whether the quantity is monitored to for example all these things you can think of and if it's well at you right it to a new stream states when the dated and if not you can write it to a new stream states raw and then let the other processes handle this information for example of how to answer the customer that he sent an implicit that's fine very interesting
thing about streaming as that you can add additional processors in very easy way so let's say you there we want some new stuff uh we uh we want to have some more monitoring some reporting on that and we write this monitoring or reporting to new stream all we have a new litigation logic which we just want to try we don't want to put it directly into production but we want to right it's results hold for a different topic to check whether this worked time when we had the database-centric application we would have to remember the processing state so we have 1 stage and for every record in all say table we would have known is that well a data are not has been processed on not so 2nd service for litigation would be really hard because it has to it has to know whether this has been a well adapted by that service but I have not well it entered by meat introduced new field of but this is not the case yet each service can no far it is processed so that can work independently which is the thing that might or might not found that interesting but as soon as have tried it really uh this uh fun to work with that because it makes things really much easier especially when you do well and you want to try out things so these are the basics for using the Streaming
now let's come to the challenges as I said we do machine learning and machine learning uh especially in the training is not a thing that you do and streaming and that the results uh the real machine learning on sets where they might be different but for us as we do forecasts of for sales in the way this is also something we do in batch so how can we work with these Bachelet like processes within or streaming the 1
machine and education needs to get all of the data base for the training in 1 that's or maybe in several they are working in a petition way but still that's many data and that so how do we get the input data and remember you don't you can't that stream you can tell give me for all to use the current well you so somewhere we need to save the data we have several options we can just keep it in the memory of all service we can use a separate database so serving database that this doesn't need to be that powerful as the database from all monolith but still we could use a smaller database all we could use a prop stall which is uh also just cheaper than a database and we can save the weights and then it can be used by the by launching an application and yes the stated application it feuds that at the moment especially if you come from warring normalized database scheme but that's price you will have to pay we have several advantages and we also then have to live with that it had application but also the idea behind such a duplication them How can you explain that can modify and very helpful in there was to differentiate between the right path and read pass so for reasoning about this makes sense amount I think that's very helpful distinction and in all old way with the database we had relative fish hauled right path so we put the when data data in the database and then predictable additions finished and when we start our machine learning we are doing in machine learning theory that needs to find all the fetch all this data with a very big joint statement and then feed it into the machine learning the so this is to read off and this machine learning theory is something that really puts the database and much pressure and you have to see database Ising that it can work with such a big fury and relatively short time frame knowledge compared this to a how we would do with training you see the right past knows longer because we have 2 data would additionally right that 2 or topics and as soon as we have new data on that topic we can always we begin already to the joining so we are joining the new states tell with states data with adaptation data with the product information and maybe for the data and right next to prop stone and there it's it's and it waits until the machine learning started what we have lost this normalized because because now we have 2 data deprecated but what we have gained is a very big operational advantage because the the right path can be most hated it as soon as the data arise we can write it down and it sits down to the machine learning starts so when the machine and starts it doesn't have to do that big joined and really puts the database and the pressure but it can use the data in a format that it needs and that this is consumed by that so we have 2 application but we have gained operational advantages the how would such a thing look like so as a set we have this location data we have the products data and we have uh the states data we can treat that locations patients and the products master data out that's not such a high data when you we can keep that in the in the service of the table and the joined that additional information to the State stream and then that John data we can depend to a file and this 5 and sits there and wait until the machine and stuff so this is the possibility to cope with the challenge that the machine and it's still batch the the you might
have noticed when explained that that said well the statesman of the patient the product data they are kept in the process of so that the state that we have in a processor
and state as you might notice the nightmare of every distributed systems and because it's hard to handle what states do we need there I mean for streaming data could just rushed through but for example in our uh in all of scenario we need to know what was the master data so that we can join the information for that so that's the data you want to join with but there are also other things that might be more subtle so when you have some time window processing you want to to aggregated stream that comes in and you want to know the sum every 5 minutes then you need to know the date of the last 5 minutes so that you can sum that up correctly and you need to know when to start a new aggregation and there are different ways of doing data can have hopping time windows a sliding time window so that's just different possibilities that have different requirements to the state following the database that this for you you could ask them there you could ask for master data you could ask for things in the last 5 minutes now you have to know that within your cells well that's fine you might think it just keep that in memory you know what came in and everything's fine but there are some challenges well a processor might fail and then it needs to restart and all state is lost all that was in its memory so from where this you get the data for responding the on something less dramatic but assess and 1 thing that you really want to use this scaling so at 1st we had a process so that took care of all locations and now we want for each processor to have 1 location so this also changes the state and their well it's say we have 3 processes for each location are women too much that into 1 so at least a Location Master Data needs now also to be merged with this 1 processor how can we do that the well as a said we just keep in memory and maybe we just keep all state in memory that we might possibly need in the future that's 1 option but it's not the best 1 the uh you can reprocess just a stream from the beginning on 2 wandered out this might take a long time on each processor could keep its own database instance and save the state and there so that it can be used increase start and you just need to know to connect which database which yet might also be interesting you can save your condensed state in stream which is an interesting thing because you have that Streaming Platform already all you can just ask a different service that hopefully knows all the master data and you can tell please give me all things I need to know several frameworks already exist to cope with that problem and it's very interesting if you start with your were in use approach in using that you have to look at such frameworks what they do and so why Dane might do it and then you you can really learn from their experience even if you are not in the same language for example you can have you can never get Kafka streams or at Apache some and up to yesterday uh of my impression was there's nothing we can use in Python and well we have to think and we employ some learning from these other things to we have to write all from of this makes sense or not and we really was searching for an answer then and very plants to be in a talk of from 1 term yesterday and they said they now open sourced yesterday the Winton Kafka streams which is the Kafka streams doing in Python and I'm really eager to have from more look uh to have a look and that they also said it's at the beginning stage and there are still some topics that need to solve for this very great to have a starting point in then and to check whether we can whether you can contribute to that and uh yeah just wrote this functionality in Python which would be really great to build more applications on that so you see here it's to get up link went encode Winton Kafka streams
well and that's the end of my talk let's summarize what have we learnt and that you have more options for your data processing applications then you might have thought but you also know the and trade and we want to encourage you just to broaden your way of thinking about applications to see if there's more things that you could use and you need to check whether you can live with a of summer and you know the challenges you know some possible solutions so that now go on and build some pretty applications on that that's
it from my side if
the it OK well a
great think you're much for the great tool that often uh we have our almost 10 minutes for questions great timing any questions any questions for Christians streaming and the way they do have a question the well that was my question and very interesting technology and never use streams how do you deal or maybe the framework deals with missing value say you have a network Gleich somewhere somebody trips on a network cable and you may see another 2 or 3 values in your stream of do you care wow what a framework that guarantees to use than with in a stream partition you get all the way to use I mean if it cannot guarantee you would get an error so this is something you really can rely on but what you have to deal with this is some streams might operate at different speeds than other ones so for us so 1 challenge will be and this was just too complex to bring it into that of presentation um let's say the customer delivers has some location some product master data and he delivers you something of data and now and the location data is the late this that there's some issue on the nodes the processes that and you are and on uh at a past of set but you at the offset for the state's data and now you would raise an arrow because to say the state's records in well it and you don't because that location doesn't exist but this will be back because the customer will tell you what I already sent you that's why did give me a and this is something where you application logic then has to cope with and so we would do that as we have 1 input where the data input it always ways will assign the realities and processing time stamps when the distillery when this was delivered and now you would idation logic needs to check that you are that you have processed also the reason master data records so this would be so the solution to cope with issue but for the stream partitions you always can rely that will get all when using in the correct order a great and I think you have any other questions you have a few minutes 7 minutes more or less I think keep asking questions folks but if you have any questions and then the question over there so and you the higher thanks a lot
for a great talk it's a really nice to see somebody else doing the same thing that we do as well in our company I have a question about theory recovering all states in the case that you're consuming crashes did you try some sort of seeking back the you to find the latest all the latest state that you could recover from um well that's the thing where you it's not
seeking back in the stream but uh um you can always have a snapshot of your data so uh that you know that you have in it in some data file if you say so what what master data that the received for example and this is used also when we do some to some previous for that to do some vector signs in there and with that stream was and that storage he also save on what was to offset in the stream which corresponds to that so bad that you always know from which point in time you have to reprocess history and if you want to get updates compared to such a proper storage so this would be the way you want to to search in the stream for updates that are missing things I like this approach so if I can just quickly show
what we did so we implemented a time-based seek in in the stream so each 1 of our every messages just like you we use of of each of our every messages contains a timestamp so do we implemented a sort of binary a searching the industry and so we can return to a particular point in time but it this I mean it works but it's not very elegant so I like your submission better yeah yeah mean its heart was time-based things as we want to think about all always in a distributed systems manner and then you have different processes that might have slightly different time things so what we want to do is we want to use that stream of the delivery ideas which then is guaranteed to always be in the same order and if you if you reference these newly created a things then you can ensure that you are really in the correct sequence could excellent so any other questions on standing here in the you but OK no other questions well and of the of acquisition I have to
get Christian in how do you integrate
the data on solid in the different databases right or excuse me can you repeat the question is how do you integrate the data more flows several databases the path of that a past and there uh saving the data in several databases like or not yeah I mean for each stream
processors you have for each stream processor you have different possibilities to save that in there and not everyone uses a database and not everyone needs to be queried so we want that uh that each service they do not need to know about what technology the other services users so the communication really as With the streams and how each of the databases some more that each service can be restarted for that all our that also other parts of the application and say what I need to know all about locations that you have in them but then theories that service for that so it's it's not a question of database integration all this integration should work way Streaming Platform and they should not need to know about that it's the only thing 0 where we need to know about it as well as a set a we want to do some exploratory data science on this uh which is not connected to this to the streams but directly uh to the to the output data but then the data scientist knows where the data lies in but within that application of communication should be 1 strings get with time for 1 more question actually and 1 yeah all question
uh do you have any particular strategy to handle their are interracial updating of the data that it's already Tula deemed to be a topic so if you're meaningful infrastructure if you change the former outdoor anything like that you do code changes or things like that with them let me see well understood the question correctly um I mean
if it's not just uh the the scheme I will we have what if you say uh that we uh that we need the additional clear addition on Futuna query machine learning this is that the question how to handle that yeah that was in designing I suppose there is established this puts you're did they die to use the dual OK and that the bites probably format marshaled in some way if you change that form then we're you do you how do you tackle there death the for for that you then you really have to reprocess your data sets so yeah I mean so this is you have a different all put formant and this does not uh healthy would've right down in there then you need to process data but if it was looking for the next 2 and a half so we have a we have no more time for questions sorry I'm looking for the next speaker please identify yourself and come to the podium and less thank Christian again the great OK great questions thank you
thank you
Intel
Task
Software
COM
Facebook
Streaming <Kommunikationstechnik>
Computeranimation
Implementierung
Resultante
Bit
Quader
Versionsverwaltung
Kartesische Koordinaten
Computeranimation
Datenhaltung
Virtuelle Maschine
Datenverarbeitung
Inverser Limes
Algorithmische Lerntheorie
Funktion <Mathematik>
Schnittstelle
Softwaretest
Nichtlinearer Operator
Prozess <Informatik>
Datenhaltung
Ein-Ausgabe
Elektronische Publikation
Randwert
Datenstruktur
Rechter Winkel
Ablöseblase
Strategisches Spiel
Computerarchitektur
Tabelle <Informatik>
Standardabweichung
Web Services
Soundverarbeitung
Softwaretest
Beobachtungsstudie
Lineares Funktional
Prozess <Physik>
Quader
Gebäude <Mathematik>
Kartesische Koordinaten
Quellcode
Computeranimation
Datenhaltung
Streaming <Kommunikationstechnik>
Domain-Name
Informationsmodellierung
Dienst <Informatik>
Web Services
Datenstruktur
Ablöseblase
Datenverarbeitung
Projektive Ebene
Stochastische Abhängigkeit
Prozess <Physik>
Punkt
Momentenproblem
Skalierbarkeit
Formale Sprache
t-Test
Schreiben <Datenverarbeitung>
NP-hartes Problem
Fortsetzung <Mathematik>
Kartesische Koordinaten
Aggregatzustand
Information
Computeranimation
Netzwerktopologie
Streaming <Kommunikationstechnik>
Skalierbarkeit
Web Services
Parallele Schnittstelle
Informationssystem
Web Services
Zentrische Streckung
Kraftfahrzeugmechatroniker
Suite <Programmpaket>
Prozess <Informatik>
Datenhaltung
Güte der Anpassung
Ähnlichkeitsgeometrie
Störungstheorie
Quellcode
Atomarität <Informatik>
Entscheidungstheorie
Konfiguration <Informatik>
Randwert
Dienst <Informatik>
Transaktionsverwaltung
Automatische Indexierung
Ablöseblase
Information
URL
Ordnung <Mathematik>
Zeichenkette
Aggregatzustand
Tabelle <Informatik>
Partitionsfunktion
Nebenbedingung
Subtraktion
Wellenpaket
Mathematisierung
Dienst <Informatik>
Kombinatorische Gruppentheorie
Systemplattform
RFID
Datenhaltung
Knotenmenge
Domain-Name
Datensatz
Coprozessor
Inhalt <Mathematik>
Datenstruktur
Algorithmische Lerntheorie
Stochastische Abhängigkeit
Widerspruchsfreiheit
Tabelle <Informatik>
Soundverarbeitung
Partitionsfunktion
Gamecontroller
Wort <Informatik>
Streaming <Kommunikationstechnik>
Verkehrsinformation
Bit
Subtraktion
Systemplattform
Computeranimation
Task
Streaming <Kommunikationstechnik>
Puffer <Netzplantechnik>
Client
Datensatz
Konfluenz <Informatik>
Prozess <Informatik>
Programmbibliothek
Coprozessor
Inhalt <Mathematik>
Wurzel <Mathematik>
Implementierung
Schnittstelle
Einfach zusammenhängender Raum
Softwaretest
Vervollständigung <Mathematik>
Datenhaltung
Paarvergleich
Paarvergleich
Konfiguration <Informatik>
Chipkarte
Dienst <Informatik>
Menge
Beweistheorie
Client
Programmbibliothek
Subtraktion
Wasserdampftafel
Bootstrap-Aggregation
Versionsverwaltung
Kartesische Koordinaten
Ungerichteter Graph
Raum-Zeit
Computeranimation
Streaming <Kommunikationstechnik>
Knotenmenge
Datensatz
Web Services
Konfigurationsdatenbank
Biprodukt
Oktaeder
Default
Softwaretest
Datentyp
Schemaevolution
Datenhaltung
Ein-Ausgabe
Binder <Informatik>
Konfiguration <Informatik>
Zeichenkette
Datenfeld
Menge
Kompakter Raum
Evolute
Dateiformat
Serielle Schnittstelle
Information
Ablaufverfolgung
Zeichenkette
Aggregatzustand
Subtraktion
Punkt
Ausnahmebehandlung
Nummerung
Computeranimation
Konfiguration <Informatik>
Datensatz
Datenfeld
Web Services
Rechter Winkel
Datentyp
Default
Zeichenkette
Konfigurationsdatenbank
Lineares Funktional
Prozess <Physik>
Quader
Güte der Anpassung
Ablöseblase
Formale Grammatik
Ein-Ausgabe
Computeranimation
Streaming <Kommunikationstechnik>
Client
Datensatz
Rohdaten
Konfluenz <Informatik>
Information
URL
Aggregatzustand
Konfigurationsdatenbank
Resultante
Prozess <Physik>
Wellenpaket
Mathematische Logik
Stapelverarbeitung
Kartesische Koordinaten
Mathematische Logik
Computeranimation
Datenhaltung
Streaming <Kommunikationstechnik>
Virtuelle Maschine
Datensatz
Web Services
Coprozessor
Algorithmische Lerntheorie
Stochastische Abhängigkeit
Mathematisierung
Biprodukt
Coprozessor
Menge
Rechter Winkel
Stapelverarbeitung
Streaming <Kommunikationstechnik>
Verkehrsinformation
Aggregatzustand
Tabelle <Informatik>
Konfiguration <Informatik>
Prozess <Physik>
Wellenpaket
Gewicht <Mathematik>
Rahmenproblem
Momentenproblem
Kartesische Koordinaten
Aggregatzustand
Computermusik
ROM <Informatik>
Physikalische Theorie
Computeranimation
Datenhaltung
Virtuelle Maschine
Streaming <Kommunikationstechnik>
Web Services
Coprozessor
Algorithmische Lerntheorie
Web Services
Schreiben <Datenverarbeitung>
Addition
Befehl <Informatik>
Datenhaltung
Mathematisierung
Nummerung
Ein-Ausgabe
Elektronische Publikation
Biprodukt
Konfiguration <Informatik>
Druckverlauf
Menge
Rechter Winkel
Festspeicher
Ein-Ausgabe
Anpassung <Mathematik>
Dateiformat
URL
Information
Normalvektor
Stapelverarbeitung
Streaming <Kommunikationstechnik>
Ising-Modell
Aggregatzustand
Tabelle <Informatik>
Lesen <Datenverarbeitung>
Konfiguration <Informatik>
Punkt
Prozess <Physik>
Gewichtete Summe
Formale Sprache
Zellularer Automat
Kartesische Koordinaten
Aggregatzustand
ROM <Informatik>
Bildschirmfenster
Systemplattform
Term
Framework <Informatik>
Computeranimation
RFID
Datenhaltung
Systemprogrammierung
Streaming <Kommunikationstechnik>
Web Services
Prozess <Informatik>
Bildschirmfenster
Datenverarbeitung
Coprozessor
Web Services
Zentrische Streckung
Lineares Funktional
Prozess <Informatik>
Kondensation <Mathematik>
Datenhaltung
Physikalisches System
Binder <Informatik>
Coprozessor
Konfiguration <Informatik>
Festspeicher
Kondensation
Information
URL
Decodierung
Streaming <Kommunikationstechnik>
Aggregatzustand
Subtraktion
Prozess <Physik>
Datennetz
Datenlogger
Besprechung/Interview
Kartesische Koordinaten
Biprodukt
Ein-Ausgabe
Kombinatorische Gruppentheorie
Mathematische Logik
Partitionsfunktion
Framework <Informatik>
Computeranimation
Eins
Streaming <Kommunikationstechnik>
Datensatz
Knotenmenge
Menge
Zeitrichtung
URL
Zeitstempel
Ordnung <Mathematik>
Fehlermeldung
Aggregatzustand
Streaming <Kommunikationstechnik>
Punkt
Vorzeichen <Mathematik>
Besprechung/Interview
Systemzusammenbruch
Vektorraum
Speicher <Informatik>
Elektronische Publikation
Physikalische Theorie
Quick-Sort
Aggregatzustand
Streaming <Kommunikationstechnik>
Subtraktion
Folge <Mathematik>
Punkt
Prozess <Physik>
Besprechung/Interview
Zeitstempel
Physikalisches System
Binärcode
Quick-Sort
Message-Passing
Stereometrie
Telekommunikation
Datenhaltung
Kartesische Koordinaten
Systemplattform
Physikalische Theorie
Integral
Streaming <Kommunikationstechnik>
Dienst <Informatik>
Explorative Datenanalyse
Web Services
Mereologie
Ablöseblase
Coprozessor
URL
Funktion <Mathematik>
Zeichenkette
Addition
Virtuelle Maschine
Bildschirmmaske
Menge
Rechter Winkel
Mathematisierung
Besprechung/Interview
Strategisches Spiel
Abfrage
Dateiformat
Nummerung
Code
Computeranimation

Metadaten

Formale Metadaten

Titel Streaming: Why should I care?
Serientitel EuroPython 2017
Autor Trebing, Christian
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/33692
Herausgeber EuroPython
Erscheinungsjahr 2017
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Streaming: Why should I care? [EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 2] [Rimini, Italy] You think, all that hype about streaming solutions does not affect you? I thought so also. But when playing around with that topic for some time, I realized that it sheds a different light on many topics I struggled with for some time. In this talk I want to share with you what I discovered when switching from a from a database centric view to stream oriented processing. Splitting your application in smaller services gets easier as you have more natural boundaries You have more options to run different data schema versions in different services (instead of one central db upgrade) More scaling possibilities Operations improvements For sure, streaming does not solve any problem, but much more than I thought before. And in python you have good support with many streaming clients. I will give some examples and comparisons for working with Kafka and Avro Schemas

Ähnliche Filme

Loading...
Feedback