Feeding a real-time user interface

Video thumbnail (Frame 0) Video thumbnail (Frame 6443) Video thumbnail (Frame 7974) Video thumbnail (Frame 19262) Video thumbnail (Frame 30550) Video thumbnail (Frame 32176) Video thumbnail (Frame 36024) Video thumbnail (Frame 36506)
Video in TIB AV-Portal: Feeding a real-time user interface

Formal Metadata

Feeding a real-time user interface
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Feeding a real-time user interface [EuroPython 2017 - Talk - 2017-07-11 - Anfiteatro 2] [Rimini, Italy] Imagine you have some streaming computations running on a server. Client programs subscribe to real-time updates, so that they may visualise the computations for end users. How do you share this constantly changing server state with all connected clients? Sending an entire snapshot after each change is very inefficient, so you must implement some sort of incremental updates – diffs. But how do you generate these diffs on the server? And how do you represent them so the clients know how to apply them to update their own state? We have been working on these problems for a long time while building a stock trading platform in Python. I'd like to show you a couple of open source libraries that we developed for this purpose, and share our experience with tracking state and propagating it to user interfaces running in other processes
Intel Server (computing) Digital electronics User interface Code State of matter Decision theory Multiplication sign Real-time operating system Streaming media Coprocessor Number Web 2.0 Order (biology) Mathematics Estimator Strategy game Operator (mathematics) Software Computing platform User interface Predictability Software engineering Shared memory Code Planning Bit Database transaction Stack (abstract data type) Timestamp Open set Data stream Process (computing) Loop (music) Software Personal digital assistant Internet service provider Calculation Order (biology) Computing platform Table (information) Resultant Row (database)
State of matter Multiplication sign Real-time operating system Client (computing) Function (mathematics) Data dictionary Order (biology) Mathematics Different (Kate Ryan album) Single-precision floating-point format Cuboid Flag Exception handling Physical system Shared memory Electronic mailing list Bit Instance (computer science) Price index Opcode Sequence Open set Entire function Type theory Arithmetic mean Process (computing) Order (biology) Parametrische Erregung Resultant Row (database) Point (geometry) Slide rule Wage labour Connectivity (graph theory) Real number Attribute grammar Wave packet Sequence Revision control Latent heat Differenz <Mathematik> Canonical ensemble Operator (mathematics) String (computer science) Representation (politics) Energy level Liquid Data structure Boolean algebra User interface Standard deviation Client (computing) Cartesian coordinate system Personal digital assistant Digitale Videotechnik Computing platform Video game Object (grammar) Table (information) Library (computing)
Trail Slide rule Open source Code State of matter Multiplication sign Connectivity (graph theory) Set (mathematics) Insertion loss Frustration Black box Client (computing) Data dictionary Sequence Order (biology) Differenz <Mathematik> Different (Kate Ryan album) Operator (mathematics) Data structure Proxy server Mathematical optimization Physical system Task (computing) User interface Data dictionary Stapeldatei Trail Information Consistency Software developer Boilerplate (text) Electronic mailing list Sampling (statistics) Sound effect Client (computing) Bit Price index Incidence algebra 1 (number) Subject indexing Arithmetic mean Order (biology) Phase transition Right angle Object (grammar) Representation (politics) Library (computing) Computer worm
Slide rule Trail Group action Implementation Link (knot theory) Code Multiplication sign Price index Insertion loss Estimator Differenz <Mathematik> Term (mathematics) Operator (mathematics) Program slicing Cuboid Codierung <Programmierung> Communications protocol Mathematical optimization User interface Compact space Data dictionary Trail Projective plane Electronic mailing list Client (computing) Bit Price index Binary file Flow separation Subject indexing Type theory Message passing Personal digital assistant User interface Right angle Pattern language Communications protocol Spacetime Computer worm
Presentation of a group Multiplication sign Order (biology) Library (computing)
again of tell everyone on my name is the town I'm a software engineer and I really like difficult problems preferably with some mathematics in them and that's
exactly the kind of problems I work on that Kwan plane so it Kwan lane we devel op and also operator own stock trading platform and also trading strategies running on that platform to we build our own software here our only customers and we've got house traders using the software to trade so we are really small team and work a bit like a lean start-up which is not so common in the finance industry and of course all of our back and code is in very modern Python we're currently in the in the process of migrating to fight and 3 . 6 all we've been doing this for about 3 years now and it's working quite well at all but this talk is not really about trading were stocks so you don't need to know anything about that but it's more about well daytime and state on on a back server and how that gets transformed and displayed to users yeah so I'll briefly show you the the problem that we're solving at 1 plane so you allow trading platform is essentially a stream processor which receives of market data updates from from providers of those providers might be stock exchanges telling us what's happening in the market or a third-party businesses that sell that real-time data and this data streams to us constantly in real time and it's a very small updates and then we processes that we make some calculations on the data we we make predictions and estimates and all based on the results we may decide to trade in the market to buy or sell order do both at the same time and crucially for us not everything here is all made it so we don't make all of our our trading decisions automatically but there's human traders in the loop so we need to display all that's happening in the platform to humans who made observe what's going on and make decisions or or intervene in any automatic trading that you might be running but in our case the user interfaces web-based building react and it communicates with the main platform for web circuits but that's not very important for this so I'll give you some specific examples of what kind of data per stage we might be showing to uses so the 1st example is traits so that this just a record of transactions that happened on the stock exchange on aparticular stock so it may look like a simple table like this so we see for each transaction indeed a timestamp when it happened but what was the price and how many shares exchanged hands at that time the and usually this is just an happened only lock so the stock exchange keep sending us new crates they happen and obviously this is not very difficult to display in the user interface efficiently we just need to send our new traits as they arrived we need to display them in the user interface another example is not something we get from the stock exchange all but this is our internal state this tells us what so we currently have opened in the market that means FIL offering to buy or sell a certain number of shares at a certain price all but those orders might stand in the market for quite some time before they actually get executed or before we cancel
them and the type of training that we do is called market-making that means how we are at the same time offering to buy and sell very large quantities of shares in the market and the value of this is that we provide a lot of liquidity in the market meaning that every time you come out to the stock market anyone to buy or sell you will always find the contra party that is willing to to trade with you and 1 of those contra parties could be asked so we have dozens or hundreds of orders opening the market at any given time and we want to see what they are what their parametres are so that traders can check on them and maybe alter them or cancel them so again you can imagine this is a table of objects with some attributes and obviously when we create a new order or it's a simple case of adding a rotor that table but
of what we also all the time is altered a parametres of the orders that are already open so perhaps the status of an order changes at some point when we start canceling the order or we might even alter the price of an existing order so in this case you you need to be able to update pretty much anything in your table a and new rows so this virtue of specific examples from from the trading world but of course you can imagine any other real time application with the user interface will have data structures like these that will typically be a list of objects or or maybe just a dictionary so how can you do this how can you efficiently show your internal state that lives in your Python versus to 1 or more connected appliance of the very 1st approach you might think about is to simply do snapshots and so send that you find some every data change so you each component in your Python verses that generates some state so far will grow all gets they've method that returns whatever to state of that component is so for instance if you have a list of traits that the state of that component is used to the list of trade objects and in your process you just call these are all those methods periodically might have dozens of them were hundreds in your process and we encode their return values in say Jason and ship them to the client every time of something changes so this is obviously extremely inefficient on the other hand it's very very simple to implement all this actually work for us in the very beginning when we needed to get a quick prototype out and see if if our style of training will even work so we simply use the snapshots because it was super simple to implement and we didn't have so much data are changing and streaming in the beginning so it's a reasonable start but obviously this will not get you far so do you need to get a little bit smarter so then he might start thinking about different the snapshots it all right so you keep your gets state methods as they were before and you still call them periodically but he no longer take the entire output and ship it to all the clients all the time instead you start sending incremental updates so you if you at hand and you trade to your list of traits you only told your clients to just as and trade you don't send the entire list of traits as it were and you do this by remembering the last stage that you're clients fall and then comparing the result of get stage 2 to that previously remembered value and computing a list of differences or incremental updates the chickens and decline so in this case teach you treat does state of each component as a black box you don't really care what's inside you just compare the old version and a new version and use and the differences to the client in such a way that the client can reproduce new state so you 1 idea how you might do that this amazingly already contained in the Python standard library and there is a for dif labor that lets you find the differences between 2 sequences and those sequences might be of strings or or a list or a couple of things this is an example of how you can use that so we have 2 sequences here x and y you construct of the matcher object and you ask it for a so called opcodes opcodes our operations you need to perform on X so that you arrived at y and in this case so it gives us free opcodes the first one tells us to delete the 1st item in acts the 2nd 1 tells us how but there's our subsequent that is equal between the 2 and the 3rd 1 tells us to replace the end of actually the end of wine and the indices in those couples referred to indices in X and Y it's pretty intuitive 2 years so this is very useful if you can represent the state as a list of things because then de Dave generator are on the level of yeah of list of items so this appear like a really good idea to us so we implemented that but then we found out there's quite a few problems that you might not think about it 1st so the 1st 1 perhaps the most obvious 1 is that snapshots are not really going away you still have to support snapshots because if a client comes to you're already running system and wants to bootstrap its state it has to get of snapshots to start from somewhere so you still kind of have to support them somehow of what's more serious tho is that you still keep calling your gets the methods all the time and maybe those methods do some work that all will be thrown away when you find out that nothing has actually changed of a real life example of the work might be doing is converting some internal Python structures to something that can be encoded in Jason and sent to clients so do you need to work around that so you suddenly their produces also need to be able to tell you if they have actually changed they might have a boolean flag and if R has changed a flag is set to false and you will never call here get state methods and you won't waste time so this of course works this is doable but the downside is that it's really complex to implement because you can make a lot of mistakes and to create a lot of what's by forgetting to set for a clear is somewhere and this is bound to happen if if you have many producers of data and of your state might change in ways you don't you can't always predict accurately but there's even more problems although this slide is more about how interesting challenges than problems 1st diff live only works with sequences of hashable types so before I showed you that Topalov strings turns out we can be the Topalov dictionaries for instance because the football an exception so you have to work around that you suddenly must have a hashable representation of every single piece of state that you want to send out your clients so you that's a little bit of work or maybe a lot of work so you you might just say I give
up for small collections I'm not even going to try and produce the of so I'm just going to send the snapshots just as before and that's in fact what we ended up doing for for smaller a list of items because it's it's just posture and I'm talking about these problems in in such depth because I find them really interesting it it really makes you think about data and data structures in Python and how they're represented for instance that hashable at the requirement is actually really really important because if didn't require you to do that and and simply let you diff the following topples of dictionaries you would find out that it's very hard to compare year old remembered states you're new state and find the differences that calls you may have remembered a reference to a dictionary in your old state and then compare that to a reference to the same dictionary and the new state so you it's really not trivial to find what the differences are if you have 2 references pointing to the same dictionary because they will compare the same so you have you to really make this work with mutable data you have to do a deep copy of your old state and then compare that to your needs so so that's unfortunate and turns out that treating you're state as a black box that you can simply Dave at will might not be the best approach you there on so you we found a 3rd way and that is generating there's these as soon as your state changes so it's no longer a black box what you really need to remember everything you due to your state this is like a journal of of mutations so the way this works is you may have a list that's called orders you insert a new item into it every time you do that something somewhere in your Python code remembers that this mutation has happened so it at times this diff somewhere saying that there has been an insert operation and the index 1 to 3 and a payload of that incident is the reference to new order and of course you do that for the item assignments and for item deletions as well To this end we build a library that is now open source as of lunchtime today so it's called if track missiles on quite being so you can install it and and tried and so and this this code sample simply shows you some boilerplate to set this up so and if track you have dispatches and listeners dispatchers RTE data structures that you can manipulate like insert into them override items were delete items and listeners our receivers of these mutations so those are the components that actually get the the log of rights so here we just created this dispatcher and a listener and we connect them to each other and this is how you use it so you operates on the dispatcher called orders appear as if it was a normal list so here we in to order objects and then we delete 1 of them and then a notice that you're asking the listener Monday dispatcher but the listener for the dates that happened and we get this so we really get a lot of all the operations to inserts and a deletion and you get references to the insert pilots so if you have this you can and go this in Jason and send that your clients and you didn't have to of find out what did what the differences were exposed all this system obviously can support snapshots very easily because the listener ever since it has been connected to the dispatcher must have seen the old adits so all the operations that were performed on the dispatcher so you you can ask at any time for a current snapshot what delays currently looks like and notice we only have 1 order here because but here we insert it to you but then we deleted 1 of them so the snapshot shows you the latest state of the dispatcher and perhaps interesting the dispatcher couldn't tell you this information because the stateless it doesn't remember its own state it only proxies of those rights what you listeners the and the listener once once you call get diffs on it likely that from the previous slides if Bill forget them and if you call it again and no rights of happened you will get an empty list of course so you this automatically preserves consistency between snapshotting and it's it cannot happen to you that you ask for a snapshot and then you ask for dates and you suddenly get gives that actually were already applied and snapshot there's a couple more features that are worth mentioning so so this doesn't just support lists but also dictionaries so you you can set items or delete items and dictionaries and then you get a lot of what you changed and that dictionary 1 feature that we're really like is compact and it's so in the previous example I have to insert and 1 delete that actually reversed 1 of the concerns if this happens all in in a batch before you even send those bits to your user interface your client yeah you might not actually want to send the entire lot because you will be telling declined to insert an item and then immediately deleted afterwards which is kind of wasteful so compacting means of canceling out dates that operate on the same index in the list so if you insert something in deleted immediately afterward you can just throw away both because it is this has never happened all and you can do similar optimizations when you set up the same index and the list to new and new values it the only problem with this is so we haven't quite finished this turns out it's quite a difficult thing to implement because when you start canceling out dates you suddenly effect indices of other operations and a list of tickets so you we've been working on this for quite some time it breaks because it proves to be a very frustrating task out last week 1 of the developers and my team was telling
you triumphantly that he finally fixed all the remaining box and we could we could really use and if compaction and half an hour later he told me that he ran a fuzzing tool on it and found a couple of new box again after 3 months of trying to get this right but the latest message as of this morning is that it will work tonight so hopefully will be able to release this feature maybe tomorrow but according to our estimation this will save us maybe 40 per cent of traffic between the back and and the user interface so this really worth a another interesting thing you can do is allow squashing or aggregating dates so this means if you have a couple of inserts on that affect subsequent indices it actually group them together and instead of saying in certain item at index 5 and 6 and 7
you could say replace indices 5 through 8 without this list of patterns it so it's a small optimization it's not actually room doesn't reproduce attractive it produces group tips that we can then use in our own custom protocol that I'm going to talk about now so beyond just optimizing of your user interface updates through the diff track and dates now you might also want to consider to use a completely our DIY protocol if it makes sense in your case 2 we decided that Jason this nice because it's human readable but it's really wasteful of in terms of space and also so isn't it's quite fast encode and decode but you can do better so we decided binary protocol and of 2 and a code the actual payloads we use Apache Avro I really recommend you check it out if you don't know if it's a bit like Protocol Buffers on what we thing it's slightly more involved and has nicer Python implementations by the way all these things in the slides are linked so I'll put the slides of all this afternoon and you can just click through and this protocol among other things allows us to to squash there's this and but instead of sending a couple of separate inserts we can do like a slice operation which declined then applies as if it was just the slicing operation in Python on a Python list but of course it has to be that in JavaScript which is a whole another story and so again about every apart from the binary protocol saving you some space and time when encoding and decoding the nice thing about it it requires you to write a schema for every message type that you and which has the nice benefits of having an always up-to-date documentation because it is literally impossible to encode a message that doesn't come from pure schema yeah I'm running out of time so you check out these links in the slides there's some other projects that inspired us when building this and thank you for your attention and
be you have enough orders to its presentation of course fresh library you released during lunch has to it and I think you all for coming to you have to think so I'm guessing we have no time for questions
you yeah you can just catch me around helping tell be here all week if you have any
questions thank you FIL