Automatic Conference Scheduling with PuLP

Video in TIB AV-Portal: Automatic Conference Scheduling with PuLP

Formal Metadata

Automatic Conference Scheduling with PuLP
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Automatic Conference Scheduling with PuLP [EuroPython 2017 - Talk - 2017-07-12 - Arengo] [Rimini, Italy] Linear programming is often regarded as very theoretical or even not known at all as a well-developed method of solving real world problems. The talk gives a short introduction to LP problems and presents an interesting use case for the Python linear programming problem solver PuLP: that of creating a conference schedule
Computer programming Operations research Potenz <Mathematik> Functional (mathematics) Scheduling (computing) Constraint (mathematics) Run time (program lifecycle phase) Optimization problem Moment (mathematics) Information technology consulting Word Data management Mathematics Term (mathematics) Linear programming Computer science Website Mathematical optimization Condition number
Scheduling (computing) Length Multiplication sign Range (statistics) Source code 1 (number) Set (mathematics) Mereology Data dictionary Variable (mathematics) Subset Mathematics Blog Computer configuration Videoconferencing Elasticity (physics) Damping Social class Physical system Area Source code Constraint (mathematics) Theory of relativity Software developer Nominal number Bit Variable (mathematics) Root Process (computing) Order (biology) Convex hull Right angle Summierbarkeit Pattern language Figurate number Spacetime Laptop Trail Functional (mathematics) Implementation Statistics Service (economics) Divisor Constraint (mathematics) Protein Rule of inference Field (computer science) Well-formed formula Linear programming Authorization Integer Mathematical optimization Key (cryptography) Online help Interface (computing) Debugger Length Planning Computer programming Uniform boundedness principle Voting Personal digital assistant Blog Universe (mathematics) Optics Light field Object (grammar) Scheduling (computing) Tuple Library (computing)
Standard deviation Empennage Scheduling (computing) Functional (mathematics) Block (periodic table) Multiplication sign Source code Division (mathematics) Variable (mathematics) Variable (mathematics) Wave packet Personal digital assistant Moment <Mathematik> Block (periodic table)
Root Link (knot theory) Block (periodic table) Multiplication sign Right angle Block (periodic table) Summierbarkeit
Conic section Scheduling (computing) Group action Randomization Mass flow rate Scripting language Run time (program lifecycle phase) Multiplication sign Source code 1 (number) Function (mathematics) Variable (mathematics) Data model Programmer (hardware) Mathematics Different (Kate Ryan album) Computer configuration Computer network Lipschitz-Stetigkeit James Waddell Alexander II Exception handling Proof theory Scalable Coherent Interface Convex function Constraint (mathematics) Broadcast programming Block (periodic table) Range (statistics) Modeling language Bit Variable (mathematics) Complete metric space Zielfunktion Arithmetic mean Convex optimization Process (computing) Interface (computing) Website Endliche Modelltheorie Right angle Block (periodic table) Pole (complex analysis) Resultant Point (geometry) Computer programming Slide rule Functional (mathematics) Line (geometry) Constraint (mathematics) Gene cluster Maxima and minima Online help Local Group Software Linear programming Software testing Mathematical optimization Rule of inference Continuous track Information Image resolution Run time (program lifecycle phase) Java applet Linear programming Binary file Similarity (geometry) Voting Personal digital assistant Blog Convex set Mathematical optimization Scheduling (computing)
Point (geometry) NP-hard Trail Scheduling (computing) Context awareness Multiplication sign List of unsolved problems in mathematics Survival analysis Online help Function (mathematics) Mereology Dimensional analysis Wave packet Mathematics Endliche Modelltheorie Area Potenz <Mathematik> Constraint (mathematics) Graph (mathematics) Physical law Feedback Bit Basis <Mathematik> Film editing Personal digital assistant Self-organization Right angle
and I'm going to talk about automatic conference scheduling using pulp and a few words about myself and Mark Lambert I've been doing present for many many years and so mathematics and live in Dusseldorf in Germany I miss you consultant and I did a lot of work for the presence of a foundation for the you're present society this a torque manager with it and and so yeah on itself your predecessor at the moment and so so very busy organizing this conference right so what I want to talk about here in the talk is the linear programming how many of you know linear programming so ifyou as good so a linear programming is a term that comes from operations research on which is if you've in mathematics so it doesn't have all that much to do with computer science and that they called linear programming is so actually something that has to do with optimization and so programming in the sense means that you want to find some optimal solution for a problem that you have a linear of course means that the probability looking at has to be representable using linear functions so both the the constraints that you have the light site conditions that you need to fulfill plus the the function actually want optimize the both the 2 billion EUR so this is something that's it's very easy to understand history is typically easy to to write down but finding optimal solutions can be extremely hot so if you know from computer science these problems are usually NP hard so they have large exponential runtime um now I want to share my
laptop was going crazy the right so let's look at the mathematics behind this so what you have when you define where these problems is you have a set of variables the ICSI variables you see here those can be usually floats are integers and then you have factors that you add to these variables and you add everything together and then you say OK uh this is my objective function so functions you want optimize so or you can also use this kind of method of writing down things for writing the new constraints and I I put up some some pictures here those also the constraints so essentially what you do is you you have a linear function and then you uh define multiple ones of those in the same one-sided this linear function is supposed to be in the set of allowable values all yeah we just going plan again aviation lying on 1 side you the OK and problems and my notebook won't crash because of this the and what it yeah can you see something that's it isn't working often you you this is not the use of a perfect we can see something as the main purpose of the talk right so I've drawn up some some constraints here you can see this yellow area this year area as the the set of allowed values and of course what you want to do is you want to in that area you want to optimize the function at your interests so this center pulp pulp is a Python implementation which provides an interface to these the solvers for linear programming problems and public services as part of a larger library called a coin or or so they they have various tools and their feature of universe system to use and uh it's all written in Python and uh if you're if you're doing if you solving problems in this space then this is the free something to look at pulp itself that is basically a front end to LP solvers but also comes with the with the it's a slow solver but it comes with a purpose over the simply written in Python and so this is great for doing debugging usually what you do is you divide cure your implementation 1st using the pope's which allows the more advanced solvers like she'll pk which is silicon 1 it's free 1 and then there's some commercial once obviously this this is an area where there's a lot of commercial development going on because writing the sources you Viva hard and is very tricky and and they use a very expensive you want take a if you want to buy 1 of those and the way that to define the problem and pulp this if you have a few objects that you have to set up so that these are the key the class names of 4 9 years LP problem is the base of the object so that you can use to basically move everything together so this defines your problem so it and it has references to the variables that using the objective function and how you want to optimize things then you have to define the XII variables in pulp as well and you do that using the LP variable class so you create 1 of these objects for each x i that you have a new problem by the LP variable of optics can be set to say OK this variable is a float of this variable is an integer that is also the option of doing having a binary variables so just takes the value 0 1 them and it's possible to define a permitted variable the value range which makes it a bit easier so you don't have to define as many common constraints as nominative and then you have the constraints there's a class LP constraints for this and with that parable you can defined using those LP variables um sir with that object you can define your your constraints using the LP variables and and you basically when defining these you basically defined the as short formula and then to solve all will take that formula and then use it in the In the process of finding a solution to a pub also has some more advanced LP constraints subclasses for example for elastic constraints so you can say instead of just saying OK it has to be the value has to be on that line or in the yellow field you can say OK the it's it's OK if you go let's say 10 % and the light field of which this is handy if you when you do debugging because sometimes you set up the constraints and solve says OK there's no no value that's uh is allowed so obvious have some some problem in their sum and then you need me to make it more flexible more elastic and you can use those to figure out what's going on OK so what about the documentation so when I when I started using poll of course the 1st thing I did was I
try to look at the documentation I found that the documentation is a bit lacking substance incomplete doesn't actually document everything it misses some details there was there some tricky this is pulp that you have to know which probably come from the authority space and if you if you not really into are done its doesn't feel as assigning as it may be well would help for for using something in Python as so the best thing you can do is to just look at the source code which is really easy to to read and there are a few blog posts helpful as well because of the people of course have had the same problem and the figure things out and proteins found in the corpus so what was the inspiration for doing this talk and for looking at the conference scheduling as example that was a talk from david eva applied on UK last year yeah this video off the top here and of course I was looking for this because of your Python because we usually before you pass 2070 will always be used manual scheduling which is a lot of work because we have around 200 sessions discussion and so I was thinking that maybe this to me make things a bit easier to simplify things to also take you a much this criteria into consideration which we usually this again after the by hand so like speaker preferences or related to the preferences so we look at the statistics that we get like from talk voting for example and we try to use that to assign the proper rooms some typical things you have to do encounter scheduling is you have to look at the room sizes and you have to guess basically how popular talks going to be then you have talks which have varying all durations so the top length of lengths are a different you that's the subset you define your schedule usually different so you have a longer 1 shot once and they have some other constraints like for example a speaker may only be available for a few days of the conference notable conference we have for that 1 the speaker cannot give 2 talks at the same time and this is something that's very important because so many mistake at this time the so let's see how how you set up the problem in Python so 1st of you 1st of all you define a few variables I use dictionaries for this define the rules the room sizes how many seats to have and that will allow you to find the talks and the top 2 relations it then you define the top slots that you want to use in new schedule so as this is always so order in this room a for example and the start time in minutes from the beginning of the day and the length of that particular slot and they can go ahead and start with defining a problem so in this case it's a it's a very simple problem and you want to maximize its so told hopes to maximize the objective function and then you have to create variables so the way that works is the basically you not actually optimizing an objective function this case because in the example and using that as so basically this the objective function is a constant function so maximizing based his me anything you're mostly interested in finding the solution space that somewhere a possible solutions possible schedule that will fit all the constraints and so what you do is you set of variables which say binary variables which say OK talk on the left and tuple is assigned to the slot on the right the knew that for all the tracks and all the slots and so this is basically the problem at you want optimize you want to say which Charcot's assigned to which slot then you of course have to used to constraints to make sure that you get a complete solution so you want all the taxa being a sketch on the society that but in this is very typical patterns so if you take the sum of all the assignments that you have for the talks and you say OK this needs to be smaller or equal than 1 because you of binary variables of this essentially says that you need to assign that's right this is different this is actually 1 which says that you can only have 1 talk per slot and the 2nd 1 is this 1 this is we need to assign all slots so you take the sum of everything and you say OK for for each shot the sum of the Siemens has to be 1 over all the slots and then he maybe to make sure that the top durations fit to the slot length that you have available so you do that using the 2nd 1 so again you sum up all the assignments and you take the slot duration and then you make sure that the sum of all the slot duration steps uh where where this talk was assigned is equal to the talk duration is this clear so far yeah it's good a then some speakers may not always be available like itself a temple the speaker is only available in late in the day so he usually uses and it takes longer for breakfast or something so what you say in this case is yet constraints which basically said OK this the the top may not be assigned slots which are too early on that day so that's that's a Friday after the so for that right you don't want to do you talk you do something like this so you make sure that this assignment is 0 and then we get to a more complex thing so you want to make sure that the the same speakers not giving the chart in 2 places at the same time this may sound easy at 1st so you just say OK you just need to make sure that in the same uh in detained same time slot you don't have the you have only assigned 1 talk of offset to speaker but the problem is that the slot duration usually different throughout the day so if
you look at the schedule for example that you can see half of the sources are sometimes a longer and shorter and you have like 30 minute talks and 45 minute talks and they have like trainings workshops and so on and what I marked in red here is so the the case where we have made a mistake in the printed a function of the the booklet well we assign admitted 2 slots happening at the same time so this is something we want to avoid of course this is fixed which in your printed brochure you can still see this so how can we handle this case that we have different slack
lengths and over so want to make sure that we don't have any overlaps because what can happen of course is that the sketches this OK I want the uh I want to have the speaker do a talk in 1 early on and then just as many have an hour later 1 another session to start with the same speaker again to start with in in room 3 so the way this is you find a you basically split up all the slots that you have into smaller blocks so you take like 15 minutes of blocks for example which is so that works for you about some of them and then use up all the the slots into these blocks and then you make sure that there are the the speaker never gives a the talk in in the slots which have the same time so you make sure that this case does not have so what you have to do here is you have to define new variables so you find that the torque block assignment and of course
you few helpless like but could could not going to go into details here this basically just says OK this block it is assigned to that slot and uh this block starts at that time and makes it easier to bend well defined the constrained which then
looks like this so the 1st thing that you do is you tie the blocks to the slots that you have so you say OK for example the the like the 1 on the right in 3 this is 3 blocks we make sure that those 3 blocks are assigned to the other side in the same way as the slot the and then you
go ahead and say OK for for each start time that I have I want to make sure that the that that a speaker can only be assigned wants to that particular time of the start time and block growth that you have in your schedule so this is
how you solve these things then of course you go ahead and you other constraints you might have we define objective function this gives a comment that out so this would be like the happiness function of that particular assignment that you created and you have to define the solver so you can use the pole 1 which is the 1 that is good is used if you set solvents and but you can also use for example the GOP and you can 1 or when the commercial ones uh and then you you run the solver which is so surprising that solve a medical and then is important and this is something I find it on price I give up I thought about about pop pulp it you have to actually check the status of the solver so in case it fails to find a solution doesn't raise an exception like what you normally expect a person to happen but it just sets the status to something that's not 1 and of it took me a bit to find out so because I thought I got a solution because it ended and to get an exception and so I looked at the solution to the right so and this is something to keep in mind that right and then your 1st want to show you results the yeah so you basically preprint the results in some way you have to then access these poll variables that we define and you do that using pulp value and then you just pass it TOP variable vary value we have to use this way of doing it so you cannot for example just right if and then assigned because for some reason those LP variables always true and so this doesn't work so you have to use this approach to to get the value right and then you get this solution to the problem which in this case looks like that it's not this is not very pretty of course you want it to be printed like this so we define some extra help us for that and I use this on your on your website you can know them in from the speakers that you find a solution for this and then maybe they come back to you say OK was this of a cop because of this and that and then you add extra constraints to Europe and to your solution industry run your your soul of course what happens when you re run the solver is it doesn't always give you the same solution so it's very possible that gives you complete different solutions 2nd time you run you more so if you add other constraints so what you have to make sure that you have to tell the the solvent to minimize the changes in a sketch that you for ready published and the way to do that for you you just take the existing schedule that use the existing assignments that you have made it and then you add a penalty function so every time something changes of the old but that you've created uh you you raise a penalty value and because it's optimizing and that is of course you have to use an objective function uh because it's not optimizer will try to reduce the penalty values so it will make as few changes as possible to get everything done that's 1 thing that you prefer probably do when the thing when when scheduling using pulp another was thing that you can add this like room assignments you can use to talk voting results for example for that to figure out which room sizes on that you can add that information to the problems well you can abstract so you can tell the the up to the sketch to optimize things in a way that all the the topics the talks of a certain topic like I did for example of groups in a way that they have on the same day and maybe even in the same room so people have to switch and of course this is always mean it's so it's rather easy to just write down everything in but basically translating the problem that you have into 1 of these LP problems is not always so it's not always easy so you have to think a little differently because it's it's not programming as we normally do in Python is physically declarative what you have to do so it's more lexical programming and so figuring out how to actually write things down it's it's not easy it you have to you often run into the situation that the source of pieces of OK there's no solution school so you basically stuck at that point and then you have to start debugging Pope does provide a few features for that so you can give names to all these variables for example in the debug output you can see where things did not go right and then you can figure out why there was no solution the there is something to consider the in all this the runtime and this is a major problem especially for conference Saskatchewan which is essentially an NPR problem and you you can easily run into this situation that as as yet more variables to the problem the runtime increases exponentially and so it takes ages for the salt actually come to a solution and then after 3 days might tell you OK there's no solution which is of course very helpful right so in this slide just summarize basically the few gotchas already mentioned so this is the conclusion here is that you should always test drive solvers so just use it on a very small problem make sure it all works right test cases make sure that gives the proper results uh do check you constraints afterwards because in some cases the constraints made you will solve only if have a something and you don't want to if you want to have things run faster there some other options that you can use so these I wrote down a few of options here the so solvers which are actually extend things beyond just linear so they can do convex optimization so as as long as you have a convex function then you can still run the optimizer and finds good solutions this 1 the 1st 1 is a very fast 1 and I haven't actually looked at this yet but I think this uh someone has you can see the blog post there and use this got results from 10 to 70 times as fast as which is promising and I'm was done that so I'm and then there's another 1 which is full for done here for conic optimization so you can use those as well if you want things run faster all right so what did we do in particular for your present were 1st I want to mention that now there is no pipe and after this talk a pipe UK some actually sit down and to this for Pike and k but there is a practical conference schedule and for your present actually did look into this and try to use this unfortunately we ran into exactly the problem that I just mentioned it has exponential runtime and so it took it took 3 days to to run a business which is 10 to the process and did not wait any longer so then Alexander you see the chair of the programme workgroup you basically then decided to his own and what's interesting is that he didn't use optimizer he just use some clustering and random shuffling and then just some basically well like the deep experience from the previous years to make things happen and this is a lot better and so it actually works out pretty well right that's on
wednesday thank you very few of us so we could not but the 5 good excuse the thing models will be the only the speakers don't have to do so so in an art and questions the what is use all you need is survival on France's as hard mathematical problem worse is the part where let's see as 1 over is minus I would all headliners in the same so all just so that other speakers will be rolled over the audience while another conference might look to spread awareness of that we're actually going to have well I mean you can you can put all these things into constraints when you do this like automatic solving of course when you do like what we do it at your present where we put in a lot of experience and we do have certain things certain considerations that we always take into account when doing this schedule uh you can you can basically tried to make everything work out so um what you just said you can for for example that just putting the headliners 1st and then having all the other talks after that you can define that as a constraint so you can define for certain talks that these need to be in certain slots or maybe in certain and and that happened at certain times yeah yeah the 2nd and you answer a question so perhaps maybe more generally about whether this knowledge the pool as in mathematics for us all well it is certainly the it's a it's a bit of an artist you have to but I mean art in the sense that you have to experiment allowed to find a good way of actually doing the scheduling we've been experimenting a lot with these things at your presence and your past this is the 16th your Python that we have uh so we've tried a lot of things and some failed some some of were good like for example the concept that we have now where we have 2 keynotes personally and use on each day of that seems to work pretty well so we tried now for a couple years to to have trainings for example integrated directly into the schedule rather than having them a separate days that does work well but it also results in the conference in the topic of days to be a lot more than but some people actually find good so that for benefit we're going to look into all these things the feedback that we get and then we probably make a few changes going forward other questions yeah I heard urge you to what when where it failed to find a solution in your experience using was it easy or hard to figure out wrong I tell you what you have changed the it's going to move on and it's the hardest part yes if you if you if the solver says OK can't find solution then you basically have overstepped because if you look at that you are that the solvers output for so they're their burden on our diver hard to read and essentially I mean if you usually cannot be you cannot just do it visually because you of so many dimensions just doesn't work out so you essentially have to basically then don't cut down your your problem to fewer sessions that you need to schedule until you find a working 1 and then you can you can take it from they can add more sessions and then at some point you find OK this not failed use this kind of backtracking approach to figure out which constrain the search because in the problem that you have yeah the questions is how many conference organizers that here OK which covers the doing the sorry not the i is is like a hackathon conference the area the law but then good only the other questions that would be possible to use live it up into smaller problems like for example the case we use the 1st step and we yeah for example the everyday individual there is assigned some yeah speakers to that case because you know enough was already you have to try some of the little bit all the other any other ideas is that it was smaller problems to overcome this exponential here what you can do is to us to to solve this is to for example look at the different tracks and then optimize the different rights individually this works and optimizing for the days would also
work but then of course you have to in some way make sure that all chance that assigned and if you split up too much and all you can monitor the case that you forgot to on the top of something that happens with your path a lot is that we get so we get talk submissions we accept them because they look nice and we if we find that people can come and we get cancelations and so you often have Fourier graph last minute changes in doing that I mean directly at the conference and figuring out a new kind of the schedule and replacing those this is not always easy and so it is with the help of some automation behind it I think it's this actually better solution what we did found is like a like work Alex found is that completely rely on the solutions so it's not really make sense so you you can use this as basis but you should then always take this basis and then work from there mostly manually we gave the thank you lock again fj