OSM Stats: Rewarding contributors and real-time tracking of OSM

Video thumbnail (Frame 0) Video thumbnail (Frame 1097) Video thumbnail (Frame 5322) Video thumbnail (Frame 13126) Video thumbnail (Frame 21609) Video thumbnail (Frame 25813) Video thumbnail (Frame 30600)
Video in TIB AV-Portal: OSM Stats: Rewarding contributors and real-time tracking of OSM

Formal Metadata

OSM Stats: Rewarding contributors and real-time tracking of OSM
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Mapathons are an increasingly effective way to get data into OpenStreetMap. The Missing Maps project hosts mapathons to increase the amount of data in areas that don't have large local OSM communities. The American Red Cross and Development Seed have built an analytics platform that tracks user trends in real-time and rewards contributors for their efforts, as can be seen at missingmaps.org OSM-stats tracks user's activity, consistency and relative reputation, reporting detailed metrics and awarding a variety of themed badges based on the type and magnitude of contributions. Badges range from simple tasks ("Add 4 roads") to challenging ("Map in 10 countries"). Leaderboard pages display up to date detail on the most active users for a current project, while hashtag groupings display statistics to be separated out, allowing tracking of groups. A map of each users commits can be seen, as can a map view indicating the last 100 changes. Most of the contributions for the Missing Maps project occur during mapathons where hundreds of volunteers submit edits and additions over a couple of hours. This means that the system needs to handle large spikes of activity when thousands of edits are added. We deployed the OSM-stats components using AWS Lambda functions and Kinesis streams. These scale very well to meet the needs of Mapathons and incur minimal cost when not in use.
Keywords Development Seed

Related Material

Area Matching (graph theory) Computer animation Mapping Software developer Consistency Projective plane 3 (number) Website Statistics Hand fan
Commutative property Group action Musical ensemble Pixel Building Video projector Convex hull Texture mapping Multiplication sign 1 (number) Real-time operating system Computer font Total S.A. Variance Mathematics Mechanism design Bit rate Hash function Ranking Row (database) Convex set Physical system Area Metropolitan area network Texture mapping Mapping Building Web page Point (geometry) Bit Statistics Data management Gaussian elimination Chain Order (biology) Website Text editor Right angle Quicksort Whiteboard Metric system Arithmetic progression Task (computing) Geometry Booting Web page Statistics Mapping Inheritance (object-oriented programming) Variety (linguistics) Connectivity (graph theory) Interactive television Maxima and minima Mass Event horizon Metadata Field (computer science) Emulation Value-added network Wave packet Number Whiteboard Term (mathematics) Gamma function Algebraic variety Home page Dialect Matching (graph theory) Inheritance (object-oriented programming) Sine Projective plane Computer program Field (computer science) Total S.A. Group action Mathematics Sign (mathematics) Inclusion map Number Event horizon Computer animation Square number Computer-assisted translation
Convex hull Multiplication sign Set (mathematics) Special unitary group Arm Fraction (mathematics) Mathematics Semiconductor memory Set (mathematics) Diagram Bounded variation Information security Lambda calculus Physical system Area Metropolitan area network Texture mapping Mapping Real number Electronic mailing list Instance (computer science) Type theory Hash function Software repository Repository (publishing) Order (biology) Linearization Website Right angle Figurate number Metric system Physical system Immersion (album) Geometry Trail Functional (mathematics) Server (computing) Statistics Service (economics) Computer file Open source Variety (linguistics) Calculation Maxima and minima Perturbation theory Streaming media Menu (computing) Mass Metadata Number Frequency String (computer science) Queue (abstract data type) Computer worm Immersion (album) Home page Stapeldatei Matching (graph theory) Scaling (geometry) Projective plane Database Limit (category theory) Computer animation Calculation Formal grammar Lambda calculus
Web page Addition Building Texture mapping Service (economics) Mapping Variety (linguistics) Multiplication sign Projective plane Variance Real-time operating system Number Frequency Process (computing) Computer animation Meeting/Interview Infinite conjugacy class property Mathematical singularity Diagram Acoustic shadow Computer architecture
a lot of work and thanks for coming back arrangement consonant with development seen in Washington DC and we talk about a project that we did for the American Red Cross you probably all familiar with the missing maps project is anyone not familiar with missing maps project so missing maps is a project sponsored by the Red Cross to encourage mapping of areas in need especially after a disastrous so to the missing Match dot org website
that we worked on the project to redesign the website so if you feel that site now that's that's a new site from last year it's actually been up for maybe 6 6 months and mission and our original goal was to not only redesign the website but Red Cross wanted user pages are showing people's statistics and what they've committed into OpenStreetMap as well as statistics on those commands and some sort of reward mechanisms related to that leaderboard showing up the ranks of uses and groups and they wanted this to to happen in real time so the missing maps project sponsors map performance so map defines a where everybody at the gather together for an hour maybe 2 hours maybe some people have and use OpenStreetMap before and they use so the they they perhaps undergo a short training session and then they have a targeted area that the where everybody jumps on this like maybe 70 users may be a lot fewer maybe maybe some more for really large ones and they map that region so the real time component is to I will do so you can show up on the projector the real-time contributions over time during his so this comes down to tracking 2 minutes so that's that's what we need to do is we need to take the commits and and track them in a a committee in OpenStreetMap is called the chains and this is made up of metadata and the data so if you're familiar with OpenStreetMap you might go here and look at some of the details on a particular change said and this has metadata and that the data included in the this is the geometry and this is the metadata that's published every minute nominee get into the details of a real-time system in a little bit of because there's actually is in order to do this in real time the geometry isn't actually available with the metadata and so it's a little bit more complicated but if you notice in the change that we we have hatched text so hashtags are how we formed communities in In in missing and or in fact other projects with these map of funds happened or maybe outside of map of funds people who may commit can add hashtags to their commands and then we can track them so
hashtags a spatially unbounded they track groups and events you could have the the the biggest 1 is the missing maps hashtag that that Red Cross was particularly interested in a but you could put as many hashtags as you want so for a particular map font offer petition particular project you might you might have a hashtag and the editor the use you can be treated as automatically add those hashtags every time they make a so we have time last match to whatever whatever it is the 1 I had so this brings us to leader boards so with these hashtags in place we can have leader boards where we can look at the total commits for any specific hashtag as well as the users and so what you see here is and this is interactive and so you can add any specific hashtag that you want if you go to this page it will default to the 1 on the left is the missing maps hashtag but I've added a popular some 1 and a maximum of and once and so you can see that in a map of fun let's say you could have 2 groups and 1 group is 1 after another 1 use another they both are using maybe some common 1 and you could have some sort of in competition in in the mountain and see who's doing more you know left side right side a ball people those with here but this is the total number of the edits that have been made since maybe in the last 6 or so months when we say when we started tracking and we would like to in the future go back and add historical data so we can go back all really missing mass project right now this is this is just since we started and then the leader board show the users and right now this is being sorted by the total number of edits but you can sort by buildings of the kilometers of roads or some or the you know any of the free field here so you see we have 3 w who is opposite and if you check every once in a while it's the least top 5 people in this sort of bounce around they had enough they're actively trying to get on top of or other right now River W here is the top 1 now if you have a management OpenStreetMap the last several months and you have added the missing maps hashtags than you already have a user page automatic this doesn't have to be at its it's automatically so far those user pages each particular user can go to a specific page and so and here we have a rate of w it shows a variety of metrics on on his contributions and total total in that sort of stuff the hashtags that are used as well as you see these badges and also we have a contribution timeline and a map showing the regions now he's that you clearly focused right here in parents and South Africa and by uh other people might be used the other people actually bounce all over the place and this is who we actually save the convex hull of a committee and then combine that with the previous convex hulls so we're not storing all of the geometry just in the approximate region and also the countries but will will map those those geometry to what countries there and so we can we can track the countries that amount and now you can look at the badges there've W has earned and there's a lot of variety of them that of course been very active and this is all original artwork made for this for this project and with clever names and this is the illustrations are all done by R. Dhillon modernity and a development in and down below on the same page of your upcoming badges so and you can see there's a progress there so we've got white water rafting of mapping of waterways and there's there's really a quite a number of badges that you can and here's some examples little pixelated OK so why rewards some people will be like that's silly we originally this project was called us and gamification but internally didn't really like that term at this this
sometimes is a negative connotation to gamification or maybe it's a buzzword that's been perhaps used too frequently of late well rewards provide a few different things 1st off there's an immersion in the mapping experience you you you you make minutes and it's not just about making the commands and you going into a black hole and then it you can see em emits all the other comments on OpenStreetMap you can go a year stats page and you can see exactly what you've done so it gives us the the statistics for what you've done I think a very useful and when most people would be interested in that people are after different things some people might not care about any of these things some people might be care my maybe care about few of these so you have a sense of achievement when you're badges and the you strive to maybe get the next batch so this increases retention it also can encourage cooperation like I mentioned before you could have teams and teams can cooperate in order to perhaps win over the other team during my and of course the competition inherent in the so how did we do it we the implemented this real-time system using the largely microservices and there's bearing here and all this talk about each of the pieces this is all implemented on AWS so the 1st thing that we need to do is we need to stream the real-time data so OpenStreetMap makes the metadata available from planet . 0 7 . org and these but different files Our published every minute usually every minute there's a new file added and it's all the commits that happened in the last minute but it doesn't include and geometry so these are available we could and replicate the OpenStreetMap data ourselves but it's changes constantly so we use the Overpass API so overpass essentially replicates the OpenStreetMap database and makes the geometry is available for the last minute for all documents will now these have to be matched up so we have a node i have called planet string and plants stream that takes in the change that metadata formalism and the augmented what is called augmented deaths from the Overpass API and has a red instance running and puts them in the reticence stands because sometimes these don't match up you can just take them out of the change that's for 1 particular manner and the geometry from the same menu because there might be a delay for a variety of reasons so we put these are reticent instance and have a a time out I think of maybe in an hour or or more maybe it's a couple hours and we match up the metadata ideas with with each other and so we create a a final change set with the geometry combined changed changed simultaneously planets stream makes the map data available for the last sorry the the geometry is available for the last 100 edits just so that if you have the missing mass website you can you can see a map showing the last 100 commits made and where they are and also he keeps track of the trending hash tags so again and missing maps if you go in and 1 at a hashtag you can see list of what's been popular so now we need to calculate the user metrics now that we have these combined change sets so we the repo is called our own stats workers that we use and we use AWS lambda functions in system and the right to believe that I yeah so here's a diagram see the combined change that goes into a Amazon Kinesis strain but which is it's just it's the security you added to the queue and then as change sets are added to the queue that fires off the land of function land the function f you not familiar with it is it's a server was so this is all server was set up so we have a node acts and and that is uploaded to as a lamb function and we don't have to worry about running servers or anything like that and they're invoked every time it change that is added to the string and it scales automatically so if there's a lot in the stream then we'll fire off a lot when the functions and it works very well so we use a RDS database to store these metrics so linear function calculates the metrics on the change that and add that to the database tell you and these are the types of things that are can be right we have the metrics here but also some geometry calculations and to figure out what country things are getting the convex hull of those geometries and adding it to the user's total contributions juncture OK
so widely and in the it is a map of Sun's right are not happening all the time there're we a lot of times there's no activity at all for a particular hashtag or any commits to OpenStreetMap and enduring map of find that can spike and get really high so we didn't wanna run an E C 2 instances of all the time so lambda functions are perfect because when the functions you don't pay for them when they're not doing anything if they're just sitting so you can upload a land of function and it doesn't cost you anything at all and that's very nice it only the costs for how long it runs and how many times so here you see the invocations and the the number of times that the limit function is called over some time period here and you can vary from 0 or maybe a couple from men and up to a hundred per minute and this the land of functions therefore providing a very cheap way to do this I will add that our cost for the lambda functions for this whole project is essentially 0 we don't we're not running a million requests and if you but if you have something that could be serverless set up I would encourage you to really look at linear functions because they're they're actually very very cheap they cost fractions of a cent per I per time around depending on the memory usage that you can figure out very very cost-effective way to to do things if if it makes sense so contribute yeah if you go to the missing maps page review of there's a list of map of funds that are coming up so this is the this is the current map plants coming up so if you have to be injured Republica Belgium later in the next few weeks well then there might be others in your area here's the overall of uh contributions for missing since we started started doing and of course I should point out that this is all open source and you can go to the American Red Cross get have page here and the some stats repo now these are multiple services so there's multiple repose but if you know some stats the read me file lists all repositories that are that we used and that's it thank you
right singular amount Matthew so we have time for a few questions the but I'm curious whether red crosses reaction has been to the project are they getting the kind outcomes they were looking for him good question well the overall yes there's been a variety of technical difficulties uh I think that you know really early in early on in the process we didn't quite realize some some of the some of the technical issues that we would have been trying to do this in real time specifically issues with over past and sometimes going down and that delay and dropped commit minutes so we have had since we started we have had periods of time where we've lost minutes and so we that's why 1 of the reasons why we wanna do historical processing is not only to go back to the beginning at time of missing maps but also to fill in these gaps and so but other than these technical glitches I think and it was it was perhaps maybe ambitious to think that we could reliably have a 100 % up all the time service that would always right so this gap filling periodic gap-filling going back to achieve 100 % by is is what we she also important people is a working for things like retention and building variances is the pool yeah yeah it's it seems to me and I have you lagoda leader pages you see that some people are very active and they have a lot of badges and it's it's it's really pretty cool and especially during map of funds and its need to have it up on the screen and watch the real-time timates coming up like you make a command and then you can go to page and it comes you know within a minute or 2 usually sometimes longer you can you can see that coming no other questions yeah thank you took and that official 1 of the architecture diagrams and your size with SuperPrint just wondering how you generate those there are some people have to get back here and then I can number I can't remember if it was the our the addition shadow sweet to me and an asset that I can't quite remember how that was done marked by a ferret is the 1 who did that and we've used a couple things to make those Amazon has their own architecture that diagram but I wouldn't use that I so can only if you if you tweet I will respond land that there was anyone else the book and the node 6 then was that they have