We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

OSM Stats: Rewarding contributors and real-time tracking of OSM

00:00

Formale Metadaten

Titel
OSM Stats: Rewarding contributors and real-time tracking of OSM
Serientitel
Teil
181
Anzahl der Teile
193
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Mapathons are an increasingly effective way to get data into OpenStreetMap. The Missing Maps project hosts mapathons to increase the amount of data in areas that don't have large local OSM communities. The American Red Cross and Development Seed have built an analytics platform that tracks user trends in real-time and rewards contributors for their efforts, as can be seen at missingmaps.org OSM-stats tracks user's activity, consistency and relative reputation, reporting detailed metrics and awarding a variety of themed badges based on the type and magnitude of contributions. Badges range from simple tasks ("Add 4 roads") to challenging ("Map in 10 countries"). Leaderboard pages display up to date detail on the most active users for a current project, while hashtag groupings display statistics to be separated out, allowing tracking of groups. A map of each users commits can be seen, as can a map view indicating the last 100 changes. Most of the contributions for the Missing Maps project occur during mapathons where hundreds of volunteers submit edits and additions over a couple of hours. This means that the system needs to handle large spikes of activity when thousands of edits are added. We deployed the OSM-stats components using AWS Lambda functions and Kinesis streams. These scale very well to meet the needs of Mapathons and incur minimal cost when not in use.
Schlagwörter
78
Vorschaubild
51:51
154
Vorschaubild
35:04
DreiFächer <Mathematik>StatistikWeb-SeiteVarianzWhiteboardMapping <Computergraphik>SinusfunktionHash-AlgorithmusMathematikMetropolitan area networkKommutativgesetzEliminationsverfahrenProgrammEreignishorizontGruppenoperationInteraktives FernsehenTotal <Mathematik>EmulationBootenVorzeichen <Mathematik>MehrwertnetzComputerunterstützte ÜbersetzungZahlenbereichVererbungshierarchieDatensatzGebäude <Mathematik>Textur-MappingGammafunktionTaskQuadratzahlInklusion <Mathematik>DatenfeldLokales MinimumPunktKonvexe HülleMetrisches SystemGebäude <Mathematik>StatistikVarietät <Mathematik>RangstatistikZahlenbereichProjektive EbeneTextur-MappingTotal <Mathematik>Mapping <Computergraphik>SchnittmengeGruppe <Mathematik>ElementargeometrieMetadatenQuick-SortWeb-SeiteMechanismus-Design-TheorieMathematikSoftwareentwicklerEreignishorizontEchtzeitsystemMultiplikationsoperatorFlächeninhaltDialektWeb SiteBitGruppenoperationBeamerTexteditorOrdnung <Mathematik>TermEinsZusammenhängender GraphAffine VarietätHomepagePhysikalisches SystemWellenpaketArithmetische FolgeDatenfeldRechter WinkelKette <Mathematik>FontPixelKonvexe MengeWhiteboardVererbungshierarchieDatenverwaltungMatchingWiderspruchsfreiheitBitrate
Immersion <Topologie>Physikalisches SystemReelle ZahlARM <Computerarchitektur>RechenbuchLambda-KalkülLokales MinimumTVD-VerfahrenSchnittmengeWurm <Informatik>StörungstheorieMetropolitan area networkSpezielle unitäre GruppeFrequenzMapping <Computergraphik>InstantiierungLambda-KalkülMultiplikationsoperatorMathematikSchnittmengeMetadatenElementargeometrieImmersion <Topologie>Differenz <Mathematik>DatenbankMetrisches SystemOpen SourceMailing-ListeZentrische StreckungRepository <Informatik>DiagrammFunktionalServerKonvexe HülleStreaming <Kommunikationstechnik>Varietät <Mathematik>Weg <Topologie>SchaltnetzHomepageTypentheorieElektronische PublikationHalbleiterspeicherStatistikBruchrechnungOrdnung <Mathematik>WarteschlangeZahlenbereichProjektive EbeneOffene MengeApp <Programm>DokumentenserverPhysikalisches SystemTextur-MappingRechenbuchInverser LimesRuhmasseStapeldateiWeb SiteMatchingComputersicherheitStrömungsrichtungRechter WinkelDienst <Informatik>LinearisierungMultiplikationZeichenketteFigurierte ZahlFlächeninhaltFormale GrammatikHash-AlgorithmusMenütechnikComputeranimation
MultiplikationsoperatorEchtzeitsystemWeb-SeiteSingularität <Mathematik>Prozess <Informatik>Textur-MappingVarietät <Mathematik>FrequenzDienst <Informatik>VarianzProjektive EbeneAbschattungMapping <Computergraphik>ZahlenbereichComputerarchitekturGebäude <Mathematik>DiagrammAdditionTwitter <Softwareplattform>Güte der AnpassungBitRechenschieberInklusion <Mathematik>HomepageVorlesung/KonferenzBesprechung/Interview
ICC-GruppeComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hello, welcome. Thanks for coming. My name is Matthew Hanson. I'm with Development Seed in Washington, D.C. I'm going to talk about a project that we did for the American Red Cross.
You're probably all familiar with the Missing Maps project. Is anyone not familiar with the Missing Maps project? Okay, so Missing Maps is a project sponsored by the Red Cross to encourage mapping of areas in need, especially after disasters. So
the missingmaps.org website We worked on the project to redesign the website. So if you go to that site now, that's a new site from last year. It's actually been up for maybe six months now or so. And our original goal was to not only redesign the website, but Red Cross wanted user pages
showing people's statistics and what they've committed into OpenStreetMap as well as statistics on those commits and some sort of reward mechanism and related to that leaderboard showing
the ranks of users and groups and they wanted this to happen in real time. So the Missing Maps project sponsors mapathons. So mapathons are where everybody they gather together for an hour, maybe two hours. It's maybe some people haven't used OpenStreetMap before and
they use the perhaps undergo a short training session and then they have a targeted area where everybody jumps on. So this is like maybe 70 users, maybe a lot fewer, maybe some more for really large ones and they map that region.
So the real-time component is to so you can show up on a projector the real-time contributions over time during these mapathons. So this comes down to tracking commits. So that's what we need to do is we need to take the commits and
and track them and a commit in OpenStreetMap is called a change set and this is made up of metadata and the data. So if you're familiar with OpenStreetMap you might go here and look at some of the details on a particular change set and this has metadata and the data
included in it. This is the geometry and this is the metadata that's published every minute. Now, I'm going to get into the details of the real-time system in a little bit because there's actually
in order to do this in real time, the geometry isn't actually available with the metadata and so it's a little bit more complicated. But if you notice in the change set we have hashtags. So hashtags are how we form communities in in missing maps or in fact other projects. When these mapathons happen or maybe outside of mapathons
people who make commits can add hashtags to their commits and then we can track those. So hashtags are spatially unbounded. They track groups and events.
The biggest one is the missing maps hashtag that the Red Cross was particularly interested in. But you could put as many hashtags as you want. So for a particular mapathon or for a particular project, you might have a hashtag and the editor that you use can be configured to just automatically add those hashtags
every time that you make a commit. So we have map time, my awesome hashtag, whatever it is that you want to add. So this brings us to leaderboards. So with these hashtags in place
we can have leaderboards where we can look at the total commits for any specific hashtag as well as the users. And so what you see here is this is interactive and so you can add any specific hashtag that you want. If you go to this page, it will default to the one on the left is the missing maps
hashtag, but I've added a hot OSM one and a map time event one and so you can see that in a mapathon, let's say, you could have two groups and one group uses one hashtag, another one uses another. They both are using maybe some common one and you could have
some sort of, you know, competition in the mapathon and see who's doing more commits, you know, left side, right side, bald people, those with hair. This is the total number of
edits that have been made since maybe in the last six or so months when we started tracking. We would like to in the future go back and add historical data so we can go back all the way to the beginning of the Missing Maps project, but right now this is, it's just since we started. And then the leaderboard show the users and
right now this is being sorted by the total number of edits, but you can sort by buildings or the kilometers of roads or or the, you know, any of the fields here. So you see we have RivW who has been on the top.
This has been, if you check every once in a while, it's usually these top five people there because they're sort of bouncing around. I don't know if they're actively trying to get on top or but right now RivW here is the top one. Now if you have
made a commit to OpenStreetMap in the last several months and you have added the Missing Maps hashtag, then you already have a user page automatically. This doesn't have to be added. It's automatically added. So those user pages, each particular user can go to a specific page, and so here we have
RivW. It shows a variety of metrics on his contributions and total edits and that sort of stuff, the hashtags that are used, as well as you see these badges. And
also, we have a contribution timeline and a map showing the regions. Now he's clearly focused right here in South Africa, but other people might be, other people actually bounce all over the place. This is a, we actually save the convex hull of a commit and then combine that with the previous convex hulls.
So we're not storing all of the geometry, just the approximate region. And also the countries will map those geometries to what countries they're in, so we can we can track the countries that are mapped as well. And
now you can look at the badges that RivW has earned, and there's a variety of them. They've been, of course, been very active. And this is all original artwork made for this for this project and with clever names. This is, the illustrations are all done by our Dylan
Moriarty at Developancy. And down below on that same page, you have your upcoming badges. So, and you can see that there's a progress there. So we've got whitewater rafting, that's mapping of waterways. There's really a quite a number of badges that you can earn.
And here's some examples, a little pixelated, but. Okay, so why rewards? Some people will be like, well, that's silly. We originally, this project was called OSM Gamification, but internally we didn't really like that term.
This, sometimes there's a negative connotation to gamification, or maybe it's a buzzword that's been perhaps used too frequently of late. Well, rewards provide a few different things. First off, there's an immersion in the mapping experience. You make commits,
and it's not just about making the commits, and going into a black hole, and then you can see them amidst all the other commits on OpenStreetMap. You can go to your stats page, and you can see exactly what you've done. So it gives a,
the statistics for what you've done, I think are very useful, and most people would be interested in that. People are after different things. Some people might not care about any of these things. Some people might be care, maybe care about a few of these. So you get a sense of achievement when you earn badges, and you strive to maybe get the next badge. So this increases retention.
It also can encourage cooperation. Like I mentioned before, you could have teams, and teams can cooperate in order to perhaps win over the other team during a mapathon. And of course the competition inherent in that. So how did we do this? We
implemented this real-time system using largely micro services, and there's a diagram here, and I'll just talk about each of the pieces. This is all implemented on AWS.
So the first thing that we need to do is we need to stream the real-time data. So OpenStreetMap makes the metadata available from planet.osm.org, and these diff files are published every minute, usually. Every minute there's a new file added, and it's all the commits that happen in the last minute.
But it doesn't include any of the geometries. So these are available. We could replicate the OpenStreetMap data ourselves, but it changes constantly. So we use the Overpass API. So Overpass
essentially replicates the OpenStreetMap database and makes the geometries available for the last minute for all the commits. Well, now these have to be matched up. So we have a node app called PlanetStream, and PlanetStream
takes in the change set metadata from OSM and the augmented, what is called the augmented diffs from the Overpass API, and has a Redis instance running and puts them in the Redis instance because sometimes these don't match up. You can't just take the change sets for one particular minute and
the geometry from the same minute because there might be a delay for a variety of reasons. So we put these in a Redis instance and have a timeout, I think, of maybe an hour or more. Maybe it's a couple hours. And we match up the metadata IDs with each other,
and so we create a final change set with the geometry. Combined change set there. Simultaneously, PlanetStream makes the map data available for the last, sorry, the the geometries available for the last hundred edits just so that if you go to the Missing Maps website, you can see a map showing the last hundred
commits made and where they are. And also it keeps track of the trending hashtags. So again, at Missing Maps, if you go in and want to add a hashtag, you can see a list of what's been popular recently.
So now we need to calculate the user metrics now that we have these combined change sets. So the repo is called omstatsworkers that we use, and we use AWS Lambda functions and Kinesis Streams, and I believe that I
Yes, here we go. So here's the diagram. You see the combined change set goes into a Amazon Kinesis Stream which is, it's just, it's a queue. You add it to the queue, and then as change sets are added to the queue, that fires off a Lambda function.
A Lambda function, if you're not familiar with it, it's a serverless, so this is all a serverless setup. So we have a node app, and that is uploaded to, as a Lambda function, and we don't have to worry about running servers or anything like that, and
they're invoked every time a change set is added to the stream. And it scales automatically, so if there's a lot in the stream, then it'll fire off a lot of Lambda functions, and it works very well. We use a RDS database to store these metrics, so the Lambda function
calculates some metrics on the change set and adds that to the database. Oh yes, and these are the types of things that are calculated, right? We have the metrics here, but also some geometry calculations to figure out what country things are in, getting the convex hull of those geometries and adding it to the user's total
contributions, geometries. Okay, so why Lambda? The Mapathons, right, are not happening all the time. They're, a lot of times there's no activity at all for a particular hashtag or
any commits to open street map, and then during a Mapathon that can spike and get really high. So we didn't want to run an EC2 instance all the time, so Lambda functions are perfect because Lambda functions, you don't pay for them, when they're not doing anything, if they're just sitting there. So you can upload a Lambda function
and it doesn't cost you anything at all. And that's very nice. It only costs for how long it runs and how many times it runs. So here you see the invocations and the number of times that the Lambda function is called over some time period here, and it can vary from
zero or maybe a couple per minute up to 100 per minute. And the Lambda functions therefore provide a very cheap way to do this. I will add that our cost for the Lambda functions for this whole project is essentially zero.
We don't, we're not running a million requests. If you, if you have something that could be a serverless setup, I would encourage you to really look at Lambda functions because they're actually very, very cheap. They cost fractions of a cent per time they're run depending on the memory usage that you can figure.
They're a very cost-effective way to do things if it makes sense. So, contribute. If you go to the missing maps page, we will, there's a list of mapathons that are coming up. So this is the, this is the current mapathons coming up.
So if you happen to be in the Czech Republic or Belgium later in the next few weeks, and there might be others in your area. Here's the overall contributions for missing maps since we started, started doing this.
And, of course, I should point out that this is all open source and you can go to the American Red Cross GitHub page here and the OSM stats repo. Now these are multiple services, so there's multiple repos, but if you go to OSM stats, the readme file
lists all the repositories that are, that we use for this. And that's it. Thank you. Okay, thank you very much Matthew. So we have time for a few questions.
I'm curious what Red Cross's reaction has been to the project. Are they getting the kind of outcomes they were looking for? Yeah, good question. Well overall, yes, there's been a variety of technical difficulties. I think that, you know, early and early on in the process
we didn't quite realize some of the, some of the technical issues that we would have in trying to do this in real time. Specifically issues with overpass, sometimes going down and that delay and dropped commits.
So we have had, since we started, we have had periods of time where we've lost commits and so we, that's why, one of the reasons why we want to do historical processing is not only to go back to the beginning of time of missing maps, but also to fill in these gaps. So, but other than these technical glitches, I think
it was, it was perhaps maybe ambitious to think that we could reliably have a 100% up all the time service that would always run. So this gap filling, periodic gap filling going back to achieve a hundred percent inclusion is what we feel is important now.
Is it working for things like retention and building the user pool? Yeah, yeah, it seems to and if you go to the leader pages you see that some people are very active and they have a lot of badges and
it's really pretty cool and especially during mapathons it's neat to have it up on a screen and watch the real-time commits coming up. Like you make a commit and then you can go to your page and it comes, you know, within a minute or two, usually sometimes longer you can see that coming in.
Another question. Yeah, thank you for your talk. A bit of a superficial one. The architectural diagrams in your slides were super pretty. I was just wondering how you generated those.
I'll have to get back to you on that. I can't remember. I can't remember if it was the I don't know, shout out a tweet to me and ask. I can't quite remember how that was done. Mark Farah is the one who did that and we've used a couple things to make those. Amazon has their own
architecture diagram, but we didn't use that. So if you give me a tweet, I will respond and let you know. Okay, will do. Anyone else? No? Okay, then thanks again, Mark.
Thank you.