Logo TIB AV-Portal Logo TIB AV-Portal

Automatic Improvement of Point-of-Interest Tags For Openstreetmap Data

Video in TIB AV-Portal: Automatic Improvement of Point-of-Interest Tags For Openstreetmap Data

Formal Metadata

Automatic Improvement of Point-of-Interest Tags For Openstreetmap Data
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Year
Production Place
Seoul, South Korea

Content Metadata

Subject Area
Geo-search engines and location-based services allow to query for points-of-interest (POIs) in a certain region or next to the current user location. Hereby, search queries often ask for classes ('hotels New York', 'supermarket Berlin', 'Italian restaurant London') rather than single points ('Hotel Belvedere New York'). In OpenStreetMap (OSM), one can specify the basic class along with every POI e.g. via the amenity tag (amenity=fast food), via direct tags (shop=supermarket) or several other specialized tags, as the cuisine tag for restaurants. These tags are mandatory for a certain POI to show up among the search results for a class-based query. Moreover they are useful to categorize search results, e.g. searching for 'Venice beach' should inform the user that there are beaches, hotels, fitness studios and clothing stores with that name. Unfortunately in OSM, there are plenty of POIs where the class is not provided. But many of those POIs exhibit a name tag ('Sunset Hotel', 'Wal Mart') which already contains some information about the respective class. In this paper, we investigate methods for automatic extrapolation of class, amenity and specialized tags solely based on POI names. For example, 'Pizzaria Bella Italia' most certainly indicates an Italian restaurant while 'Tapas Bar' indicates Spanish food. We use machine learning tools to extract for many amenities typical words and phrases that occur in associated name tags and learn respective POI classifiers. For example, learning indicators for 'shop=hairdresser' on German OSM tags led to high scores for 'fris', 'cut', hair' and 'haar'. While 'studio' and 'design' also appeared in many name tags, they are not suitable to distiguish between 'shop=hairdresser' and 'shop=beauty' with the latter including nail spas. For other kinds of POIs as supermarkets or gas stations, names of large chains ('ALDI', 'Aral') showed up as typical indicators. We empirically prove that with the help of our learned classifiers, tags for POIs with unknown class can be extrapolated with high accuracy. For example, amongst all hairdressers 8% were untagged but could be identified by our approach.
Computer animation Student's t-test
Pairwise comparison Computer animation Information Green's function Video game Demoscene
Polar coordinate system State of matter Multiplication sign Execution unit Set (mathematics) Insertion loss Parameter (computer programming) Disk read-and-write head Inference Roundness (object) Forest Endliche Modelltheorie Error message Physical system Thumbnail Curve Arm Sampling (statistics) Special unitary group Instance (computer science) Lattice (order) Price index Variable (mathematics) Complete metric space Sequence Demoscene Hand fan Type theory Category of being Process (computing) Numeral (linguistics) Summierbarkeit Resultant Point (geometry) Probability distribution Computer file Event horizon Theory Wave packet Number Goodness of fit Term (mathematics) Green's function Energy level Binary multiplier Metropolitan area network Distribution (mathematics) Dialect Dependent and independent variables Information Interface (computing) Computer animation Personal digital assistant Calculation Universe (mathematics) Video game Musical ensemble
Group action Length Multiplication sign Execution unit ACID 1 (number) Insertion loss Logic synthesis Dimensional analysis Bit rate Atomic number Videoconferencing Automation Rhombus Physical system Pattern recognition Mapping Electronic mailing list Bit Maxima and minima Type theory Arithmetic mean Data management Summierbarkeit Annihilator (ring theory) Spacetime Probability distribution Point (geometry) Link (knot theory) Open source Complete information Branch (computer science) Student's t-test Event horizon Rule of inference Number Frequency Goodness of fit Distribution (mathematics) Wechselseitige Information Validity (statistics) Information Interface (computing) Mathematical analysis Basis <Mathematik> Computer animation Integrated development environment Personal digital assistant Predicate (grammar) Password Universe (mathematics) Game theory Family Spectrum (functional analysis)
that is not a bad Adam and improvement on the news text of student data and demand and then then also there was no so normally you information that that you don't use
it and some of will not be an issue that there are many is the character the novel in rest of the century in colleges and make a sentence and you
could be restaurants in Seoul it not present in all our and things that we see that this means to and that is the green restaurants in Seoul
and he had nothing to comparison it is not that all of those and ultimately money so you to and creators have access so you will go around and see OK this restaurant National Sea data so that right now but very often it is the life of some of the information I address and maybe opening hours in the scene of the mug everybody the basic idea so uh there's a question about that if you just have the notion that to have it as an information about is that the text maybe you can find them automatically going to look at the me and to illustrate the amount in their
thinking time and then ability to use the name of restaurants already kind of the restaurant and you just guess decrease seen for case there for example if you have unless you got a minimum value is not as well as the to be solution techniques that but it seems to be kind of 2 types of the scene we want to do that you and a lot of the of to life Japanese-German green and so on and so on OK so this is the typical PC make enough that they can have only 1 of so we have to talk about if and when will be talking us and this is the Spanish what's like a payment there around in Canada seems to be but sending tell you about what knowledge in the middle of the food was always contained in this now we have not had limitless 50 style when we think that will experiments and marrying the other just because the man benefits has some information in the most efficient is not really in there and announced that they embody most brands of the nodes and edges medical and in the name doesn't have anything about events in joules ROC curve and the knowledge that we all had thumbs meetings and and rocks and to do so and is that the 1st response numerous instances of and then it will come as strong wasn't going why because we know we can also be right and the same thing you have to touch my again it's really unlikely that some German so nameservice but it's there and the use the In these clues to get from and you can automatically that that information and approach and then you can not adjustment of inference that candidate for thousands of points of interest that sent OK so the question is what kind of text can help to extrapolate the discrete something as an example actually that it works enough about the kind of that's true but just illustrating sample OK that that was then it used to have a model using the text is in the parameters of course they might be new kinds of special food and there might be new tax and of course it's a dynamic thing but for this kind of course but if suggestions what you should and is and and then we look at the dataset Japanese along the tongue and and move the text that you 200 times because if you want to learn something that we have some information already there so if some tax appear in any of enterprise so that a few times inventing thinking that that and then into the and non thousand 500 PCs classes in the training dataset and this offender not only in is suggested that we only have 25 in the example home cake this as much a specific you should use K. good assert there because sometimes present text the page layout something like that of the but we know that all the people who want to be more willing to something that gender being which is not an ethnicity you not a time and they have a lot of people opening and Germans influence influencing the of separating that from names and knowing that the within the with the people you should use regions determined for that many of In some really strange things that we see music I don't know what's supposed to be the head of the wall and a lot of spelling errors in an atom and could take care of that we didn't get to them because normally that enough of the and then some they're really not clear about that and you the result loss not and some which is really rest of the Syrian food and not too many places in Germany so a testament that to OK so not secretion 111 so that what should be the features so that is used as the growth that they're in Texas so units runs in general rather than just the 1st round and then we're just looking at a system that finds an indicated that the sum indicator phrases in the sample that the then of you is the rise of course in a long and slow and then you involved in you know the the human in the on and this all starts is I think so many just looking at the center legal cannot the interfaces that this the at the end level of the to extract and by hand and we do not know if only get them in terms of growth in land that the fans of them so that the man was not a good time questions have can extracted automatically and 2 in the 1st and then variables names the constructed the cake crumbs I'm that's really up to 10 to the premise that that constituted that sequence of the matter the heart of the sun here states the world and the remaining while you're 1 of the sciences and account frequently in this last question completeness not owning 1 there so the people at the time of the month and a half years NYTimes and hence the curriculum significant variants firm contains 2 % of the names similar this city's 200 name stands for this is this will make much sense but if this is really have this works fine so that makes 1 modification because of it the thing on his the the times in all of these so I'm going to need this nothing to write and then this information processing OK so long as in the forest way because they are on the sum of the of the of the here more often than not these the people but in this case here we found concept OK so now we have this information so we have for each crops the increase in engines and so on and so on and even that's reasonable according to those indicating that have the name interfaces for the set of examples in the calculations in times of famine and if you follow and have to percentages in harmony of that the state here kind of 1 to use those to and then that will be construct the feature the meaning the directive instructional of the files with found in the face of the we use the phrase plants of the 6 year and multiply percentage and that is on the and it's not something like models not using it 0 of the really in the article some of the needs the it is really popular involved a sense and all the the universe of mango OK in the in the fact that you can see is that the defendant has entries then there is hope that the people at the real thing but also of way not and then when you rent was about to learn the system and human that use cited then packets and some planet on the used in the way that we don't get a classification right away when we get a probability distribution of the classes that are there were some going in and out of the maybe 90 % of the show that the new president and essentially contains but the with the real maybe from essential because the thing of then get that a distribution of all of them which sum up to 100 % OK and then the value in 2 steps 1st we have a the entity known results and nondecreasing and using some of
the descended from the 1 and the wall and you see the reckons numbers so this is this is this is the long and the arms it pretty well in the sense that make up the difference of in the theory of all of the of the United beings evil and then you have been here the all of these barbecue things which serve both now and this is the correct answer was if I'm not really perfect but this is because it has to decide for 1 of those things and the any place to put in the wall of pair their costumes which don't have anything so many for some of you we close at all many of those classes that that 1 and this makes it and that's a last minute to do
it so what you are going to needed not only to see that the assigned an attack only if the classifier of shown us that we use probability distributions to decide if you can assign it had on up and that 1 of this is that and the reason it is a as the the that essentially is because we think that it is used as kind of an exhaustive search for all of the places that have the thing I'm 1 of those the type of cases of faces jump along 1 of 4 types using the thing that probability of the woman is that assignment and this resulted in in the and then you can think of ethnicity could syntax interest by a lot of the compared to the number of restaurants that the syntax and said about 1500 type seem to have and then you will the restaurant names so maybe uh that that there's a link and and data as this attack you can just put in that interslope that can afford to miss this step in the of the loss of accuracy of 98 % is much higher than in our analysis of the data already known in the information because the only thing that if that about the and see some examples and feature automatically classify them so that the will then so the names and the kind of some 50 50 by admitting beginning it correctly classified because of pedophiles I guess the basis which had diamond then in the classified the burden the Romanians along the you have and I thought well I just wanted to mention you before that there is not that because they're on the thing that Germany like all passwords name work and and to then if you have family in a certain font contains that the universe methods that can that can this is the come and this is your problem because the of these places are classified by the research of food and there's nothing going to fight and come up just the place and you need to study of food and they contain i'd which is also pretty bad stuff and then those students use just have to do something out it and if you go to other countries and and I found that out that is the idea that the student in the class maybe 1 can to this things and extract rules of that so yes that is necessary for the this is the thing that we're not we find an of that was also yeah I think the some of the time and the map and what is the of the thing to be enough places which already have this information contained I can say this invention the beginning and end of the preceding text that all of that and the other thing that had that classifier at the manager restaurant about all of you know nothing nothing in there you assigned 161 new things that do some tricks you because they only was this lack of overlap here said it had to learn at 1st and classifier which is the those those events for them and then the son of a new American chart tactic supermarket bakery name and about 3 thousand 400 mutualism user text that is about 5 % which is nice but not like you should do it in an automated way to put them in all significant because this not be because you fragment have considered incorrect said we to that and then useful thing we to do with our approaches good and that system most of them and create a new type of you know you could say that this is going to make useful suggestions for the money with an environment the approach with the analysis supermarket in which is just place that this stack of faith in the son of people to just this is not a supermarket this is my restaurant for that in a bit of atoms of a permanent because the use because of disposable distribution firms would be that we that we could just suggest that user ship just a few select from but in the government that because of the the the group that would be necessary to have a complete information for this point if you have some suggestions to think that people will be at best a strong incentive to put information on OK so conclusion that we see in the last person of the person contains dimension about 60 to 80 % which is quite a lot and adjusted you and that's the countries is but of course you need to check and then decodes to improved quality could consider additional tax is that the technique that we would needed to distinguish for example restaurants and that's because thus tend to be the open data and so on and then there's the branch that that might be useful and possibilities for the text taxes ascription text that people put that kind of information and it's much harder to pass them to get some interesting information that might be the unit that between certain classes and then we know values of inventory mentioned and if you ever the places like this video it's not cool from the if the In increases the I got of of the year and be this year that With the did not add in an automated way meaning and use this these guidelines for you to do automated edits from seeing the data that united not recomended so if integral be better to have a some even although they are then get an additional thing because we don't want that wrong data I mean the percentage for the synthesis we more I think the nominal rate might be even larger and that's the you 1 the long of the tested it on the side but yet if the community will check and validate technologies things of course is not found that if the Community could just check at so we would just that model and for the suggestions and you could say yes this is correct and think this would have for you what it you the of the just where you have this 1 yeah so yeah somebody should look at them differently so all of this is what on and and of this I can't use that is the list of the incan all and have a look this and this this you all know the other is the sum of this new movement and could make some kind of game with something like that and includes the appointment of concluded that it would be nice and and this is where the yeah yeah didn't exercise you and you want to have if all the company is you need to what 1 of the things we don't know what they think about the kind of thing here what can the so that the reader with it and the world of the real world this is you know so that you can find of it so the law of and on the at the end of the but that you know if this a substring occurs exactly the same number of some not just think he just dismiss the smaller 1 but as we just about them so if the acid that appears more often than peter out there that the can just both so you can get them to the full information because it is a constant you just the same as previous but then the and is completely different and so it just about you no interest to the maximum that you have found that to say in frequency so that they could do was shown which contains information on synthetic continues you the I agree the no meaningful semantic interface with a fixed length because the number of letters of OPS at that it's forming it's nothing to choose but we just use it is not apparent so that know and if it isn't that because there are women is an doesn't to match so studious and this is what we might the the recognition of the so OK but when I think this was also the paper and
people that computes the again because of its 3 letters but notes for that is because the space itself and it can be so it for characters spectrum what she's 3 yeah if you use to and then along with all yeah the places where we have like this that we so this is the demands of the reason is that liberalization of things there are only a few short names so if you start with some some places didn't show up so it places adjustment you know something like that so you have to know that the instruments and the ones that we you have once been balanced number this moment of around that need the tested with different trade-offs and came up this and it might be better if maybe you should learn it also vectors that fast
I'm I'm