AV-Portal 3.23.2 (82e6d442014116effb30fa56eb6dcabdede8ee7f)

Image Geocoding as a Service

Video in TIB AV-Portal: Image Geocoding as a Service

Formal Metadata

Image Geocoding as a Service
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Year
Production Place
Seoul, South Korea

Content Metadata

Subject Area
Driven by the ambition of a global geocoding solution, in this paper we present the architecture of an image geocoding service. It takes advantage of the ubiquity of cameras, that are present in almost all smartphones. It is an inexpensive sensor yet powerful, that can be used to provide precise location and orientation. This geocoding service provides an API similar to existing ones for place names and addresses, like Google Geocoding API. Instead of a text based query, images can be submitted to estimate the location and orientation of the user. Developers can use this new API, keeping almost all the existing code already used for other geocoding APIs. Behind the scenes, image features are extracted from the submitted photograph, and compared against a huge database of georeferenced models. These models were constructed using structure from motion (SFM) techniques, and heavily reduced to a representative set of all information using Synthetic Views. Our preliminary results shows that the pose estimation of the majority of the images submitted to our geocoding was successfully computed (more than 60%) with the mean positional error around 2 meters. With this service, an inexpensive outdoor/indoor location service can be provided, for example, for urban environments, where GPS fails.
Presentation of a group Process (computing) Computer animation Logic Service-oriented architecture
Dataflow Orientation (vector space) Projective plane Frame problem Machine vision Number Uniform resource locator Estimator Digital photography Process (computing) Computer animation Videoconferencing Position operator
Computer animation Open set
Tablet computer Computer animation Personal digital assistant Universe (mathematics) Traffic reporting Position operator
Degree (graph theory) Building Computer animation Computer configuration Orientation (vector space) Multiplication sign
Point (geometry) Service (economics) Confidence interval Code Orientation (vector space) View (database) ACID Open set Parameter (computer programming) Mereology Stack (abstract data type) Graph coloring IP address Estimator Computer configuration String (computer science) Entropie <Informationstheorie> Cuboid Endliche Modelltheorie Data structure Position operator Form (programming) Dependent and independent variables Mapping Software developer Plastikkarte Sphere Demoscene Uniform resource locator Digital photography Computer animation Software Visualization (computer graphics) Point cloud Service-oriented architecture Geometry
Point (geometry) Building Presentation of a group Group action Copula (linguistics) View (database) Sheaf (mathematics) Estimator Mechanism design Different (Kate Ryan album) Data structure Position operator Form (programming) Social class Physical system Covering space Algorithm Information Maxima and minima Subject indexing Database normalization Uniform resource locator Process (computing) Computer animation Network topology Search algorithm Point cloud Website Metric system
Pulse (signal processing) Building Presentation of a group Group action Java applet State of matter Multiplication sign View (database) Decimal Open set Mereology Neuroinformatik Formal language Estimator Mathematics Sign (mathematics) Matrix (mathematics) Algebra Position operator Descriptive statistics Physical system Algorithm Logarithm Software developer Electronic mailing list Benchmark Flow separation Data model Process (computing) Repository (publishing) output Resultant Spacetime Thomas Bayes Point (geometry) Computer file Real number Temporal logic Student's t-test Machine vision Scalability 2 (number) Revision control Data structure Matching (graph theory) Scaling (geometry) Chemical equation Projective plane Mathematical analysis Computer animation Network topology Point cloud Object (grammar) Pressure Library (computing)
Point (geometry) State observer Presentation of a group Service (economics) Transformation (genetics) Decision theory Set (mathematics) Formal language Mathematics Lecture/Conference Different (Kate Ryan album) Hypermedia Energy level Endliche Modelltheorie Descriptive statistics Area Algorithm Software developer Mathematical analysis Line (geometry) Flow separation Arithmetic mean Process (computing) Computer animation Integrated development environment Software Personal digital assistant Universe (mathematics) Right angle Resultant Spacetime
thank you very much like the manner thanks for
sharing the this session I'm coming from Portugal is almost the same logic to have and I'm really enjoying the conference and already solved the intestine gaining spine presentations I will talk about he misses geocoding and and I'll try to explain all the overall process behind these services the was is possible try to to explain it
in why do we are so interesting in images because in fact the people use a lot of images people are taking pictures here which is good but they are using the camera as the sensor and I think it's the most use of self so that we carry with us today is the the camera so if you see that the number of features of flow live thing per they are about 1 . 8 billion pictures uploaded every day so we have lots of pictures so what we like to do we like to use the images to xt make the priests precise location of the user for example and this lady took a picture and I would like to use to tell her where she is in this room precise position and orientation and the other methods like GPS are not able to provide the orientation for example you could say I'm here but you have to walk and to to move around to find your orientation and the and even as you our vision tool to find where ourselves are so use our vision to competed to estimate our opposition and in in this project we tried to do these geocoding in less than a 2nd so it is there not a bad user experience to upload the photo and have the the position estimation less of a 2nd we also tried to use the camera In continues to the streaming there's the view and for each 2nd we extract the frame and do the processing based on the frame per 2nd and we are able to do that and it's was a requirement the
and I am who by explain it but I have a small video here we really small the which
is the usage of these geocoding
as any other geocoding and in
this example we change open trip planner and instead of this the the possibility of introducing coordinates are a street name and we can use that our camera and we
Apollo the 18 a report we take a picture with their mobile phones tablets was established in this case this is the inside our university campus we already prepared information for
every world and as you can see there serotiny immediately their precision their position inside the
building sorry inside the building and the the precise orientation of the
user just to faster the but it doesn't say the 1st option last time and 180 degrees to the to move to the opposite side of so
we have uh from that the the user only sees these from and is found in this use all geocoding service the it gets the requesting a response as an acid which is the response and for the user or a form of a vowel point of view is just the just the the geocoding therapies and it provides an API behind the scenes on the backhand we divide our Our self-training tool models the the first one is related to that of ozone and orientation estimation and to be able to compute the position we have used that other ways of for involved models so everything was fueled previously use that tho we took images to create is thank you to create these models serve I to to talk about this a few different parts of the software the geocoding and API it's uh an API is very similar with the uh jokily API as smart box mom where will lapse and Yahoo and you name it there are lots of you calling API as all these API uses strings to compute to the make the position so it is and can so our
API is almost the same with the the idea is that from a developed point of view you don't need to change your code you just use this API as API eyes for the request we use an additional parameter which is the image it's color photograph so it will be easy users to upload the image the disorder that can have our cannot have xt stacks the probably if you are using the services you don't have the location but you might have been and we use an optional parameter scholar region that despair and for example is the use of well the Maps API and for this region retraction tools submit to the I P the IP address of the device building the image because if you know the IP can geocoding DIP and limits the the uh the scope of the answer so might include the answers with these that and the IP address but it's uh an optional parameter the answer as uh structures fuel the columns geometry 3 D and this geometries filled with provided the location as a 3 D point and we provide also it uh regarding the ground the ground the orientation the and also the pitch to know if the camera is pointing up are pointing downward pointing forward emotional and additional sphere of the related to the confidence level of our a geocoding there visual opening services uh their response several their positions or might have several positions and the 1 the by score so uh we just mentioned the visible part of this the API and all this runs in the background before up he talking about disposes stimulation I will talk about the the that they're point cloud models that may generate so it's easier I think to understand so before so now able to
estimate the position what we do for and news site for and for this conference we need to take pictures lots of pictures and is a structure from motion algorithm to create a point clouds as a matter of so you might not figure well on the outside the point clouds and also the pictures taken our related Mr. the copula the size position of those images and created a huge by clouds and we use this class of algorithms for section for motion and afterwords also work is related to the is trying to reduce this point cloud as you imagine breaking point class for all the world ascribed to complicated and and you get the hinge point called so in several steps to try to reduce this point clouds so the 1st step is just uh feels some redundancy and afterwords who use technique Collins synthetic views that is his staffers unfairly is room I can take a form ample 16 pictures of newsroom and with these 16 pictures I have all been whereas all the details of his room but I could have all the details and let the last pictures this synthetic views I can pictures there's um as an an official picture that we can get to that is the lumen set of views that I can use to to after was georeferenced is room present with the for image 1 for each of these well as I could do the job of the our referencing and we just for pictures so instead of having a point cloud based on 16 20 images of having a point cloud based on just for images the same for outside building 7 many picture of the building all creating that views of its 1st save sometimes I mean think that use it with 2 4 thirds and so on but it's it's a way to compute the minimum said of use that dying is June estimate afterward the position the this is the only information that to want to tune the this but to be able to have a very fast the search algorithm we to reuse use of a cover letter tree and all those features in this the accused i indexes in his the calorimetry so the search mechanism is quite fast in this vocabulary tree so this previous step this offline step of creating new synthetic views and is going retreat we have to do that prior to the to be able to estimate the the image position so the regarding the the pose
estimation the as I said in the group is saying that it views and a new image that to we need to find out the the location is related to the image that we already have and for by 2 2 images yesterday uh with the in different positions in front of the questions that will find that that people around and so on but even though I was able to create and they have features that matches in those 2 through images so it's if I have the permits of 1 image I can use a social mission metrics to compute or to estimate with the other images so the thing in in our system will use this thing that can use tools is similar to the position of a new
images so this is the and will will talk about this model so to the made the position we have a new image that someone uploads we extract the features from that the images this time more well known technique of feature extraction from images and with this features we go to the into the Volker where the tree to find the similar features already in our in our synthetic use and the opposite of this will be uh a list of seed that views that have some features matching the features extracted from the images so we tried to do a next step to match our of image with those seen that can be used to see which is the synthetic views that best match the input damages for those that we have a match we compute this decimation matrix and we are able to With this estimation matrix we are able to estimate the the position of the uh the image so an having no files in doing to show that the learning process it's quite complicated and the has this concept of computer vision and so on but we no way this and this is a considered a lot of steps we know a lot of structure and uh complicates after aspect to the world is and the today the are able and we were able
to to do its fora for several for in there not spaces so we have a very lousy repository online it's it's real food softer had at but I would say it not completely open because to be completely open needs should be about development but I think it there's some documentation around but uh my yeah at my purpose of this talk is try to find the fact that we all people into singing in this kind of logarithms and in this kind of processing tool truly pressure student to write better recommendation that states that the heat and so on balance take from the temporal from the bucket to get the we we are still working on this project well and try and other techniques instead of 4 some synthetic views which I believe does its features and our technique to her reduce the amount of the point clouds uh the results well aware not so good we have some benchmarks comparing the 2 but where are open to 2 other ideas to make this uh geocoding works we are the tagging all the images with semantic you descriptions instead of just trying to extract features through trying to extract objects so I can say these pictures has some some sky some buildings some persons cars signs and so on to enrich the and about the bayes also are working on improving the scalability scalability of the lower overall process because we like to do this uh at that scale so this there's some problems and to try to do this in their world scale now also trying to move everything to PostgreSQL of all Daytona about the bayes already you are able to cluster the point features and so on this and then we also need to develop a point cloud algebra to manipulate to this point clouds I would like to see if this part of the point already exists I should like to to work with the historical data for example city center is changing there is a new building there is a new shopping even inside the mall shopping scan change and so on and for example if I'm trying to geo-reference or geocoding and all the image I would like to use unlabeled point cloud The related with that space so we probably need to keep several versions are a several temporal point clouds in our system and so on but as I said that the idea of this presentation is to see if they are someone else into singing these algorithms that we can work together I have um I'm running edges lab in my university but since 2 years ago we created the groupings are are a lot just to to do computer vision algorithms because there's so many images and so many projects that using images that we need and me feel that we need to improve our skills related computer vision that's where and doing this so thank you very much if you young men there is a yes I in the kind of uh and uh programming language you're using for a phrase the simplest pulse simple and could get there and it was no later to than the the the requirements performance to be able to do this in less than a 2nd and we use open cv library Open CV libraries written in C + + can visiting Java or another languages even in in Python and so on but it's fostering in C + + OK as was the looking for speed so that's 1 yesterday because they have found this clearly if you were able to 5 seconds 10 seconds to ever uh that is the position of analysis of you could you could started in our I did you could do it another language and when it's working you could what we write it to another language for yes yes you're solution and things in the world and doing this as natural as possible so this is
steps II decomposes steps because the steps corresponding to software levels analysis and so it we show the world we can replace this this models with the present is easy image features extraction again looking in any language so he afterwards you never uh a set of descriptors the scriptures of the features so any language works so the 8 on open so the has many I wouldn't uncanniest detector lines of transformation and and many other algorithms and which I would have to use for image feature extraction and and that that that's a good question and the we use several algorithms and for example it's different than the the image feature extraction is different in outer space is an indoor spaces for example as a set for example that there are feature X churches there mostostal to to differences and so on but the meaning those spaces there as our almost equal university everywhere but cleaning out their spaces we have a lot more uh must so the if I can send it to our full paper you can see that for different the the feature extraction algorithms and have different results from their not forever but basic defined area taught me use differences which a lot of this this couple I will and I was just wondering that the how does reacts with the dynamic and mean use for our your environment that is changing could you see a lot uh looking at the the choose the 1st something has changed him if you're myself has and there was this is this yes there there are several changes in the environment that that do not affect the the algorithm but so that they are at the end of it depends it depends so is dynamic and his concept to use the meiji cheese is that the user asked to locate the to still and then extract New Europe models he in all I could use a condom observers yes yes no we can't send you a major US you OK you look at me take a major stakeholder for review the neural point about this right uh users to contribute to the images to beauty at the bank of yes yes so I am asking you know within a year of very Mageed and then if you have enough images of this are you can recreates the founding yes systems and that the the last 1 that unless the user about the do you have any policy in both right so epitome shows a license for the user when they have a role to the majors that mean they give you the right to use the media how we use them and what is the restriction and all these things do you think about these well in what did think but there's think about it that the process is testing and development stage but that's why I'm if we stop there for us it's a more people and so on when you will be right there's something about the things more legal terms than that I think that the major decision if he's if you use the images or we don't use the images because we can do this uh geocoding and without using the images and even without storing the image from the user so there is no problem regarding the user right so we just use the English extract the features and it without starting and we also tried to extract the features on the mobile phone for example instead of uploading the image we just do this feature extraction on a mobile phone and we only get below their the features descriptions to the two-hour service so in this way we don't see the image of the users just areas have the features so and we answer our asses is based on the features so in this case there is no you should relate to the their personal rights are explaining features so people can send features from everywhere you and we don't see thank you very much we'll get further and you