We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

An On-board Visual-based Attitude Estimation System For Unmanned Aerial Vehicle Mapping

00:00

Formal Metadata

Title
An On-board Visual-based Attitude Estimation System For Unmanned Aerial Vehicle Mapping
Title of Series
Number of Parts
183
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer
Production Year2015
Production PlaceSeoul, South Korea

Content Metadata

Subject Area
Genre
Abstract
A visual-based attitude estimation system aims to utilize an on-board camera to estimate the pose of the platform by using salient image features rather than additional hardware such as gyroscope. One of the notable achievements in this approach is on-camera self-calibration [1-4] which has been widely used in the modern digital cameras. Attitude/pose information is one of the crucial requirements for the transformation of 2-dimensional (2D) image coordinates to 3-dimensional (3D) real-world coordinates [3]. In photogrammetry and machine vision, the use of camera’s pose is essential for modeling tasks such as photo modeling [5-8] and 3D mapping [9]. Commercial software packages are now available for such tasks, however, they are only good for off-board image processing which does not have any computing or processing constraints. Unmanned Aerial Vehicles (UAVs) and any other airborne platforms impose several constraints to attitude estimation. Currently, Inertial Measurement Units (IMUs) are widely used in unmanned aircrafts. Although IMUs are very effective, this conventional attitude estimation approach adds up the aircraft’s payload significantly [10]. Hence, a visual-based attitude estimation system is more appropriate for UAV mapping. Different types of approaches to visual-based attitude estimation have been proposed in [10-14]. This study aims to integrate optical flow and a keypoints detector of overlapped images for on-board attitude estimation and camera-self calibration. This is to minimize the computation burden that can be caused by the optical flow, and to fit in on-board visual-based attitude estimation and camera calibration. A series of performance tests have been conducted on selected keypoints detectors, and the results are evaluated to identify the best detector for the proposed visual-based attitude estimation system. The proposed on-board visual-based attitude estimation system is designed to use visual information from overlapped images to measure the platform’s egomotion, and estimate the attitude from the visual motion. Optical flow computation could be expensive depending on the approach [15]. Our goal is to reduce the computation burden at the start of the processing by minimizing the aerial images to the regions of upmost important. This requires an integration of optical flow with salient feature detection and matching. Our proposed system strictly follows the UAV’s on-board processing requirements [16]. Thus, the suitability of salient feature detectors for the system needs to be investigated. Performances of various keypoints detectors have been evaluated in terms of detection, time to complete and matching capabilities. A set of 249 aerial images acquired from a fixed wing UAV have been tested. The test results show that the best keypoints detector to be integrated in our proposed system is the Speeded Up Robust Feature (SURF) detector, given that Sum of Absolute Differences (SAD) matching metric is used to identify the matching points. It was found that the time taken for SURF to complete the detection and matching process is, although not the fastest, relatively small. SURF is also able to provide sufficient numbers of salient feature points in each detection without sacrificing the computation time.
126
Medical imagingElectronic data processingVelocityOrder (biology)Combinational logicTransformation (genetics)ComputerOpticsSemiconductor memoryModulare ProgrammierungMatrix (mathematics)Vector spaceReliefSeries (mathematics)Complex (psychology)Orientation (vector space)TheoryPhysical systemProjective planeSpline (mathematics)Sampling (statistics)NumberDataflowRevision controlNichtlineares GleichungssystemMeasurementPlanningPresentation of a groupField (computer science)Point (geometry)Partition (number theory)VotingPixelFocus (optics)EstimatorFile formatEuler anglesEvent horizonBit ratePoint cloudEndliche ModelltheorieKey (cryptography)Computing platformObject (grammar)ArmMultiplication signActive contour modelMappingVirtualizationPattern language1 (number)AlgorithmSoftwareStatisticsLimit (category theory)AngleNormal (geometry)Translation (relic)Complete metric spaceDirection (geometry)BackupAuthorizationFreewarePosition operatorComputer animation
Medical imagingEigenvalues and eigenvectorsStatisticsNetwork topologyTransformation (genetics)Type theoryInvariant (mathematics)AverageINTEGRALArithmetic meanForcing (mathematics)Maxima and minimaMereologyMetric systemPhysical systemCore dumpSpiralSampling (statistics)Term (mathematics)NumberScaling (geometry)Goodness of fitMatching (graph theory)Range (statistics)Connectivity (graph theory)SummierbarkeitPoint (geometry)Absolute valueDistortion (mathematics)Set (mathematics)Roundness (object)Internet service providerFocus (optics)Frame problemGreatest elementBit rateKey (cryptography)40 (number)Different (Kate Ryan album)Computing platformMultiplication sign2 (number)ForestRight angleAlgorithmLimit (category theory)ResultantReduced instruction set computingStability theoryCorrespondence (mathematics)Traffic reportingExtreme programmingPosition operatorComputer animation
AlgorithmMedical imagingInformationOrder (biology)StatisticsFunction (mathematics)System identificationAverageSoftware testingDialectFlow separationMathematical optimizationAxiom of choiceForm (programming)Fourier seriesLine (geometry)Limit (category theory)Moment (mathematics)Physical systemPolygon meshProjective planeSlide ruleResultantSatelliteTerm (mathematics)Visualization (computer graphics)NumberMatching (graph theory)Process (computing)Presentation of a groupMetropolitan area networkFood energyCross-correlationPoint (geometry)Set (mathematics)Gaussian eliminationEstimatorExpressionEuler anglesDiscrepancy theoryBit rateGene clusterClosed setKey (cryptography)Different (Kate Ryan album)ArmMultiplication signStandard deviation2 (number)ForestRight anglePattern languageEigenvalues and eigenvectorsArithmetic meanError messageCircleGraph coloringComputer animation
MicroprocessorSoftwareExpert systemVector spaceSoftware testingIntegrated development environmentPhysical systemLink (knot theory)AreaMeasurementOperator (mathematics)WeightProcess (computing)Presentation of a groupInheritance (object-oriented programming)Online helpView (database)Graphics processing unitArmMultiplication signComputer worm2 (number)Computer hardwareLimit (category theory)Real numberPoint (geometry)Key (cryptography)Computer animation
Computer animation
Transcript: English(auto-generated)
Okay, by looking at the title of my presentation, you might wonder, this is related to the previous speaker, but actually it is, so everything is connected to everything.
My name is Samsung Lim, I'm the supervisor of RID-1 TAMJIS, which is the first author of this paper. RID-1 is developing an onboard visual-based attitude estimation system for drone mappings.
When I do UAV mapping, what we do is we collect images and you know the position of a drone and you know the direction of the image camera. You know the camera pose, and therefore from the overlapped images you can actually do the
math and get a 3D point cloud. That's what normal UAV mapping is doing, but what we want to do here is do the opposite. From the overlapped images we want to estimate the attitude of UAV and the position
of UAV. So we are interested in UAVs and the previous presenter was interested in people's attitude and orientation. So our objective is to develop a visual-based attitude estimation system.
We want to use the system as a backup of IMU, so it's a supplement to IMU to provide an accurate platform and camera pose during the image acquisition.
And also we want to remove GCPs when we collect UAV images. The reason why we developed this system is that in fact we have commercial software packages. If we use Pix4D or photomodeler or other commercial software packages, then we can
do the computing and data processing to trace back the attitude and pose of the platform. But this is done by post-processing. Once you have images, then you apply the software and compute the image plan and
use the backward transformation to obtain the attitude angles and positions.
But we want to develop an on-board attitude estimation and therefore our requirement is the time is the most important criterion. And therefore we have to be careful about transformation equations and detectors.
We use optical flow to measure the ecomotion of the on-board camera and we want to estimate the platform's attitude. We want to integrate optical flow with the key points detector. The key points are the points that can be found from two different images, from
two overlapping images. And therefore we have to cross-correlate the points from one image and another image. And we use image feature detection algorithms to identify those key points.
And there are so many key points from one image. Any point that is available in both images can be a key point. But some key points are very clear from the image. So we want to use those key points as a virtual ground control points.
In order to do so, we have to find the best detector or fastest detector because time is a critical issue here. Virtual ground control points are key points on the image.
They can be extracted from salient features and free GCPs should be present in overlapping images. And therefore any point from the, any common point from two images can be virtual ground control points.
But as I said, we need to identify very noticeable points from overlapping images. So we want to use only limited number of free GCPs. And these GCPs are crucial to estimate the pose of camera.
So if we go to the basic theory of this transformation between the image plane and the ground points, first of all you have a camera, camera position and focal length. And the image plane is in the middle and the image points, the pixel coordinates,
X and Y, are determined by the relationship between the camera focus and the ground points. And the ground point has 3D coordinates, X, Y, Z. So from this relationship, what we can do is to calculate velocity, to estimate the
velocity on the image plane. So the velocity on the image plane is a combination of translation and orientation. And we can represent this complex relationship into just vector matrix format.
So V is the velocity on the image plane and it is the combination of translation and orientation. And once you have overlapped images, the velocity can be represented by optical flow. So this is just sample optical flow field from two overlapping images.
As I said, the time is a critical issue and therefore we want to identify which key point detector is suitable for our project.
There are so many detectors available, but we tried three, we tested three criteria. The first one is detection rate, how many key points can be obtained from overlapping images and the second one is most critical one.
We measure the time to complete, how much time is necessary to complete detection of key points and the third one is matching rate. Although the detectors can give you a series of key points, but some of them are not
correctly matched and therefore we have to inspect those key points visually and manually and then we calculate the statistics of matching rate.
We tested about 249 overlapping images of lofters over New South Wales, Australia. And basically we have two different kinds of UAVs. One is fixed wing UAVs and the other is copters, but we used fixed wing UAV.
It comes with GPS and IMU, so it is able to do RTK positioning for UAV platform. The reason why we choose fixed wing UAV is that fixed wing UAV is more suitable for
high altitude flight and therefore we can minimize the distortion when the image is of the focus. So this is sample images of lofters over.
From the top left to the bottom right, this is six images in SQL. So you can see some overlaps between two consecutive images. For example, in the first image, the round belt is at the center, but the next image it is on the bottom.
Now in the third image it is totally gone and then other features like trees and the roads are apparent in other images. So we tested seven key points detectors.
The first one is Harris detector and then minimum eigenvalue detector. We call it MIN-IEC and scale invariant feature transform, SIFT, and maximally stable extremal region, MSAR, and speeded up robust features, SURF, and FAST and RISC.
So we tested seven key point detectors with respect to three criterion, the detection rate and time to complete and matching rate. So I want to report about those statistics.
We use two matching metrics. The first one is sum of absolute difference between two sets of key points. The key points obtained by the detector and the key points as a reference.
And the difference is measured by absolute value between the two differences, that's set. And the other one is sum of squared differences, so difference is squared. Out of seven detectors, MIN-IEC, the minimum eigenvalue algorithm, provides us the largest
number of key points. So MIN-IEC detected more than 5,000 key points from 249 images.
I mean, this is a min value, so out of 249 images, on the average, MIN-IEC value can give you about 5,000 key points per image, so that's a lot. The second one is SIFT.
SIFT can give you more than 2,000 key points from one image, on the average. SURF is in the middle, and FAST and RISC performed not so good.
We got very limited number of key points per image. So RISC and FAST didn't perform well in terms of the number of detected key points. We also looked at matching key points.
Out of key points, out of detected key points, how many key points are actually matched? But this matching means not correct match, it also includes false matching too. Anyway, the matching key points, the SIFT algorithm is excellent.
SIFT provides more than 250 matching key points per image. And the second one is MIN-IEC, and the third is SURF.
So SIFT, MIN-IEC, and SURF are the top three, and the rest of them are providing us not so many matching key points. So this is one of the criteria, the matching key points. And when we looked at matching rate,
SIFT is about 11% matching rate, and MSUR and SURF are similar. So SIFT, SURF, and MSUR are good detectors in terms of matching rate.
But again, here matching means just we can see the key points from one image, and the key points from another image. So we can see those key points from two sets, but it doesn't necessarily mean that one point in one set
corresponds to the identical point in another set. So it contains some false matching key points too. I think time to complete is the most important criteria in our system.
Unfortunately, SIFT doesn't perform well. It takes a lot of time simply because SIFT provides you too many key points. So it takes more than 1.4 seconds to complete detection.
I think FAST is the most fastest one, as the name suggests. But FAST doesn't work well in terms of detection rate. But anyway, the top three performers are SURF, BRISC, and FAST.
Now, from the visual inspection, we inspect individual key point from the image, and we matched from one image to another. So we compare the matching key points against our visual inspection.
And we couldn't do this for SIFT results because SIFT has too many key points. So unfortunately, we couldn't report the correct matching rate for SIFT algorithm.
But anyway, SIFT is out of candidate because SIFT is too slow. So it didn't affect our project. Anyway, most of them perform well. They perform, I mean, correct matching rate is above 80%.
So I think it doesn't matter which algorithm do you use. But what I want to say here in this slide is that there is a discrepancy between SSD and SAD.
If we use SAD, the absolute difference as a matching metric, then SURF is the top one. SURF performs number one among six or seven. But when we look at SSD, it's not.
So whether we have to use SAD or SSD is a question. Based on the average correct matching rate, so we consider both at the same time, then main I is the top one, and Harris is the second, and SURF is bottom two.
Based on these statistics, what we learned from this test is that SSD and SAD do not show consistent results, especially in terms of correct matching rate.
But SURF shows the highest correct matching rate with respect to SAD, and SURF was among top three in terms of time to complete and matching detection rate.
So which one is the optimal key point detector for our project? SIFT provides the highest number of matching key points, but it provides too many key points. That's the disadvantage to our system, because it increases the processing time,
and our interest is onboard camera pose estimation, and therefore we excluded SIFT from our detection algorithm. But SURF shows an optimal processing time.
It takes only a point to second, and it shows a reasonable number of matching key points. And depending on which metric you use, SURF is the top one in terms of correct matching rate. So our choice is SURF.
And what I found from this experiment is that there is a clustering pattern among matching key points. I will show you a cluster of matching points in the next slide. But we can use this cluster as a tool to classify correct matching key points
and false matching key points automatically. In order to classify correct matching points and false matching points, we applied cross-correlation, and we also used outlier information.
So this is the algorithm for the automatic visual identification of correct matches. So we use outlier information and use the clustering and apply cross-correlation, and then eliminate false matches.
These are statistics of automatic visual identification of SURF key points. Horizontal axis represents the number of images, so from the first image to 249 images.
And these numbers represent how many correct key points are identified. So some image provides very small number of correct matching key points, and some other images provide 151 correct key matching points.
By the way, this is statistics for SURF algorithm. We apply the same to the mean eigenvalue key points. Statistics are similar, and some images
doesn't have enough number of common key points, hence these images cannot be used for our attitude estimation system. Statistical analysis shows SURF has a mean error of 0.4,
and max error of 4, and standard deviation is about 0.88. So statistically, SURF is superior to mean eigenvalue, so that's one of the reasons why we choose SURF as our algorithm.
Here it shows you clusters of key points. I put two different images into one, so you see two different colors here. So the points in one image,
for example here are the black lines and circles, so black circles in one image are classified as key points, and they are moved to black crosses in other images, so they are clustered,
and therefore we can use this clustered information for automatic visual identification. This is the final result. We have a pair of two sets, a pair of two images.
You remember this is the first image on the series, so near the roundabout, this is key points, a set of key points, and they are matched with the key points in the second image.
The performance of various key points detectors was evaluated with respect to detection rate, and time to complete, and matching rate, and a set of 249 error images were used to evaluate the performance of key points detectors,
and our assessment results says self is the optimal key point detector for our system, for our onboard attitude estimation system. It doesn't necessarily mean self is the best. I would say it is optimal for our system.
I think that's it. Thank you very much. Thank you very much. It is time for one quick question. Thank you, Samson, for your presentation.
Just regarding the SIFT detector, we use SIFT on a GPU, and in GPU, the SIFT runs quite fast, but in aerial vehicle, I don't know if you have graphics with a GPU, but if you have hardware enough to do it
on the aerial vehicle, it would be interesting. Thank you very much. It's very helpful suggestion and comment. Actually, we thought about GPU, too, but as you can imagine, UAB has so many limitations.
The payload is, first of all, the payload is the significant limitation. GPU and any microprocessor could increase the payload.
Also, the second limitation of UAB is the battery. If you use high performance computer, it takes more battery power. Very, very short question. Maybe I missed.
Which software do you use to apply these algorithms, this open source or whatever? We develop our in-house software, so we use actually MATLAB. How far are you detecting this? How fast?
How far? How fast? What was the eight? No, no. How fast? How fast? Because if you are going this way up, you can detect by going this way, can you find all the key points fast enough to detect the height? As said in the slide,
self-process, there's key points within 0.2 seconds. I'm not sure it is fast enough for actual onboard operation. Our system is ongoing. It's not complete yet, so we didn't actually test in real environment.
Thank you very much. Unfortunately, we don't have any more times. We can discuss this question later. Thank you very much. Thank you.