The STAGA-Dataset: Stop and Trip Annotated GPS and Accelerometer Data of Everyday Life

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/68907 (DOI)

Publisher

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Release Date

2024

Language

English

Production Year

2022

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

## Motivation & Contribution Part of the development of an analysis pipeline for mobility studies using GPS data is benchmarking its performance on both the raw data accuracy and the analysis pipeline itself. When we started to develop our algorithm for stop and trip classification, it became clear that we needed a precisely annotated dataset containing accurate stop and trip labels as a ground truth. Apart from validating our development, we wanted to have a reference point for comparing our analysis methods with existing libraries. For the study, we planned to equip participants with a smartphone to collect movement data in form of GPS and acceleration data for several days in a row. To prolong battery time, we chose a lower sample frequency. Our special focus was to create ground truth for stop and trip detection algorithms, hence the annotation focused on this. Through this manuscript, we contribute a comprehensive dataset providing accurate start and end timestamps for stops over 126 days. The STAGA dataset is an unprocessed table of GPS coordinates, annotated with a timestamp, altitude, GPS accuracy, and class label ("stop" or "trip"). Each sample labeled as a "stop" further contains the GPS coordinates of the location it's attributed to. The acceleration data is provided as a separate file, but covers the same time frame and contains a triple (x, y, z) of acceleration sensor readings for each given timestamp, sampled at 1 Hz. The STAGA~dataset is provided publicly and free to use. We further provide the iOS app used to create the diary data for simple stop/trip annotation while on the go. All this is made available under CC BY 4.0. ## Method #### Diary To create the dataset, we first tried a traditional diary approach: four researchers were taking notes, writing down addresses and times whenever they stopped. While this provided some first samples, it was a tedious and error-prone process, since taking notes is impractical in everyday life. Furthermore, it required looking up the coordinates belonging to each noted address, which works for clearly defined, urban spaces but can become problematic otherwise, e.g. in a park or a rural, outdoor environment as addresses aren't precise enough here. Because of that, we developed a simple iOS app that helped us annotate our movements. The app contains a map to validate the identified position, one button to start or end a stop, and a list overview of previously recorded stops. It captures the GPS position whenever a new stop is started and stores the current time as the start timestamp. When the button is pressed again, the stop is completed and the current time is stored as the end timestamp. Trips are derived from the intervals between two stops. Even more, the app allows exporting the captured annotations as a CSV file which can be directly used for benchmarking purposes. This way, we were able to create a GPS dataset containing precise stop/trip annotations, together with a reference position of the actual stop location. The diary was recorded using an Apple iPhone XR. #### Data Collection The device we used for the recordings was a ZTE Blade A5 (2019). It was configured to record GPS samples at a minimum accuracy of 25m, so if the device was unable to obtain a position reading within this radius, the data point was omitted. We sampled data with a frequency of 0.1 Hz and used both network and GPS as sources for determining the position (the smartphone supports A-GPS and GLONASS). It runs Android 9 and is equipped with a 2.600 mAh battery; during the recording of the dataset, the battery was always charged before the phone shut down. While the dataset contains mostly everyday life, it also holds small periods of vacation, travel, and hiking. Most trips were carried out by bike. However, the dataset contains long periods of walking, car traffic, and train rides as well. While the data was recorded in two different European countries (mostly urban environments), everything was rotated and projected into the North Atlantic for privacy protection. In the same vein, all timestamps have been shifted to start on January first in the year 2000. However, none of these changes should affect the performance of stop and trip detection algorithms, as the relative temporal and spatial accumulation of GPS records are not changed. ## Dataset Statistics The dataset contains 122,808~GPS and 7,813,740~accelerometer records. The recording time spans over 126.65~days. The diary contains 692~stops and 691~trips. The average (mean) duration of a stop is $240.8min$; the average trip duration is $22.7min$. On average, a stop contains $114.0$ GPS samples; a trip contains $63.5$ GPS samples (mean).

Keywords

foss4g2022

academictrack