Preliminary analysis of crowdsourced sound data with FOSS

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/61909 (DOI)

Publisher

FOSDEM VZW

Release Date

2023

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Crowdsourced datasets starts to become common, we can cite the Wikipedia and Wikimedia projects or the OpenStreetMap database as well known examples. The UMRAE research lab collect data from thousands users around the world with its NoiseCapture application. Assess the quality of the sound spectrum recorded by hundreds of differents smartphones models is a challenge by itself and people are working on it. But in the mean time, we ask ourself if we can extract information from the tags provided by the users. This talk will present the 2017 - 2020 collection dataset, the analysis of the recordings' tags and the complete FOSS toolset we used. We will present the challenges we faced, the solutions we found and the issues we will have to mitigate in the future.