## Introduction NoiseCapture is an Android application developed by the Gustave Eiffel University and the CNRS as part of a participatory approach to environmental noise mapping. The application is open-source and all its data are free. The study presented here is a first analysis of the first three years of data collection, through the prism of noise sources. The analysis only focused on the labels filled in by the users and not on the sound spectrum of the measurement, which will be studied later. The aim was to determine whether known dynamics in environmental acoustics could be recovered using collaborative data. This preparatory work having to be consolidated and extended thereafter, and with the will to include this study within the framework of the Open Science, an attention was brought on the reproducibility aspect of the analysis. This one was entirely realized with free software and literate programming techniques. The context of the study, the tools and techniques used and the first results obtained will be presented as well as the benefits of using literate programming in this type of preparatory work. ## Data An article presenting this dataset was published in 2021 (Picaut et al. 2021). It details the structure of the database and the data, the profile of the contributors and the contributions but does not analyze the content of the data. This is what this article proposes to begin. The data used in this study correspond to contributions made between August 29, 2017 and August 28, 2020. During this period, nearly 70,000 unique contributors allowed the collection of more than 260,000 tracks for a total of about 60 million seconds of measurement. A trace is a collected recording, it contains the sound spectrum (1 second, third octave) recorded by the phone coupled with its GPS positioning (1 second). This information can be enriched by the contributor with labels. There are 18 labels and the user can select one or more of them for each of the traces made. They are detailed in (Picaut et al. 2021). The preliminary work presented here focuses on the analysis of the proportion of certain labels in the global sample at certain temporalities. In addition to data from the collaborative collection, some additional data were used to limit the study area. We chose to limit the geographical scope of this preliminary study to metropolitan France because this area contains the largest number of recordings. The climate and sound dynamics are known and documented there. To facilitate the reproducibility of spatial filtering, it was decided to use open data sets from recognized sources: the Natural Earth database (Patterson and Kelso 2021) and the Admin Express database from the National Institute of Geographic and Forest Information (Institut Géographique National 2021). ## The study ### Tools #### PostGIS The data are provided as a dump from a PostGreSQL/PostGIS database (Ramsey and Blasby 2001). Several scripts perform much of the attribute and spatial filtering. These filterings are saved in a materialized view whose data will be analyzed with the R language. #### R The R language (R Core Team 2021) is a programming language for data processing and statistics with many libraries dedicated to geospatial data. Rmarkdown allows to mix code and text in markdown for the dynamic production of graphs, tables and documents. It is one of the recommended means for literate programming. #### Git Git is a Distributed Version Control System (DVCS) (Chacon and Straub 2014). It enables collaborative and decentralized work. The choice of Git was natural as different collaborators are present on several sites (Nantes, Lyon, Paris) and Git is already used within the UMRAE laboratory. ### Implementation The data are provided in the form of a PostGreSQL/PostGIS dump. A server has been set up and the data loaded. A materialized view was created in order to provide a stable access to the data corresponding to the defined criteria. These criteria are both attributive (filtering of certain tags, minimum and maximum durations, etc.) and spatial (located in France, reduced trace area, etc.). A Rmarkdown document establishes the connection with the view and then performs the operations allowing to analyze the data. A document mixing narrative, figures and code allowed the resumption and continuation of the analyses shown here. ## Results The study concerns tracks bearing a tag, registered in metropolitan France. It focuses on the proportion of a certain tag in relation to all the tags for a given period (time of day, season, etc.). In the sample studied, it is possible to note a prevalence of the tags _roads_, _chatting_, _animals_ and _wind_. The tags _air_traffic_ and _works_ are also well represented. A first axis of analysis concerns the time distribution of the tags. Animal noises (tag animals) are more frequent in the morning and especially one hour before sunrise. This is a common dynamic for bird song. We also observed peaks in human activity, especially commuting. The next temporal axis was the seasonality, especially those of animal noises, with a more intense activity in European spring and summer. This phenomenon could also be observed in the recordings. We also noticed that music was less present in autumn than in other seasons and that it is mostly present at late hours. |