We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Lightning Talk: The Pierre Auger Observatory Open Data release: not only data

00:00

Formal Metadata

Title
Lightning Talk: The Pierre Auger Observatory Open Data release: not only data
Title of Series
Number of Parts
13
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
On February 15, 2021, the Pierre Auger Collaboration released a first dataset representing 10% of all its cosmic ray data acquired since 2004. During the 16 months preceding the release, most work went into creating a framework allowing for releasing more than just the high level reconstructed parameters of observed cosmic rays. The framework contains: pseudo-raw data at the detector level, a website providing a complete description of the data as well as their recording and analysis by the Collaboration, an event display for visualising the different detector signals, and a series of analysis notebooks that can be run online to replicate the main physics results published by the Collaboration and improve the understanding of the use of these data. This presentation will focus on the importance and the added value of this framework to help scientists or science enthusiasts delve into the data without being discouraged at first sight.
Computer animation
Computer animation
Computer animation
Diagram
Computer animation
Transcript: English(auto-generated)
Hi, I'm going to talk about the Peirshofs battery open data release and focus on the fact that there's not only data and that there is a framework
coming together with the data to help people get into the data and be able to do physics out of it. So this will be the main message of this presentation is that open data release is not only releasing data, it's also releasing a framework to help people work with this data. So first, a few words about the Peirshofs battery
just to understand what kind of data is released and why it was released this way. So it's located in Argentina. It's the largest cosmic ray observatory on Earth, also the largest observatory. Here you can see a map of it superimposed on Switzerland to see how big it is, it's very huge, over 3,000 spec kilometers. There is a ground array of detectors
of 1,600 particle detectors. Then there are four telescopes overlooking as the atmosphere, they're not looking at the sky, they're looking at the atmosphere of both detectors. And they observe cosmic rays interaction in the high-end atmosphere and the cascades they produce in the atmosphere itself until they increase the ground. So the data at the end of the day
is for each cosmic ray, something like time and direction of arrival, energy, and some estimate of the primary composition. So what cosmic ray it impact at the top of the atmosphere. And the number of events is very low. So the highest energy there is maybe one event per square kilometer per millennium. So that's something like a few per year
for the full size of the observations. It's not a large amount of data, so at lower energies there are more of them. So we are talking about tens of hundreds of thousands of people, so it's not a large number. And so for each event, the detectors observe it like that so this is an event, a high energy event at night, so it's seen by the four telescopes. And so you see the development
of the cascading atmosphere and then at ground, a group of detectors over an area of more than 10 square kilometers detecting some signals, some particles going through it. So if you want to provide all the relevant data, in fact, a very simple text, ASCII file,
CSP or Excel or whatever, with one line per event, so 10,000 lines, and maybe 10 or 20 columns of information would probably be enough. Now, as an experimental physicist, we know that getting just the final output of the detector to do the physics is usually not a good idea because you really need to understand properly
how the detector is working to be able to extract the best of the data. And furthermore, this is a data release, so the Kiroche released 10% of their data, which is done after 20 years of operation of a collaboration of 500 people. So you're probably not going to find something new into these data with a very basic superficial analysis.
And so in order to really find something within these data, which was not found before by the Kiroche collaboration, which is the reason why data are made public, then you really need to understand the data. And so it's important to provide this information to the external world.
So what the Kiroche have actually decided to do was to provide as data as I described before, also just for 10% of the total data set, the information of arrival directions, these high level reconstruction information for each event, but also to provide pseudo-row data.
So data very close to the level of the detector, both for the surface detector and for the telescopes. And these are close to row data, just calibrated basic data, and they are a bit harder to analyze. So a website was designed to display the events and to explain all the fields of the data and also how the data was recorded,
how the data was selected, all the calibration process, all these physics which went into preparing the data. And also in addition to that, a set of Python notebooks was provided. So these Python notebooks, there are a couple of them which are just basic tutorial on how to use the data.
And then five of them are provided, reproduce hand result of physics papers produced by the Kiroche. They are running on kaggle.com. So if you go to the website of the open data, poge.opendata.oche.org, with one click of a button,
you can open one of these notebooks on your browser, test it, modify it. And so it makes it very easy to play with the data and to make this first step, which is usually difficult if you just download the data on your Android and then don't know what to do with this amount of data. So I'm just going to quickly show you
this result of one of these notebooks. So this is a large scale anisotropic notebook. So as you can see, it's a very simple, let's say four or five pages of Python when you make an event map, exposure map, you divide both of them and you get a significance mark of cosmic rays, EEV, exact electron volts.
And then in the end, you get a dipole that you try to adjust, which demonstrate that cosmic rays are extra galactic, the highest energies. And so basically all this notebook that is reproducing this science paper of 2017, the observation of a large scale anisotropic dipole.
And this is the final result of the Kiroche batteries, a five signal dipole. And with this data, you only get to 1.6 sigma. So it's 10% of the data set and limited data set. But all the physics is there and it's quite simple. And so being able to draw a map and do this kind of analysis,
which has a kick of a button, is we believe something very useful to breach this gap between people within the collaboration and people outside of the collaboration. So to conclude, I'm convinced that the process of releasing open data is not just making these data have enabled, but also providing a framework
to help people get into these data. And in particular, having a website and notebooks. So this easy to access Python notebooks at the kick of a button. Already, I believe a great solution to help people get into the data, see how easy it is to do real physics with them,
and then go deeper and deeper and try to do their own physics. So if you're interested as a collaboration in making an open data release, the first one, or if you've already done some and want to add these notebooks or need any help to this kind of processes, feel free to contact the Kiroche data release task
or contact me directly. I would be very happy to provide any help to go in this way of doing data release. Thank you very much.