The return of Crazy Data
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | 295 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/43546 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
00:00
Wald <Graphentheorie>FlächeninhaltGeoinformatikDatenverwaltungComputerforensikSoftwareRechenwerkGefangenendilemmaGrenzschichtablösungAdditionExogene VariableWellenpaketComputeranimation
00:21
DatenverwaltungFlächeninhaltComputerforensikGeoinformatikSoftwareInhalt <Mathematik>Coxeter-GruppeWald <Graphentheorie>ExpertensystemSoftwareentwicklerDatenverwaltungFlächeninhaltGefangenendilemmaOffice-PaketElementargeometrieSoftwareCoxeter-GruppeZählenComputerforensikDienst <Informatik>Bildgebendes VerfahrenGrenzschichtablösungDesign by ContractAggregatzustandZentralisatorMAPWald <Graphentheorie>SatellitensystemWissensbasisBenutzerbeteiligungMereologieKumulanteDokumentenserverSoundverarbeitungProzess <Informatik>ResultanteDatenbankAbfrageMapping <Computergraphik>EinsQuellcodeTemperaturstrahlungTypentheorieKonzentrizitätAuflösungsvermögenIntegriertes InformationssystemPhysikalischer EffektSuite <Programmpaket>FokalpunktBetragsflächeLeistung <Physik>ATMRechenwerkForcingMinkowski-MetrikDifferenzkernBAYESPortal <Internet>EinfügungsdämpfungComputeranimationVorlesung/Konferenz
05:53
FlächeninhaltWald <Graphentheorie>Befehl <Informatik>Konfiguration <Informatik>Prozess <Informatik>QuellcodeSimulationInnerer PunktDifferentePolygonFAQPaarvergleichPhysikalisches SystemContent ManagementEinflussgrößeInformationDatenverwaltungNatürliche ZahlRandwertNormierter RaumSchätzungEntscheidungstheoriePunktWasserdampftafelGüte der AnpassungVerschlingungProzess <Informatik>FlächeninhaltEinsGeradeDienst <Informatik>Offene MengeBitrateDifferenteWald <Graphentheorie>Coxeter-GruppePhysikalisches SystemNeuroinformatikMonster-GruppeUngerichteter GraphGreen-FunktionData MiningBildverstehenEndliche ModelltheorieSatellitensystemProdukt <Mathematik>KorrelationsfunktionInformationDatenverwaltungSystemzusammenbruchQuellcodeElektronische PublikationStatistikSelbstrepräsentationAlgorithmusGesetz <Physik>Computeranimation
09:52
ZahlenbereichSystemzusammenbruchEinssinc-FunktionVersionsverwaltungATMURLEchtzeitsystemHydrostatischer AntriebFastringKorrelationsfunktionComputeranimation
10:28
SpieltheorieBitrateÄhnlichkeitsgeometriePlot <Graphische Darstellung>Prozess <Informatik>ZufallszahlenÜberschallströmungRechenwerkSchätzungQuellcodeDatenbankStellenringRandwertCoxeter-GruppeVakuumHash-AlgorithmusElektronische PublikationSkriptspracheKurvenanpassungBetrag <Mathematik>Ordnung <Mathematik>WarteschlangeElementargeometrieQuadratzahlDichte <Physik>MinimumFlächeninhaltRechenbuchCase-ModdingDivergente ReihePixelMultiplikationsoperatorEinsRechter WinkelRahmenproblemDreiecksfreier GraphMeterZentrische StreckungKonfigurationsraumZahlenbereichGruppenoperationArithmetisches MittelURLSatellitensystemNotebook-ComputerDifferentePlotterDichte <Physik>BitrateZeitreihenanalyseQuick-SortEntscheidungstheorieATMPunktTotal <Mathematik>SchnittmengePolygonFlächeninhaltUngerichteter GraphAggregatzustandKette <Mathematik>Folge <Mathematik>Schnitt <Mathematik>QuellcodeInformationWort <Informatik>ResultanteWechselsprungEinflussgrößeDokumentenserverZählenMehrrechnersystemRechenbuchRechenwerkNatürliche ZahlWeb SiteComputeranimation
16:33
MIDI <Musikelektronik>RechenbuchFlächeninhaltGanze FunktionAtomarität <Informatik>MenütechnikUser Generated ContentUngerichteter GraphSkriptspracheFunktionalKonfigurationsraumDichte <Physik>EntscheidungstheorieBetrag <Mathematik>InformationElektronische PublikationBefehl <Informatik>DokumentenserverZweiZentralisatorMereologiePlotterCASE <Informatik>BitrateAggregatzustandEinfügungsdämpfungDreiecksfreier GraphTotal <Mathematik>VorhersagbarkeitTechnische InformatikWald <Graphentheorie>Physikalisches SystemURLGrenzschichtablösungProgrammierumgebungArithmetisches MittelTopologieSystemzusammenbruchKontextbezogenes SystemAbfrageProzess <Informatik>Materialisation <Physik>TabelleSpieltheorieVirtuelle MaschineVerschlingungWürfelWorkstation <Musikinstrument>DatenbankKette <Mathematik>p-BlockAusdruck <Logik>AdressraumGebäude <Mathematik>Call CenterGeradeMaschinenschreibenZahlenbereichProgramm/QuellcodeComputeranimation
22:39
UmwandlungsenthalpieProdukt <Mathematik>Rechter WinkelCoxeter-GruppeEinfacher RingEinsProzess <Informatik>MereologieValiditätStatistische HypotheseBenutzerschnittstellenverwaltungssystemFlächeninhaltInformationQuadratzahlTechnische OptikBeobachtungsstudieMultiplikationsoperatorMeterTeilbarkeitKonditionszahlVorhersagbarkeitKontextbezogenes SystemHydrostatischer AntriebSchwellwertverfahrenUngerichteter GraphAnalysisWald <Graphentheorie>DifferenteSatellitensystemComputeranimationVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:07
Thank you. Thanks everybody for being here. My name is Daniel. I've been working for almost 10 years with GIS in the Brazilian Federal Police, and I've been to several Phosphor G editions before. In my unit, I'm responsible for training, procurement,
00:25
software development, international cooperation, just management. And my area is the geometrics area of the forensics director of the Brazilian Federal Police. I'm a police officer. I'm also a forensics expert. I was recruited about 10 years ago to work in the GIS
00:41
infrastructure there. So that's what I do there. I maintain this. I mean, I manage the team that maintains this portal. And this is a knowledge base. Our knowledge base is a weak base. And this is our main portal. This is how we provide services to our internal customers. And this is a big deal there
01:05
for us that we have a way to deliver. We procure the image contracts in the central unit, and people from all over Brazil will order the images using our web portal, and
01:20
we will deliver the image to them using the portal. This is our database, our imagery coverage in our internal database. This is all imagery with 30 and 50 centimeters spatial resolution. And the actual body of this presentation is on GitHub. So I'll close the presentation right now and I'll open the GitHub repository. Okay, here. Okay.
01:56
So in Fosford, Denver, sorry, Portland, I delivered a presentation called crazy data
02:02
in which I gave examples on how to deal with the very difficult data. I mean, either because it's big or because it's dirty or because you have to do some special treatment on it. And when I arrived here at Fosford, many people start asking about this subject. So I decided to crunch some data, and this presentation is about how to do it.
02:24
So I'm going to be very specific that I'm not going to discuss politics, ecology, or relations of cause and effect. I will try to be very objective, very technical here. So please, when you're going to ask questions later, please have that in mind.
02:43
And the intention is for the whole process to be repeatable. So if you're patient enough and you have access to this repository, we'll be able to do the same things I have done and arrive at the same results. And you will also be able to modify the searches and the queries and you will be able to answer different questions than the ones I proposed here.
03:02
So what happened here? Why is this presentation being delivered here? We had a lot of news coverage about the recent news coverage about the fires in the Amazon forest. And we have a lot of data from various serious institutions that give us an insight on that.
03:25
So this one is very interesting. This is sponsored by NASA. So if you see here the fire counts, the radiative power, it's closely related. There are two types of sensors,
03:40
and there's maps here, and there's the history. For example, this is the state of Brazil. So you have the cumulative monthly fire counts. So it starts at zero in January first, and then it will count all the MODIS alerts, MODIS and VERUS alerts that happened during the year in that specific region. So you have a history for several years here.
04:05
So if you put the mouse over here, you get the count for 2019, which is 3,200-something fire spots. And until this same date, in 2016, there was 3,532 fire counts. So it has
04:21
all the data in there. If you hover the mouse over here, you get specific yearly, so 2005 was way higher. And you have all this very nice data for several states in Brazil. So that's how I started looking into the subject. Another very interesting portal
04:42
is this one, which is the source for the fire information system from NASA. And it will give you literally a hit map of the fire spots that the satellites have detected.
05:03
And the first thing I realized when I opened this map, I was seeing the news and I started checking the data. The first thing I realized is that you don't have an unusually high, relatively to the other parts. Just to be clear, this is a 24-hour alert. So we could
05:26
give it some more time, but it's difficult to see it. It gets too dense. So it's 24 hours. And when we look at this, we don't see a special concentration on the Amazon area.
05:43
You see large concentrations here, which is a dry area of Brazil, which is not forest. You see high concentrations over here. You do see fires in the forest, lots of fires. And you see this. And I was very surprised when I saw this because this wasn't in the news.
06:03
So let's go to the data then. I believe in asking questions that will help us make a decision. So you have a decision in your mind that you want to take. What questions do I need answered to take this decision either way or the other? So I'm not proposing a decision here,
06:23
but I'm proposing a question. So the question is, is the Amazon forest being burned at a high rate? And we're going to break this question down. First things first, where is the Amazon forest? There are at least three ways to define the area of the Amazon forest.
06:43
The first one is a legal document, laws from Brazil that tell that the Amazon forest is the red line here. It's inside the red line. So that's the legal Amazon, like we call it. So there's special laws for that specific region of the country. Then there is the biome data. So some areas inside the legal Amazon are Pantanal or Cachinga
07:07
or Linsois-Marines is different area, not sorry, not Linsois, but Cachinga-Pantanal, different biomes. It's not the Amazon forest. It's in the legal Amazon. It's not the Amazon forest. Okay, what is the biome then? The biome is the green line.
07:21
But I only have this data for Brazil. If I want to compare Brazil with the other neighboring countries, I wasn't supposed to be using this data. At least I had to have similar data to in the other countries and I don't have it. It's probably available, but then again, I wouldn't be able to make it, put it together in two days. So what I decided to use was
07:43
the Amazon river basin. So if you take the elevation model and you compute where the water would go, if you put the water in each place of that region of South America and you track the water all the way down to the sea, which places would you put the water in a way that the
08:05
water reaches the Amazon river? So that's the area. That's computable. That's very objective. That's an algorithm you can run on the rest of data from SRTM, for example. So that's a good starting point. Also, water has a lot to do with the ecosystem, with the plants,
08:22
with the animals. And so we get a good enough representation of what I am considering the Amazon for this statistics. So it's very objective. We're trying to go for that. Okay, so in the GitHub repository, I try to get all the sources there. So if you need the
08:42
same data you have, I will show you later the actual files where I put the links down. So I don't know how long NASA will keep those links for the requests I made, but you don't have to wait for NASA to process your new request. You can just download the ones already made. Or you can make new ones for yourself. So the sources for the main polygon,
09:01
it's this institution over there. And the source of the green line is here. So it's a WFS service, Brazilian open data. Okay, now that we got to figure out where the Amazon forest is, how do we measure burn rates? Because that's our question. We're asking if the burn rates are
09:22
unusually high. So NASA has this fire information resource management system. So they provide heat spots from the two sensors that are aboard several satellites. Most of these sensors have been around since 2001. And there are two satellites with those sensors. And the various sensors are more modern. They give different products. And you can do
09:44
your statistics with either one. They have a high correlation between each other, but they're not the same. If you look at this portal here, and they're really zooming the data. I'm feeling very adventurous here. Maybe I hope it doesn't crash.
10:00
Okay, so the orange ones are MODIS data. The red ones are VRSS, since it's very recent data. The MODIS version is in a near real-time mode. It's not science mode. So they still have to calibrate for location. But you see that they have high correlation, but they're not the same. For example, here you don't have MODIS alerts, but you do have VRSS. Okay, but the number of
10:24
features is much higher. Okay, let's go back there. So that's what the data looks like. I'm going to show you step by step how I loaded it, if I have enough time. But I'll show you step by step how I loaded it. So that's how it looks in Brazil. That's just for the last month,
10:42
last 30 days, sorry, 30 days from September 27 to August 27. And that's what it looks like. And it's only MODIS, not VRSS. And QGIS has a hard time with this. Lots and lots of points. The total data set has 110 million points.
11:03
So this is Brazil. Remember when we spoke about Africa? I went over there, and that's the same scale, same configuration. And this is Brazil, this is Africa in the last 30 days. And this isn't in the news.
11:20
Okay, but this is just seeing. We don't have the numbers. We haven't crushed the numbers yet. So this is the source. This is where I got this from. Also, there are some caveats, because we have to be careful with what we are measuring. What does a fire detection mean on the ground? What does each feature from this data set mean? So if you open here,
11:42
you have access to detailed information from NASA, from the science team, about what is this. So if you have a fire, so if you have a fire in a location, the satellite will detect just one fire spot here. If you have two fires on the same location, it will detect just one fire spot,
12:01
albeit it will be brighter. I didn't take brightness into my calculations. I could have, but you will see later why I didn't, because it was very heavy for my notebook over there. And if you have a large fire that takes four pixels, those pixels are, I think, one kilometer by one kilometer. So if you have a large fire, you get four, in this case,
12:22
four detections. And the versus is pretty much the same, with the difference that the pixel is 375 meters. It's much smaller. It's like almost, I don't know, almost four times more pixels. Yeah, almost four times more pixels per unit of area. So let's get back there.
12:47
Okay, now we got, we're still studying the question, what does it mean for the array to be high or low rate? I can't answer that. I really can't. I didn't get to a conclusion. But we could start by comparing, right? We could compare one place to the other
13:03
and try to get similar data. And I got natural earth data for the countries, and I got data from our geographic institute for our municipalities, which is not here, sorry, I should have put it here, and for our states. So what time frame are we talking
13:23
about? I didn't put a lengthy explanation here, but all the data we see from the earth, the firm's website talks about a yearly cycle. So everybody measures yearly, I'm just gonna go with it. Okay, I'm gonna try my best to adhere to the time here, which was almost finished.
13:45
So what do we do to process this information? There's a lot of notes here. What have I done? But I think it's best that I go to the results, then I jump to the repository where there is Okay, so the first thing I notice is that if I take Amazon fires by country, so I picked up the
14:06
Amazon polygon, I cut down the polygons of the countries, and I considered only the area of the country inside the Amazon. If we see here, it looks like the heat spots are diminishing, right? From 2018. Especially in the density data here. So this is actual fires, the absolute
14:27
number, this is divided by the area, so it's Moge's fire per 1000 kilometers square. But 2018 hasn't finished yet. If you do the same thing, but we take into account that
14:42
only the data until August 27, you see this. So it's much different, right? The first thing we see here is that Brazil has the most fires by far in the Amazon. But the highest density of fires is not us, it's Bolivia and Guyana. I was really impressed by this.
15:02
So it's also very different. This is zero, so it's almost 50% difference. So this is the worst data, which is very similar. Pretty much the same, you could look at carefully there later. So if we need to take a decision, if you're a decision maker
15:21
in Brazil, and you need to know where the problem is worse, you will take a look at a state by state. So there are worse states, like here, Jorama has the highest density, and Mato Grosso has the highest count. But do you see the time series? This isn't unusually
15:47
large, it's much larger in 2018, but it isn't unusually large compared to the time series. Amazon's fires have always been in the news, but not this much, so my impression is that this is to answer the question, if it's a high rate or not, I don't know, but it's similar to the
16:07
rate that we've been seeing so far. And it has been much worse in the past. That is very, very bad. 2004, 2005 was very, very bad. Then you have density here, that's what a decision maker could look at. Thank you. For municipalities here, oh sorry,
16:28
that's the same data, but it's Verz now. And since Verz is more recent, the sensor is more recent, you'll have data here since 2001, and you'll have data here since 2012, and so you have to make, you either plot this data again for you to take a look at it carefully, or you
16:44
have to do a mental exercise to translate one thing to the other. And here in the density, we see clearly that the density has increased, and in some cases, more than 100% here, which is the case of the state of Raima. Oh, then we get the same data by municipality,
17:02
so that's what decision makers should look at, into how to fight the problem. But then I try to compare with the other countries. So this is the same problem as before, the year hasn't finished yet, but the thing is, I don't know how the yearly cycle works in the other countries. So I just compared the height of the lines, I didn't compare
17:25
whether it's decreasing or increasing from 2018, but we see this, by country, that's the total amount of fires, absolute fires. The Democratic Republic of Congo is here, then Russia, then Angola, then Brazil down here, then Zambia, and all these other countries. If you go by
17:43
density, you won't see any of these countries in density, if you just put the most dense deforestation here, because of course, small countries, they burn everything really high, so I pick these countries, and I plot this graph, the density graph. So the highest density
18:00
of deforestation is in Angola, then Zambia, then, not sorry, not deforestation, we are measuring heat spots. So it's Angola, Zambia, Congo, Central African Republic, Mexico, Russia, then Brazil. So you could reproduce this data if you wanted to, but this data is also, I like to say that it's also very dangerous. Why? What conclusions are taken from this? You're not measuring
18:24
intentional fires, you're not measuring wild, if the fire is a wildfire, it's an intentional fire. You're not measuring, you're not separating volcanic activity from burning forests, you're not taking into account several other, other, other informations, but it's something to look at and
18:46
try to get more information to see what does that data mean. So why don't we have the, we only move these fires here, we don't have any verse here, because my machine crashed. It took three hours to run this query, and then it crashed. This query took 59 minutes, so I'm just
19:05
gonna tell where the rest of the data is, and I'll wrap up. So if you go here on the repository, you have the scripts by order. So first you download FIRM's data, this is the fire links, so these are really lazy scripts to unzip, to load the files into the database,
19:25
then to process the data, and then export the materialized tables, and then generate the graphics. This is a very hacky script, but it produces very nice graphics. You have a one function to plot the graphics, and then several configurations to plot each graphic here,
19:44
so you can change, for example, I use 10 countries here, you could change the number and get more countries into the graph. Okay, I'm done. I could explain it in detail, but it will take a long time, and if you need anything specific to the repository, you could
20:03
either put a poll request, open an issue with the repository. Thank you. We have five minutes for questions, starting in a second. Just take your breath. Okay, go ahead.
20:31
Thank you for the talk, very enlightening. But for me, I'm a biologist, so inevitably it will
20:43
have an ecological taste, my question. When you're comparing Africa to the Amazon, like the Amazon forest, you're comparing apples with oranges, okay. After Central Africa, there's a lot of grassland, savannas, fires there is an inevitable part, it's part of the biome resistant, okay. Amazon forest is not, so just to address this thing, because you
21:05
asked why the other fires are not being used, and fires there existed for thousands of years. Amazon fires is part of the deforestation, okay. This is just a statement I want to make. The rates may not be higher, I agree. So another part of my statement, rather than a question,
21:21
is that the proper question to ask, how much this adds to the total loss of the forest, and not if the rates are higher or lower compared to other years, okay. That's two comments. Yeah, I don't know how to answer that. I don't have a computer engineer. Sure, thanks for the comments. I just want to ask you basically,
21:50
do you are using any kind of prediction systems for forest fires or thinking about that, you know, because then you can easily look to the locations where something can happen
22:04
in the future, like let's say tomorrow or in two days, based on the just weather data, let's say. Well, there are better people than me to answer for that, but the best I can
22:20
answer is that the fires always have a context. It's either environmental or people have been cutting down the trees and they want to use it for grassland or something like that. So if you get that context before, you could theoretically predict where the fires will be.
22:40
Yeah, but the fire starts when the matter on the forest ground is dry. If there are wet stuff like wet leaves and whatever, there is no fire because the stuff must be dry. So based on the weather condition, it's kind of easier to predict. Well, every year is the same.
23:02
Every year there's a dry season, then when the fires happen and, you know, it's a large area, so you could, you would probably be better predicting with specific context of that specific area because everywhere it's drier, not that dry because it's the rainforest, it rains a lot there. Hey, I was just wondering, do you have any detail on
23:27
the breakdown of what happens naturally versus ones that are done by people when they are trying to, say, clear land? Is there information on that? Well, I didn't do that analysis, but you could theoretically do it because there are fire temperatures, there's the amount,
23:43
the size of the fire, there's the history of the fire in the air. So if you see a new fire where there's already been a fire, that's different than the fire where you have never seen anything before. So you could do this with this data. I didn't process it, but... Hi, thank you for the presentation. I was wondering, did you look at Sentinel-2 data from
24:08
the Copernicus mission from Europe because you should have higher resolution. I'm not sure if you could see better details. That would be great for validation so that we make a hypothesis, a more complex analysis here. So my hypothesis is this is, for example, deforestation.
24:26
So I'll check it with Sentinel data, but I didn't do that analysis here. But the thing is Sentinel doesn't have a product like the firms. If the Sentinel, at least that I know of, if they produced a fire alert based on optical Sentinel-2 data, that would be
24:45
that could have been used for this study. Yeah, I think should be possible because it's also a multi-spectral sensor, so you should have access to some information. And just curiosity, did you look also in Siberia? Because I know there were some fires there also in this time.
25:00
You know, something I was curious about in Siberia. I was wondering if somebody lights a fireplace in their home, if it will show up as a fire. So that's unfair, right? So you're heating your own home and it shows up as a fire spot, but it isn't. So just check this. It has to be a fire at least 50 meters square to show up in the satellite.
25:22
No, just comments. There's also a burnt area product from MODIS that you could use to validate somehow because this active fire product is based on a threshold on temperature and LST. And you can be detecting a lot of things as well. I mean, okay, Amazon is burning and so on.
25:44
Maybe I have linked this here. Let me check. And that MODIS product is, okay, 500 meters, so you could also use it. Okay, where can I get the MODIS burnt area product? In the MODIS NASA Earth data site. Okay. So it's already. What I have seen so far is
26:02
that it's updated May 2013, so it doesn't work here. It's still going. Thank you, Daniel, for the presentation.