OpenDataScience Europe 2021: Interview with Patrick Schratz
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 57 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/55225 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
11
32
38
40
50
53
54
57
00:10
Fraction (mathematics)FeedbackOpen sourceError messageProjective planeOpen setVector potentialField (computer science)Multiplication signUniform resource locatorTheory of relativityEndliche ModelltheorieCodeIntegrated development environmentCodeVisualization (computer graphics)EmailRepository (publishing)Machine learningSoftware frameworkTask (computing)Set (mathematics)Software developerPhysical systemResultantInformation technology consultingLink (knot theory)Data structureSelf-organizationOnline helpCartesian coordinate systemSinc functionTemporal logic1 (number)ForestCommitment schemeMeeting/Interview
Transcript: English(auto-generated)
00:11
I'm Patrick Schratz. I'm an R consultant working in Zurich, Switzerland at Synchra. And I'm also a PhD candidate in environment modeling at the University of Vienna. And
00:25
I contribute a lot of open source code to various projects, especially to the MLR projects for a project for machine learning in R. Now it's been six years, I'd say. So I started in 2015 with coding in the first place. And
00:43
then I think beginning of 2016, also in my PhD, then started late 2016. I got more into the development of machine learning and into modeling as well, because I was missing some things. And I just had the idea of contributing to an existing system rather
01:03
creating my own island solution. And yeah, this is how it started. So with some, well, small contributions, and then wallet, you know, I realized it's really fun. I can help myself, I can other people can have other people as well. And yeah, that's how it got started.
01:22
I'd say, the more the result is of public interest, like for example, spatial data, or internal environmental related outcomings, like how is how's the forest doing, you know, like, or what kind of weather can we expect the next days? I think these sectors are the ones
01:43
that there's the worst and most open source contributions available. Also, because I think, hardly there, you cannot make the most money out of it. And I think the more money people see, the more potential they can make in certain sectors, the more they want to keep it private
02:02
and think, oh, we can use that for ourselves and then maybe sell it. And for like, public interest data, like everything that's kind of spatial related, you can not make that much money in the first place. And so I think, well, the Okay, we make it open source, we just give it away. Maybe other people can do something. And on the other hand, I also see that
02:25
there's a large commitment to science contributions in that field for many, many years. And this, the more sciences involved in a certain field, especially with to code, they tend to make it
02:41
more public. And there is a huge science community in the spatial temporal data, not just in the last years, but it was growing more and more. And I think these factors, they all come together. So I'd say, luckily, there's quite some spatial temporal data publicly available.
03:03
And that's also the field where I'm working in also for my PhD. And the more it comes to confidential data, like private stuff, finances, and other things, people tend to really hold the information back. And they think, oh, even if you go give out the,
03:21
don't give out the data per se, but if you give out how we do it, people might see potential of how we can, they can attack us. I think that's, that's a potential thing. They want to hold things back. And when it comes to all we do this, because we give all the data anyhow to the public, then we can also give up the code, and then everything comes together.
03:41
So, I mean, MLF3 is an ecosystem for machine learning. So it's really, or for modeling in general. So it really tries to make things easier when you work with spatial data that has a spatial relation. And we handle kind of, we try to make it easier to not have to extract the values from the locations and then make it back together,
04:03
but just to have it all in one place. And we just handle the things in the background. And whatever you can do with spatial data outside of the modeling world, that is completely not covered by MLR because other packages for visualization and other spatial tasks you need to do with the data, they can really do that better. So, but for applying it
04:27
and for having it in a framework that knows how you can apply certain models to data structures that seem to be quite complex in the first place, this is really valuable to us, or in my opinion. And you can use it for kind of a lot of science studies, of course.
04:44
And I've also applied it to certain data sets like environmental data sets, so that there's the link again to my PhD. So for environmental modeling of any kind of data for pathogen infections, for example, in Northern Spain, that's one of my topics in my PhD,
05:02
where you can use hyperspectral data to monitor the problems. So these are kind of example applications that are really nice to handle with the help of such an ecosystem. Yeah, a lot of the bottlenecks, they're always bottlenecks. Currently, I'd say that
05:21
the bottlenecks are actually on the human resource side, because we really, and this is probably the same as it applies to any open source organization. We need people who actually do it and who have also knowledge of what they're doing. Because if you implement something that you just roughly know what's going on in the background, you're likely to make mistakes.
05:46
And while of course, there are many ways of how you can treat and handle the data out there. Even though I'm working with spatial temporal data since five, six, seven years, I only have
06:01
maybe touched a fraction of how you can do that. And I can try to implement this into the framework and make it easier for others. But there will always be people that miss something, what they actually want to do and how they want to do it. So what we really need would be people telling us what is missing, what they want to do, how it should be done. And then maybe
06:23
also validate what we eventually implemented. Because the human error there, that is maybe the most dangerous one if you implement something and it's not working in the way that we would like it to be. And to get feedback in the first place or contributions. So that's why
06:40
every time I talk about it, I say, well, yeah, there are actually some people behind this project that do this. And these people, they need input, they make mistakes and you can support them. So that's also how I got into this project just by starting contributing. And then you try to minimize these, like you said, bottlenecks, but maybe even like missing features, I'd say
07:06
more or less. Because on the bottleneck side, which usually for me, a bottleneck is something that maybe runs slow, but in fact runs, but it's there. But what we are really missing out is, I think, some ways how to do things in the first place that the community would like to have,
07:25
and they really need the input from outside. So if you see this, feel free to approach us, feel free to request features or just check out what's there and see if it works out the
07:41
way you would expect it to be. And then the three halves. The first place would usually be just, you can contribute on GitHub to any of our repositories by just opening questions, issues, anything. We also have an open Metamost channel in our organization, which you can just join and join any channel of the package you're interested in. But anyway, there are many
08:06
different ways. Probably contacting us for email also works, but doing it in a public way where other people can also benefit from what you try to ask or share is to be the best way than doing it in a private, direct way. So yeah, come talk to us and we're open.
Recommendations
Series of 57 media
Series of 24 media