Paituli STAC - experiences of setting up and using own STAC catalog
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 156 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68509 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Europe 2024 Tartu89 / 156
6
33
35
53
55
59
61
67
70
87
97
99
102
103
104
105
107
111
121
122
123
124
125
126
127
128
134
144
150
151
155
00:00
Streaming mediaLibrary catalog
00:04
Data analysisService (economics)Computer configurationRaster graphicsVector spaceImage registrationTrigonometric functionsSupercomputerDatabaseLink (knot theory)StatisticsWeb browserWeb pageScripting languageService (economics)Online helpLibrary catalogServer (computing)Fluid staticsMusical ensemblePoint (geometry)Flow separationComputer configurationSoftwareMultiplication signLatent heatLibrary (computing)Open setRight angleData analysisSlide ruleCASE <Informatik>File formatProjective planeModule (mathematics)Field (computer science)MetadataCombinational logicMoment (mathematics)Template (C++)Set (mathematics)BitParallel portDefault (computer science)CurveProcess (computing)Raster graphicsVector space
04:45
Least squaresComputer-assisted translationComputer animation
Transcript: English(auto-generated)
00:00
I'm Giliak and I'm from a Finnish company CSC and we have put up a stack catalogue with Finnish data and the starting point was the data was available already but it was in many places and it was difficult to find. So basically in this project we were only concerned about putting up the stack and I put out
00:23
also here the time that things have taken because it took a little bit longer than we thought in the beginning. So in total we're talking about one and half years and we currently have only raster data but of course we have also lighter and vector data so thinking about that and all data is open to
00:40
everybody and we have used only phosphor G and if anybody is actually interested in a service then there is a link. The first question we had because there is actually several options to choose when you want to put up a stack API then we spent quite some time in selecting which
01:01
software to use because we had very clear that we are not going to make another one and we ended up with gear server with open search for AR community module and it has post case as the back end and for us the main reason maybe to choose this combination was that we had both in use
01:21
already so didn't get really new dependencies and that's kind of nice and also we were like familiar with both gear server and post case so also the learning curve was quite easy there and one thing that I especially liked was that both database and templates were super easy to modify so if we wanted
01:40
to add some field or something then that was very very easy and it also had an API for making data updates because some of the datasets are updating daily so that was also important thing that to keep it up to date in the future and one
02:01
thing that we have that is our internal thing that we have scripts for making gear server statistics so now I could get all the stack statistics out what we did not like that much was that there is no tool to ingest such static stack catalogs to gear server we have kind of written it now ourselves but it's not yet good but maybe someday we could have something
02:24
for that to to provide for others and then one problem for again for us was that we don't have Java coders in house so whenever there is a problem then now we have a solution for that that we can talk to care solutions and in some cases it would be nicer for us if if the chase them would be
02:44
default format now it's HTML I think I skipped on the previous slide that our main use case are data analysts from academia who want to actually use the data from Python and R so that HTML doesn't make any sense at the moment we
03:03
are talking about hundred collections with 350 items and this has been the main work to get the metadata to the database it's over 12 months I would say of course there are nice Python libraries that we have been using but just to get everything right it takes surprisingly a lot of time we have put
03:25
our scripts out but they are very data-specific so I don't know if they are for any help for anybody else one very nice thing was that browser here we didn't do anything we just linked our our stack and it it was there and
03:43
this is actually when there is new users this is the page where I refer that go there check what it is available and then we have statistics I thought this might be interesting so it's mostly Python users if you are an FME users but I guess this is quite expected because I personally like the
04:03
Python tools a lot more and then we have also written example script for Python R the Python one is quite advanced with Dask and parallelization in the background because as a company we have a lot of computing resources that we provide for Finnish researchers so this was basically what
04:22
the end users are interesting and then as the last thing we have tested to run this Python thing in parallel and it kind of evened out after five parallel processes so if anybody has any good hints how to make this curve go longer
04:44
than I'm very interested yes