We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Your Geoportal F***ing Sucks

00:00

Formal Metadata

Title
Your Geoportal F***ing Sucks
Title of Series
Number of Parts
156
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Many national and regional governments have in the past few decades created GeoPortals to meet their obligations to provide citizen access to their spatial data. This spatial data is collected, in many cases, at tax payer expense. Indeed the EU (2024) says: """ The publication of data is driven by the belief that it brings enormous benefits to citizens, businesses, and public administrations, while at the same time enabling stronger co-operation across Europe. Open data can bring benefits in various fields, such as health, food security, education, climate, intelligent transport systems, and smart cities - and is considered "an essential resource for economic growth, job creation and societal progress". """ But even now nearly a quarter century after the introduction of the first Open Geospatial Consortium (OGC) standards for interoperability there seems to be a wide spread failure to make use of OGC standards to provide access to the underlying data that is needed by citizens create economic growth. This paper will detail the author's experiences with attempting to acquire spatial data and their observations of relatively inexperienced students trying to navigate some examples of geoportals. The paper will then make some suggestions to help data providers serve data with the modern methods and formats that users actually want, using open source tools such as GeoServer.
Keywords
127
Execution unitSoftwareSoftware developerComputer-assisted translationGeometryProduct (business)Local ringBoom (sailing)Bit rateComplete metric spacePlanningFood energyDefault (computer science)Form (programming)10 (number)Decision tree learningWordSheaf (mathematics)Revision controlSoftware developerResultantMappingLevel (video gaming)Process (computing)Student's t-testWebsitePhase transitionComputer fileBoss CorporationDeclarative programmingBitElectronic mailing listWeb pageWeb portalSoftware engineeringFunction (mathematics)StatisticsServer (computing)Shape (magazine)Open setDefault (computer science)Standard deviationFile formatMereologySoftwareLink (knot theory)Structural loadInformationQuicksortDifferent (Kate Ryan album)Set (mathematics)TesselationLimit (category theory)Goodness of fitBoundary value problemDecision theoryCuboidType theoryAreaDiallyl disulfideDependent and independent variablesCASE <Informatik>Value-added networkInterface (computing)Selectivity (electronic)GeometryComputer animationLecture/Conference
Decision tree learningAdvanced Boolean Expression LanguageComputer configurationCodeFile formatVector spaceRaster graphicsSet (mathematics)Data compressionMeasurementAttribute grammarAxonometric projectionFunction (mathematics)Musical ensembleoutputParameter (computer programming)Online helpServer (computing)Student's t-testServer (computing)Shape (magazine)TesselationStandard deviationSerial portLevel (video gaming)Computer fileCartesian coordinate systemOnline helpMultiplication signReading (process)Projective planeTouchscreenRaster graphicsLink (knot theory)Revision controlBoundary value problemSubsetVotingDigitizingVector spaceArchaeological field surveySoftwareFlow separationWeb pageMusical ensembleComputer clusterData conversionInterface (computing)Order (biology)Set (mathematics)Web 2.0Virtual machineQuicksortData compressionCuboidError messageMiniDiscAddress spacePhase transitionSelectivity (electronic)RobotMacro (computer science)NumberLimit (category theory)Antivirus softwarePoint (geometry)Graph coloringNumber theoryOffice suiteSimulationAttribute grammarWebsiteDifferent (Kate Ryan album)AreaNumbering schemeMeasurementFile formatIntrusion detection systemNeuroinformatikOpen setTranslation (relic)Computer animationLecture/Conference
Raster graphicsServer (computing)GeometryComputer-assisted translationSlide ruleWeb portalMaxima and minimaQR codeService (economics)InformationVirtual machineServer (computing)Standard deviationPresentation of a groupFunction (mathematics)File formatFluid staticsUniform resource locatorDifferent (Kate Ryan album)WhiteboardField (computer science)Boundary value problemStudent's t-test1 (number)Position operatorQuicksortAreaScripting languageInternetworkingMetadataChannel capacityLattice (order)Computer programmingStatisticsSelf-organizationProcess (computing)Internet service providerLevel (video gaming)Revision controlFormal languageCombinational logicSet (mathematics)SubsetResultantIdeal (ethics)Multiplication signSoftware maintenanceTesselationOrder (biology)Forcing (mathematics)Link (knot theory)BitComputer fileEmailTheoryShape (magazine)Traffic reportingPhotographic mosaicWeb portalPoint (geometry)Limit (category theory)Diallyl disulfideSpacetimeSlide ruleAttribute grammarMereologySource codeRobotSoftwareOverlay-NetzWebsiteRaster graphicsComputer animationLecture/Conference
Least squaresComputer-assisted translationPresentation of a groupComputer animation
Transcript: English(auto-generated)
OK, so first thing, this is all my own work, nothing to do with my employer. My boss does actually know I'm doing it, but she said she didn't want to be in any way related to it. So quick introduction. So who am I? I'm a well-known troublemaker. Many of you all have met me being troublesome at other conferences.
I'm an academic research software engineer. I'm a Phosphogy developer, developed work on GeoTools, GeoServer over the years. And despite being a research software engineer, which I thought would be a great job that only involved software, I do actually teach some students, so I did last year.
So some of this talk is based on that experience. So here's my, once again, a declaration of conflict of interest and a disclaimer. So I am a GeoServer developer, which could be used to provide a GeoPortal should you want to. I once worked for a company that advised local government and possibly national governments on GeoPortals.
And anything I say should not be taken as anything against those companies. And as I say, my boss, technically she does know about this talk, but she declined to have it listed as one of our outputs for the year, because she didn't think it would be a good idea. I want to thank my students this year
that I put up with me, when I gave them what I thought were seemingly easy assignments to draw a map. Just go to the portal, get some data, draw a map. And I discovered that GeoPortals are much harder to use than you expect. If you're not a, you know, if you've not been using GIS for 20 years, a lot of things that GeoPortals do are really annoying.
Much of this talk is based on things that have really annoyed me this year while I've been trying to get some data. But some of it is things that my students just completely failed to understand how or why they were being forced to jump through hoops. So, who in the room is responsible for running a GeoPortal?
Cool. So, thank you, Evan. So, what do I want when I visit your GeoPortal? What do I want to get? Okay, I want spatial data, because I'm a geographer.
I always want the map data as well as the statistical data. I don't want to have to go to a separate site to find it. I want all the data that I need. I don't want lots of little short bits of data. And I want it in an easy-to-use, open standard format. I'm really big on open standard formats and have been for sort of 20-odd years now. I love open standards.
Things that annoy me and make me likely to hunt you down, having to request each tile separately, download limits, and different results based on which file type I've just selected do I want. I don't want to know that half the data you've got is only available if I run history.
It's not helpful. So, here are some examples I've collected over the past year. I'm gonna apologize if anybody from the Scottish government watches this talk. A lot of this talk appears to be really anti-Scottish government. Just because I live in Scotland and I happen to want to get Scottish government data this year, there are many worse or equally bad data,
geo portals out there that I could have picked on. But it happens that it looks like I really hate the Scottish and UK governments. They are quite bad examples, though. So, first question. Are you running a data portal or a geo portal? So, if I do a DuckDuckGo search for geodata for the UK,
I end up on this page, which does have the word mapping on it, but it's got loads of other data that is spatial data, but it's not behind the mapping section. So, to start with, please make a decision on whether you're a geo portal or a data portal. Ideally, I want you to be a geo portal because all data is spatial.
I don't really want you hiding stuff away under mapping. Most of these cases, if I want something about criminal justice, I have to go download the criminal justice data. Then I have to go back to the mapping section to find the boundary data that will allow me to draw a map of my criminal justice data.
That confuses students terribly, it turns out. They didn't understand why they needed to do that. Don't have a filter that lets pick fault by format. I don't want to know that Sheffield City Council publishes its data as geojson, whilst if I ask for WMS format,
I get Erewash's district council data, but not Sheffield's any longer. That's not the right thing to do. All of your data should be available in all of the formats that you support. So, don't offer me the chance to filter by format. Particularly don't have zip in the list.
Zip is not a format. What does that mean? So, I actually followed that up, and it turns out that means shapefile, but they're too embarrassed to call it shapefile, so they called it zip. And occasionally it means map info, tab files as well, it turns out, but mostly it means they're shapefiles
that they've zipped up for me. Again, not terribly helpful. But you'll notice that this is the same search set, but all I've done is changed the format, and I've got three different sets of answers based on what format I've selected. And the next thing, I want all of your data.
I don't want to know about the fact that you flew it in phase one, or you flew it in phase two, or it was funded by somebody slightly different. Historic Scotland funded this flight. I want all your LIDAR data. I'm not interested in which phase you did it. And I don't really want to search through the 23 phases of LIDAR discovery that you've done
in different parts of the country. I want all the LIDAR data. Give me the LIDAR data. By all means, provide a link that takes you to the phase one data, if that's important. But by default, I want all of it. So again, this is the Scottish LIDAR data. I was particularly interested in getting LIDAR data for the whole of Scotland.
It's a nice map interface, it's great. As you can see, I can draw a little box to show the area I wanted, which gives me a list of results. I don't know if you can see, but there's a little shopping cart logo next to it.
I can click on the little shopping cart 40 times to add each of those tiles to my shopping cart to download. There is, in fact, to save it slightly, there's an Add All button. So to get the 400 tiles of LIDAR that I wanted, I had to click the Add All button 40 times.
So I click Add All, they go to the next page, click Add All, go to the next page. And I don't really want a shopping cart, to be honest. It's not really the answer. And then to make it even better, I got to the download, went to open my shopping cart up to pay for it, and it said, here are the 397 separate file buttons
that you can click to download the data. I thought, no, there has to be a better way. Unfortunately, it turns out the Scottish government foolishly left the S3 bucket address in their download links, so I could cross-mount their, I could mount their S3 bucket onto my local machine
and just use GDAL on their S3 bucket, which probably cost them quite a lot of money, because I made a BTR of it and then did all sorts of stuff, basically downloaded that data a lot of times. But that was fortunate. Otherwise, I would have had to press the button 397 times to download them. And they're not called obvious things,
so you can tell when you've missed one. They've got UUIDs as their file names at this point. So when you get to the end and discover you've only got 396 files, it's really hard to work out which one you've missed. It's hard to see how anybody could make this worse for people to use.
This is a slightly different site. This is when I wanted some English LiDAR data, because the Scottish boundary goes like that and I wanted a rectangular area, so I had to get some English data to match. They provide a nice, whizzy map interface and you could draw a banding box on it and say, give me all the data. And it returns this helpful error message,
you've selected too much, in fact, this is Scotland 50 centimeter Phase 4 data, the limit is 20 tiles per request. Now, 20 tiles is not very much, because again, I was looking for maybe 150 tiles or 200 tiles.
And it gave me the top, I think it was the top left hand corner, so I couldn't even then say, all right, I've had those 20, give me the next 20. I had to carefully draw another box that took in the next 20 and didn't take in the first 20. Disk is cheap, really. If you can't find a couple of terabytes of temporary hard drive, let me know,
I'll send you some. I've got them lying around in my office. It's, no, don't limit me by the number of things I can download. So, a little digression here about formats. So, this is once I finally got to your portal and got some data. If it's a raster, I want it as a compressed tiled geotiff,
or COG if you prefer, that's got all my data in it. I don't want 496 separate TIFF files that I have to put together. I definitely don't want 496 ASCII grid files with nine digits decibel precision on the heights
that you didn't bother to compress. So, if it's actual measurements, I want LZW compression or deflate compression. I want the actual numbers. Serial imagery, JPEG is great. If you don't know how to compress your stuff like that, go read Paul Ramsey's Compression for Dummies tutorial.
He basically explains how he's taken the city's data set, and I think he went from about 500 gigabytes down to 57 or something. You can make them really small. It's great. If it's a vector file, ideally I want a single file. I don't want a zip file that I've got to take apart
and unpack and hope for the best, particularly as every so often if I'm on the lab computers, then the virus scanner check kicks in and says, oh, you can't download zip files because they might contain dangerous macros. And again, that's quite distressing for the students when they're trying to do their assignment. I want the full attribute names.
I don't want them cut off at 10 characters. That's really annoying. Don't use shape files ever. I want it in a good projection. So, if I'm downloading UK data or British data, I want it in OSGB. I don't want to be forced to take it in that long and convert it back to OSGB. To keep you collected in OSGB. You've lost me a lot of accuracy
by doing that conversion and going to GeoJSON. I really want something that respects character sets. So, if I've got funny squiggles over my letters, I want them to still be there when I download it. And I particularly want it to be supported by my Phospho-G software. So, I want to be able to use it in QGIS. I want to be able to use it in GeoServer.
I want to be able to read it with GDAL or OGR. I don't want an Esri coverage something or other, which British Geological Survey tried to get me to download the other day. It's no help to me at all. And it comes in eight different files and you can lose them. So, those are all the things that really annoy me about it.
Is there a better way? Of course there is. So, better living through open standards. Go read the OGC standards. They're brilliant. They're easy to use. They've been around for 15 years, 20 years now. They're well understood. Everybody pretty much understands them.
There are a few holdouts in the industry that don't read them properly, but to be honest, we're not talking about them here this week. And if you're sitting there thinking, oh, well, that's okay. We've got WMS linked for our data. We do standards. That's not actually what I meant. So, there's a picture of Scottish deprivation
from Scottish government web map server. It was good. Okay, it's in blue to green. I don't really like the color scheme, but you know, okay, I can live with that. So, but I only want the Glasgow data, so I'm gonna clip it. So, I open up my raster extraction, clip raster by extent,
bot tool in QGIS. I can select the SIM to 2030 data and I can clip it. It's great, isn't it? Whoops. Still got horribly wrong. I've got red all over my screen now. And this really distresses the students
when they think they've done something that they learned the week before with rasters and it doesn't work any longer. QGIS doesn't accept WMS layers as rasters. And that's fair, because they're not. It's a picture of the data. It's not the actual data. But this is unexpected to new users. It's unexpected to experienced users as well, actually.
So, WMS is a picture of the data. There are other standards we should be using. Expect to date your serving. Use WFS, somewhat elderly, well-understood standard now. Is the OGC API features standard? Again, it's nicely standardized.
Most people understand it. If it's raster data, please use the web coverage server. Please use version two, because version one was really hard to understand. I can just about make version one work, but version two is much easier to understand. You actually get the data. So, do try and make sure you get your axis order right when you do it.
So, that's what happens if I go to the Scottish spatial data, deprivation data, WFS server. So, the green one is the WMS map still. The purple one is the WFS server. Axis orders, they're important. So, I switched it around, told QGIS
to ignore the axis order the server was specifying. It does actually work. I can style it. It's great. I can do selections on it. It's quite tricky to do a selection on it. It would be really nice if, when I went to the site, it was a link that said download deprivation data for the East Kilbride parliamentary constituency
rather than me having to work out how to do that. So, I can work that out. First time users, not so much. We have coverage server. This is German data. This is wheat growing in Germany apparently, I believe, if Google translates correct. So, it's a nice raster layer.
I can overlay it. I can style it. It's great. Can I clip it? Turns out, no. But this is actually on QGIS, not on the WCS. I think QGIS doesn't understand WCS layers as rasters, which is a bit weird, and I will try and dig into why that is at some point. If I save that out as a geotiff,
and then you reimport it, the clipping works fine. So, there's nothing wrong with the data. It's just the way QGIS sees it. So, conclusions. I'm down to my last three minutes or so. Don't restrict your output format to whatever was uploaded to your portal. That's really annoying.
Use GDAL, use OGR, use JIRA Server, use Map Server. Any of these programs will convert from any format to almost any format. You're not restricted by what they've given you. Try not to use non-standard formats. Try not to chop up your actual names. Don't force me to choose what I want by hand.
Ideally, I want QGIS to be able to talk directly to your JIRA portal and download the data I want. I don't really want to ever have to see your JIRA portal. I just want to know what the URL for it is. Don't restrict me to a small subset of your data. Ideally, you're not short of space. You're not short of internet capacity these days.
To be honest, not that many people are downloading your data that you need to worry about your download limits. Don't require me to download more than I want. So, if I'm looking for a particular area, don't say, oh, that falls across four tiles. You'll need to download all of those and clip them yourself. You do the clipping.
Clipping is easy. You can do it. Don't make my students do it because they really struggle with it sometimes. Do provide OGC standard data endpoints, WFS, WCS, and WMS. Make sure you put all your related data sets together. Don't make me go to some completely different agency
because they're in charge of boundaries and you're only in charge of statistics. Just copy their boundary data over onto your statistics server. So much easier. Provide me a way to filter by appropriate areas. So, don't serve your data purely in tiles. I particularly want it in parliamentary areas
or council areas, that sort of thing. Make sure that machines can talk to your server, service to your portal. Don't make me talk to it necessarily. And provide a static URL so people can cite their data sources. This didn't used to worry me so much when I worked in industry, but now I'm back in academia. I have to be able to say exactly which version of your data I used.
So, don't keep changing the data, the data behind the static URL. Okay. Anybody wants to download the slides, they're there. Or you can use the QR code. Take some questions.
That was perfectly in time and I'm sure we recognized a lot of these annoyances. I did, certainly. So, the floor is to the questions, yes. Hi, thank you for the great presentation. We have a geo-portal.
OTC standards, everything is downloadable, including shape files and zip, sorry, but it's there. It's based on your network and geo-server. But now I'm in the awkward position that we want to have a data portal together with business intelligence and documentary information in my organization. And I want to protect my geo-portal,
but I want to bring it in a good place in the data portal. Just provide links to the WFS endpoints. We all do that. That's all a data portal is. It's just a link to a WFS endpoint. And while I'm still worried a bit about how I can, we still have a geo-portal in the new data portal,
but I will need to combine it with something about, well, when we have an environmental report, I want to include the download links to everything, but I'm really worried about how the maintenance it will take to get everything in the right order. Do you have any tips or things I should think about?
I'm probably not the right person to ask about non-spatial data. But again, you can produce reports directly out of geo-server if you put your mind to it. So my theory is that you shouldn't be doing anything by hand. It should all be machine-generated.
So convince your users that this is the way to go. They should write some small Python scripts or something rather than a document. They might not like that, though. I've got to keep worrying for now, but thank you. More questions.
I should also say, actually, I meant to say that the Estonian land board geo-portal is absolutely brilliant. It's one of the best ones I've found. It actually fulfills all the things I wanted.
So Ian, what is your experience with data licensing? Because different geo-portals have different licenses for using the data. All data is free. I don't care. I'm an academic. I steal everything. If it's good enough for an AI bot, it's good enough for me.
So yes, licenses are just a nightmare. Yeah, the fact that we can't combine ODL data with CC0 data we were discussing this morning and things like that, it just makes me cry. So mostly, I just try and ignore data source, data licensing as much as I can.
If the worst comes to worst, I'll just claim that I've used less than 10% of your data. It's fair use. And that might be enough to keep your lawyers off me. Other questions?
So thank you very much. You wish that you get a single geo-tiff. I call it accessible. But that means if the data is stored in several geo-tiffs, because one huge
for a whole great meeting can be too big, that means that they have to make your geo-tiff for you. So that, okay, I want to download this and you have to wait for the process to pick all the pieces, join them, and then maybe the day after you get the result. Most of these geo-portals will make me wait an hour
to get the download of the tiles I want. So I'm not worried about waiting half an hour or 20 minutes and getting an email later to download. It already does that, because it takes that much time to zip up the tiles they've already got. But actually, if you've got your data stored as tiles, you know, as a COG mosaic,
chick-clipping out of the area I want won't take you very long. It's well within the capabilities of quite reasonably-sized server. There's another question over there. Hello, Ian, thank you very much for the presentation. I have a question. What is your take on the language of the geo-portals?
What language should they provide? For example, I don't know, Italian geo-portal is in English, and some parties are in Italian and some parties in English. I'm quite happy, as I say, in an ideal world, I'm only going to your portal to get the WFS endpoint or the WCS endpoint.
After that, QGIS can talk to you, and ideally, yes, you've got inspired-based language metadata built into your data set, but I can use Google Translate to work out, you know, that German wheat fields that I had was all in German, but I could work out they were wheat fields quite easily.
It's not that hard to translate the information. So yeah, whatever language you want. So your local language is fine for me. Time's almost up. We can take maybe one more short question, if there's any.
No, then I would like to thank Ian for his brilliant presentation.