We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Processing large OpenStreetMap datasets for geocomputational research

00:00

Formal Metadata

Title
Processing large OpenStreetMap datasets for geocomputational research
Title of Series
Number of Parts
17
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production PlaceWageningen

Content Metadata

Subject Area
Genre
Abstract
OpenStreetMap (OSM) is a free and openly editable map of the world. Like Wikipedia and unlike government or corperation maintained datasets, OSM is created and maintained by a community of volunteers, making it the premier decentralized and fastest evolving source of geographic vector data focussed on features relevant to human activity (e.g. roads, buildings, cafes) on planet Earth. Unlike Wikipedia, every data point in OSM has a geographic location and attributes must be structured as key-value pairs. OSM is a rich source of data for geocomputational research, but the decentralized nature of the project and the sheer volume of data. ‘Planet.osm’ now has more nodes than there are people on Earth, with more than 8 billion nodes, and the rate of data creation is increasing as the community grows, to 10 million users in early 2023. The size and rapid evolution of OSM are great strengths, democratising geographic knowledge and ensuring resilience. However, these features can make it difficult to work with OSM data. This lecture will provide an introduction to working with OSM and will cover the following: - How and where to download OSM data - How to process small amounts of OSM data using the osmdata R package - How to process large OSM ‘extracts’ data with the osmextract R package - Other command line tools for working with OSM data, including the mature and widely used osmium tool, the pyrosm Python package and the osm2streets web application and Rust codebase Finally, the lecture will outline ideas for using OSM data. It will conclude with a call to action, inspiring the use of this rich resource to support policy objectives such as the fast and fair decarbonisation of the global economy as societies transition away from inefficient, polluting and costly fossil fuels.
Web pageThermodynamischer ProzessAbstractionProcess (computing)Computer fontMusical ensembleSatelliteQuery languageUniform convergenceZoom lensGrass (card game)Computer-generated imageryMathematical analysisPoint cloudComputing platformFatou-MengeVector spaceSoftware frameworkGame theoryOpen setComputational physicsAttribute grammarVolumeVertex (graph theory)Directed graphBit rateGraph (mathematics)Time evolutionBuildingHorizonElectric currentGroup actionACIDCycle (graph theory)Software developerComputer networkBasis <Mathematik>Disk read-and-write headIntegral domainObservational studyFocus (optics)Process modelingSystem programmingEndliche ModelltheorieDisk read-and-write headConnectivity (graph theory)ConsistencyPhysical systemInformationSoftware developerBitLink (knot theory)Centralizer and normalizerSpacetimeSatelliteMappingThermodynamischer ProzessToken ringDirected graphLevel (video gaming)Parallel portOpen setWeb applicationShift operatorDifferent (Kate Ryan album)State observerLaptopTranslation (relic)Moving averageSet (mathematics)Chromosomal crossoverNeuroinformatikNumberRaster graphicsIntegrated development environmentFocus (optics)Structural loadMaterialization (paranormal)FacebookEvent horizonPower (physics)Lattice (order)FreewareVector spaceContent (media)Formal languageCodeXMLComputer animation
GEDCOMUniform convergenceFormal languageEvent horizonSymbol tableGoogolLine (geometry)Computer networkTrailRevision controlLocal ringFeedbackPlanningWeb browserCache (computing)Numbering schemeDecision theoryBoundary value problemDigital filterOpen setGUI widgetTime zoneProcess (computing)PrototypeGoodness of fitSelf-organizationWeb applicationCollaborationismComputer animationSource codeXML
Computer networkLine (geometry)Open setCompilation albumGUI widgetWindowStreaming mediaLevel (video gaming)Cartesian coordinate systemScaling (geometry)Line (geometry)InformationMeasurementResultantSoftwareVector potentialCycle (graph theory)Visualization (computer graphics)
Open setLine (geometry)GUI widgetComa BerenicesComputer networkBoundary value problemGoogle MapsMetreLink (knot theory)Performance appraisalPoint (geometry)SoftwareLevel (video gaming)Complex (psychology)NumberRepresentation (politics)Boiling pointMedical imagingCycle (graph theory)Engineering drawing
SineGUI widgetBoundary value problemComputer networkGoogle MapsMeasurementBitEvelyn PinchingProcess (computing)Integrated development environmentCASE <Informatik>Structural loadWind tunnelBridging (networking)Vapor barrierEngineering drawingProgram flowchartComputer animation
Web browserLine (geometry)BuildingLatent heatAreaSocial classGoodness of fitBuffer solutionClosed setPolygonForestProjective planeMultiplicationEngineering drawing
Computer networkRepository (publishing)Data typeView (database)Game theorySoftwareGeometryData conversionSimulationTransportation theory (mathematics)Library (computing)Personal digital assistantComputer programmingStatisticsFormal languagePlanningGroup actionDeutscher FilmpreisEvent horizonSymbol tableOpen sourceCartesian coordinate systemPhysicalismWordPoint (geometry)SoftwareDifferent (Kate Ryan album)Level (video gaming)Type theoryWebsiteComputer animationEngineering drawing
Network topologyVertex (graph theory)Revision controlBoundary value problemService (economics)Finitary relationQuery languageOpen setSource codeDatabaseAreaComa BerenicesXMLWebsiteLevel (video gaming)Service (economics)CASE <Informatik>MereologyOpen setHand fanComputer animation
CodeVisual systemDensity of statesVolumenvisualisierungText editorInteractive televisionCellular automatonLine (geometry)Video game consoleLink (knot theory)File formatProcess (computing)Open setSource codeDatabaseGame theoryRepository (publishing)Data typeSimulationSoftwareData conversionComputer networkTransportation theory (mathematics)GeometryLibrary (computing)Personal digital assistantStatisticsComputer programmingFormal languagePlanningOpen sourceWebsiteLine (geometry)Data structureGeometryEntire functionTheory of relativityLatent heatReading (process)Computer fileLevel (video gaming)Fitness functionComputer animation
VideoconferencingDirected graphAlpha (investment)FreewareWeb browserSoftwareGame theoryMachine visionNeighbourhood (graph theory)Computer networkSimulationGame theorySimulationType theoryOpen sourceSoftwareWeb browserPresentation of a groupMultiplication signExtension (kinesiology)Computer animation
Physical systemPlot (narrative)SimulationEmbedded systemDistanceComplex (psychology)AbstractionQuarkFormal languageGoogolAddress spaceTwitterTerm (mathematics)MathematicsEndliche ModelltheorieProjective planeComputer animationEngineering drawingProgram flowchart
Numbering schemeAsynchronous Transfer ModeStructural loadAttribute grammarMenu (computing)Point (geometry)Doubling the cubeData typeAreaLengthPlanningLevel (video gaming)Vector potentialCASE <Informatik>Cycle (graph theory)Similarity (geometry)
Line (geometry)Directed setComputer networkWeb browserHorizonElectric currentGraph (mathematics)Time evolutionGroup actionThermodynamischer ProzessFocus (optics)Basis <Mathematik>Integral domainObservational studyDisk read-and-write headEndliche ModelltheorieProcess modelingSystem programmingCycle (graph theory)Software developerDistanceComplex (psychology)Process (computing)InformationSoftwareMereologyPresentation of a groupPlanningString (computer science)Line (geometry)Rule of inferenceSeries (mathematics)GeometryCodeBinary codeArithmetic progressionOpen sourceComputer animation
Web pageAbstractionProcess (computing)Thermodynamischer ProzessComputer fontSlide ruleSheaf (mathematics)Installation artInterface (computing)Query languageIdeal (ethics)Turbo-CodeService (economics)Focus (optics)Scale (map)CodeData typeComputer networkShape (magazine)Pressure volume diagramLine (geometry)Information securitySoftware repositoryFormal languageCodeSheaf (mathematics)Group actionMaterialization (paranormal)Projective planePresentation of a groupBitWave packetThermodynamischer ProzessTerm (mathematics)CodeSource codeMultiplication signComputer animation
Series (mathematics)Formal languageTwitterMathematicsNumbering schemeBeta functionTurbo-CodeQuery languageWater vaporComa BerenicesSkewnessWeb pageThermodynamischer ProzessRegular graphInterface (computing)Ideal (ethics)Slide ruleMeta elementContent (media)Data typeWebsiteLibrary (computing)Query languageBuildingWeb 2.0Inequality (mathematics)BitComputer animation
Formal languageGoogolAddress spaceTwitterTerm (mathematics)MathematicsNumbering schemeCommunications protocolBuffer solutionOpen setServer (computing)Data compressionThermodynamischer ProzessGraph (mathematics)Cycle (graph theory)Query languageGoodness of fitStructural loadInstance (computer science)AuthorizationSet (mathematics)Computer fileTotal S.A.Arithmetic meanWeb 2.0RootComputer animation
Query languageRegular graphWebsiteMeta elementLibrary (computing)Data typeContent (media)CodeThermodynamischer ProzessDivision (mathematics)Level (video gaming)Time zoneService (economics)IcosahedronArrow of timeWeb pageSlide ruleInternet service providerPolygonAreaPoint (geometry)Function (mathematics)Functional (mathematics)NumberDifferent (Kate Ryan album)Computer fileChainInternet service providerVector potentialQuery languageSet (mathematics)CASE <Informatik>DialectTime zoneLevel (video gaming)Interface (computing)DatabaseService (economics)CodeSheaf (mathematics)Library (computing)Text editorVector spaceMoment (mathematics)Expected valueVolume (thermodynamics)MathematicsGraph (mathematics)Directed graphTexture mappingLaptopFocus (optics)Maxima and minimaPoint (geometry)Software developerFlow separationResultantComputer animation
Web pageThermodynamischer ProzessAreaSlide rulePoint (geometry)Function (mathematics)Internet service providerComputer fileView (database)CodeVisual systemCellular automatonLine (geometry)Plot (narrative)VolumenvisualisierungBlogText editorPolygonMultiplicationVideo game consoleMessage passingInteractive televisionFinitary relationGeometryAttribute grammarString (computer science)Open setoutputDirectory serviceOperations researchVector spaceDirected graphField (computer science)Probability density functionCausalityComputer fileMultiplication signDemo (music)DialectInternetworkingWordProper mapBitUniform resource locatorString (computer science)InformationGeometryCASE <Informatik>DivisorCycle (graph theory)Keyboard shortcutInternet service providerRaw image formatPolygonQuery languageOpen setGraph (mathematics)Software repositoryHard disk driveAreaMatching (graph theory)DistanceDemosceneComputer animation
Function (mathematics)String (computer science)Internet service providerDirectory serviceOperations researchVector spaceDirected graphPoint (geometry)Field (computer science)Thermodynamischer ProzessAreaInformation securityCodeData typeContent (media)Computer-generated imageryFatou-MengeBlock (periodic table)Group actionWikiGeometryHausdorff dimensionQuery languageDesign of experimentsPattern languageWeb pageComputer reservations systemGraph (mathematics)Maß <Mathematik>Line (geometry)Slide ruleProcess (computing)Texture mappingAxiom of choiceMultiplicationPolygonError messageComputer networkRadiusData structureSubsetSquare numberExecution unitDemo (music)Plot (narrative)Point (geometry)MappingTable (information)Type theoryGraph (mathematics)MultiplicationSet (mathematics)Theory of relativityGeometryData typeTerm (mathematics)Structural loadSoftwareLine (geometry)Centralizer and normalizerBitString (computer science)CASE <Informatik>Observational studyCycle (graph theory)Multiplication signResultantBoundary value problemComputer fileMereologyRoutingVertex (graph theory)Group actionCovering spaceTrailElectronic mailing listData structureEqualiser (mathematics)Reading (process)BuildingProbability density functionPolygonShape (magazine)WebsiteLink (knot theory)Internet service providerSeries (mathematics)Thermodynamischer ProzessComputer animation
Web pageThermodynamischer ProzessAreaSubsetPolygonGeometryComputer reservations systemField (computer science)Hausdorff dimensionDirected graphGEDCOMFormal languageAddress spaceGoogolTwitterYouTubeTerm (mathematics)MathematicsNumbering schemeStreaming mediaLink (knot theory)Instance (computer science)Electronic visual displayPhysical systemComputer-generated imageryUser profileOnline helpSign (mathematics)Projective planeStructural loadDimensional analysisProduct (business)BuildingMultiplication signSystem callMixed realityCross section (physics)Software maintenanceDirection (geometry)Computer animation
Term (mathematics)Formal languageTwitterMathematicsNumbering schemeDemo (music)Neighbourhood (graph theory)Frame problemVisualization (computer graphics)Core dumpSoftware developerEuclidean vectorMixed realitySineYouTubeCodeTape driveGoogolComputing platformMixed realityImplementationGoogle Street ViewDimensional analysisSoftware developerProjective planeComputer animationRepresentation (politics)BuildingShared memoryContent (media)File formatComputer animationLecture/Conference
Time zoneGUI widgetSineComputer networkLine (geometry)Digital filterBoundary value problemThermodynamischer ProzessDefault (computer science)Reflection (mathematics)GradientRight angleLevel (video gaming)Theory of relativityTerm (mathematics)Open setMappingHill differential equationComputer animation
Event horizonFile viewerPoint cloudRaster graphicsVector spaceData transmissionTerm (mathematics)Case moddingSoftware developerNetwork topologyStreaming mediaOutlierStructural loadHome pageFrequencyServer (computing)StatisticsText editorCodeContent (media)Directed graphGEDCOMStandard deviationData conversionDatabase transactionExpressionPrice indexDatabaseBuildingSet (mathematics)Computer fontSystem programmingFile formatGroup actionInformation securityMobile WebInstallation artScripting languageExponential functionLocal ringVirtual machineLevel (video gaming)Open setSoftware developerMappingVisualization (computer graphics)Software testingWeb applicationAdditionComputer animation
EmailBit rateSocial classView (database)Local GroupEmulationRing (mathematics)Scale (map)ExplosionGEDCOMMobile WebScripting languageLocal ringStructural loadAbelian categoryGamma functionAreaSubsetComputer networkDemo (music)Texture mappingAxiom of choiceWeb pageThermodynamischer ProzessField (computer science)Computer reservations systemGeometryPolygonHausdorff dimensionDirected graphInstallation artPlot (narrative)Electronic mailing listoutputEquivalence relationCodePhysical systemBoundary value problemRectangleComputer clusterDigital filterFile formatSinguläres IntegralFunction (mathematics)System programmingElement (mathematics)Data structureElectronic program guideIdeal (ethics)WikiContent (media)Internet service providerInformationText editorFocus (optics)FrequencyWebsiteQuery languageData typeMessage passingError messageInformation securityGroup actionCodeSimilarity (geometry)Functional (mathematics)Error messageSoftwareGraph (mathematics)Level (video gaming)Type theoryFunction (mathematics)Equivalence relationTerm (mathematics)Object (grammar)Set (mathematics)MereologyStructural loadBoundary value problemAreaFormal languageQuery languageFitness functionFlow separationVector spaceSheaf (mathematics)Process (computing)Attribute grammarPoint (geometry)Demo (music)Reading (process)Multiplication signSemiconductor memoryPlotterInclusion mapInteractive televisionInformationWebsiteData structureSource codeComputer animation
Thermodynamischer ProzessDigital filterMilitary operationWeb pageLine (geometry)Game theoryGEDCOMComputer fileComputer networkQuery languageAktives RechnernetzCalculationParameter (computer programming)Directed graphState transition systemProcess (computing)Data typeError messageMessage passingOptical disc drivePlot (narrative)CodeGroup actionWikiInformation securitySoftware testingAxiom of choiceData conversionExplosionRepository (publishing)GeometryMereologyMeasurementQuery languagePoint (geometry)Standard deviationError messageParameter (computer programming)Source codeCodeLibrary (computing)Data structureComplex (psychology)BitSet (mathematics)2 (number)NP-hardCycle (graph theory)Theory of relativityPlotterSoftwareFigurate numberComputer fileComputer animation
AreaPolygonElectric currentMobile appTwitterGoogolView (database)GEDCOMWeb applicationDifferent (Kate Ryan album)Arithmetic progressionRepresentation (politics)Open sourceRule of inferenceClique-widthPoint (geometry)Computer filePolygonScaling (geometry)SoftwareView (database)Term (mathematics)BitFunctional (mathematics)Computer animationEngineering drawing
AreaPolygonWeb pageFrame problemFocus (optics)Plot (narrative)Meta elementSoftware repositoryContext awarenessInheritance (object-oriented programming)AbstractionComputer fontCodeMathematical analysisData analysisSoftwareProof theoryInstallation artTable (information)Computing platformThread (computing)PasswordLemma (mathematics)Gamma functionGEDCOMMessage passingDirected setWhiteboardScale (map)Process (computing)Directed graphFile formatShape (magazine)Subject indexingStandard deviationType theoryWKB-MethodeWikiBuildingComputer networkTransportation theory (mathematics)Boundary value problemSoftwareRaster graphicsLink (knot theory)Open setComputer animationSource code
Thread (computing)Transportation theory (mathematics)Image resolutionRaster graphicsBuildingProgrammable read-only memoryFile formatDistribution (mathematics)Daylight saving timeProduct (business)TorusAngular resolutionPolygonAverageObject (grammar)Social classUsabilityProcess modelingCubeInheritance (object-oriented programming)Integrated development environmentOpen setComputer clusterGEDCOMRevision controlExecution unitThumbnailWeb browserEmailPreprocessorDemo (music)Error messageTemporal logicPopulation densityPixelComplete metric spaceConsistencyPositional notationComputing platformComputer-generated imageryFrame problemAbstractionCovering spaceMetreIntegrated development environmentZoom lensStructural loadBuildingSatelliteType theoryAliasingRaster graphicsState observerSource codeComputer animationEngineering drawing
Bookmark (World Wide Web)Event horizonMenu (computing)Data transmissionTerm (mathematics)Group actionTwitterGoogolClient (computing)Pressure volume diagramMathematical analysisDirected graphFormal languageJava appletGeneric programmingDependent and independent variablesBoundary value problemInformationLink (knot theory)Web pageView (database)Content (media)Parameter (computer programming)Element (mathematics)Metric systemGroup actionFormal languageComplete metric spaceCodeGeometryHand fanMultiplication signQuery languageLengthOpen setComputer animationXML
Transcript: English(auto-generated)
Okay, fantastic. Thanks Beatrice and everyone in the OpenGeo Hub Foundation for making this event possible. Thanks everyone for joining. I feel a little bit like I've got the graveyard
shift. I thought it would be worth just going back briefly, a brief overview. There's been so much content covered in OpenGeo Hub 2023. I had one of the first sessions and I'm doing
one of the last sessions. I don't know if that's good or bad, but I'm very pleased to see lots of people made it, especially because yesterday there were free beer tokens and very tasty Polish food in Poznan. Everyone's managed to get up with hangovers, whatever. On Monday,
I covered tiny geographic data with SF and other tools. What was interesting about this is that I was doing that in parallel with Michael Dorman, who was doing a session on Python.
That's unusual for an event like this. Normally, you just specify on one tool or one language and roll with that. But in this case, you could choose, do you want to do R or Python? There's even Julia content at this summer school, which is really great. We tried to integrate those as well.
Another question, has anyone taken a little look at the translation between the Python code and the R code, or have you just all looked at one or the other? Question for people. If you're interested
in being more multilingual, if you speak R, but you want to see what it looks like in the Python world, we've put together some materials that show how you can do the same thing in R and Python. The reason I'm showing that now, and we've had loads of great talks just going throughout,
quite a focus on, I'd say, remote sensing data and raster data. Again, I think it's great to have this crossover because some of the concepts and the tools and especially the ability to
process really big data sets are applicable to both the vector data and the raster data. Another thing that's great about this summer school is that it is multidisciplinary. There is a general focus on earth observation and earth systems and maybe environmental science,
but there's also a fair amount of research and workshops on more social science-y stuff. We had the course yesterday from Anita on moving pandas and there's this one on open
street map data sets. I'm going to go into the workbook via the abstract, so just to provide a bit of a heads up on what we're going to cover. It is largely about this OSM data source and from
the previous hands up poll, I know that about half the people in this room are using OSM for research already. If you're new to OSM, a good way to think about it is it's the Wikipedia of maps,
so it's geographic data that's provided by the public, so that's why that has advantages and disadvantages. It means that the coverage of certain tags is variable. It's not a consistent data source. For example, open street map data in central Berlin where you have a very strong
OSM community is much more detailed than OSM data in North Korea, for example, but you'd still get data in North Korea and Berlin, but you just need to be aware that there are differences
because it is a community of people and that community varies across space. That's one thing to say. I guess that means that it's difficult to do scientific research with OSM because the tagging is so variable and there are biases in that data set that depend on the
communities of people using it. But the emphasis of this talk is actually large OSM data sets. It's usually used, probably the most common use of open street map data is it powers a lot of web
applications. Basically, you've got Google Maps on the Google side. I think Apple has their own geographic database, but then everyone else is using OSM. I believe Facebook has invested substantial amounts of money into OSM infrastructure and a lot of other
companies use open street map as the basis of their maps. Every single one of those maps that has open street map in it has this usually little tag at the bottom saying, do you want to make this map better? Please contribute. Even if it's only 0.1%,
so people who use those maps actually click on that button, that's still a lot of people contributing. The amazing thing, one of the features about OSM is it's continuously updating even more frequently than satellite imagery data where you might get an update every two weeks as
it passes by. With OSM, it's literally every hour. You can download extracts of the entire planet every hour and stuff's going to change day by day. There's just a few numbers in here.
10 million users, 2023. As far as I remember, it's 8 billion nodes. That's interesting from a computing perspective, what do you do with this kind of data? I think that's useful to have access to this kind of huge data set. I was fairly ambitious when I put together this abstract, but
I think we can cover all of this here. In this talk, we're going to go through, I've said what OSM is, and I think that's enough of a definition for now, so you can go and look up more
information about what it actually is. It will become apparent more features of OSM by using the data. We're going to look at how and where to download OSM data, and by that, I mean large OpenStreetMap data sets, how to process small amounts of OSM data using the OSM data R package.
I've actually skipped over that bit because that's been covered in quite a lot of detail in other places. The main package we'll be looking at is OSM extracts, which is a package that makes
it super easy to get some of these large OSM data sets. If you want to download a data set representing all the roads in Poland, you can do that, and maybe you will do that during the session on your laptop. There are other command-line tools, such as Osmium Pyrosm,
which is a Python package. Osmium is like a command-line tool called from the system command line. I've also put in an example with OSM-NX, which is a very popular Python package for working with OpenStreetMap data. I've also mentioned some other things that are fairly new
and quite fun that you can do with OSM data. The lecture's going to talk about, well, how can we use this data to do stuff? There's a link to the seminar, the panel session on how can our Python
and Julia developer community support decarbonization. I want to link this back
to that debate that we had about how can we actually make a difference to this really important problem of the climate crisis? We're going to think about that. In fact, I'm going to cover that bit first because that's the more philosophical, non-practical component.
I've just got a few examples of how OpenStreetMap can be used in this policy space to actually support policies that can lead to decarbonization. I've got a few tabs up to illustrate that. In fact, the first thing that I'm going to show is, again, something that I showed on the first
lecture, which is just an example of how you can use these processing techniques to generate tools to inform policy. This is the Network Planning Tool for Scotland. It's a prototype
tool funded by Transport for Scotland. It's being developed in collaboration with Sustrans, which is a non-for-profit organization in the UK supporting sustainable travel. At the University of Leeds, we are developing a web application to support investment in new
cycleways in Scotland. In Scotland, as in many other countries, there is an increase in funding available for active travel. That's a really good thing because people recognize that there is a problem with obesity, there's a problem with air pollution, there's a problem with
emissions, there's a problem with dependency on imported fossil fuels from countries such as Russia, so we need to try to tackle this. An investment in active travel is a way to get many of the benefits on an individual level and on a bigger level. What this application
does is it links to this lecture because it uses large OpenStreetMap datasets. Every single line in here comes from OpenStreetMap. You may think, oh, well, if it's just a Wikipedia, you can't use it in production, but actually, the results show that this stuff really can be used
on a big scale. I didn't really present any of the ways of interacting with this, but something that we've got as a bonus question is to try to take the information in OSM
to get a measure of walkability or cyclability. Instantly, just by clicking on that button, we've gone from an interesting visualization of where you've got cycling potential to something that's, I think, more policy relevant. What it shows for the city, which is Edinburgh,
is a policymaker's attention should be drawn to the lines that are both thick and red. If it's thick and red, it means that there is high potential, but there's low cyclability. You've
got potential gaps in the network. This is a good example where you've got a lot of cycling potential coming from these residential areas in northeast Edinburgh, but they can go on reasonably quiet roads, and then they hit this, and that could be a bottleneck. Just by fixing this few
hundred meters of road network, you may enable many, many more people to get to the destinations that they're interested in. We've also put a link to Google Street View, so you can actually assess
why is this marked as not particularly cycle-friendly. You can see there, would you feel comfortable cycling on that? Put your hands up if yes. I probably would, to be fair. It doesn't look terrible, but it's four lanes. You don't have
anything where it says you're protected here. If I had an eight-year-old, I wouldn't take my eight-year-old, and I wouldn't take my eight-year-old grandparents on it. That's the level of cycleability that we need to make it available to everyone. There's another important point here,
which is that OpenStreetMap and indeed any 2D dataset cannot capture the entire complexity of the road network. You can't boil down everything in the street down to a few numbers on
OpenStreetMap. They're all representations of the full complexity of it. It helps to be able to look at an image like this from Google Street View to say, well, in this case, I think the evaluation of this road, this is down as having a cycleability of 20%. Basically, it's saying it's
not supercyclable. We could compare that with something else, and hopefully, it's going to be a bit more cyclable, but none of these measures are perfect. It's got traffic calming.
It's probably a bit quieter, but there's a feature here where I think this is not great because you could get... I wouldn't say this is supercyclable. This just shows that the OSM data
on which these measures are based are not perfect, so you need to take it with a pinch of salt. To some extent, yeah, that answers the question of how can you use it, but this is just one example. There's a whole world of possible use cases for OpenStreetMap data, and I would definitely like people when they're doing these practicals and listening to this session to think
about how you could use OpenStreetMap data in your research. I haven't thought much about how you could use OpenStreetMap data in environmental sciences. It's something that I'm very interested in, and I did my master's in environmental science and management, but I wonder if you're looking at
some environmental measure like ecological processes. Roads are actually really important in providing severance, so if you've got two communities of terrestrial mammals,
roads can form a real barrier to get around, and you see stuff like animal bridges or animal tunnels to avoid the roads, so that's one possible use case for the environmental sciences, but there's loads of other stuff. You get some pollution coming from roads, so yeah, that's one way, but broadly, OpenStreetMap data focuses on human-made infrastructure. It's
not exclusively human-made, but this is why OSM data tends to be more used in the more social quantitative sciences rather than the physical sciences.
If you like, I could actually provide a practical example. Yes, that would be great. I've been using OpenStreetMap data, both the land use polygons, and what we're doing is we're trying to find areas that are suitable as compensation areas for a specific kind of bird,
and that has some specific requirements. For example, it doesn't like forests, and it doesn't like anything that goes high into the landscape, so no forests, no buildings nearby, and so using the OpenStreetMap data that also has classes like forests or buildings or just
settlements in general, we can create a buffer around those areas, and well, the areas that we have left outside of those buffers, we can look at those more closely. Cool, yes, that's a really good use case, and is that in one specific country, or is that in multiple countries? Well, it's a project in Germany, but I guess you could, as long as you have a reasonable amount of
data, then you could use it anywhere. Yeah, so that is a great example, and OSM may have things that you might be surprised that it has, so it does have land use polygons. It's interesting that you're doing it in Germany, which is well known to have very good OSM data, but yeah, one of the reasons that I really like
OSM data is it means that methods that you develop, even if it's only for one city or one country, could potentially work in other places, so OSM tags are pretty much the same, so in this case, we're looking at a tool to support active travel planning, sustainable
transport, and specifically cycling, and the tag for that is highway equals cycleway, or that's one of the main tags. In Germany, it's not highway equals Fiedstrasse. In France, they don't use it, so cycleway is like the word that's used in every country, so yeah,
your solutions can scale potentially globally into any country if you use OSM data. So yeah, that's a really good point. Anyone else using OSM for unexpected applications or
physical sciences? No? Okay, that's fine, so I'll continue because we do have a lot to get through, so that's one example. There's an amazing open source software community and an amazing contributor community in OSM, so it's probably worth, if I'm talking about OSM,
it's not good that I've shown Google Maps and I haven't actually shown OSM, so to see OSM, you just go to OSM.org, and prominently, you can see that you've got this kind of button that's saying support the map, so they're always looking for contributors, and like Wikipedia, they do need
to raise money to operate, and the OSM website is pretty good. It gives you different types of map. I think this is probably in Poznan, so you can see there. It gives you a map just like
Google Maps or Apple Maps, but every piece of data in here is open access, so I like using it because you're encouraging people to use it, and in some places, it's very, very good in upstate, so this is Citadel Park, for example, that's very well mapped out. This is probably better than Google in this case because you've got a community who go out with GPS devices and they want to map
every single part of the park, so the level of detail here is really good, and this is what it looks like. I think it's probably worth just seeing what you can get. This is about data after all, so you can get pulled down some data from the OSM website directly, so you can just click
on any one of these components. You've got service road. Let's just click on that one, and then you can download the XML, so that's the XML representation, and if you save that as a .osm
file, you can actually import that with Google and therefore with R or Python as needed, and you can also download small extracts from OpenStreetMap in this way, so export. I think that's probably
the button, so let's just try doing that, so this is a small area, and I can just click export, and so they're really a fan of open data to get stuff. Given that this is
supposed to be a practical session, I'm going to try and import that, so if I just go map or just OSM example, actually, I'm going to start off with a different command, which is sf
st layers. It's in my temporary tmp thing, and I think there's a file now called map.osm,
which I've just pulled down, and if I run that command, you can see that this contains some data. I've got 25 multi-polygons, 10 points, 10 line strings, and four other relations geometry collections, so it's not as simple as you might think. OSM has a particular structure that doesn't
fit perfectly into the way that Gudal and the simple features specification work, so Gudal has solved this problem by making these five different layers available from .osm data.
Let's say we're interested in the lines, so osm lines equals sf read underscore sf, and then point it to the same file, which is map.osm, and then there's this command layer
equals quote lines, and then I can plot those OSM lines, so that's cool, right? You definitely wouldn't be able to do that with Google Maps, which you can't just go and download the data,
so that shows that the data's there, but this isn't about small data sets. This is about trying to get slightly larger data sets, and there's various considerations that you should take into account, so that's the OSM website. Any questions about OSM specifically?
So I'll move on, and going back to this idea that we're going to do the how you can use OSM, and then we're going to go into the practical, so do it in slightly reverse order. There's some really cool tools that people are building, like there's a whole open source community who are very talented building new things on OSM, and one of them
that I find pretty mind blowing is this AV Street game, so there's an entire open source city simulation game. Put your hands up if you played Sim City when you were younger, or maybe currently when you're a fully grown adult. I used to play it, and you can now do a kind of Sim City type
thing with open source software in the browser. This is almost more of an experiment of how far can you push open source software, but it's quite impressive what you can do with it.
This is written in Rust, and it compiles to WebAssembly, so you can run it in your browser, which is quite cool. You've got all these different things, so they made a game called 15-Minute Santa, so if you get to Christmas time and you want a game to entertain yourself or
people, you can pretend to be Santa and deliver presents within a 15-minute city. There's various tools that you can play with. This is a slight distraction, but it is a lot of fun, just to show you what OSM data can do,
and we're going to cover this to some extent. This just shows that you can add in scenarios. Interestingly, this is a scenario, and you can run
this model, and you should start seeing some agents. This is the time, so it's got to 5 p.m., and suddenly everyone's getting out of bed and starting to do their stuff around the city, and it's like an agent-based model where you can see people moving around and doing things.
So, we just slow it down. Hopefully, I can pull out one of these people. So, there you go. You've got a little bike going along there. You can click on it. Your camera follows the bike, and all of this is running on open data, and it's just amazing what's possible now.
Yeah, this is running in WebAssembly on client-side, so yeah, very impressive. Yeah, there's some pretty impressive technology going into it. Just to continue this, so the guy who developed this, Dustin Parlino, actually, I've got the pleasure of working with him in some of our work in active travel England.
So, we're working on a project called the Active Travel Infrastructure Platform, and it uses similar WebAssembly client-side stuff. The aim of this is to make it as
easy as possible to design new transport infrastructure. So, if I want to build a cycleway to south of Leeds, where you do actually have severance, like these roads are huge roads, this M1, it's really hard to get from Rothwell or some of these places out here
to City Centre. You just would not do that by bike because there's a high level of traffic stress, to put it mildly. So, let's imagine that we want to build a new cycleway. You can do this in
seconds, one, two, and it snaps to the road network, and then you can say, well, actually, here, I want to go through this park. So, the aim is to make it as easy and quick as possible to design new infrastructure. So, let's just pull this down here.
This has been used by transport planners to support their transport planning. So, we can finish that, and then you can add in your attributes, and then we've used this tool to collect data on planned infrastructure in England. So, that's something that's work in
progress, but is another potential use case of OpenStreetMap, and it's certainly helping with this tackling the climate crisis problem. And you can imagine if you take this data on planned infrastructure, and then you take data on the cycling potential that I presented here,
you can actually come up with quite a good picture on what percent complete is your network now, what percent complete will it be in five years time, where do we need to prioritize future plans. So, that's covered this part of the presentation on ideas of using OSM data. So, there's ideas that
I've got and things that we're actually working on that certainly are policy relevant, and it's good to see what's out there and what's possible. So, any other, any quick questions on any of the stuff that I've just presented, or
should we move on to the practical side? By the way, all of this stuff is open. So, I wouldn't say during this session, but at some point, feel free, please do make a note of AB Street. One of the problems that this software solved is how to take the line string geometry
of OSM and then extrude it in two dimensions. So, there's a whole series of rules and a code base that does that, and we're going to actually use that in the final, one of the final sessions. So,
we won't play AB Street, but all of this stuff is open. Same with ATIP. If you know any people who want to, who are interested in designing new infrastructure, please get them to use ATIP. It's open source. It works in any country. We've just built the binaries for England, but if you know techie people, it should be fairly straightforward to deploy it in other countries, and it's a work
in progress. Okay. So, that was the intro. Hopefully, everyone's ready with their packages, their computers, because this is a practical session, and I expect people to be running some of the codes at the same time that I'm doing stuff. So, yeah, these are the,
in terms of the structure, we've got this section on process large OSM extracts with the OSM extract
R package, and then we've got this bit on other command line tools for working with OSM data, and any of those could take up the session. I think it's probably worth just focusing in
on the OSM extract R package, especially given that I know the majority of people in this room attended the R training on the Monday morning, so I assume that you are comfortable using R. There's some stuff on Python as well, so that's good, but let's just dive into it. Just for
reference, this is something that you may want to have up on your browsers, so if you haven't already, please navigate to this website, which is ogh23.robinlovelace.net, and this is where you'll find the materials for this, and another thing to say is that the source code is, of
course, on GitHub, and you can see my GitHub actions are failing there at the moment, but all
of the code runs, or it should run, and we can fix any issues with that. In fact, given that the, yeah, so the stuff that I'll present will be slightly out of date, so I'm running this on, and this is the locally hosted version, which is slightly more up to date,
but that doesn't matter. So, the first thing to say is there is actually a lot of tooling around the OpenStreetMap community, so yeah, there's actually a whole language dedicated to
querying OpenStreetMap, so before going into the R and Python wrappers, it's worth being aware of what's the bare bones way of accessing this data, so if you want slightly large datasets,
you want a bit more control than just the OSM website, where you've got this download button, there's actually a project called Overpass, and that's an EU funded project for making it easier to get hold of OpenStreetMap data, and I've just got this example where you can zoom in
to a place, and then you can click on this wizard, so it's kind of handy for exploring let's try build query and run, and you can interactively explore that, so that's a web tool that you can use, let's try the example
that I mentioned, which is highway equals cyberway. There we go, so that's quite good
for just exploring what's in there. OSM tags are big and complicated, and that's one of the challenges with working with OSM data, it does take a bit of time, and they're all documented on the OSM Wiki, so yeah, that's the first thing to say, and I'm not going to run it now,
but if you want to get this with cURL, this is the Overpass QL language, which I would not claim to be proficient in this language, I find it quite confusing to be honest, for example,
I don't know what this inequality followed by a semicolon means, but that's how you write it, and then you can save that as a text file, and then you can download it, so that's probably the raw way of doing it, and this is what the OSM community themselves have developed. Two questions,
yes? Yeah, so I'm using this Overpass also for documenting how we wrote in the last 10 years. Yes. One thing that bothered me is to, I'm not sure if the lack of data in, let's say 2012,
was like, I'm doing it in Tanzania, so I'm not sure whether it was like there was no root, or there was no author and documentation, so how can we verify that part? So that is a good question, there is actually a web tool developed to try and answer this question,
and the name escapes me at this moment, it was developed by researchers in, what's the name of that German city where you have the OSM research? Heidelberg.
Heidelberg, yeah, so there's a tool developed by researchers at Heidelberg University that tries to answer that question. Yeah, it's the same people who developed open root service, but they've got this historic OSM data query builder, so you can say how many kilometers
of roads were there in 2010, 2011, 2012, and it won't tell you for sure, but it will give you evidence of when it's just that the data wasn't there, so it's a difficult, I think it's an
important question, I have the same question about cycle infrastructure, so you might suddenly get loads of cycle infrastructure appearing in 2015, that doesn't mean that suddenly they did loads of building, it just means that the OSM community started building, so it's a good question and I don't have any direct answers, but I think there are ways to help answer that,
but yeah it's a good question, I'll try and find that tool and send it your way maybe during one of the exercises, so yeah good question Tom. What's the size of the OpenStreetMap at the moment, the total size? Yeah, it's in the order of 100 gigabytes compressed I think,
so when you uncompress it, you need a terabyte drive to process it, so there's a file called openplanet.osm and there are people who specialize in working with this stuff and they just say you need a two terabyte SSD and there are tools that are designed for it,
but it might take a few hours to do a query on that data set. Yeah, that's a common use, so it takes a few hours to put into PostGIS. But is there a publicly-available PostGIS instance where you can just do the queries
and pull down anything you want, but in a way OSM itself is a giant database, those queries that I've done are calling those servers and they use their own stack, but there is definitely a
that's developed for working with these large data sets and in a way that's the topic of this, so specifically the compression that they use is based on protocol buffers and that's
the PDF files which has quite good compression ratios and that's how you download these big data sets. Did somebody try to put it in Geoparket now? I wouldn't surprise me if someone's tried, but I haven't seen that and obviously Geoparket is very new and they do have their own toolchains
and there are definitely some very good developers working on in this space, so one other example is have you heard of Protomaps? PMTiles, so PMTiles is like a new way or a way of serving vector data
and the guys developing that are into processing large OSM data sets, so there are tool chains for like planet-sized data sets, that's not the focus of this and so we're going to talk about country sized and regional... Yeah, I mean I think there's huge potential to add value to these data sets
and make it available for research, so that's actually a perfect segue to the next section of the lecture and there's some code in there as well, so people who are running the code,
which is hopefully everyone, if you've got RStudio or you've got your editor open, the way we're going to use this is actually through this OSM extracts package, so you can load those with library OSM extracts and try running this code, so you don't really need to
run this first line, that's just setting up the plotting and this shows that when you load the OSM extract, you get these zones and I think going back to your question Tom, like this is
how I don't understand why they've come up with this zoning system, if you had to break the world into a number of regions, if I had to do it, it definitely wouldn't look like this, it doesn't seem like particularly evidence-based, it just seems like someone got quite excited with a pen
and paper and just like drawing lines, but maybe there is some reason, like I guess one of them's like Africa, this is the Asian continent, so yeah probably there is some reasoning, but I suspect it's possible to do better than this and it's just weird that you have these big holes in the
ocean, so this is, if you want to work with large data sets, I think this is probably North America, this shows that the geofabric packages up the world.osm into different pieces, so it's slightly more manageable, but each of these regions is huge and you may struggle to
run it on a consumer laptop. So what's cool, so just to explain what this is, there's an industry that there are many companies using OSM data and adding value to it and how do they fund themselves, like there's a lot of companies that sell services that build
on OSM data and as long as you're compliant with the license which says that any changes you make to the database you must republish, you can use OSM for commercial purposes, so OSM is definitely
being used for commercial purposes and there's a company called Geofabric which provides geographic data services and one of the things they do, which is a free product, is they just package up all of these, they package up the world data set and provide these extracts and that's why the package has the name OSM extracts, so it's an easy interface to get hold of these. So this is
level two where you can get individual countries, Poland will be in there, Great Britain, then you have level three which is very patchy and again this shows that Geofabric is probably focused on
its clients, so they package up the extracts where people want them, so for UK and France you've got quite detailed regions but then for India if you want to download OSM for a city in India, which I needed to do a while back, you have to download all of India if you want to use Geofabric. So what
did we do? And this package was developed by myself and Andrea Giladi who is the lead on this package at the moment. We added another provider, so there's OpenStreetMap.fr which is like another
source of these packaged data sets and they have slightly better coverage, so you would expect France as it's developed by the French OSM community has detailed zones but also India,
so that's really useful and probably for China as well. So if you want regional extracts and then there's a third provider that we use called BeeBike. So if you want to get big OSM data sets but not planet sized, you could download planet.osm and have two terabyte SSD and then
write this code and wait for several minutes, maybe a few hours until you get the right result or possibly what most sane people would do is search for it online and download it and the reason this is necessary is because you cannot download this volume of data from the API queries.
You have a maximum amount of data which once you get to a large city, you'll start hitting these limits. So if you want to download the country's worth of data, that is necessary. So
we hope to add more providers. We hope that more providers will come along and give better coverage but across these three providers and BeeBike focuses on cities, we can cover a decent amount of use cases and so what OSM extract does, it tries to make life easy.
So a typical thing will be, okay, I need to do some research for this city and I want to get the data from that city. So just to show you this, what we're doing here is we're
pulling out, we're using this other package to geocode the texturing Poznan into a latitude longitude pair. So this is a point that's somewhere in Poznan and then this function oematch will match Poznan to the smallest available extract from each provider. So
it's actually quite a huge file. So the biggest one that we can get, it's actually a region in Poland, I think. Does anyone know what this is?
Okay, so this is a province. Okay, so that I thought we would have to get all of Poland. So you've got the province there. From BeeBike, you've got Poznan.osm. So that's like the city and then the French ones, you have to download the region and you can see the sizes of these
datasets. So that gives you an indication of like, how long is it going to take me to download them? And for all of Europe, those file sizes would start to get quite big, but you can run it. It just takes a bit longer. So, and again, we want to make it as easy as possible for researchers to
use OpenStreetMap data. So we provide, there's all these functions that feed in, but basically if you just want to get hold of the data, you only need one function with OSM extract, which is oeget. So that takes a city name and then you've got different things and it
will just download the data for you. So that's probably something, if people here on their laptops want to try running that function, it may take like up to a minute or so, let me know if it works. And in the meantime, yes. Yeah, question, because I started with
finding very correct area, especially in the geographic business. In this case of Pozna, yes, when you get to speak with them and find some shortcuts to find a Pozna because it's found this whole region, yes? And do you have any suggestions how to find the proper
region, cities, et cetera, in the resources, yes, for providers? So I do, I mean, my suggestion is before you run oeget, try running oematch. So oematch is the
thing that we use to identify the place names. So, I mean, what we can do is try it, maybe with a bit of a live demo of another place that someone's interested in, be helpful for
people. Yeah. Okay. So we can try that. In the meantime, has anyone managed to download data and import data for Pozna? Cool. Okay. So I'll come back to that, but someone give me a place name. So we're going to try this. Okay. So I'm just going to say the first thing I would do,
I could just go straight in and say, OE, can you see the screen? Okay. Yes. That's not too bad. I can make it bigger if people want. So how do you spell that?
Like that? And then I can literally just run that. I'm not going to do that because I want to know what it's going to do first. So it's oematch. So there you go. What's interesting
is it gives you quite a lot of information. So it's like, it tries geofabric first, because that was the first one we added and it's probably the most well known. So it has a look and it says, geofabric, what can you give me? And it uses string distance. So the
best thing that it can find... So this can be dangerous because if you get it wrong, you might end up downloading Japan and then you've maxed out your internet usage for the day or whatever. So it's good to run oematch first, see what it's going to download. So then it said,
yeah, Japan, well, let's move on and see what we can find. And then no exact match was found in any OSM provider searching for the location online. So I think what it's done is it's actually done a geocode and then it's like, oh, we've matched it with, and I'm not going to try and
say that. Someone else, one of our Polish colleagues can read that. Lesser Poland. And in fact, it's found it from geofabrics. So that's not bad, right? I have another question. So for example, if you'd like to find a name, so it's Warsaw, yeah? So it's better to use English word Warsaw or Polish, Warsaw.
Yes. Based on that, I suspect Polish, but let's try it again. So is that the... I'm not even sure if that's English. So, okay. So English seems better. Yeah, it's found it. How would you find it? So it's found at the B bike. So probably it may depend on your provider. So
in this case, it didn't find it in any of them, but because B bike, it only contains cities. It's got a dedicated one called Warsaw. So it kind of goes through and it's like best match I've got is this one, but you do need to, so kind of, you may need to
interact with it a few times. One of the really cool things about OEMatch is if you give it a polygon, what it does is it looks for the smallest area that contains all of the polygon. So in the UK we have cross boundary issues. So often it will be England, it's the smallest one, but if you've got to do some research that covers England and
Wales, if you give it a polygon that covers England and Wales, it will say, well, the centre is in England, but actually I know that you need this for Wales. So it will give the smallest one that contains all of it. So that's quite a nice feature that I don't think is
available in any of the other OSM extracting packages that I'm aware of. How do you spell Warsaw in Polish? So can it do it? And yeah, so it can do it. So it's not bad at getting it. And
when I say big, I don't mean planet size. Like if we're talking about using open geohub servers,
then yeah, that's appropriate. But for most purposes, we're talking about country sized, regional size. But one of the cool things, like if you rarely need all of OSM data because it's so much, you're usually interested in one specific thing. So the example from previously was talking about land uses. There is a SQL query that could pull down that, or if you're only interested in
cycle ways. So the key thing, you can download these big PDF files. The key thing is what do you read into memory? Cause that's usually the limiting factor. So you could easily have all of European PDF data, but you only want to pull in a particular thing that you're interested in. So
that's one of the key things that OSM extracts allows you to do as well. Okay. So that was a good little demo. People already have data from Poznan, which is cool. So that goes nicely onto the next thing, which just shows how this works. So what actually happens behind the scenes when
you run oe-get, people have run it and it's worked, it's downloaded this file. One thing to say is you can always go back and find that file. So there's oefind, which finds the file name
and that's kind of handy. And the way that OSM extract works is it downloads the PDF file, then it converts it into a geo package, which is quicker to write. And then it reads in the geo
package file for reasons that I won't go into. But when you do oefind, it will give you both the location of the geo package file and the location of the raw PDF file that was downloaded from one of these providers. And you can see there, when you run oefind Poznan,
it will give you that. And I'm just going to prove that it works, hopefully. And you can run this as well, oe underscore find. And the first thing it wants is a place name, Poznan. And so it's given me two. So there's two places and it's given me a file path that does actually
exist. And I can check that that exists by if I just say Poznan files, you can use those later. So that's kind of handy. And then I can say file dot exists in R and then Poznan underscore files.
And they do exist on my hard disk. And I can actually use that to read that in. So if you ever want to read this in, then that's fine. This next thing isn't really relevant, but it's quite
cool if anyone likes kind of automating workflows. So I've actually released this file in the repo and using this GitHub command line tool. So this just says, create a text string that says gh
release create v2 Poznan file. And it runs that. If you run that code, it will upload that PDF file. So if you want to find it in a place that's slightly more easy to find than by searching
through OSM, you can see the result of that is this PDF file. And that's actually a really interesting file. There's so much rich data in that PDF file. You can import that with Gudal. But if you're using R, I recommend using OSM extracts because it gives you some stuff that
will make it easier to work with. Or if you're using Python, I recommend using Pyrosm or one of the other packages. One of the important things to say, if you want a custom region, is that there are providers that allow you to draw a custom geometry and then click
download. And it will give you a custom extract for any shaped region that you're interested in. But there's still not, to my knowledge, any open API that allows you to send them the boundary and then you get it back. So you have to do that as a manual process. And there's the hot OSM, which is like the Humanitarium OSM community allows you to do this.
And there's a couple of other websites that allow you to do that if you want to get custom PDF. So it's useful to know that you can actually start with the PDF file if you want.
Okay. So that's a little bit on reading in the data. I've switched here to Monaco because to understand things, it's good to run it on a small example. And this just shows all of those
five layers from Gudal. So we can read in the Monaco points. And this would work for Poznan as well. So feel free to run it for Poznan. Monaco lines, multi-lines, polygons, and other. And this is actually the first time for this course, I just put together this little table to
show, I mean, it will vary from place to place, but you can see that points are pretty prominent, but you've got almost equal size for points, lines, and multiple
cons. And as you would expect, you've got probably more points than other features, quite a low amount of data per feature. So this is like shops, this is points of interest. One of the confusing things about OpenStreetMap data is you can represent the same entity in two
different ways. So let's say you've got a supermarket, it's not uncommon to have the supermarket represented as a point and a polygon, which is kind of confusing because then the same entity will show up in two different layers. So that's just one thing to be aware of.
And then you've got the lines where you have loads and loads of lines. So this is why OpenStreetMap is great for transport research because people love mapping roads and footpaths and stuff. So you've got, this is actually the largest one for Monaco and the size per feature
is slightly larger. In terms of other types, you've got multi-line strings. So that's really interesting that I don't actually know what's in these multi-line strings. So that's probably something worth exploring. And then multi-polygons are multiple sets of polygons that probably
include land use cover. If you've got buildings that are the same building but they're in two different locations, you can't represent them as a single polygon. They are represented as a relation and the way that that's imported into a geographic structure is as a multi-polygon.
And then you've got other relations as well. So again, I'm not actually sure what's in that. Basically, for my transport research, I can get 90% of what I need through this lines thing. So in a way, OSM data is very, very simple, but when you start importing it, it gets quite
complex. So OSM data is composed, only has three types of OSM entity. So everything is a point, which is obviously just a latitude and longitude. You don't have elevation in OSM.
And then as many tags as you want. So that's the first one. The second type is a way, which is just a series of vertices that form a line string. And then the third type is a relation. So a relation is a grouping of ways and points or ways, points and relations. So it's almost
like the list structure in R. You can have as many different things in a relation as you want. So a great example and another question just to make it a bit more interactive. Has anyone been
on any of the Eurovalo cycle routes across Europe? So at least one person. So Eurovalo on OSM are represented as relations because you can have a crossing point and say, this crossing point is part of Eurovalo. You can have a line string that is part of Eurovalo. So it's a way of bringing
together things that are not usually together. And Goudal, and I don't know how it does it, but it converts those three things into these five fundamental types. And each of those, I guess, the relation to geometry types is points are obviously points, lines are obviously line
strings. Multi-line strings are multi-line strings and same with multi-polygons. I guess other relations is this type geometry where the geometry can vary from type to type, we can have a look. So that brings us to the first practical and I know an hour's gone already,
but I think it's probably worth just having a bit of time to try and make some maps and going back to the first lecture, just try working with some of these data sets to see what you can pull out, either for Monaco or for Poznan, that's option A and just have a play with those.
If you're interested in graph networks, there was actually a really good tutorial from SF Networks package that was actually in 2021, not 2023. And you could have a look at that.
So this is the result of this where they use this case study and you can do many, many things like
this is showing centrality on the network. Sorry, please. Yeah, sorry. Sure thing. So yeah, this was put together by Andrea and there's a team of people, including Lorena, who's doing a parallel lecture and you can get in the centrality of the network. So if you want to start doing graph operations, you can use this OSM data type as an input. So
that's another exercise. You've got the links in there. So I think the best thing to do is just have a play and I'll give people 10 minutes or five, 10 minutes to have a play with that.
And in the meantime, yeah, I'll stop talking for a bit and allow people to have a play with these awesome data sets. And also if anyone's got any questions about any of this, that was a great time to ask. I have a question. Do you know any project
working on three dimension that fits? I know of a project that's looking at visualizing buildings in 3D and that is one that's linked to this product called... Yeah,
like there's loads of really interesting tools that people have built. So there's one called Street Mix. So again, this is actually a really nice tool if you want to
try redesigning streets, but one thing I like about it is it's very, very simple and you can see it's maintained. The latest release was in June, 2023. It simplifies all of transport planning down into one dimension. So it's just like that cross section of a street saying, what do you want on that street? Do you want to make your pavements wider?
Do you want to remove, do a road diet, stuff like that. So obviously that's one dimensional. So that's going in the opposite direction that you're asking about, but there's actually a 3D implementation of this, I believe, which is Street Mix 3D. And I don't know how
actively maintained it is, but let's have a look. So this guy's trying to make a 3D demo, although this doesn't look very 3D to me.
Okay. So that's the 2D representation, but then there we go. So that's the same street view in three dimensions. So from a research perspective, I don't find this particularly
interesting. It's interesting technically, but from a software development perspective, but it's not real data. It's just like a visualisation add-on. Does that answer your question? Were you thinking more about buildings?
There is this project in the Netherlands, it's from DUL of 2021. And then you can have all of the buildings of the Netherlands that they have. One thousand buildings and then at the end you can have some data, but I was wondering if there was some other project. It's a really cool data
set. There is a format where you can share 3D animation and content. I mean, I know Mapbox has got pretty amazing 3D representations of stuff. It depends what you
mean by 3D. So if you're interested in the buildings, yeah, you can do that. Even in this MPT thing with Map Libre, we enable 3D view. So if you click on this, you can actually go, it's not really 3D, it's kind of 2.5D. And it's cool if you're interested in looking at
where the hills are, and this is for Scotland where there's quite a few hills. But again, this is more for visualization than actual data. OSM doesn't contain elevations by default.
Of course, there's a tag that you can add, which says elevation equals this, but that is very sparse. But adding elevation and gradient data into OSM is a process that you need to do yourself. And myself and someone from Lisbon, Rosa Felix, we created an R package called Slopes, which adds
gradient on. Probably not so relevant in the Netherlands where it's mostly like zero gradient, but for hilly cities, that could be useful. Over two maps, what is the relation between
street map and is it going to be similar? It's kind of, it's not real open data. Right. So. You know, much bigger budget than open street map, of course. Well, is that budget in terms of money or budget in terms of people hours? Because the OSM community, if you were to quantify all of those hours.
Yeah, yeah. So the relationship between them is that I think they're two parallel datasets. I've heard a lot about overture maps. I haven't actually done my research and dug into it. So can you or someone else like give a quick overview of what it is? Is it building outlines
or what kind of data do they provide to anyone? Yeah. They tend to promise about what is available. There was also a reflection from the open street map people and I'm not sure if it was worth it. Let's say it wasn't dramatic.
Yeah. So, I mean, they have released their data into the open. This guy, Brandon, he's the developer of PMTiles and I know that he has done some stuff. He's created a visualization of the overture maps. So it might be worth it.
You can download it. It's available. Yeah. And they use it here. And it's in Asia and it's ODP licenses, same licenses. Cool. Yeah, yeah. I mean, to me, that sounds amazing.
But yeah, I don't know much about this stuff, but I'm going to have a quick look to try and find his overture. There you go. So it's got six stars and it's quite recent. So basically,
he's the developer of this PMTiles thing and decided, ah, let's see if I can turn this overture stuff data set into a web app. And it's just a test of this thing that he's
developed called PMTiles. So let's see. I think what it does is, as far as I remember, this is just place names. So it just gives incredibly detailed place names of everything. So maybe that's what it is. Maybe they provide other things in addition to place names.
How they've developed this, I don't know. So yeah. They do have a code before assembly. Let's see. Yeah. So this is just the places. Yeah. So they obviously have multiple layers that
they provide. So yeah, I'm not sure what the relationship between them is, but if they're releasing it under the same license, I would say that's definitely promising. Okay. Anyone, any other questions? So I guess we've got two options. We could continue to
progress with the OSM extract data and just have a little look at some of the data that's in there to try and explore that as a kind of interactive demo, or it could be more of the kind of lecture
type where I talk through other tools that are out there. It's probably worth seeing as I've had a little play with these other tools. It's probably worth me saying something about them.
Yeah. Getting some nods. So all of this stuff, it's just flagging it. And given that people have got OEGets working, that's the most important function. You know where to look for the
documentation, or maybe you don't actually, because I haven't talked about the documentation. So I think that the docs for OSM extracts are actually pretty good. So we demonstrate how to do various different things. So the website is docs.openstreetmap.org. And if you search for
OSM extract, you'll see that. But if you want to learn more, there's a lot of information about how to do specific things that you can look at in your own time. One of the most important things about OSM extract, which is why I think it's a really
great package for working with large OpenStreetMap datasets, is you can do the filtering before you read it into memory. So that's important. That will really speed up your processing. So yeah, you can customize how it's importing it. So in this case,
this is an advanced use case, but we've got this custom vector translate thing, which is actually passed to Gudal. And you can really have a lot of control over
how it's converting that PDF file into the object that you're reading in. So the point there is, there's no single objective way of converting OSM structure into what is essentially simple features where you have a set of features and columns, because it doesn't fit into the straight jacket
of the data frame. So you need to have different ways of doing it. So this promotes a multi-tag, which Gudal allows. And if you want to dig deep into it, you can start playing around with things like that. A really cool thing as well is, let's say that your area of interest is in a huge
extract and you don't want to read in the whole thing. You can have a separate boundary object and then OEGit allows you to pass a boundary, which can be a geographic object. And I've got
an example in the thing, and then you can just say boundary type equals clip and it will clip the output to that area. So in this case, there's a boundary around, I think this is a city in Malta and it only returns the stuff in that area. And that's really good. And you can also filter on
the attributes. So that's kind of cool as well. So you can use SQL queries and you can use all of the things that are possible within Gudal. So Gudal actually supports a SQL language to do
queries on the import stage. And yeah, there's a lot of options in that. So do check those out on the OSM extracts documentation. And there's also this big contributing to OSM section. So
this is the kind of get started vignette that gives the most important stuff. So we're only covering a small part of that in this demo. So very briefly in terms of other tools that
I've tried. So PyROSM is, I would say the Python equivalent of OSM extracts, who's primarily Python user in here. So yeah, a few people. So this has similar functionality and it's written by
Henrique Tenkonen. And I gave it a go. I followed the instructions for some reason, this wasn't working for me. So I actually got an error message and I've opened an issue on the
issue tracker. And it may be because I'm not experienced with Python, but I just couldn't get it working. So very interested to hear from people who have more experience with Python than me if they can get that working. So there's not a lot to say on that. One thing is that PyROSM
supports bbike and geofabric. It doesn't support OpenStreetMap.fr, but geofabric and the French one are kind of similar. So it works in a very similar way. But as far as I can tell, and again,
I may be biased because I'm a coauthor of OSM extracts. As far as I can tell, PyROSM doesn't have quite as many features as OSM extract, but very happy to learn more about it if anyone gets it working. Then you've got OSMnx, which is a super popular package for working with Street
Network data. This has, I'd say, more functionality than OSM extracts, although OSM extract is just focusing on extracting the data and doing some pre-processing steps.
OSMnx is a very big package that has a huge functionality focused on Street Networks, but it has batteries included. So it allows you to do the full pipeline, including geocoding, and downloading OSM data. It doesn't use the extract. So this is actually calling directly to
the Overpass APIs, I believe. So I suspect it won't be possible to download national data sets with OSMnx. But if you're doing a city, it works fine. And I've just got this example where I am reading a data set, and this code runs, no problem. So OSM graph from Polygon,
that downloads the network, and you can specify your network type as bike. And then it plots it in this nice map. And then once it's in that format, you can do loads of cool stuff in OSMnx.
And that's like a geographic plot. Probably like for the bigger users, and Tom, this is the kind of tool that you'd want to use if you're using planet.osm. Osmium is a high-performance, low-level library for working with these PDF files. So this is Bash code that gets the data
if it's not already downloaded. And then you can just filter it and have this kind of standard command line interface. And what that does is it takes your OSM data, and it filters it,
and it outputs another PDF file called posdan cycleways. So that's a computationally efficient way of dealing with large data sets. And because that's not reading it into memory, you can probably run that on the huge planet datasets, no problem. So if anyone wants to work with,
anyone want to work with planet.osm, put your hands up. Tom's probably the only person in the room. I'm interested in it as well, but I haven't, like I've been using mostly national and regional sized datasets, but this just shows that it does work. If you want to use the command line interface,
you can use Osmium to pre-process it and extract just the cycleways. And then you can go back into your standard stack and read it in with your good out thing. And that's a nice way of working. So the second set of exercises is a bit more advanced. So try using the query
argument of oegets to download only cycleways for bbike. And that will probably take a bit of trial and error because the SQL query language, if you're new to SQL that might take a bit of trial and error. You can cheat by looking at the source code. So the source code
you can find in the repo, I've got way too many tabs open. So just to be clear, the source code is just in this file called osm.qmd. And again, you can try reproducing all the stuff here. So try without reference to that. But if you
get to the point where you can't figure it out, you can always look at the source code and it's got the answer in there. And then if anyone's feeling really ambitious in the next
seven minutes, you've got some bonuses. So think of a more sophisticated query to get an active travel network Poznan. So you can have highway equals footway. When it gets interesting is including relation tags in there. So you might have something that's just like highway equals
residential. So it doesn't seem like it's part of the cycle network, but if it's part of a relation that says, this is the cycle way to North Poznan, that may actually be quite cyclable. So there's many different tags to do that. And that's the hard thing with OSM is the tags are very organic
and they kind of grow in complexity. And then there's a bonus and a bonus bonus after that. So generate a simple measure of walkability. I think this is probably one as an exercise to take home and think about, but generate a simple measure of walkability and cyclability
based on OSM data. And then this is something that I want to work on. So if you solve it, let me know and you've saved me many weeks of work, but where would you prioritize new active traveling structure in Poznan or any other city? And that's the kind of research question that I'm interested in. So I think that's almost it for my side. So I think it would be good to
at least allow a little bit of time, but there's one final thing that I alluded to earlier, which is this OSM to streets thing. So I showed in the beginning, this gamified way of viewing
OSM data in AB street. And because many people were asking, how did you do that? How do you solve the problem of converting the centerlines from OSM into a 2D network into polygons? That would be really useful. So other people were asking and a great thing to do in open
source software is to take a bit of functionality that other people can use and split it out into a standalone package that works in its own right. So that's what this OSM to streets is. So as a final demo, we're going to zoom in to Poznan, hopefully. And somewhere around here, hopefully
it will show up. There's Shenzhen, there's Poznan. So we can create a 2D representation of this area. So hopefully this doesn't scale for particularly large datasets.
That didn't really work. So that's just important current view.
Or is it processing now? It's kind of locked up. So okay. So I think it might be, I think it may have activated. There we go. So now it's done that and it's done the kind of auto extrusion and it uses simple
rules based on like the average width of the road. And then you can download that as a GeoJSON file. So you've got the boundary, the intersecting polygons, lane polygons. That's probably the one that's most of most interest. And then you can download that as a GeoJSON file. So yeah,
that was the final thing. And again, this is really work in progress and it's just almost an experiment in how you can represent OSM data in different ways, but I find it really interesting and potentially useful if you want to figure out not just the centerline of the roads, but also
how wide they are. Because the key thing with this, I don't know if there's any examples in the extracts that I just processed, but where you've got multi-lane roads, it takes the lanes tag and it outputs, there you go. It will make the road wider depending on it. So I'm not
saying that this is perfect, but it's just an experiment to show how you can add value to OSM if you understand those tags. So again, potentially useful. It's just a web app at the moment, but at some point it would be possible to create a command line interface and it's written
in Rust, so it should be high-performance. So that's probably the final thing that I wanted to say. So you've got, yeah, a few minutes just to have a play. That's it from me in terms of the lecture. So yeah, thanks a lot for listening. And I'm just going to end on this little animation
that I made that is a GPS trace that's then overlaying on the OSM network to show that yeah, that there is a link up between OSM and where people actually traveled.
Do you think the VIMG has or just automation? This is just a static animation. So it's just a GIF. This is actually a trip that I did on Sunday when we were traveling to the hotel. So this endpoint is actually the Fair Place Hotel. Yeah, so thanks a lot for listening. Final question, yeah.
Okay, can you open MacMost and send just one example? Open MacMost, yeah. So we made this through Europe. We made the 10 meters, one below. We made the 10 meters rasterized. So that is on the one below.
Okay. This is just the... That's, yeah, that's a... Okay. So we rasterized the buildings. Cool. It's 10 meters. Yeah. So we did it by the environment. It was, I mean, it's like, I don't know, 20 minutes.
And we find it would be more useful for people because you just, you have it like either so you can integrate with the Earth observation data, or you can subset, you know, you do some land cover mapping and you say, well, if I know it's a building, I mean, I don't know if you can take
some photos or something. So this is, this is OSM. So this is Overture has taken OSM and provided it. No, that's us. That's all. Okay. Yeah, yeah. That's all. But the other ones, they're going to Overture. Okay. If you, if you, some of you want to use for Europe, we didn't do the world. So we have all this rasterized. Yeah. Alias. Yeah. If you update it, we'll, we'll make sure we update it once a year.
Yeah. And, and we did notice, for example, Bulgaria, Romania, there was a lot of buildings missing. So there is a disclaimer. Yeah. You know, if you use this type of data for actual projects, you know, you have to really consider that, you know, the, the opposite as it says in the instructions, it says that it's not,
you know, it's not complete. I mean, there's no guarantee anything. Yeah. Yeah. Okay. You don't know what it is. Okay. You can, you can, it's a code, right? So you can just open it. Cool. QGIS. Yeah. Zoom in and you will see the rasterized.
Once you go, you can integrate with the VTS or satellite image. Cool. Yeah, that sounds really good. Any, any other questions? Yeah. So, with Tom was saying that you can use Open3 client, but at your own risk, so
is there any approach to validate? Yes, yes. And that goes back to the previous question about roads in Tanzania. If suddenly loads of roads appeared in OSM Tanzania on a certain date, does that mean
the roads would be built or does it just mean that they weren't tagged earlier? So there is, I'd say the place to go, um, for that is I'm gonna try and find, find the, um, might be called OSM actually, which is not, um, API OSM, uh, like OSM quality.
I think that's, that's the name of it. So yes, this comes from this GI science group at Heidelberg, and this is actively developed. Only yesterday there was a release 1.1 and they, these guys do have simple metrics for,
um, estimating completeness. It's hard to do because this, if you don't have any ground truth, there's no way, but, um, they've come up with some simple metrics to try to, um, estimate, um, complete completeness,
I believe. So, um, yeah, that I would say, and that answers your question at the back about, um, the time series, like you can do really cool queries that say, for example, tell me the length of footways
in Tanzania every month, month by month. So you have complete control and it's like a language for that saves a lot of code. Um, so yeah, I'm a big fan of the OSM API, um, developed by the GI, GI science group at Heidelberg. Um, but I ha you know, I haven't dug into it a lot, but I have actually generated some,
some really interesting outputs from it. Um, let me see completeness. So yeah, I would say they do have some stuff on their quality. No, not really one, one thing.
Okay. Well, I think that's it. I can see people downing tools. There's coffee outside. So I don't want to delay people from getting the final coffee, but thanks all for attending. Thanks, Tom. And all of the open geo hub team for making this possible.
And any questions people have over cup of coffee. Come and find me.