Introduction To Location And Linked Data
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 95 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/15547 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Nottingham |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Nottingham 201347 / 95
17
25
29
31
32
34
48
50
56
58
68
69
70
82
89
91
00:00
Archaeological field surveyLinked dataLinked dataExpert systemBitArchaeological field surveySlide ruleDisk read-and-write headComputer animation
00:33
Standard deviationBinary fileWorld Wide Web ConsortiumRDF <Informatik>Object (grammar)Predicate (grammar)Archaeological field surveySelf-organizationLinked dataComputing platformMenu (computing)Maxima and minimaSummierbarkeitMetropolitan area networkExecution unit8 (number)Software testing3 (number)Moving averagePersonal area networkExponential functionLevel set methodArtificial neural networkArmOpen setSpecial unitary groupEmailUniform resource nameGoogolWindows RegistryCodierung <Programmierung>DatabaseWater vaporIntegrated development environmentAvatar (2009 film)Variety (linguistics)VolumeVelocityGraph (mathematics)Bus (computing)RadiusHill differential equationSource codeKey (cryptography)Software frameworkContext awarenessData modelUniform resource locatorUsabilityDisintegrationShift operatorFocus (optics)MereologyInformationVisualization (computer graphics)Link (knot theory)Level (video gaming)QuicksortMultiplication signGroup actionQuery languagePhysical lawCross-correlationNetwork topologyCartesian coordinate systemRule of inferenceBoss CorporationArithmetic meanWeb browserNumberAreaClient (computing)MathematicsSubject indexingRandomizationRight angleInformationPrime idealDisk read-and-write headContent (media)Electronic mailing listWorld Wide Web ConsortiumDifferent (Kate Ryan album)Statement (computer science)Matching (graph theory)Object (grammar)Mathematical analysisMoment (mathematics)VotingSign (mathematics)Order (biology)Office suiteMultiplicationLine (geometry)State of matterCountingInsertion lossNatural languageWeb servicePoint (geometry)Goodness of fitForcing (mathematics)BitCASE <Informatik>Integrated development environmentDialectView (database)Spring (hydrology)Process (computing)MereologyAuthorizationSpreadsheetDatabaseAngleHand fanUniform resource locatorNumbering schemeString (computer science)INTEGRALData storage deviceStructural loadMachine visionPersonal digital assistantResultantProgrammer (hardware)Operator (mathematics)Linked dataWebsiteGeometryMappingGenderActive contour modelCasting (performing arts)Shooting methodContext awarenessCodeJava appletScripting languageMeasurementSelf-organizationBit rateStandard deviationWorkstation <Musikinstrument>Medical imagingPhysical systemFrequencyMetropolitan area networkSystem administratorPredicate (grammar)Open setArchaeological field surveyProduct (business)Set (mathematics)Einbettung <Mathematik>Web 2.0Series (mathematics)Civil engineeringConnected spaceVariety (linguistics)Source codeComputer fileKey (cryptography)Greatest elementLocal ringOpen sourceRelational database1 (number)Digital libraryFile formatData modelWater vaporGraph (mathematics)Software testingSoftwareType theoryWindows RegistryDependent and independent variablesLibrary (computing)Axiom of choiceTouch typingWindowMultilaterationPersonal identification numberRoutingInternet service providerProbability density functionComputer programmingIdentifiabilityState observerShape (magazine)Latin squareLevel of measurementDomain nameUnitäre GruppeHookingXML
Transcript: English(auto-generated)
00:00
So just out of curiosity, how many people know about Linked Data or consider themselves, I'm not going to say experts because that's quite demanding, but and how many people have not don't know anything at all? So this talk was planned to be a complete introduction, so I'm going to just sort of introduce Linked Data, what it is,
00:23
and then talk a bit about some of the stuff that are doing an ordnance survey with Linked Data and then talk about some of the stuff that's going on around the rest of the UK government with Linked Data. So this is a slide where I potentially get a bit heretical for this kind of conference, but one of the things is, I mean, so we saw about three years ago the
00:42
UK government started opening up a lot of data. So there's lots of data being put out there by government organisations and the age-old joke is that one of the nice things about standards are that there's just so many to choose from. And so even in the geospatial domain, you know we've got GML is an OGC standard, KML is an OGC standard. If you're into 3D stuff,
01:02
city GML is an OGC standard. We've also got things like shape files which are, they're not really standards, but they're really, they're very popular amongst the geo community. There's other formats which are, you know, things like CSV, there's XML, some government departments think PDF counts as a data format, but I'll come on to that a bit later.
01:23
But the thing is, this is where I get a bit heretical, the rest of the world doesn't actually care about GML, KML, shape files. You know, you've got the, you've got the statisticians, they've got SDMXML, I think people that do financial data, they've got XBRLML, and pretty much every little domain out there has got their own ML of some variety.
01:44
So when you actually, and I think one of the things is that data is kind of interesting when you want to, when you bring it all together, so it's sort of more valuable when you start bringing lots of different data sources together to do something. And then there's a few programs of work that you might be aware of that are about trying to bring data together.
02:00
So we've got things like Inspire, which is about creating a spatial data infrastructure, about using unique identifiers to reference things in your data, and bringing it all together. And again, I'm going to argue that spatial data infrastructure as well, who cares, it's data infrastructures, you know, we want to bring everything together, not just stuff in our little geo and environmental bubble. And this is where we start going towards
02:25
the web of linked data. So this is the famous or infamous, depending on your point of view, scheme, I think it's devised by Nigel Shadbolt and team Berners-Lee. It's the five star rating scheme of publishing open data on the web. So to get one star, just publish
02:43
some data under an open government license. Now what some departments did was they wanted to publish a few spreadsheets. So they took a screenshot of a spreadsheet, saved it as a JPEG, embedded it in a PDF document, and then hosted it up on their website. That probably doesn't really count as releasing data. So to get two stars,
03:02
actually release data in a machine-readable format, not as a screenshot embedded in a PDF document. To get three stars, don't assume everyone uses Microsoft. So don't release a .xls file, release a .csv file if you can. So use non-proprietary standards. To get four stars, this is where you start getting into the linked data world. So start using
03:24
open standards from the W3C. So start to use things like RDF to release your data, and use HTTP URIs, especially as the primary keys in your data to identify the stuff that's in your data. And to get five stars, actually start linking your data out to other people's
03:42
data so they can follow some links, get some more context, get some more knowledge, which might be useful for their applications. So I did actually introduce a few TLAs without explaining what they were. So for those of you who don't know, RDF is a W3C standard. That's World Wide Web Consortium. It's been around for over 10 years,
04:02
and I tend to think of it as being to the web of data as HTML is to the web of documents. That might be slightly contentious with some people, but that's sort of the way I think about it. And it's based on a very, very simple data model. So it's based on this idea of statements which are composed of triples. So a triple consists of a subject, a predicate, and an object. So
04:26
that's basically the subject and the object are two things. So it could be a person, place, an organization, and the predicate is just a relationship between them. Likewise, you can relate things to values, so for example, dates, numbers, strings, that sort of thing.
04:45
Each of the subjects, the predicate, and object, they're all identified by HTTP URIs, and the values are any XML schema data type. So just to give you an example,
05:00
so data.gov published, there's some data published, so the person in the bottom right hand corner is my bosses, bosses, bosses, boss. And data.gov published some data about senior civil servants, that's somewhere. They also published data about government departments,
05:21
and the Ordnance Survey were publishing data about place. So in sort of the old world, these were all silos of data that were stuck on different websites. There were no connections between them, and if they were, you didn't actually know what those connections were. So in the linked data world, what you do is you identify each of those things. So Vanessa Lawrence has got her own URI that identifies her. This is a URI that identifies Ordnance Survey up here,
05:46
and this is one that identifies Southampton. So you identify people, places, organizations, these HTTP URIs, and then you link them on the web using HTTP. But what you actually do is on the linked data web, you qualify what that link means. So on the document web,
06:03
there's lots of links between HTML documents, but you don't know what does that link mean, but in RDF, you actually say what that is. So here we're saying that Vanessa Lawrence has a posted Ordnance Survey, and Ordnance Survey is based near Southampton. And you can imagine, as you start following more and more of these links, you get a big graph of data.
06:23
And like I say, I think it's a very, very simple data model. So I'm just going to go on now to talk about some of the linked data work we've been doing at Ordnance Survey. So three years ago, we were asked to open up a number of our products, and we've created linked data for three of those open data products at the moment.
06:43
So the first one of those is a product called CodePoint Open. So CodePoint Open is a product which tells you all about postcodes, so where they are, which administrative areas they're in. The other one was the 50k gazetteer, which I'm not going to talk about too much in this talk, but it's basically an index onto our 50k map series. And lastly, there's a product called
07:04
Boundary Line, which has information about all of the civil, voting and administrative areas in the country, so all of the constituencies, wards, counties, etc. So what we've got in the OS Link data is, I mean, hopefully my ambition is to head towards having a URI for
07:21
every place in Great Britain. But at the moment, we have to make do with postcodes and administrative areas. So this is the URI for the City of Southampton, this is the URI for the Ordnance Survey headquarters. Now when you look up those URIs, so this is a screenshot of if you look up the postcode for Ordnance Survey HQ, so you get back some nice HTML,
07:46
which has a few facts telling you stuff about that postcode. Not surprisingly, we've got a map showing you where it is. It's got some information, for example, telling you that this postcode is in the district of Test Valley, it's in the county of Hampshire, it's also in England.
08:03
This is what you get if you go from a browser. So if you go from a browser, it assumes you're a human or sort of a close approximation, you want some HTML. But if you went from a Linked Data client, you would actually get back RDF for this information. And likewise, this is some of the information for the administrative geography. One of the things we put into our
08:24
Linked Data, because in the early days, Linked Data wasn't very suitable for geospatial data, because not any of the triple stores, which is the database that holds RDF, not many of them had the capability to do spatial indexing. So what we did was actually pre-computed a lot of the
08:44
implicit topological relationships and put those in the data. So for example, if you look up an administrative area, it'll tell you everything it contains, everything it's within, and everything it touches, as long as that makes sense within a particular geography. So that's some of the stuff we've got in the OS Linked Data. We've got a number of different
09:03
APIs that you can use to interact with that data. And actually, as we're an open source conference, I mean, this is entirely built on open source software. So the first one we got is a search API, which was built on top of Apache Solar. I just let you type in a place name and it finds you the URI for that place. And you can do some very simple spatial queries
09:24
in that. Another more interesting one is we've got a query API. So we've got a SPARQL query endpoint. So SPARQL is to link data as SQL is to relational databases. So that's the query language of choice. And that, if you're interested, is built on open source software
09:41
called Apache Jena, or a database called TVB. And we've also got a couple of other APIs, which I'll touch on a bit later. So this is a screenshot of our SPARQL endpoint. So basically what you can do is, in the top window there, you can type in a SPARQL query. I won't go into SPARQL too much now. That's a whole day's worth of tutorial.
10:02
You can actually choose your response format, whether you'd like it back as JSON, XML, CSP. And you can either look at the responses here. Well, this thing, which I think is quite useful, actually tells you the GET request it's performing on the API. So should you want to copy that GET request and embed it in your JavaScript or PHP, you can actually then use
10:21
that to build your applications. One of the other interesting APIs we've got is something called a reconciliation API. So this is a random spreadsheet that I grabbed off data.gov. And I think it's the location of all the libraries in the country. And one of the columns, which one is it? So this column here has got the administrative area that the library's in. But
10:44
as you can see on that, it's just a string. It's just a bit of text. The thing is, it doesn't then give you a hook into anything else. You know, if you want to find out what the, I don't know, the European region that that particular unitary authority or county is in, it's not very easy to do that. Or if you want to find, you know, compare them.
11:02
These are all very niche queries, by the way. If you want to compare the number of libraries in bordering regions, you can't do that sort of thing. But with our reconciliation API, you can load a spreadsheet into a tool called OpenRefine, which used to be called Google Refine. And you can turn that. So as you've seen, all that column of place names
11:22
is now turned blue. So basically what it's done is it's tried to match the string in that column to a URI in the OS link data. So you've now got the URI in the spreadsheet, which means you can then go off and get information from our linked data and use that to enhance the data that you've already got. I mean, one sort of perhaps more pragmatic,
11:41
simple thing is you could go off and grab the Latin long coordinate for that. And then if that was a postcode, say, and stick it on a map. So it's basically just a way of hooking into the OS link data. So that's just a kind of very brief summary of the OS link data. There's lots of other people doing linked data around the government. The ONS have just
12:04
published, oh no, I've just stolen my thunder. Pretend I didn't say that. They will be soon publishing a new linked data site. The LAM registry published linked data, environment agency, Met Office soon, hopefully. Who else? Oh, legislation. So there's actually linked data
12:21
for every piece of legislation in the country. And what this is actually starting to do is that it's actually starting to really join up government. So there's actually links. Actually, yeah, and the Department of Communities and local government have been publishing linked data. So they've published something called the Indices of Multiple Deprivation. They've linked that to Ordinance Survey. The company's House has published data,
12:43
which can be linked to Ordinance Survey. The environment agency Bathing Water data again have been linked to the OS. They've also linked it to the ONS. ONS have linked this to the legislation. So you can now, you can see you're starting to form this big web of data across government. And this means if you want to ingest all of this data into an application and use
13:01
it, because it's all in linked data, because it's all in RDF, it makes it, I won't say easy, but it makes it easier because you don't have to worry about translating between different XML, JSON, random stuff that most APIs give you back. So it's just an example. This is a screenshot of the environment agency Bathing Water data. And as you can see, there's a
13:24
reference there to the ONS and the OS. And this is just a little, this is an API that builds on top of it. So you can actually just put an OS or ONS URI or the free text into one of their APIs and get back a list of all the Bathing Water quality observations
13:40
in that particular area. And then you can actually do a slightly more complex one, which gives you, that compares it to the Bathing Water in neighbouring areas. Big data. So I'm not a big fan of the, as we were discussing before, the talk of the big data terminology these days. But one of the things I think linked data helps with is
14:02
probably the variety aspects of linked data is what you actually get. It's basically designed when you want to integrate data from lots of disparate sources. So you start having this big big graph of data. We've been using internally, so we've actually been trying to see can we consume some of the data that the government is releasing and use it and ingest it and use
14:24
it to enhance our data and provide value added services on top of that. So we, luckily for us, there was data about transport, the environment, local authorities, crime, weather, business, education and health had all been released as linked data and it had all been linked to the
14:43
postcode. So if you wanted to do some sort of analysis, at least at the postcode level of different areas around the country, this made it very useful. So we actually built an application which, I know it's going to sound very GIS-y in some ways, it is very GIS-y, but it lets you
15:01
more easily do queries across a number of data sets but more complex queries. So this simple application was just designed for, say you move to a new area, you want to find a house where you want to live, a very traditional GIS kind of thing, but you want to filter it out. So you want to find a house in an area that's got low crime where the education, where all
15:25
the schools have got high ofsted ratings, maybe it's near a pub, maybe there's, well hopefully there's low levels of pollution. You get the idea where you can combine all these data sets to sort of really narrow down the areas you want to find. So again, while you can do that
15:41
in a GIS, what you can do in a relational database, I would argue why would you want to, because this just makes it so much easier. And just another point of interest, so when I started off I mentioned that linked data didn't work very well with spatial data.
16:00
I think, I'm trying to think, probably a couple of years ago now there was something called GeoSparkle, we're standardized by the OGC. So GeoSparkle is a way of embedding geometries, so qualitative and quantitative spatial data into RDF. For those of you that are into your GML or your WKT, so basically the way you put a geometry
16:20
in RDF is to store it as a big blob of GML or WKT. And the idea is that now we've got a standard people who create the databases that load RDF can start to build a spatial index and I think a certain popular big, well maybe not popular, but big database vendor has actually implemented it in their latest 12c release and hopefully some of the open source ones will start to follow suit.
16:47
So just to conclude, I think the nice thing about linked data is basically if people start, if we start to have URIs on the web which identify the resources that we're interested in and people start reusing some of these things, so I'm going to be a bit biased here,
17:02
but I can say if you want to talk about postcodes use the OS URIs and other URIs around the world. And this means that we don't have the problem of key clashing, so you know if people, two sets of people are using 10 digit numbers for keys you don't get that problem because you've you've kind of effectively created a global key and we've got a common data format now
17:24
across the web called RDF. It just makes it easier to integrate data and I think it's actually an OS, it's starting, so I'm not really a job fair, I'm not really a GIS person, I'm just trying to push us to think just beyond points, lines and polygons, beyond cartography but actually
17:45
about the data that we've got and what we can do with it beyond printing out and put it, you know, sticking to the wall and put some pins in it, you know, so actually trying to think more about data rather than visualizations of that data. And again I'd argue at risk of getting into trouble that basically spatial isn't special
18:03
anymore really, you know, it's just, it's just data, but it's really, really important, I mean it's not surprisingly been, you know, recognized that location is one of the key hubs that everything connects to, so and my next talk you'll actually, I'll show you an example where,
18:21
you know, location has been shown to be a key integration hub that everything joins into that we can then query across lots of data via that route and that is kind of a whistle stop tour of all things linked data.