The RAGLD (Rapid Assembly Of Geo-centred Linked Data) Framework
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 95 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/15584 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Nottingham |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Principal idealStrategy gameWhiteboardMereologyProgrammer (hardware)Source codeBuildingSoftware developerOctahedronAnnulus (mathematics)Execution unitLink (knot theory)SineMalwareEmulationService (economics)Component-based software engineeringOpen setSuite (music)Vector potentialDisintegrationSummierbarkeitWeb serviceConnectivity (graph theory)Visualization (computer graphics)Data managementTransformation (genetics)Query languageLinked dataSoftware frameworkIntegrated development environmentPrice indexElectronic meeting systemTurtle graphicsPoint (geometry)DatabaseTable (information)Electronic mailing listUniform resource locatorGeometryLogic gateThomas KuhnMathematical analysisSoftware testingGame theoryArmUniform resource nameDecision theoryProcess (computing)InformationDatabaseWeb serviceTransformation (genetics)Open sourceWebsiteData managementGeometryFamilyFunctional (mathematics)Visualization (computer graphics)BitQuery languageShape (magazine)Covering spaceWindowResultantSystem callCurveComputer fileChainElectronic mailing listInformationProjective planeCartesian coordinate systemAxiom of choiceSet (mathematics)Data storage deviceBenchmarkSoftware developerBuffer solutionStandard deviationSpacetimeQuicksortFigurate numberType theoryRoutingAreaHydraulic jumpMereologyIntegrated development environmentFreewareTurtle graphicsNatural numberNumberSelectivity (electronic)CodeStructural loadPosition operatorBuildingConfiguration spaceParameter (computer programming)CircleDiagramLink (knot theory)Connectivity (graph theory)CollaborationismArchaeological field surveyRevision controlWordWeb pageProof theoryLevel (video gaming)SoftwareBus (computing)Semaphore lineCuboidMashup <Internet>MassIdentifiabilityPrice indexReflexive spaceBoundary value problemElectronic visual displayAlgebraic closureStatisticsSeries (mathematics)UsabilityPresentation of a groupMultilaterationUniform resource locatorSlide ruleWeb 2.0Inheritance (object-oriented programming)Different (Kate Ryan album)1 (number)Point (geometry)Entire functionSystem administratorFile formatInfinityMoment (mathematics)Sampling (statistics)Coordinate systemTheory of relativityNetwork topologySuite (music)Service (economics)Context awarenessSoftware frameworkOpen setGreatest elementPower (physics)Linked dataCustomer relationship managementSystems engineeringConcentricGUI widgetExecution unitSymmetric matrixInformation securityGroup actionState of matterBit rateMetric systemStrategy gameRow (database)Machine visionOffice suiteWeightBasis <Mathematik>HypermediaDecision theoryNichtlineares GleichungssystemGraph (mathematics)Process (computing)View (database)Goodness of fitWireless LANTouchscreenSurfaceSpecial unitary groupWhiteboardPhysical systemArithmetic progressionRule of inferenceMultiplication signUniverse (mathematics)Physical lawRight angleMetropolitan area networkSign (mathematics)Insertion lossCASE <Informatik>Boss CorporationComputational complexity theoryPersonal digital assistantNeuroinformatikSolid geometryExploit (computer security)Interpreter (computing)Amenable groupTerm (mathematics)Multi-core processorKey (cryptography)Computer programmingElectronic program guideException handlingGastropod shellCivil engineeringWave packetLibrary (computing)AeroelasticityAuthorizationVideo gameRoot
Transcript: English(auto-generated)
00:00
So I'm just going to talk about a project called RAGGOLD. I feel like all of these projects have to have a silly acronym for it. So RAGGOLD stands for the Rapid Assembly of Geo-Centered Linked Data Applications. So I'm just going to give you a bit of a background of some of the motivation behind the project and then describe what it was we actually
00:21
built using the RAGGOLD tool. So just some of the boring stuff out the way first. RAGGOLD was a collaborative project between Ordnance Survey's research department, the University of Southampton, and a small company called Semaphore. It was part funded by the Technology Strategy Board. And it was part of their harnessing large data sets
00:44
funding call. So I just want to talk a bit about some of the motivations. So this is the every linked data presentation. Supposed to have this diagram in. This is the sort of obligatory. This is a couple of years out of date now, I think. But basically, this was a sort of tried
01:01
to capture all of the linked data that was published out there on the web. Ordnance Survey, just at the top there, circled in red. So you're asking how many things linked to OS. Hopefully, it will give you some indication. For those of you who are not linked data aficionados, the big one that sits in the middle is something called DBpedia, which is a linked data
01:23
version of Wikipedia. And there's also, where are we? Somewhere on there. There's something called Geonames. It'll be one of the yellow ones. So Geonames is a crowd sourced gazetteer for the whole globe. And again, that DBpedia and Geonames
01:41
are sort of quite a big hub for connecting things on the linked data web. So I think given that Geonames, that shows that location is very, very important here. Now, one of the unfortunate things about the linked data web is that people actually aren't that good at doing the links.
02:01
So unfortunately, some of the government data, you see lots of people publishing silos of RDF, which seems a little bit pointless given the clues in the name, really, linking. So a guy called, I don't know how many of you know someone called Hugh Glaser, but Hugh Glaser
02:21
decided he wanted to sort of create a simple web service that got around that problem. Now, in linked data, there's a sort of a keyword called same as. So what you can do is if, for example, so Ordnance Survey published linked data about administrative geography, and the Office of National Statistics also published data about administrative geographies. Now, we have the same data, but from a slightly different
02:43
context. We both publish slightly different information. But what you can do is you can publish a link between our two identifiers and say, this is the same as this. So there's a keyword called same as, which says two entities in two different data sets identify the same thing. Now, what Hugh did was he wrote a tool, which all it
03:01
simply did was it went to all the linked data that was published, hoovered up all of these same as connections, put them into a MySQL database, and, if you like, filled in the transitive closure. So if A was the same as B, and B was the same as C, but there was no explicit connection between A and C, he would add that in.
03:21
So what he did was he formed the transitive closures of all the same as relationships on the linked data web and created a really, really simple web service whereby you put in a URI, and it gives you back a list of all the other URIs on the linked data web that identify the same thing. Some of those same ases are open to interpretation,
03:42
open to questions. So it is a tool that you do have to look on it and make a bit of a decision. Because, as you can see, I haven't shown that one. Actually, I put Southampton, and it has actually, someone somewhere said that the geographic area of Southampton is the same as the council that runs Southampton, which
04:01
is not strictly true, which is why you have to look at what you get back. But the idea is to create a really simple service that does one thing very well. And herein, I was discussing, well, could you make that a bit more generic? Don't just have same as. Could you maybe create a really simple containment service?
04:20
And this time, you put in Southampton, and you get back a list of everything in Southampton. Or could you maybe create a different from service? So a bit like the disambiguation thing in Wikipedia. So you put in a URI for Southampton, and it gives you back a list of all the other things which are called Southampton, but aren't actually
04:40
the same as the one you put in. I don't know. That could be useful. So the idea is, could we sort of generalize this idea to sort of other simple services? Another motivation behind it was, so this is an application that Semaphore and Southampton University built called CUK.
05:01
It looks like a fairly sort of standard mash-up that we're all familiar with. So the idea here, I don't know if you can see it very well, Elizabeth, is you click on one of these polygons. These all represent different administrative areas. What it does is it gives you back some information about crime, deprivation, education in that area.
05:23
The information is shown on this diagram on my left. And basically, the center circle shows you the information about the area that you've clicked on. And the concentric circles show you summaries of the information on the areas that are adjacent as you go out.
05:42
So the first concentric circle is on all the things that are adjacent to the thing you clicked on. The next one out is two steps adjacent to that. But they found they had to do a lot of work. So even with this wonderful connected up data, they found they had to do a lot of work to create the visualization, to create tools,
06:02
to do the queries. They've done some statistical aggregation. And some of the links they had to form were not actually there, so they had to create explicit links from implicit connections. And what they said was, wouldn't it be neat if we had a framework, a tool set that let not only people
06:22
publish data, but also consume and use all this open data and linked data that the government's published, a framework that just makes it easier to build these kind of applications. So that was basically the motivation behind Raggled. So yeah, it was basically to come up with a set of a framework which enables you to build a series of tools, services,
06:44
to enable you to do everything from visualization and statistical aggregation to querying and make it easy. So this is what we tried to build with Raggled. These are just some of the examples of some of the services we've done.
07:00
So we've got something called a relationship management service. So this is the more generic version of the same mass service. Now, one of the things we did here was we said, well, actually, you can. So what we do here is we say, just use a MySQL database. You have two columns. You have an identifier in one column, identifier in another. You have a little, just a parameter file.
07:22
And what you can do is you can say, my relationship service is either transitive, it's symmetric, or it's transitive and symmetric, or it's reflexive. And it just makes it quite easy, then, to build a whole suite of different services based on that. So you can build an adjacency service, containment service.
07:41
I'm going to show you a slide a bit later on, which has got genealogy data in. So you can build a parent service, or you can build an ancestor service, or a sibling service. So it's just basically one database, a simple tweak and a parameter file that should build a whole bunch of different services. We've got data transformation services. So that was looking at converting maybe shape files
08:00
into RDF, though we didn't, unfortunately, quite get onto that one. But that was the idea, to go from one data format to another data format. Simple spatial query services, which I'll talk a bit about later. The reconciliation service, kind of covered. Various visualization components, so you can take the spatial data stuff and put it on a map, visualize it.
08:21
We did also create a very simple link data publishing framework. So you can start with a triple store, wrap it in our code, and you can actually publish and link data out there on the web. A workflow management tool that lets you basically chain all of these things together, and also federation services. So you can federate the services out
08:42
and aggregate the results up. So what I hope to do now is to actually give you a couple of examples to show how all of these things kind of fit together behind a very simple example. So given the nature of it's all about open sources conference, all of the stuff we built in Raggle,
09:03
apart from the code that we wrote, is all based on open source technology. So the idea is we wanted people to be able to publish data easily just on your standard LAMPs, WAMPs type stack. So we use LAMPs for this project. We've used YUM. I can't remember what YUM is, I'm afraid. I didn't do anything with YUM, but might be familiar to some of you.
09:23
We used PostGIS. We also used MySQL. We also used WordPress. So it's all sort of easily available open source software. Now, a Raggle environment, this is what you get when you've successfully installed your Raggle environment. You just get a nice welcome page.
09:41
And one of the things we do then is we provide another page which gives you a list of the services that you've set up in your environment. So this was one that we actually built using government open data. So we've got various different services, some based on airports and bus stops that we got from transport.data.gov.
10:00
And the idea is that you basically take the Raggle platform, you have this configuration file. Now, this is perhaps the hardest part of all of it is you have to edit the configuration files in something called Turtle, which is a slightly more human-friendly version of RDF. So it was either that or edit RDF XML, I'm afraid.
10:21
So we went for Turtle. So basically, what you can do is you can just specify the different services you want to construct, how the service should behave, and a simple parameter done that, done that, done that. You're then presented with something like this. So this is the airport service. And what it gives you back, it sort of automatically generates these pages. It gives you back all the different services
10:42
and how you interact with them. So these are all very simple RESTful services. Where applicable, it will give you back instructions on the GET requests, the POST requests, et cetera. So what we did was, for the airport service, we ingested the airports. They got some lats and longs. We then linked that up to the OS data.
11:02
So one of the very simple things, for example, you've got a containment service. So you can put in an administrative area. It'll just give you back a list of all the airports in that area. You can specify a lat and long. It'll give you back a list of all the airports near to that. Now that we've gotten any interacts,
11:21
it'll give you a list of airports that are either in or slightly overlap the boundary of an administrative area. So these are just very, very simple services that you can build fairly easily on standard software from open data. And we have another thing which gives you a list of all the features in that index.
11:42
So as I said, this was all based on some of the government-linked data. So you click on one of those things. This is all the data you get back about that particular airport. So that's just a very simple example of what a Raggle service looks like. I just want to now go through an example of chaining all of these services together.
12:01
So one of our developers at Southampton made the ultimate sacrifice and did the world's most boring drive, which is from Totten to a place called Basingstoke, or Boringstoke as we call it in the UK. And he collected the sort of GPS points as he went along that drive. And, sorry, I've jumped the slide ahead there.
12:22
Ignore what I've just said for the moment. First thing we did for this was we created just a made-up sample of BNB databases. So it's a BNB, it's got a name, it's got a Latin long coordinate. We made that, we loaded it into a Raggle service. You can do a query that says, give me back all the BNBs, this is what you get.
12:42
And they're all identified by this URI. Right, so now we're on to Basingstoke. So what he then did was he did the journey to Basingstoke. He saved it as a typical GPX file format, I think. He saved that, ingested it into the Raggle, into a Raggle service quite easily.
13:00
It stored it as a WKT file. So now basically you've got that, that geometry is now easily ingested into our service. And that's identified by this URI here. What you can then do, once you've ingested it, is you can visualize it on a map. So that's what we've got there. And the way you then do that is via this ingest service.
13:23
So what you've got here is, this is a URI that identifies the geometry for that road. This is a call to an ingest service. So you put the road geometry into the ingest service. It loads it into a Raggle tool. You can see it on a map. Another thing we did was, and these are sort of fairly simple GIS-type things,
13:43
which are probably perhaps hard to do. In a way, it'd be easier to do on familiar tools, is we then built something like a buffer service. So what you can do is you can take the route. You can ingest it into the Raggle tool. You can then buffer it. And this is what we've done here. And then what you can do is you can actually,
14:03
sorry, jump in here again, and you can actually create a new geometry from the buffered road, install that into a Raggled service. So you've now got the buffered road stored in there. Now what you can do is you can find all of the things which are within that buffered geometry.
14:24
And again, you can see down here, this is all just chaining a whole bunch of URIs together. So basically, we've loaded in a geometry, we've buffered a geometry, we've loaded in some points, we've then done a spatial query to say, find me all the things within that buff,
14:41
all the BMBs within that buffered geometry. And it's all, sounds messy, and this looks messy, but when you do it in your actual code, it's not that bad. But the idea is each one of these is a very, very simple component that does one thing very well. So you've got a buffer service, a geometry ingestion service, a spatial query service, and we can just chain these all together.
15:03
And again, once you've done the buffering query, you can create a new geometry from the BMBs, the multi-point geometry from the BMBs that are in that buffered geometry, re-ingest that into a Raggle tool and display that on a map, which is this big, long, horrible URI here.
15:22
So again, all of this is just by passing URLs and URLs or URIs into each other. So that's one example of something we did. Perhaps another interesting example is, so I do linked data as a hobby, sad as it sounds, as well as doing it at work,
15:41
and I actually created my entire family tree as linked data and linked that to DBpedia. So we thought that would make quite a nice example of a Raggle service. So when I earlier talked about some of these relationship services, what we did was we created a simple MySQL database which had a list of relatives.
16:01
So you've got people and their parents. So one column is a person, the next column is their parent, simple as that. What you can do is you can then change the various settings in the parameter files. So to generate from that simple database, you can then generate a grandfather service or a grandparent service. So you can say, return me everything,
16:21
which is two hops away from the original person or an ancestor service, which will do the full transitive closure or a descendant service if you want to go back the other way. Or you can put siblings in and you can say it's symmetric, so you've formed a sibling service. So each of these widgets on here were basically formed from a simple Raggle service.
16:41
So you select a person, tells you the siblings, parents, the grandparents. On the bottom of the screen, it's got a list of descendants as well, disappeared off. Just to show you, we weren't just gonna use our data. Some of my relatives were born in the States, so you can actually also use OpenStreetMap
17:01
or any other map-based API of your choice in the Raggle toolkit. So this is showing you the place of birth of, who is this, Clarence Burdett. No idea who that is, but I was obviously, someone was related to him at some point. And what this window is actually showing you
17:21
is this is actually going to the linked data for my family hosted on my website at home and actually retrieving all of that information, formatting it nicely, and populating it in a window there. And you can do that to any of these other relatives. So again, what we did was we took one of our developers
17:40
at OSU, hasn't got any knowledge of linked data or web services or developing these kind of applications and sat him down with a Raggle toolkit. And I think it took him maybe about, probably about just over a week to put all of this together, so he had a bit of a steep learning curve. But that was basically the idea that for people who want to use linked data or open data,
18:02
they don't want to get too bogged down with sparkle endpoints, or maybe they haven't got as far as fancy gestures yet. You know, it's to actually build, just build applications like this a lot in a, we just make it easy or easier. So just to summarize, what we did end up building was a set of tools and technologies
18:22
to make it easy to select data, filter data, manipulate data, visualize data, transform data, and hopefully communicate it and bring all of these things together in a number of simple components to create interesting applications. And at the moment, we're kind of in the position
18:41
where we're trying to figure out what we're gonna do with it. So it seems very useful. We're trying to figure out, are we gonna open source it? Are we gonna, so that's currently, I say watch this space. We're not quite sure where, how we're gonna, sorry, forgotten the word, but basically what we're gonna do with it. So that's currently where we're at now.
19:01
But that was the simple idea, bit of a whistle-stop tour, I know. Behind the raggled framework, lots of components, chain them all together, build applications in a relatively straightforward manner. And that's about it.
19:39
It's research level code, yeah.
19:42
I don't know if that says enough. Yeah, yeah, I mean, we basically did it. We did enough to do the proof of concept and to prove we've done it. We haven't quite productionized it and released it as a commercial or open source thing that people could then, we'd be happy with people to take on, but that's where we're at now.
20:02
Exploits, that was a word I was looking for, yeah. Mm, yeah, so it's the,
20:33
it's basically, it's the same kind of thing.
20:51
So the idea is we wanted to make it just a lot easier if some, not everyone wants to get, you know, down with sparkle.
21:01
A lot of people find that very difficult. And the idea is we thought it'd be far easier if we just said the only database you've got to worry about is two columns. You just then almost tick a little box and a parameter file that says that this is transitive. And then you've just basically, you've just built a transitive closure on your data
21:20
and it's just easy, perception of ease of use, yeah.
21:44
For the buffered geometry, yeah. Yeah, so, ah, so you can create a URI, but yeah, it's up to you then
22:00
if you want to then sort of ingest that into the database and to persist it. So it can be either, but yeah. You don't get an infinite number of them, don't worry. Unless you want to.
22:44
So in Raggle, the idea was to create, I don't know if this is gonna answer your, but the idea was to create just very simple RESTful type services. So you just have a, for example, a containment service and you have one parameter that you put in, which will do you a simple spatial containment type query. So it's all sort of fairly simple RESTful type queries.
23:16
Yeah, so this also has a sparkle query endpoint
23:20
should you want to do that. So as part of this work, we did also create a tool to let you build a sparkle query service on top of a triple store and to actually publish linked data as well. So behind the scenes, you can either, you've got a choice, you can either use Postgres, PostGIS, MySQL, and just have simple pair relationships in a store
23:42
or you can actually power it via a triple store and do a sparkle query. So we didn't really try that. I mean, one of the reasons
24:00
we used the simple pair store was because the same as data that originally motivated us did get pretty big. I mean, not big data big, but it got big. And the idea was that just two columns, it was just quite easy to build something that was very fast, but we haven't really done any benchmarking or anything.
24:40
Beeple, yeah, yeah. We did look at Beeple when we were doing this. I mean, just decided it was too, a bit too overblown for our uses, a bit too complicated. Yeah, a bit heavy, heavy. It's probably a better way of putting it, yeah.
25:01
So it's a similar idea and we did investigate using it.
25:24
Yep. I've seen some data, some functions also identified by your eye. Yep, mm-hmm.
25:40
Yeah, that's exactly it, yeah. And this doesn't, it just also takes shape files as well. Oh, sorry, yeah, sorry, it's not, yeah. Instead of saving that. Yeah, yeah. Oh, right, sorry, yeah, yeah.
26:04
Sure, and I think this was designed to be a bit more lightweight, a bit easier, based on open-source tools that people that are maybe not familiar with ArcGIS, so the guys from Southampton who worked with it, they've never done any GIS at all. They've never even heard of Arc, you know?
26:21
But they, and then when we showed them it, they were just, oh, I'm not interested. I want to do something simpler and free. Sorry if there's anyone from this room here.