Disparate data, technology fiefdoms and 65 pictures of your cat
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 188 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/31722 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2014 | |
Production Place | Portland, Oregon, United States of America |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G 2014 Portland47 / 188
7
10
14
15
16
23
24
25
28
29
33
37
39
40
43
45
46
48
50
56
64
65
69
72
74
82
89
91
98
102
107
111
114
118
128
131
132
135
138
141
143
147
149
150
157
158
161
164
165
166
173
174
175
179
185
00:00
Natural languageOrder (biology)Self-organizationSoftwareStatisticsType theoryLevel (video gaming)DialectPropositional formulaForm (programming)BitUniverse (mathematics)Complex (psychology)MereologyCentralizer and normalizerMeasurementInternetworkingProcess (computing)Remote procedure callSpectrum (functional analysis)Distribution (mathematics)Point (geometry)Population densityGradientComputer iconOpen setPixelArithmetic progressionShape (magazine)Order of magnitudeLatent heatWhiteboardDifferent (Kate Ryan album)NeuroinformatikAmsterdam Ordnance DatumContext awarenessMultiplication signService (economics)
06:53
Mathematical analysisNatural languagePerspective (visual)Computer programmingLevel (video gaming)Pairwise comparisonNumberWater vaporSpectrum (functional analysis)Point (geometry)Data transmissionFile formatDifferent (Kate Ryan album)
08:03
Mathematical analysisMedical imagingCodeInformationNatural languageNatural numberCombinational logicSoftwareLevel (video gaming)DialectGeometryTemporal logicBitExpected valueLine (geometry)Conformal mapExtension (kinesiology)Pairwise comparisonProjective planeConsistencyGodQuicksortAreaReal numberMeasurementInternetworkingProcess (computing)Instance (computer science)Point (geometry)Vapor barrierOpen setShape (magazine)CoefficientObservational studyFile formatSound effectVery-high-bit-rate digital subscriber lineMotion captureWebsiteDisk read-and-write headKey (cryptography)Different (Kate Ryan album)NeuroinformatikSingle-precision floating-point formatContext awarenessMappingWeb 2.0Computer-assisted translation1 (number)
16:58
CodeVelocityInformationComputer clusterNatural languageSoftwareBitLine (geometry)Price indexPairwise comparisonPhysical systemProjective planeTerm (mathematics)NumberQuicksortMeasurementParameter (computer programming)Presentation of a groupSet (mathematics)Open setWordCartesian coordinate systemSound effectDifferent (Kate Ryan album)Computing platformMultiplication signStandard deviationMappingOntologyWeb 2.0Lecture/Conference
Transcript: English(auto-generated)
00:00
How are we all doing? A little bit tired, a little bit warm, feeling like a bit of a nap. It's a good time to do that. Let's have a bit of a nap. We're going to talk about open data. Let's get going. So data comes in many shapes and forms. As geographers, we use data every single day.
00:21
But we should note that data is on an infinite spectrum of possibility only confined by our common understanding of the universe. So in this example, we understand temperature. We probably get the idea of check-ins. I know that 2% is 2 in every 100. I've got a strong familiarity with ice cream.
00:43
And this factoid is somewhat geographically contextual. It's coming from Forceware. So that's cool. The point here is that I can understand this data point without much further explanation. It kind of makes sense to me. It's a kind of solid, general statistic. One which has been derived from a vast array
01:02
of crowdsourced Forceware data. So it's coming from a whole bunch of other little data points to make one big data point. It's also relevant in our greater understanding of things like ice cream and marketing, perhaps check-in behavior, and even summertime habits of humans. So it sits on what we'll call a wide and open spectrum.
01:25
So my name is Will, and I have a problem. I like data. I do lots of stuff with data. I like trying to understand it. I like the complexity of data. It's kind of like detective work. It's deductive, yeah? I started SparkGeo, which is our little company,
01:41
four years ago. But even before that, I was deeply embedded in data, doing geosystical analysis, doing NDVI's, doing a whole bunch of stuff with remote sensing, doing spatial distribution of chemo types of Scots pine saplings in the soon-to-be independent
02:01
country of Scotland, I'll point out. I helped clean up corporate data sets. I helped do a whole bunch of stuff since coming across a pond. I've been analyzing forestry data and resources data. But since we started SparkGeo, we've been helping social networking kind of data. So the magnitude has increased enormously, and the type of data has changed a lot,
02:22
but it still comes down to data, data, data. I would imagine that for a lot of you, the story is somewhat similar, that every day you're messing around with pretty weird data. And that's one of your central value propositions is that as a person, you know how to deal with that stuff. It's a GIS thing. So SparkGeo is a technology company.
02:41
There was a relatively recent time when clock speed and pixel density and stuff like that, and specification would drive technology. Well, that's kind of changed where we care a lot more about features because specifications have got to a point where we're all kind of happy. Computers go pretty fast, internet goes pretty fast. We care a lot more about experience,
03:01
and experience more often than not is driven by data these days. So that means that SparkGeo is actually a data company. Data drives your experience of the internet. And in fact, data drives many other parts of the world. It's a measure by which SparkGeo is graded.
03:22
It's a measure by which I would argue we are probably all graded in one form or another. In the end, it doesn't really matter how good your map technology is because if the data's wrong, the technology is a bit of a failure. And it's galling when you've gone to so much effort to build a wonderful map. All the buttons work, all the icons look lovely,
03:43
but they're in the wrong place. That's difficult, people get upset with that. I think many other companies and organizations probably see themselves the same way. I would argue that many companies which used to be something else are probably now data organizations or data companies in some way or form.
04:01
Certainly municipalities have come such. So as a technology company, we're also a data company. We live at the intersection of technology and data within the context particularly of geography. Open data is awesome. So here's my one kind of yay to everybody.
04:24
It's worth noting that in BC, we're super lucky. I'm gonna say I live in BC, so, BC. We're really lucky that the provincial government's done a fantastic job. We've got lots of resources and over the years, the access to data has got better and better and better and better.
04:41
So first up, I wanna congratulate all those people who made that happen and some of them might be here. I'm not sure. There are certain people in the conference who certainly are involved in that process. But I think the story of kind of progressive openness around data is one that is witnessed kind of across the board.
05:01
I think we see a lot more openness, a lot more data publication, states, provinces, cities, regions, lots more data out there. So that's a great thing. Yay, great job, guys. But you might have noticed, I mentioned it earlier, I'm a Scotsman, which means I'm never actually happy or terribly satisfied with the situation.
05:24
So, as a Dara Scotsman, I'm gonna tell you a story. So I come from a little town in the north of British Columbia. It's a map of my little town. So some of you might get it, some of you might not.
05:41
My story starts with a hackathon we held in Prince George. We were looking specifically at open data for the city and regional district to find out, you know, we just wanted to get a bunch of technologists together. We're a small resource town, so technologists are few and far between. So the opportunity to network's really good. The opportunity to talk about open data is really good.
06:01
Our municipalities and cities have just been releasing data, so it's really cool to sort of hack away on that. So we had various teams doing various different things. There were different ideas that they followed up on. One team in particular had this problem they wanted to solve. They had a simple idea. They thought, hey, we wanna compare the budgetary financials of different municipalities
06:22
with each other and find out where you get the best buying for your tax money. We wanna understand where I should live to get the best services for the lowest costs, you know, where's the best place? So this is, like from a business perspective, that makes perfect sense. The idea would be able to give the consumers, the citizenry, an idea of the best value municipality
06:41
to move to. Seems reasonable, seems interesting. Turned out to be quite a tough, tall order. And that's mainly because no one is really talking the same language. No one is talking the same language. And by language, I don't mean
07:01
spoken, written, programming languages, or even data transfer formats. I'm talking about the raw, absolute data points, the numbers. The numbers published by different municipalities mean different things, which means there was no opportunity for any level of comparative analysis.
07:22
The hackathon team were left comparing apples with oranges. Because of the vast spectrum of data we talked about, the municipalities of BC had found themselves seeing and measuring the financial world in slightly different ways. And that slightly different perspective led to slightly different financial data products, which meant completely different data products,
07:43
which meant no dice for the hackathon team. The point here is not to beat on those municipalities. It's not really, you know, they've come through very troubled waters to get to the point where they are releasing data. But the point is highlighting that perhaps there's an opportunity cost in general
08:00
around this kind of stuff. In review of the appropriate data of the comparative analysis of budgetary data, we found that a whole bunch of different technologies were at play. Different technologies, different platforms, different, a whole bunch of different stuff.
08:21
Each technology was providing data in a slightly different way. In the geospace, we also see a whole bunch of tools and technologies. There's a gazillion different tools for different jobs. And maybe that's a good thing, maybe it's a bad thing. We'll see. The expectation of the hackathon team was not that they would find exactly the same thing. I think that would be unrealistic. But that they would find maybe different dialects
08:43
of the same language, you know? The things that are common enough that you can mash them together in a meaningful way. I'm a geo guy. I knew that was gonna be the outcome. I looked at that, I thought that's a great idea, guys. You should do that.
09:00
Secretly wondering if they would have some special sauce that I hadn't seen before that I could steal from them and use in my work. And I thought, this could be a really cool thing. Maybe they've solved the problem. But being personally quite validated by the fact that it didn't work, you know? And that this is the problem I face every day and thank God I haven't missed a trick.
09:24
The real thing is that the barriers to this problem are many and complex. There are human barriers, there's technology barriers, there's technology environment, security, FOI, licensing, vendors to consider.
09:40
There's a bunch of consideration. But that got me thinking. It got me thinking hard about, do we actually care about any of those considerations? How long will it be before you move on to the next piece of software serving or disseminating your data? When will the next high speed internet format come out?
10:04
It's worth considering the process of just publishing an open data website just because you can. Maybe that's not such a good thing. With this in mind, the real value of data is of course the data,
10:21
not necessarily the technology housing it. This is an important thing. Or indeed the software supporting its distribution, it's the actual ones and zeros, it's the data. The values in those tables, more so, and the value of each data point increases every day as well. As the temporal depth increases,
10:42
the amount of actual value increases too because you have more information. I mean, Landsat for instance is a hugely valuable dataset because of its longevity. And that happens entirely independently of the software or the technology.
11:00
That happens because of the data and its age and the consistency of its capture. So we should make sure that we are capturing and publishing the right data because if we're not, then again we face this idea of the opportunity cost to our investment in that data.
11:20
So back to the hackathon. Context is really interesting. Context is a really important thing. Without context you get a skewed impression of what our world actually looks like. You might be confident in knowing that your little piece of the world is just right. But unless you have a good idea
11:41
of what's happening around you, you kind of end up with the map chicken. You end up with this idea that, you know, you've got your piece right, and I don't really care what everyone else is doing. So you don't have this idea of context. And this is an, you know, an extension of this is the idea that we should generate an enormous value
12:04
to our data by publishing it in commonly understood manners. So let's take cats, for instance. The University of Abster did a wonderful study. They found that there's 14 billion images
12:21
of domestic cats on the internet. Of which 2.7% have bred around their heads. Indeed, there's only 220 million domestic cats in the world. Which is, which leaves us with the problematic situation that there's 65 pictures of every single cat on the internet.
12:42
What's the point here? The cats, what's the point? The massively popular phenomena of cats on the internet is the combination of cuteness, convenience, and compatibility. Think about it this way, each cat data point is commonly understood by both the computer and the person. There's only a few popular image formats.
13:03
And in the most part, they're well documented, well understood. The ability to take a picture of a cat is somewhat ubiquitous, it's easy to do. And these data points are perhaps just slightly different dialects of the same language. So they're easy to share, they're easy to manipulate,
13:20
and they're easy to reuse. Oh wait, isn't that what we want from open data? Consider the multiplication factor that we had with the temporal nature of data, and then consider what the network effect is if we commonly publish the comparable data sets.
13:41
If we understand each other with different dialects of the same language. This is the data utopia I think we need to strive towards. Is this easy? No. No, this is really hard. This is actually really, really hard. But what's the first easy thing to do?
14:00
What's the first easy geo thing you can do to make your data readily available to everybody else on Earth? The easy thing is to do that. Publish your data in two well understood projection formats. I'm sure that your local conic conformal
14:23
measures the area way better, and it's got better distance, but the rest of the world, the rest of the web mapping world who want to join things together, they care about Web Mercator. We can beat up a Web Mercator all we want, that's fine. Problem is it's there, it's a reality.
14:41
So it's typically either a button push or a single line of code to also publish your data in a commonly understood projection system, to get it in a commonly understood manner that anyone, if they want to, can just say, hey, whoa, yeah, I can consume that into my web map. It's the same kind of thing as this thing.
15:02
I can get roads from Alberta, or I can get roads from BC, and I can have Western Canadian roads. This is awesome, you know? The hard bit here is not the technology. The hard bit here is actually the advocacy and the willingness to committing to what I'd like to call a commonwealth of data,
15:23
a commonwealth of data formats, a commonwealth of sort of data lumps that we can all access. I think that is the key takeaway here. My point? The key thing is that, for instance,
15:42
every individual municipality's data becomes more valuable the more it can be commonly understood in the context of other municipalities. I keep on beating up on municipalities. That's not really fair. I just mean entity that publishes data. Let's say that, for instance. Companies could also be doing this.
16:02
Every province, territory, state, entity, company, data becomes more useful when it can be placed within a much bigger context. In short, I propose that we congratulate ourselves on making a huge leap forward in publishing data, but we start thinking a little bit more about what to publish.
16:22
We start talking to each other, and ideally, we kinda try and publish the same thing. And there's a picture of a cat I find on the internet. I thought you might like it. That's me. Thank you very much.
16:47
So you said the publishing 4326. Isn't that just the technology of the day? Where do we draw the line? Where do we draw the line? When the world's a different shape. No, I mean that. Yeah, I agree.
17:01
I agree that 4326, well, and 357 to some extent, they're sort of indicative of the technologies that we're using right now. But I think also LatLng in general terms, and the WGS84 in general terms, is probably not gonna disappear until we have a different shaped Earth,
17:21
because it's the most convenient way. And frankly, we're measuring latitude and longitude as our kind of defacto global measurement system for the globe, I guess. So if there's a better one, awesome, let's present it. Let's get out there. But I'm not sure there is right now. And we could probably blame,
17:44
let's blame open layers for the sake of it, but we could also blame Google and Bing and all these other guys for joining together and doing the same thing. Or we could say that's an awesome approach, and now we can all publish our data in the same thing, and no matter what manner we want to display that data,
18:01
it's readily available. But how about compared to, for example, the OGC standards, are they ubiquitous enough to be considered the language we should choose to support until the Earth changes? Sure. But I'd also argue that the OGC can provide their standard
18:23
but we could spend an awful lot of time jumping onto that standard and doing that, or we could do this thing that's gonna work on the platforms we have right now. So I mean, there's a pragmatic piece here,
18:40
which is an easy thing to do to get your data to everybody who's using a web mapping application which only understands one of a few projection systems is to press that button, is to write that line of code that says transform as and cache me. It just seems like a very straightforward approach
19:01
to getting over the hump, which isn't necessarily an OGC hump, it's a global kind of use of data hump in that we want more people to share more data. I think that's something that we'd all like to see. And a quick way of doing that is publishing it in a commonly understood projection system.
19:25
So you mentioned at the beginning that the municipalities didn't just have different data formats, but that the numbers meant different things. And so I see the publishing standards is a good way for normalizing the publishing of that kind of data,
19:40
but how do you get municipalities to start tracking the same numbers and talking to each other in the same language? Well, I think talking to each other is the magic. And also the, yeah, there we are. And also the hardest piece of the puzzle. I mean, when it comes down to it's a human decision, what data you track.
20:00
And I think the open data trap is, hey, it's easy, we got this thing, turn it on. And the harder bit is where you think, okay, we should actually have some kind of understanding of what is commonly useful to the community. And maybe that involves a manipulation on the municipalities end.
20:21
My experience is that typically if you make it harder to release open data, it typically doesn't happen to quite the same velocity. So that's a risk for sure. But I think the network effect of people talking the same language and being able to be sort of some level comparative with each other is enormous.
20:41
I think there's huge value there. And I think each individual municipality or state, province, country can actually leverage that themselves. I think there's a value to them as well.
21:03
So in terms of the sort of interoperability you're talking about with data from multiple municipalities, multiple sources, what besides the SRS, what parameters are you running into because I probably don't have the greatest grasp
21:22
of using different, of this problem, but what are the parameters? The key thing is that people publish different stuff about the same thing. So they'll call it, they might call it the same thing, but it's an entirely different entity. So the columns are different.
21:41
They hold different information. So in essence, it's comparing apples and oranges. So it sounds like what you're really talking about is developing a standard set of ontologies. Yeah, oh, I don't use those words because they're really long. Well, other people do who are trying to develop
22:01
those standards. It's a standard set of ontologies. It's a semantic understanding of what data would be useful in the community and blobbing ourselves into that. Thank you very much, guys.