Open Source Discoverability: the journey of unearthing an abundance of hidden data
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 52 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/44681 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FOSS4G SotM Oceania 201949 / 52
9
11
12
14
15
17
20
23
26
28
30
32
34
39
44
00:00
SoftwareentwicklerOffene MengeWeb-SeiteEinfache GenauigkeitHomepageCOMInhalt <Mathematik>SuchmaschineWeb SiteHydrostatikApp <Programm>Web-SeiteVerschlingungHydrostatikEinfache GenauigkeitWeb SiteCodeInhalt <Mathematik>Basis <Mathematik>SystemzusammenbruchApp <Programm>MAPRepository <Informatik>FlächeninhaltSpider <Programm>VisualisierungKartesische KoordinatenDatenstrukturResultanteAutomatische IndexierungTopologieGoogolOnline-KatalogMereologieInformationWellenlehreVerknüpfungsgliedProfil <Strömung>BrowserMengeSuchmaschineWeb-ApplikationOpen SourceWhiteboardMapping <Computergraphik>SoftwareentwicklerDatensichtgerätDivisionRechter WinkelComputerspielSoftwareschwachstelleBenutzerbeteiligungGebäude <Mathematik>Arithmetisches MittelExpertensystemVirtuelle MaschineSystemaufrufPunktMultiplikationsoperatorFront-End <Software>AbfrageHomepageQuick-SortBitMinimumTwitter <Softwareplattform>MathematikLoopServerGatewayOffene MengeHacker
07:53
IndexberechnungGoogolApp <Programm>Soziale SoftwareRenderingServerArchitektur <Informatik>EntscheidungstheorieWeb SiteBrowserVolumenvisualisierungFunktion <Mathematik>W3C-StandardKonfigurationsraumStrategisches SpielInhalt <Mathematik>GasströmungDienst <Informatik>PunktwolkeLambda-KalkülOpen SourceSuchmaschineZahlenbereichDynamisches SystemResultanteRenderingBildgebendes VerfahrenHydrostatikMultiplikationsoperatorInformationsspeicherungApp <Programm>BitHomepageWeb-SeiteAggregatzustandAutomatische IndexierungLeistung <Physik>BenutzerbeteiligungStandardabweichungBrowserFramework <Informatik>ServerRoutingCodeFreier ParameterVersionsverwaltungWeb SiteHypermediaSystemplattformKartesische KoordinatenDefaultKonfiguration <Informatik>Dienst <Informatik>Rechter WinkelNP-hartes ProblemKreisflächeMereologieTouchscreenMAPInhalt <Mathematik>VerschlingungDemo <Programm>ProgrammierumgebungArithmetisches MittelInteraktives FernsehenKontrollstrukturMathematikOpen SourceFlächeninhaltDreiecksfreier GraphSchaltnetzWeb ServicesFormale SpracheArithmetische FolgeSurjektivitätAppletMengeEinfach zusammenhängender RaumStrategisches SpielDatenfeldPunktwolkeSichtenkonzeptLipschitz-StetigkeitMinkowski-MetrikUmwandlungsenthalpieDatensichtgerätMaßerweiterungBesprechung/Interview
15:45
DatensichtgerätMengeDatensichtgerätVorlesung/Konferenz
16:01
KommunikationsdesignLastSampler <Musikinstrument>Physikalischer EffektNormierter RaumLipschitz-StetigkeitSchreib-Lese-KopfMinkowski-MetrikProgrammschemaMenütechnikVollständigkeitApp <Programm>InternetworkingGüte der AnpassungSpider <Programm>Inhalt <Mathematik>Web-SeiteMAPProgrammverifikationEinfach zusammenhängender RaumEinfache GenauigkeitSuchmaschineComputeranimation
17:06
GEDCOMProgrammschemaTermBitMathematikInteraktives FernsehenWeb-SeiteMengeRechenschieberServerApp <Programm>BenutzerbeteiligungEinfache GenauigkeitMetadatenSuchmaschineGoogolComputervirusDatenstrukturAutomatische IndexierungMailing-ListeRechter WinkelHilfesystemMAPVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:02
My name is Wing. I'm relatively new to geospatial. I'm not a geospatial expert by any means. But I am a front-end developer, and I've chosen to apply this to geospatial, the learnings that I have from being a front-end developer.
00:22
I work at CSRSR61 back in Australia. And if you have any questions that we can't get to during question time, you can find me on Twitter on that handle at soy sauce. So when I read my own talk title, I kind of realized it was a bit confusing. So I'll kind of preface this with a bit of what
00:43
this talk isn't going to be about. It's not about how to encode or describe your data. It's purely to expose the data that you've already got handy. The next one is, I don't think any data that doesn't exist. We can't really pull data out of thin air. Although that's almost what we're doing here.
01:01
But it'll make sense once we get into it. We're not exposing any data that's not accessible by a web browser. So Terrier takes data and kind of makes it displayed in a web browser. So that's the sort of data that will be surfacing. And we're not making your open source code base discoverable. I'm not going to tell you any growth hack tips on how
01:22
to get more stars on your GitHub repo. But if you know how to do that, you can tell me. So what it will be about is, we're going to be talking about how we began our journey into how we made Terrier more discoverable. Terrier is a JavaScript web app.
01:40
And how we kind of did this while maintaining our open source ethos. So we wanted to apply an approach that kind of worked with the way that we worked in the open. And the technique will be as broad as possible. So this will kind of be broadly applicable to every single page out there and staying free.
02:03
So we didn't want to make it cost anything additional to run a Terrier map. We just wanted to add this new cool thing without having to increase the cost. And I'll give a crash course in search engines or web crawlers. So I need to firstly tell you a little bit about Terrier,
02:21
which is a web geospatial visualization tool that runs in your browser. It's a single page application. And when I say single page, I mean it. So this literally right here, everything that you see here is everything Terrier. So you've got a map on the right hand side. And you've got a kind of UI toggles on the left hand side that kind of change
02:41
what you see on the right hand side. There's no notion of a separate page. However, we do have this thing called the data catalog. It's essentially a tree structure of a bunch of data sets that you might want to curate and show to your users. And you can only access this catalog
03:00
by clicking this Add Data button that's just in the top left. So that right there is a HTML button. So search engines, they're pretty big. So one of the first points of calls that we have when we have a query or a question
03:20
is we immediately think of, oh, I'll just go and Google it, or Bing it, or Docker, or whatever it is that you like to do. So given that, I think it's pretty important to kind of understand how they work or how they view things on the web. This is a very contrived example. The HTML that's there doesn't really matter. It's just there for example's sake.
03:40
So search engines do a thing called web crawling, where they crawl your website. And so let's just say we have this three-page website. It's got some content on it. And more importantly, there's a couple of ATAGs or some links to more pages on your website. So the web crawler will go and see your home page.
04:02
And they'll be like, oh, look, there's more links to more pages. I'm going to go to those pages. And so it will go and visit reallyamazingthing.com slash great dataset and then do the exact same thing. You'll loop through and crawl across your website to get a picture of what it is that you've
04:20
got on your website. So this is what a search engine result page would look like for this three-page website. And again, this is a really basic example. It's a bunch of HTML pages that we link together. Search engines really only want to serve relevant, useful, and meaningful data information
04:41
to its users. I look at Terrier. This is going back to Terrier. I look at this, and I'm thinking, where's the meaningful content? I can't really. There's nothing for me to serve up to the search engines, right? How do we even make a map like this indexable or have data for search engines to crawl? Well, there isn't.
05:01
And that's why I pointed so much attention to this add data bit because that's the part in our application where it's like a gateway to a whole bunch of information that is basically invisible to search engines. We could surface this whole bunch of data and more for each of the data sets that we have in Terrier.
05:21
And some of the Terrier maps that we have have hundreds and even thousands of data sets that are completely invisible. This boggled my mind. So of course, we were set up to fix it. And so going back to static HTML sites, it's 2019.
05:41
Terrier isn't a static HTML website. It's all about the single page apps now. So let's see what the HTML looks like for a single page app. It's a HTML. It's got one lonely div in the middle right there. So it's like, oh, OK. Hello, I'm a div of ID up. So that gets served up.
06:00
And its sole purpose in life is to wait there for your JavaScript to be parsed and downloaded and ready so that your JavaScript will go in and take over and bring that div to life. Notice that we don't have any anchor tags or any links to other pages on this. So the search engine is like, oh, this is just one page. I'm done crawling this website. So it literally can't see all the other pages.
06:22
So if a search engine sees a link to your website from someone else's website, they'll go and visit it. But it's the same thing. If you hit your page at this with a search engine, it's the same thing. There's more empty divs or blank pages, essentially. And so again, in 2019, we were like, oh,
06:40
we're done away with HTML. We want a div ID of app. We're all building applications like the bottom right there. So apps that rely on HTML, like Slack, which was mentioned earlier, if you don't know what it is, it's an instant messaging application. Their website says, most links will automatically expand to show a preview of their web page.
07:02
Slack doesn't know what to do with a div ID of app. It needs to see actual content on there for it to preview to you, because it's not going to run JavaScript to parse everything. Similarly, it's the same for search engines. They will not know anything about your thing, because they just see empty content, same with Slack.
07:21
You can have the most richest and most extensive data on your website. So it will just be invisible. No one is going to know about it. People can't Google it, because that's how they get data or get information. And I think that's really unfortunate. So you might be sitting there and thinking, oh, wait, really? That can't be true.
07:42
That's very nice. What's the point of doing all this work if nobody can find it? Surely the machines are smarter. Surely, surely, right? Unfortunately, it is true. Google will say that, oh, yes, we can index JavaScript. But that's a fallacy. It's true to some extent, but it will take weeks and weeks and weeks on end.
08:02
And it won't do it perfectly, because it's JavaScript. Other big search engines out there, Bing, they completely fall flat when they're trying to crawl single page apps. It's the same thing. They see nothing. Again, other applications like Slack or social media
08:22
even, when you're sharing links on it, that's the same thing as Slack. It can't do anything when there's nothing for it to preview. So thankfully, we have a solution for this. So it's called server-side rendering. You might be eerily familiar with this. Throughout the history of the web,
08:41
we've gone from static HTML pages linked together to HTML pages that get rendered out by PHP, which is the server-side dynamic HTML templating language, which was supported by a big part of the web. And we've gone all the way through to doing everything we did in PHP but in JavaScript. So we've gone full circle all the way back to JavaScript.
09:02
And this is the state of things if you want to fix this. Server-side rendering, what is it? It is ridiculously hard. It's bloody hard. It fundamentally changes the way that you architect your code. You kind of have to think about a whole lot of things,
09:20
including whether you're running in the browser or on the server, whether you have web APIs ready available to use, or if you're just running in the node environment. And just remember, it's really bloody hard. And so yes, it's hard. But it's hard for a reason because JavaScript. But mainly, the benefits you get out of doing it
09:41
is you get great UX benefits. You get fast first panes. So what I mean by that is once you serve up HTML to the browser, the browser can immediately pause that. And it can show something meaningful to the users instead of having to wait and stare at a blank page while the JavaScript gets downloaded, paused, et cetera.
10:03
Search engines can index your site. So once you get this done, search engines can index. There are some downsides. Slow time to first byte, meaning now your web server has to go and render and parse your JavaScript before your user even sees anything because your user's requesting the page, and the server's like, oh, OK, hang on, wait a minute.
10:21
I need to render this first. Slow time to interactive. What this means is you get a bunch of HTML, but you can't really do anything with it until the JavaScript gets parsed because that's what the role of JavaScript is, to bring interactivity to your page. I still think that's better than staring at a blank white page while, if you're
10:42
on a really slow connection, that's just a really poor experience. And just remember, it's really hard. So how do we get all of these benefits without having to do the hard yards? It's not feasible for any of us at this conference to walk away and go back to work and be like, oh, OK, we're going to server-side rendering everything. There's an easier way.
11:01
Let me tell you about it. It's called prerendering. This is our golden savior. Overview, less stupidly hard. We get all the benefits of server-side rendering and more, and more meaning we don't have that slow time to first byte. The server doesn't have to do any thinking when
11:21
your users are requesting pages, because it already has the HTML ready to deliver. It just has to serve that down the pipe. A lot less architectural changes. And it's basically really bloody easy. So easy that I can explain it to you in three steps. All right?
11:41
Prerendering 101. So use a headless browser to render out all your pages, your routes. Store your HTML, and then you serve that back up. It's that simple. I'll break that down a little bit more. So headless browser and store HTML, that's kind of two steps together, happens at the same time. This is the prerendering part, where
12:01
you take a headless browser and you hit all your routes, as if you're a search engine, because you want to create an image on your own server of what your website looks like. And you kind of save the results of that into HTML. Webpack allows you to do all of this, and it's all bundled up nicely into a little plugin.
12:22
But basically, that's what it does under the hood, right? Number three is configure your web server to serve up that HTML. So you make your home page serve up, your original index HTML that has some prerendered stuff on it, and then say you've got some routes, you would serve the page for the index.html for that,
12:43
and so on and so forth for the rest of the routes on your page. So you might be thinking, oh, well, that sounds really good, Wing, but what's the support for this strategy? React, it works, yes. Vue also works. Angular, luckily, yes.
13:01
And insert any JavaScript framework. It will work, right? At the end of the day, it's just HTML. All we're doing is we're utilizing web standards. We're using the power of the browser to get a snapshot of your end state HTML.
13:20
So if anybody ever tries to sell you something without at least mentioning some of the downsides, it should be extra suspicious, right? So here are some of the downsides. You would have increased build times, right? Because every time that you build or a new version of your application, you're going to have to go in and pre-render everything again. Increased time to interactive, I've gone through that.
13:40
If you've got lots of routes, this might be a bit tricky. You might have to think of some creative ways to get around this, because it's just going to get more and more unwieldy. If you've got any dynamic kind of user-specific content, you're going to want to make sure that you put some placeholder content there so that it's ready to get swapped in once the JavaScript is ready.
14:01
And so in Ontario, all we had to do was just change one button to an anchor tag, and everything was fixed along a couple of other big things that we kind of did in the background. And so other possible solutions for this problem. Of course, you can go in and actually do service-side rendering. You can use kind of at-edge services.
14:21
There's some things out there, like Cloud for Workers, Lambda Edge, that will do things. You can run extra code right at the end where your users are getting their pages served up. There's other service options that will let you do similar things a bit earlier. And there's other actual cloud-based solutions
14:41
that will pre-render and do all of this for you. So I'm not here to preach the one true way to do things. But I look at all of these options, and all I see is vendor lock-in. Surely, you can use any of these individually or in a combination or whatever. But if we included this by default with Terrier, it would just drive up the cost of running Terrier.
15:01
So we wanted to make sure we shipped something that was open-sourced by default. We wanted to avoid vendor lock-in. We didn't want to make any additional external dependencies. We didn't want to force any one platform onto our users. And we wanted to make sure that the app still kind of works even if they couldn't get whatever it is that we're running.
15:20
Like, this is all full-spec nicely. And so yesterday, I did a demo. Today, I'm going to do another live demo. No Tech Talk is nearly as exciting without something that can crash and burn. So we're going to try and do that now. And so this is a Terrier map that I
15:42
need to put on the other screen. Display settings. I'm going to need to see it. That's all right. I think I've got it.
16:01
You can switch back, I think. Yes, thank you. Right, so this is a Terrier map. It runs things. So we can put things on the map. And given good internet connection, we can. OK, so it's a map. It visualizes things.
16:21
But that's not what we're interested in here today. I'm going to go wild. And what we're going to do is we're going to go to our tools here and actually just turn off JavaScript completely. We don't need it anymore. It's 2019. Who needs JavaScript? So you can see that our single-page app that runs in JavaScript, it still kind of works.
16:42
I can still click into all of these pages, and we can see what the content is. This is essentially what you see as a web crawler or as a search engine. And this is one really good way to verify that it works, because you've turned off JavaScript, but you can still see everything on the page.
17:01
All the content here can be indexed, and it's just not tucked away, it's not hidden. So that's all I have for you today. I have a booth. Let's have less of the left side, more of the right side. And if you have any questions, I'm very happy to take them.
17:20
Thank you. Hey, great talk. I've been working a bit recently with using structured
17:41
data to represent the metadata of data sets and web pages. Have you thought a lot about how you might use something like JSON-LD to support further SEO of your data sets? That is a tricky question. When you say structured data, you mean like structured metadata?
18:02
Yeah, OK. So yes, you can absolutely do that. That will help you give structure to the data that you're serving up. But you're still going to have the same problem of not having that structured data available for all of your pages unless you configure your web
18:23
server to set up that given structured data for every page on your single page app. So yes, that would still work. You'd just have to do it a different way, because you're not making it available on every page. One other quick question.
18:50
Hello. So does it work? Does Google now index all of your data? Sorry, does it work? Does Google now index all of your data sets?
19:01
Yes, it does. I will show you that now all of our data sets are indexed by Google. And I'm not going to shame myself into showing a Terriom app that isn't running this, because it would just be like the slide that I had where there's just a page in there. But yes, all hundreds of data sets
19:21
that we've added for this particular map, it works. It's all there. Yes, so it all still works once the JavaScript is loaded in. It's still an interactive app, as if nothing changed. It's just a little bit. The search engines would just see it a little bit differently in that they can actually see it.
19:40
It's not accessible to them.