Aggregating risk with H3 and PostGIS
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/69093 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 2022151 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
BitMultiplication signDemo (music)Internet forumSpeech synthesisPoint cloudCodeAddress spaceComputer animation
00:46
Point cloudAddress spaceService (economics)Latent heatUniform resource locatorScaling (geometry)CodeComputing platformPresentation of a groupPoint cloudProcess (computing)Address spaceLatent heatMathematical analysisMultiplication signCategory of beingComputer architectureLambda calculusLevel (video gaming)Design by contractResultantCASE <Informatik>Database transactionFront and back endsService (economics)Term (mathematics)ExistenceGeometryHand fanComputer animation
03:41
Greatest elementCategory of beingLatent heatGeometryDomain nameMetre1 (number)Differential (mechanical device)Term (mathematics)Hill differential equationLevel (video gaming)Denial-of-service attackImage resolution
04:23
Event horizonDivisorInsertion lossInsertion lossNumberUniform resource locatorEvent horizonArithmetic meanTerm (mathematics)DivisorComputer animation
05:20
Multiplication signEvent horizonCalculationSoftwareBuildingCategory of beingExecution unitTowerComputer animation
06:14
Address spaceEntire functionService (economics)Scale (map)ArchitectureShape (magazine)Boundary value problemBuildingDependent and independent variablesBlock (periodic table)Scaling (geometry)Visualization (computer graphics)System administratorArithmetic meanDemo (music)Uniform resource locatorNumberDenial-of-service attackEvent horizonRight angleTime zoneLevel (video gaming)Computer architectureData managementMoment (mathematics)Category of beingLatent heatComputer animation
08:04
Denial-of-service attackTotal S.A.Demo (music)Proof theoryBitDemo (music)Query languageUniform resource locatorComputer animation
08:53
Total S.A.Proof theoryDemo (music)Denial-of-service attackDenial-of-service attackTerm (mathematics)SynchronizationGeometryBitPlug-in (computing)Object (grammar)Regular graphInteractive televisionLevel (video gaming)Distribution (mathematics)Front and back endsVirtual machineExpressionZoom lensClient (computing)HexagonSystem administratorUniform resource locatorTime zone2 (number)Structural loadBoundary value problemTessellationVector spaceProgram slicingServer (computing)TesselationDebuggerDemo (music)Computer animation
11:20
LogicQuery languageServer (computing)Proof theoryComputer architectureDigital Equipment CorporationTessellationMoment (mathematics)Computer animation
11:38
LogicQuery languageServer (computing)Image resolutionBitInformationLogicGeometryExtension (kinesiology)TessellationLimit (category theory)Level (video gaming)BuildingUniform resource locatorBoundary value problemQuery languageClient (computing)Shape (magazine)File formatExpressionDegree (graph theory)Front and back endsTerm (mathematics)NumberDebuggerZoom lensMereologyVector spaceSoftwareAttribute grammarCartesian coordinate systemTesselationFunctional (mathematics)Digital Equipment CorporationTouchscreenVolumenvisualisierungDifferent (Kate Ryan album)HexagonComputer animation
14:19
Point cloudAddress spaceQuery languageClient (computing)NumberComputer animation
14:40
Local GroupZoom lensNumberArray data structureSimple groupBound stateGroup actionSystem callHexagonMatching (graph theory)Boundary value problemNumberCodeLevel (video gaming)BuildingDenial-of-service attackExpressionCuboidRegular graphZoom lensSystem administratorBitPositional notationTime zoneResultantFunctional (mathematics)Front and back endsIterationSource codeJSON
16:12
Lambda calculusGateway (telecommunications)Function (mathematics)VolumenvisualisierungVector spaceAddress spacePoint cloudComputer architectureExpressionServer (computing)InformationElectric generatorTesselationFront and back endsProduct (business)Scaling (geometry)Functional (mathematics)Line (geometry)Level (video gaming)TessellationCodeQuery languageGateway (telecommunications)Term (mathematics)Group actionLambda calculusVector spaceVolumenvisualisierungComputer animation
17:21
NumberVector spaceScale (map)Dependent and independent variablesMereologyGeometryQuery languageVector spacePoint cloudNumberAddress spaceResultantScaling (geometry)MathematicsMathematical optimizationClient (computing)TessellationDynamical systemComputer animation
Transcript: English(auto-generated)
00:00
Okay, I think it's nine o'clock, we're going to start now. There's no moderator, so I'm going to moderate, I think. So I'm Mark Varley, I'm the CEO of AddressCloud. Speaking after me is going to be Sarah Huffman, we're talking about nominating and geocoding. And then after that, I think we've got some really interesting lightning talks as well. So yeah, I'm going to be speaking for the first
00:21
kind of 20 minutes, and we're going to have some questions, a bit of time for people to move rooms, and then Sarah is going to be up at 9.30. So I'm here today to talk about aggregating risk with H3 and PostGIS. So I'm going to give a little bit of background around the company, what we do. I'm going to dive in a little bit into the detail,
00:41
I've got a pre-recorded demo I'm going to show, and then kind of conclusion and questions at the end. So in terms of what we do, AddressCloud, we're a UK-based company. I like to say we were born at Phosphogee, so in 2013, I went to my first Phosphogee in Nottingham, was really inspired by all the talks that I saw,
01:03
and that gave me the idea to start a business. 2015, I went to Korea, and I met Dennis, but I also met my co-founder, Thomas, who's our CTO, and on from there, we've basically built the business up. So what we do is we provide a geocoding
01:20
and a location intelligence service. The geocoding service is UK and Ireland-specific, so we use government data from the UK and Ireland, and we ingest that, and we provide that back as a service to, predominantly to insurance companies, and then we have a location intelligence platform, which is all around for a specific address,
01:40
describing it, and providing a very, very fast risk analysis of that property, specific property level. So, as I say, our infrastructure is completely serverless, so I've talked at Phosphogee a few times and evangelized about serverless, and we live and breathe that, so in 2019, we moved our entire stack
02:02
to be serverless running on AWS, predominantly on Lambda. We now scale, we do 50 million transactions a month, so we work with big insurance companies, smaller insurance companies, and we also work with, have some government contracts as well. So, in terms of what I'm gonna talk about here,
02:21
I've actually stolen this, somebody, I saw a presentation yesterday where they put this quote up from Paul, I don't think Paul's here this year, but we're big fans of Paul Ramsey, we're like, all of his ideas are great, and this is really, this is something that we do. Really, to build something to scale, you need to kinda make it boring, so what I'm gonna show you today, the actual back end, the actual final result
02:42
is pretty interesting, but the architecture behind it is quite boring, and boring scales, boring scales really, really well. So, as I say, really, today, so for the first kind of, you know, we've been running since 2015, so the first seven years of our existence have all been around a specific niche, which is insurance,
03:02
and specifically around optimizing for one use case, which is find me an address really quickly, and describe it, tell me stuff about it really quickly. So, to do that, we precache everything, we preload everything, all of our geo-processing is all done ahead of time, rather than being done on the fly,
03:21
and that allows us to be able to scale really well. So, as I say, for that 50 million transactions a month, we have to be available 24-7, and we have to respond within half a second, that's the SLA that we have with all of our customers. If someone wants to get their insurance at 3 a.m., they have to be able to get that, so that means finding an address and working out all of the things
03:41
that would worry insurers. So, a few examples there, and there's a lot of geography in insurance, it's, if you're looking for a really interesting place to apply your geo skills, insurance is a fantastic, really fantastic domain. So, yeah, flooding, fire risk, subsidence risk, and crime being some of the key ones,
04:00
these are all things that you can use geo for to understand it and provide insight to customers. And we do this down at specific property level, so if you've got 20 properties on a street, maybe it's on a hill, for example, we'll differentiate, we'll say the property at the bottom near the river is a bad risk, the one at the top is fine. So we're doing this down to a very low level
04:21
of resolution, normally down to five meters. So, in terms of what we do, so today everything's been met on single risk, but what we're looking at now is another problem that all insurers face, which is making sure that they have a balanced portfolio. So what this means is bringing in a huge, a very large number of locations
04:41
and being able to provide some insight back to that company very, very quickly. So really it's all around kind of hotspots, do they have too much risk in one place, and also how they might be able to manage and mitigate that risk. And again, geography's a huge factor, so we talked here about floods and quakes,
05:01
but this would also be things like terror attacks and other things that could potentially mean huge losses for customers, but also for the insurance company, it could be an event that the company could actually fold, the company might not be able to bear an event like that.
05:20
The biggest example of this was, and really a lot of the work that we're doing today kind of got triggered really by the terror attacks, the 9-11 terror attacks. The company I was working for at the time was insuring through, like as a global insurance company, through a network of subsidiaries and localized companies,
05:41
they insured both towers and all of the retail units around it, and when the building came down, it was, this is a 300-year-old company, for the first kind of two or three weeks they didn't know whether or not the company was gonna survive that event. Luckily they did, through kind of fortuitous circumstances and reinsurance and things like that, they survived it,
06:01
but they wanna avoid doing this in the future if possible. And that means being able to perform some of these calculations in advance before you go and insure all of these properties, basically doing this in advance. So yeah, this is something we're building at the moment, this is kind of an early insight, but it's like can we basically,
06:21
can we take our serverless ethos and all of the things that we do and extend that out to, instead of just looking at one property, looking at a huge number of properties, but with a similar architecture and being able to provide very quick insight and very quick responses. And yeah, can we do it at scale? So we've got kind of five requirements
06:42
that we're gonna go through here, and then I'm gonna show a quick demo around how we propose to solve these. So yeah, we need to be able to visualize millions of locations, so the companies that we work with can have very large portfolios, so this thing needs to be able to support a very large number of locations.
07:01
We wanna be able to filter, so we don't really want, we don't know what the, necessarily the questions are that our customers, or our customers are normally the insurance underwriters, or the portfolio managers, we don't know what they're gonna ask, so they need to be able to ask any question and be able to do that really quickly. We also need to be able to aggregate,
07:22
so we're quite lucky in that most of what we're aggregating by are predefined shapes, but these could be anything from administrative boundaries, Cresta Zones, which is something specific for insurance around earthquake risk, earthquake risk zones, fire blocks, floods, and buildings, so from right up here,
07:41
right down to a low level of detail. We need to be able to do ad hoc as well, so in the event like, let's say we're in a country that doesn't have any good administrative boundaries, we still want a means to be able to aggregate that data up as well, and finally, as everything, it needs to be really quick, and it needs to be serverless,
08:01
we have an SLA with our customers, so we need to be able to respond really quickly. So this is a really brief demo that I've put together, just showing what it is that, how we're gonna solve this. It's pretty rough already, but I'll play it and I'll explain.
08:21
So basically what we're doing here is, so everything we see here is the entire portfolio, so we start with a million locations, I wanna look at Marks & Spencers, or I may wanna look at Argos, these are two retailers in the UK, and you can see this is all happening, there's no smoke and mirrors, I'm able to basically go in there and be able to perform queries really, really quickly,
08:43
and get that insight that's coming up straight away, and this is all using, basically using some really, really simple tooling. So I'll just play that again, because it's a little bit quick. So we start with the four portfolio, so this is the whole country, a million locations and a lot of money,
09:00
and very quickly, almost instantaneously, we can get down to 600 locations, 477. This here is the distribution of flood scores, so we can see most places have no flooding. What would be more interesting is to kind of slice over here and look where some of the higher flood risk is. And again, once we go in, so it's responding to both the interaction with the map
09:22
and also the interaction through the search bar here, so it's very, very quickly aggregated. I think here I'm probably using a hex zone that's maybe a little bit too small, ideally I'd use that a little bit bigger, but you can see that this and this are completely in sync with one another. And yeah, this loads really, really quickly, so all of this stuff is happening on the server,
09:42
that's why we can do it so quickly. So in terms of what we're using for that demo, so this was just something that was running locally on my machine that I coded on Sunday, but basically we have React and Map Libre on the front end, we've got Express.js which is serving up our API,
10:00
we're using TileServer that's serving up our vector tiles and then this is all Postgres on the back end. But what we're doing really is that the TileServer, so everything is being pre-pashed, so we're creating all of our admin boundaries in Postgres and then cutting, using typically we're gonna be cutting tiles to serve up here, so that's all pre-calculated.
10:21
And then in terms of the actual data itself, we've got that million insurance policies, we're actually pre-calculating and pre-storing all of the zoom levels, so we're using the H3PG plugin to show those hexes, but rather than doing it on the fly, we do it in advance. And I ran, basically pre-storing everything
10:46
with all the tags for every zoom level. So to calculate every zoom level on a million locations in advance took 12 seconds with H3PG, it's really, really fast, and then after that all you're having to do is aggregate up tags.
11:02
So there's no geo, all the geo's been done in advance and then when you're running on the fly, all you're doing is aggregating regular text. So it keeps things really, really simple, there's no geometries being passed to the client and back, all we're doing is passing regular JSON objects and then the client is doing the styling.
11:20
So one thing that we would probably do actually looking at yesterday, one thing we're not using at the moment is DEC, so for the H3 tiles we've actually pre-calculated those ourselves. I went to the DEC talk yesterday, so I think we'd probably look at DEC here as well as ability to be able to make this even better. So yeah, basically what we're doing is we're pushing
11:42
all of the complicated geologic to the server, we're doing it as part of our ETL, so as we bring on all of these policies, we are going out and we are tagging, we're doing all of that geo work in advance, so everything's pre-cached and pre-calculated and therefore very, very quick.
12:01
And as I mentioned, so Tippy Canoe we're using, if you haven't used Tippy Canoe it's great, I guess the only kind of drawback is that you have to go to GeoJSON first before you create your tiles, that's slightly annoying but it's a really, really fantastic piece of software that will run on even quite small servers. The other thing as well is the H,
12:21
so the limitation of what we've done here which I think DEC will solve is that when you start generating H3 tiles at lower levels of resolution, the vector tile format becomes, the vector tiles that you create become huge, whereas I think DEC would solve that because DEC knows how to render H3 in advance but we can keep our back end the same.
12:41
We're just passing through a bunch of different H3 attributes with some information against it. So, and again, we're quite lucky in that our geometries are static, so our buildings layer gets updated every six months, our administrative boundaries maybe once a year
13:00
and obviously H3 hexagons don't move. So all of this stuff we can do in advance and we don't make the client pay the penalty in terms of performance, everything's pre-cached, pre-cached and pre-aggregated. And then what we're doing, so really it's the front end that's doing the work, the front end needs to be fairly intelligent, it needs to understand what zoom level am I at,
13:23
therefore what is the most applicable resolution of information I can show. And also when it gets, it's orchestrating the queries of the client, it's passing through the extents and when it gets back, really I call this painting by numbers. So you've got a really, imagine you've got a really simple vector tiles canvas
13:43
and the client is basically looking at all the information it's got back and it's painting up all of the different shapes on the screen and this is very, very quick. We're using expressions but apparently if you use deck GL you can actually do this with JavaScript functions so it's a little bit easier, a little bit more elegant, a little bit simpler.
14:02
But really the idea is kind of trying to constrain, so when you're aggregating up millions of locations, try and make it as simple as possible. So construct a query where you might get, I don't know, 200 or 300 tiles and then painting those up on the client, it makes this very, very quick. So yeah, as we're in Florence,
14:22
I found this up so I'll leave this, I thought it was quite interesting but basically this is the concept, we're painting by numbers, so we're bringing complicated data, simplifying it down to a really simple SQL query and then painting things up on the client to make it go really, really, really quick.
14:42
For those who like the code, this is the paint by numbers function that I'm doing here. It's pretty simple so I'm basically saying, where, what zoom level am I at? Here I'm just using regular MapLib, we're not using like React Map, React MapLib GL which I think we would do in future. So some of this stuff is a little bit,
15:01
you know, we're basically instructing the map what to do. But yeah, we're getting the zoom, we're working out the bounds. This is my backend call, this is calling to my API to be able to do the aggregation, so we're basically saying, within this bounding box, it's just a simple group by, so group by all of the data, fills by the insured name and the flood score
15:21
and then bring that all back up as an aggregation. And then what we're doing here is we're setting up our match expression, so if you use match expressions, match expressions are basically arrays of arrays, it's like a map box notation, but it allows you to, so we're getting our ID, so that's saying, look on the map for things with this ID
15:42
and that might be a hex, that could be a hex zone, that could be a admin boundary, it could be a building, could be anything, and then we're gonna iterate through all of our results to paint, paint each thing up and then we've got some thing here with the colors, so it knows what colors to paint everything up. And then there's a fallback here saying, well if there's a particular zone like a hex zone
16:01
or a building that we don't know anything about, then give it like a regular invisible paint, so don't paint anything. And yet this works really well, it works really quickly. So in terms of taking this into production, so what we said at the beginning is it needs to be fast and it needs to be serverless, now we know we can do it fast, can we do it serverless?
16:22
And yeah, we can, so basically, by doing everything in advance in Postgres, our lambda function can be quite dumb, all it's really happened to do is, I mean the API that we've got is probably about 20 lines of code, it's just saying, okay, take information here and write and send some simple queries to Postgres to do group by.
16:43
I'm just gonna do a quick plug for Thomas, our CTO, the other thing that we need to do is, you remember earlier in my previous example, we had Express and we had a tile server, so what we're doing here, we're not using any tile server because a tile server will need a server to run it on, or even our vector tile generation
17:00
is being done completely serverlessly. So Tom's doing a talk later, he's doing a lightning talk on how to render vector tiles directly using API Gateway and Lambda with no server, no backend, no nothing. It works really, really well, it scales up really well. And then as I say, react to map leave on the client, but we're probably gonna go to deck.
17:23
So to conclude really, as I say, I said to you at the beginning, to make this scale, you need to make it boring, that means doing everything in advance. So do all of your, everything as part of your ETL, if you know in advance the questions that your customers and your users are gonna ask,
17:41
then pre-render and pre-optimize for that, and that allows you to cache and do other cool things as well. Pre-render your vector tiles, so if you are able, if your geometries don't change, if they're not dynamic, they're static, pre-render everything and the vector tiles is fantastic. And then using this painting by numbers approach, so instead of having to pass geometries up to the client
18:03
that have been pre-formatted, basically just have your geometries, have your result query results, and let the client put those together and paint them up, and then that way you'll get really, really quick, really fast performance. And yeah, that's everything, so I hope you enjoyed the talk,
18:21
I'm happy to ask any questions, I've got, if anyone wants an Address Cloud sticker, I've got some Address Cloud stickers, but yeah, I'll open up the audience to any questions. Thank you.