The serverless map stack lives - TIB AV-Portal

The serverless map stack lives

00:00

3

Related Material

OpenStreetMap US

Dey, Gautam Rolek, Alexander

Formal Metadata

Title

The serverless map stack lives

Title of Series

State of the Map US 2019

Number of Parts

70

Author

Rolek, Alexander

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/58478 (DOI)

Publisher

OpenStreetMap US

Release Date

Language

Content Metadata

Subject Area

Information Science

Genre

Conference/Talk

Abstract

The world of serverless technology is rapidly expanding. With the recent release of AWS Aurora Serverless for Postgres let’s find out if it’s now possible to build an entirely serverless map stack.

State of the Map US 201960 / 70

1

32:31

“Keepin' it fresh (and good)!” Continuous Ingestion of OSM Data at Facebook

2

39:27

A deep dive on training data requirements for mapping with deep learning

3

28:07

A Tour of OSM Data Analyses + Visualizations

4

26:36

A turbo introduction to Overpass

5

28:22

6

24:55

AI for disaster response mapping lessons from SpaceNet 4

7

22:15

10 Years of Map Kibera

8

17:46

Bringing Validation to Users Integrating Quality Assurance Checks into Map Editors

9

05:08

Buildings! Buildings! More Buildings, This time in Canada!

10

19:11

Capturing the flow of tags - Chameleon

11

29:15

Are you ready? Bridging the resiliency gap with OpenStreetMaps

12

06:18

Corporate Editors in the Evolving Landscape of OpenStreetMap

13

29:56

Denver Metro Building Import - The Slow Road

14

21:23

Denver Metro Buildings Import

15

21:12

Evaluating fire evacuation routes using open source maps and data

16

20:18

Expanding the reach of OpenStreetMap

17

26:44

Collaboration is Key Engaging communities and governments with Open Cities Africa

18

18:10

Comparing Bing US Buildings with OpenStreetMap

19

04:43

Confessions of a Nervous Geography Student

20

24:31

Gamifying POI Data Collection

21

27:03

Geochicas and The Streets of Women

22

05:16

Get Your Colleagues Mapping How to run a corporate mapathon

23

28:32

How Map Matching Failures can be used for Map Making

24

29:59

How OpenStreetMap can support the UN Sustainable Development Goals (SDGs)

25

15:35

How to get College Students engaged with Mapping

26

31:06

27

19:15

Indoor Wayfinding with Visual Impairments OSM, BLE Beacons, and Nearby Explorer

28

26:16

Ingrid Burrington

29

16:13

Keeping New York road map perfect

30

21:36

Kerb your enthusiasm

31

17:22

Let’s Get Work Zones on the Map

32

18:24

Machine Learning Model Democratization with OSM Data

33

20:22

Machines and Mappers

34

06:41

Making sure your work is understood by others

35

19:56

Mapping Brands with the Name Suggestion Index

36

1:01:58

Mapping Prejudice

37

05:25

Mapping Rural Communities with the Canadian Red Cross Missing Maps Pilot

38

28:25

39

20:03

Measuring National Low Stress Bicycle Accessibility with OpenStreetMap

40

36:18

ODbL license compatibility

41

04:42

Open Street Map Across Grades & Subjects

42

35:08

OpenStreetMap for Location Data Privacy

43

58:51

OpenStreetMap US Plenary

44

23:49

OSM + Wikidata + Metadata +...

45

28:18

OSM Express - a spatial file format for the planet

46

28:19

OSM powerups: upgrade your mapping toolbox from collection to analysis

47

26:50

OSM Water-How well are Minnesota’s water features mapped

48

17:05

Fixing OpenStreetMap with Government Imagery

49

25:26

From The Inside Out Building a City Vacancy Portal

50

33:08

Simplifying OpenStreetMap data conflation with Hootenanny

51

05:10

Source=Mapillary

52

20:20

Spreading the word about the importance of using OpenStreetMapping in the classroom

53

16:59

State of the US road updates

54

30:42

Supporting Local Groups on osm.org

55

21:24

TeachOSM: The New Site

56

05:51

Teams! Teams! Teams!

57

20:24

The approach to building better bike maps in Canada

58

18:54

The Geometry of Mobility Curb Management and OSM

59

20:10

The Power of Satellite Imagery in Disaster Response

60

27:45

The serverless map stack lives

61

06:33

Tracking Down SEO Contributers

62

19:22

Getting on board: Transport fleets capturing Mapillary imagery and our initial journey into OSM

63

25:45

Travel Mapping and METAL Maps and Map Data for Fun and Computer Science Education - Jim Teresco

64

26:06

Update on Open Historical Map: Using the OSM stack live with vector tiles

65

14:03

Using OpenStreetMap for Affordable Housing

66

20:46

Visualize large data sets with Elasticsearch and Kibana

67

07:59

You don't know OpenStreetMap

68

20:40

Road conflation

69

05:28

Road Work Ahead

70

17:31

Route Planning for Blind Pedestrians using OpenStreetMap

Automatic playback

Speech

Text

Image

00:00

Computer animation

00:44

Computer animation

02:25

Computer animation

05:46

Computer animation

09:03

Computer animation

13:20

Computer animation

18:52

Computer animation

24:21

Computer animation

25:06

Computer animation

Transcript: English(auto-generated)

00:01

Well everyone, thanks for coming to hear my presentation. I'm going to be talking about the serverless map stack Quick bit of background on myself here, so my name is Alex geospatial software and application developer Also one of the founders of the go spatial organization We largely focus on developing open source geospatial software using the programming language go

00:26

I'm also one of the core developers on Tagala, which is a map box vector tile server written in go and You know, I kind of dig serverless this new technology has been coming out Well, I guess serverless the concept isn't really new at this point But still a lot of applications to be researched and you know kind of explored. So

00:45

how many people in here have used serverless technology like lambda functions and Okay, it's about half the room. So I think of a quick background on what serverless is so Serverless compute I found this good definition on the serverless stack It says serverless computing is an execution model where the cloud provider is responsible for

01:06

Executing the piece of code by dynamically allocating the resources so really when you're looking at serverless, like why do you care like what's really the big deal here and The core concept is like how do we scale easier and cheaper? Like how do we scale servers?

01:21

I put quotes because all serverless but yet there's still servers behind the scenes on it You know databases now we're talking about that talking about scaling, you know, serverless databases scaling the storage And the whole intent is to shift your focus from your DevOps Infrastructure management and really get back to focusing on your application and adding value from that perspective

01:41

And then one of the big core themes of it is to only pay for the resources that you're using So we're serverless available. I mean pretty much it's all the big providers at this point You've got Amazon's got a Google's got cloud functions Azure. I think there's a whole bunch of different ones that are out there They're popping up So, you know the concept and technology is becoming fairly prolific at this place

02:04

As far as language support all the providers a little different and which languages they do support In the case of lambda, they have a whole bunch of different languages. You can see here and I think Google cloud functions They actually have a way to execute Docker containers, I don't think it's like they call it cloud functions

02:21

But anyway, it's just trying to do like kind of serverless Docker executions as well. So Alright, so let's get into serverless vector maps When you look at building a vector tile server, you kind of have these three core components that you need You're gonna need the tile server a tile cache and a data provider

02:42

You don't necessarily need a tile cache But usually is a pretty good idea to put one in place here And you're gonna need like I said a data provider which could be a whole bunch of different things It could be you know post just the database shape files, you know different data sources here As far as like serverless vector tile implementations

03:03

There's one that I've been working on which is the lambda shim for tegala And you know, it's written in go and all the geoprocessing and coding has happened inside of tegala And it currently has support for post post just and Geopackage and then I was also looking around just trying to find other implementations that are out there and I'm actually came across this one

03:25

That's fairly new by Henry fazler and it's called cloud tile server and this one's been written in node.js using typescript and his strategy has been to take in compile SQL queries from a configuration file

03:42

And execute them against post just you're gonna need post just version 2.4 and you push all the Geoprocessing and coding down to post just using the new st as mvt, you know, it's simplification So he's got it where he's pushing all the the geoprocessing and coding down to the database server And I'll say other tile servers could implement a shim and become serverless, you know based on those different languages are there

04:05

You can usually implement some sort of shim To whatever the web route the router is and you can start making those various servers serverless Okay, I'm going to talk about two primary architectures today First one is going to be about using lambda with tegala s3 and geopackage and the next one's gonna be very similar

04:26

but we're gonna explore some of the Amazon Aurora serverless postgres Before I get into this I'm going to talk about the cold start So the cold start is one of the big things you need to understand when you're looking at serverless And it's really the time cost of instantiating a serverless resource

04:45

Everyone kind of focuses on the cold start because if your functions aren't being used or your resources aren't being used This is the initial time penalty that your user is going to encounter. So it's the cloud provider going Oh, we haven't used this for a while Let's go ahead and de thaw this function and then make it available and it starts warming it up and it's ready for you

05:06

And these things happen when a function hasn't been invoked for some time They don't always tell you like what this time period is, but say 10 minutes. It hasn't been used It'll start being down to you. You know a cold function at that part. Also if you're scaling concurrency

05:21

So if you look at a function that's being invoked lots of times, but then like there's a traffic spike So the concurrency needs to go up on it as they start spinning up additional resources to handle this increased concurrency You're going to be dealing with the cold start at that point as well It won't be on the currently warm functions just on the new functions that have been need to be de thawed and start executing

05:47

all right, so Lambda, Tegla, S3 and Geo package So here's a quick architectural diagram Kind of explaining how this would work. I want to start with looking at the

06:00

Deployment dot zip that kind of dashed line through there and what this is is this would be what you would Zip up as your lambda function Package so in this instance would be using Tegla lambda and Tegla configuration file And you'd actually have the data provider geo package packaged up into this function payload as well

06:22

Now if you look through the request context and how this is going to work you'd have a request that comes in You have API gateway And you'd use a proxy routing inside of a P API gateway to say just route all requests to the lambda function Then Tegla lambda would take over and it would start saying okay Well, you know what map are we looking at here? What zxy tile are we trying to look up here?

06:46

It's going to parse that configuration file to grab the various SQL statements that we want to execute And then it's going to form up a query and go and invoke geo package Pull the geometries out Start doing the geo processing

07:00

Apply the various encoding for NVT and then it's going to stick a copy of it in the s3 Tile cache and then send a response in now if the tile was already in the cache It would go ahead and fetch the tile from the s3 cache So doesn't have to go through and do all of the geo package, you know, basically doing all the geo processing and coding If you look at it too as a cold start time, but they like run zero to two seconds on lambda

07:23

So if your function is cold, you're gonna buy about a two zero to two second penalty It's relatively negligible overall Are you really gonna spend more time on like the geo processing and coding if it's gonna be a fresh tile that hasn't been It's not in the cache yet All right, so the pros of this I actually find this like a pretty clever I really like this architecture

07:43

I mean, it's very self-contained. There's really not much to it You put your package up you put in lambda and you really just forget about it after that I mean it just starts running Amazon's lambda is gonna start, you know, scaling horizontally. You don't really have to think a whole bunch There's also no virtual private cloud

08:01

You don't need to worry about from that perspective and the only cold start time you deal with is from lambda Con so there's a size restriction It's probably the biggest one here is that it can be 50 megabytes Zipped up your payload that you can put into lambda and Tegel is about seven point eight megabytes zipped So that leaves about you know, what 42 megabytes worth of data you can put in there zipped up and so unzipped by

08:26

250 and currently, you know tag only supports geo package with this approach, but additional data providers could be supported It's positioned if someone wanted to write one for geo JSON or shape files or whatever So you could actually, you know extend tegala to support additional formats

08:42

I think it's a really simple elegant approach and I think would actually work in a lot of situations Also depending on what your maps needs are you could launch multiple of these instances So you could have maybe one natural earth and you could have maybe one that's more focused down on a city so you can actually kind of break up your map and have multiple data providers and conflate on

09:02

the client side, too All right, next one we're gonna look at he's gonna be tying into Aurora serverless so Serverless Aurora Postgres this was officially released, you know, July 9th 2019 So it's really new a couple months old really at this point

09:23

It's comes it ships with Postgres 10.7 installed and it's also got post just 2.4 the really interesting thing about Aurora Postgres is that you can scale it down to zero and the whole thing they're saying here is that you can scale this Thing down to zero shut the database entirely off

09:40

And then you can have it come back on from a cold start and then it'll handle all scaling and tries to do like seamless Scaling up and you know the CPU data storage memory allotment everything It just tries to handle all of the scaling and everything for you When you scale down to zero and not paying for any compute time

10:00

You're only gonna be paying for a storage at that point. And so it's a pretty interesting Product from that perspective you're like, okay My database can now literally shut off when it's not being used and then I have the availability zones that are here so let's take a look at the architecture when using

10:20

Aurora serverless very similar Almost the exact same architecture as a geo package come over to the deployment zip kind of you can look at it And the main difference is that we're not packaging Aurora serverless, which makes sense here And so you said the configuration file and take a lambda I put this hard line in there for the VPC for the virtual private cloud and then we point to Aurora serverless here

10:45

so By the big glaring things to look at here your cold start times I mean they kind of get pretty big through some of these resources here and you know when you start looking at 42nd cold start time you start wondering why is this really feasible? I want to say this is an extreme scenario, too

11:03

right, so this is like You've set everything down to zero you've had the database scale to zero your lambda functions cold when you look at the VPC side of things that eight to ten seconds right there is About instantially instantiating an E&I an elastic network interface So you have to go from a lambda function if you go into a VPC

11:22

It has to set up these things called E&I is which will be the bridge over into the into the VPC which allows you to access the database so That looks pretty rough and you start going well is this something I'd really want to use You can actually make some changes to the architecture to get rid of those and the way you would pull this off is

11:43

If you got rid of the VPC and you went with using an I am roles you can restrict access to the Aurora Serverless database in that perspective now personally. I haven't really tried that yet, but that's what Amazon recommends And you know I'm not too comfortable bringing my database outside of a VPC

12:02

But depending on your data set depending on the application it may make sense And in the way you get rid of the 30 to 40 second cold start time On Aurora serverless would be well to don't scale it to zero so if you just keep it at the minimum setting They call them Aurora compute units. I think AC use and if you want to keep it on it's going to be

12:23

Around $100 a month to just have it at that base gets a minimum setting at that point But you'll get rid of that 30 to 40 second latency from the cold start time Alright, so a couple pros here Handles scaling CPU data store connection count and it can scale down to zero you know

12:41

I hammered it just to see like what would happen from the connection count perspective to see just like you know Just just throw everything at it, and I got it to scale up like 4,000 connections, and it was just Growing really well And then you just see the graph kind of scale back down as it wasn't being used all the way down to zero so It's pretty interesting to watch it handled scaling you know better than I was expecting

13:03

But then again, I didn't want to know what to expect when using this cold startup times like I said can be 45 seconds plus And most of that as I described can be mitigated if you were to Launch it outside of a VPC And make sure that Aurora serverless was always on with at least the minimum settings

13:22

all right, so another way to Look at improving our tile servers and the performance that we have here is to start bringing in a content delivery network and This is really important. I think especially if you want your maps to be like snap here you start looking at the infrastructure

13:40

We're talking here talking a lot about like cold start times And you really should start you know moving the tiles and persisting them at you know content delivery network So you'll have edge delivery forum, and I've got two different models for this so the first one this is the one that I played with the most was you put a Content delivery network in front of your API gateway, so essentially request comes in and it goes to cloud front goes through the whole

14:05

Invocation that we've been talking about today But the downside of this is that when you have a miss on cloud front you're going to have to invoke API gateway It's gonna hit lambda, and then you're gonna have that startup time, so let's say Aurora serverless was cold

14:21

And you'd scale down to zero you know Just to come through this even if you've got the tile say in Amazon s3. It's already been generated You've got this startup time which could be really expensive to get going there But you know as if you have like a long TTL you set up on the content delivery network You'll be able to start persisting tiles more there

14:42

And you can kind of avoid having to go through the the processing flow here as your tile cache starts filling up more and more But I think there's a better way to do this and actually I was talking with Henry the the guy who's been writing the other Lambda implementation I remember talking this last week, and he had a different architecture, which I thought was a really good strategy

15:04

And essentially what he does is he says hey, let's take cloud front and let's go straight to our tile cache And I was like well How is that gonna work then if you have a miss and what he'd implemented was if you have a miss or 404 on?

15:20

s3 he does a 307 redirect to your API gateway and You would only then invoke and have to go through the whole like cold start boot up time and everything you know That we've been talking about today if you've got a miss at your tile cache And you know I actually been playing with this over the last several days

15:40

And I think it's it's wonderful. I think it works really well and so This this just basically has the content delivery network writing write to s3 you go to the invocation And then you know as a tile cache starts filling up You're actually utilizing these serverless technology less and less and less and if you want to say update tiles you can just purge areas from

16:01

Your tile cache, and it'll go ahead and start invoking things again All right, so When should we use these serverless map stacks right so I think they're really good for development and testing environments And I think if you're gonna scale to zero I think it's perfectly applicable to do that in development testing environments Oftentimes the database is a very expensive part of the operation

16:23

And you're not developing 24 hours a day seven days a week, so shut them down if we're not using them You know maps with intermittent traffic spikes Maybe you're expecting huge spikes you want to warm things up, but you don't know how to scale them Or how many resources you can actually you know use all this infrastructure to scale for you Tile seeing is another interesting application of using say Aurora serverless if you wanted to hit it really hard

16:46

You could you know have it scale up for you, and then it would scale down once you're done doing you know tile seating you could also Maybe make a script which would do massive paralyzed tile seating by hitting API gateway just to fill up your tile cache

17:01

And also thing maps with like you know high zoom requirements So you may pray pre-filled your cache with zoom 0 to 12 and then after that you want to have live tile generation directly from your data store And then some additional research I haven't been able to you know finish up yet is now

17:23

Application load balancers can invoke function the lambda functions to And there's some pros and cons that could be developed in there I haven't had the time to go through them all You know API gateways got a max timeout of 29 seconds, which you know based on our cold start times here You'll be getting timeouts in certain situations

17:41

Neither the API gateway as far as I know it doesn't support HTTP to when it hits the lambda function Application load balancer might but again. I don't really know you can configure the load balancer timeout On application load balancers and people oftentimes complain about the price of API gateway so application load balancer

18:00

May be like a cheaper alternative Alright, so in the end what I recommend this for production. I'm gonna say I'm not not recommending it So you know I think you should tread lightly with it. I think that everything that I've my research shows here It's pretty promising overall. I think you just need to understand like what the context of your situation is

18:21

I wouldn't say go out and just slam it into production starting a development testing environments But overall I've been pretty happy with the way this architecture is coming together And if you can get things working that are you know to your satisfaction the ops on this stuff is really low So once it's configured you really don't have a whole bunch of subsequent work to do so But you know here's some suggestions use the content delivery network. I really think that architecture looks great

18:45

I think it works really well especially when you use it with obviously the tile cache Precede some of the you know lower zooms zero to ten I don't think scaling to zero makes a lot of sense if you're gonna try to put this into production But you know development testing environments, please do you know and if you're brave you can put the database

19:03

Outside of the VPC and use the I am roles. I again. I haven't tried this yet It is the suggested strategy, but you know oftentimes if people are writing I am roles They just seem to use wildcards all over the place, so you need to have a pretty diligent. I am approach so All right, well, that's it questions again. My name is Alex. Just open up for discussion, so

19:33

Sorry you first Yeah, oh really. Oh really what do they announce that?

19:53

Yeah, that'd be amazing So wait what do they announce that was that at their invent or reinvent her oh? really

20:01

Okay, and they're just going to handle it for you. It's going to be a note non configuration wow okay? Well, it's great news, so we can remove that line from the you know still maintain our security right so yeah

20:22

Oh really, they're going to move them into the lambda So but you'll still need a gateway like the load balancer API gateway That's going to actually reach into the VPC so you don't have to do the VPC execution role is what you're saying

20:43

Yeah, yeah, there's definitely been that like VPC like bridge people are like you know we're security where we handling it performance

21:01

And so it's been back and forth, but it sounds like there's between what you're talking about executing inside the VPC But also the E&I change You know that whole problem should be just going away So you know and that's the other thing too is you don't have to use Aurora serverless obviously you can use like just a normal RDS instance to if you're you know more comfortable with that too, but you still have the cold starts We've been talking about but sounds like they're addressing them, so that'd be great

21:27

Yeah, exactly and that's a good point. I mean there's lots of different ways to architect it This isn't the only like architecture out there. I mean even from cloud front you can put

21:43

Additional like lambda at edge triggers too, and so you can do some of like header redirects and rewriting of the various You know content your cache control headers and stuff there, too, so Yeah, there's there's plenty of different architectures that are out there It's always just a balancing act like you know kind of where you at. What do you have? What's the

22:01

performance goals of all those things Using that along with particularly as increased performance, and yeah, that's just 3.0. Yep, I'm a doing the geometry

22:23

Calculations Do you see that becoming the method I Mean I think it'll it'll depend on you know what your data set is so I mean if you can get your data into Postgres, and that's what you're working with and yeah, you know I mean if you you know are using geo packages

22:41

No, I mean that won't work right so I think it just depends on you know kind of what you're working with from your data set but Yeah, I think there's a lot of stuff actually moving into using ST as MVT You'll start looking at putting more pressure on your database that way, too Which is you know so I think where you know Aurora serverless would be great for that Is that you can scale your database if you put a lot of pressure on it from that perspective as opposed to?

23:05

Horizontally scaling say web servers or the lambda function for a geoprocessing and encoding Because you're basically putting all your pressure for encoding and processing on the database right

23:24

Nope not yet. Yeah, and actually I'm gotham's getting pretty close on The big update to the core geoprocessing to it, so we'll be producing like way less point count In the resulting geometries and actually handles some of the you're not standing rendering errors that are there, too

23:42

So I think at that point apply be a better time to kind of review it so Okay nice, and then I know that

24:02

t-rex Just recently released some of the benchmarks that they were doing to I think you were we were talking about that yeah Yeah, and there's like you know a lot of different options that are out there You know that's where I think that you know Henry's option

24:23

Too is like I wanted to put a couple of them out there for serverless vector tile implementations And so you know take a lambda is obviously doing native geoprocessing and coding And then the cloud tile server is you know going down and using it at the database so a couple different approaches you can go with here

24:47

Yep Yes, so I think this exact same architecture would work You know all the way through so I mean it just be a different tile server I haven't looked into like the raster tile servers and has anyone implemented like the lambda shims, but I mean

25:04

from an architectural perspective and implementation You know you're gonna be able to do the exact same thing Technically here, so I don't see like any reason you couldn't do with raster tiles

25:22

Yeah, so which one did you use closer to like to this one or yeah this one right here Yeah for raster tiles and which tile server are you using?

25:40

So did you so you'd use map Tyler does it map Tyler have a land implementation, or did you put the shim in yourself? Okay, yeah Exactly yeah, and you know I think Google's got like a whole docker

26:01

Side of things as well, and then I mean obviously ECS. I mean there's a whole bunch of these different architectures Actually when I was writing this presentation I I had so many slides at first and I just had to start like reducing the scope so it could actually be One thought all the way through because there's so many things that are happening in this world, so I know

26:29

I think you should look at lambda is pretty ephemeral I mean lambda it can have a level of state so you've got basically you can initialize things that are outside of the function call and you can keep like a

26:43

context inside of that so Really where that would be helpful. I think is like a database connection pool right and so the subsequent time that that Function would be invoked then you could have you know that same database that you've already built up that connects You don't have to reconnect, but I don't think you should look at it as

27:01

Really having much state inside of these things it should be pretty much ephemeral and when you're designing them so All right any other questions anyone gonna go try this I Think it's pretty fun technology, and I guess that I think it's there's a lot of interesting opportunities here

27:21

And for me the biggest one is like trying to spend less time on DevOps and really focus a lot more on the application Development, but you know with the cold start times You just have to know you know what you're gonna get hit with and kind of what you're juggling so But yeah, thank you everyone for listening to me hear me out and for the discussion