Serverless Geospatial
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/69239 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 20227 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
Service (economics)Elasticity (physics)Server (computing)Musical ensembleDifferent (Kate Ryan album)Entire functionOrder (biology)Library (computing)Virtual machineData storage deviceLevel (video gaming)Service (economics)AdditionConnectivity (graph theory)VirtualizationParallel port1 (number)Installation artAxiom of choiceQuicksortStapeldateiAnalytic setSoftwareEndliche ModelltheorieVideo gameFunctional (mathematics)Event horizonReading (process)Type theoryComputer configurationInformationGoodness of fitYouTubeData warehouseHard disk driveOperating systemSystem administratorPersonal computerPoint (geometry)Mathematical optimizationVisualization (computer graphics)MomentumLastteilungWeb pageWeb applicationElectronic data processingVariety (linguistics)INTEGRALDatabaseGateway (telecommunications)Object (grammar)Domain namePoint cloudSoftware maintenanceSet (mathematics)Data structureCubeNeuroinformatikBefehlsprozessorPower (physics)Mobile appConfiguration spaceClient (computing)Web 2.0BlogSoftware frameworkLambda calculus2 (number)Cartesian coordinate systemMultiplication signComputer hardwareMedical imagingRemote procedure callReduction of orderRaster graphicsScalabilityLatent heatDynamical systemCASE <Informatik>IterationCycle (graph theory)Overhead (computing)Operator (mathematics)Game theoryProcess (computing)System callMereologyScaling (geometry)MiniDiscAdventure gameMainframe computerComputer architecturePhysical systemInstance (computer science)Right angleElectronic mailing listExtension (kinesiology)Channel capacitySpacetimeArchaeological field surveyWindowFile formatUniqueness quantificationLaptopProduct (business)WeightUniform resource locatorTesselationSimilarity (geometry)Line (geometry)GeometryRepository (publishing)Patch (Unix)WordBuildingComputer fileIntegrated development environmentPattern languageHigh availabilityDefault (computer science)TrailRow (database)Matrix (mathematics)Tape driveData modelComputer animation
Transcript: English(auto-generated)
00:01
Hi everyone. This will be a shorter session, should help you get into the next room if there's a line. So, bonjourno. Hi, my name is Dave Bianco. I'm a senior geospatial architect at AWS. And I'd like to introduce to you some serverless technologies that exist today
00:21
and how they correlate, how they relate to geospatial capabilities. So what exactly is serverless? There's got to be a server somewhere, right? And yes, there is. There's always a server. But with serverless technology, the traditional maintenance and configuration are hidden from the system administrator.
00:44
In other words, serverless brings the power of on-demand compute and storage to a variety of applications and services. By removing hardware provisioning and patches, we open the door for automatic scaling, high availability, and pay-for-use billing.
01:04
So let's walk through those benefits. The primary gains of going serverless are lower cost. You're only paying for the compute time in which your application is running. This could be a few seconds for a microservice function, or it could be several hours,
01:22
perhaps, to meet your business hours where it's not online overnight or a long-running model. Scalability. With lightweight serverless components, it is simple to grow with increasing demand and reduce resources when demand decreases.
01:43
There's no provisioning guessing game to try and figure out how big or how much server is needed to handle expected capacity. It just adapts to the need at hand. And lastly, improved deployments with operational overhead when it's removed. This allows for quicker release cycles and faster iterations.
02:04
And because serverless technology is in the cloud, you can often tap into hundreds of built-in service integrations, which allows you more time to build your app instead of configure your app. So I was thinking about the serverless journey, and I thought it was interesting
02:21
how I mapped it to sort of how we receive music today. So in the olden days, we had mainframes and vinyl LP records, and the quality was great, but it was cumbersome to manage. And then we moved to personal computers and added mobility, as in the case of 8-track tapes.
02:41
And for those who don't know 8-track tapes, I adventure you to research that. And then compact disks arrived, and CPUs got smaller and more performant, as did hard drives. iPod allowed us to put our entire music library in our pocket, which is effectively what we're doing with containers, where the entire operating system is available.
03:08
And now we're in the modern world of streaming services. Your music is available on demand, and you don't even need to set up or configure your library. I mean, we often do anyways, but it's a tangent, but an interesting parallel.
03:29
So what compute and storage capabilities are serverless? At AWS, our serverless capabilities can be organized as compute, messaging, data storage, and virtual desktop infrastructure.
03:46
With these components in place, we can use the usual variety of geospatial capabilities, such as machine learning models, analytics and visualization, ETL, et cetera.
04:05
And to go through these, within the realm of serverless compute, we have AWS Fargate and AWS Lambda. Fargate is a compute engine to handle Docker containers and Kubernetes containers, and Lambda, which is often, it's a function as a service, and it's often mated with API Gateway for data processing, whether that's visual or ETL-based.
04:28
On the storage side, so I should note that these are not, there are more services that are serverless than are mentioned here, but these are the common ones that are utilized within the geospatial domain.
04:41
Amazon S3 is our object store, which is just ever present. You can't even turn that off ever. And Aurora and Redshift are basically data storage options. You can install Aurora with the Postgres flavor and the PostGIS extension,
05:01
and have a serverless option for that data storage. And then the same thing with Redshift. Redshift is mostly for data warehouse analytics, but you can have a serverless option for that. So we have serverless storage alongside our compute.
05:22
So in the virtual desktop space, there's AppStream and WorkSpaces, and really the only difference here is the AppStream is accessed through a web page, where WorkSpaces is accessed through a client app that you download. But what this provides is a capability to access a desktop,
05:42
a remote desktop, whether that's Linux, Windows, or Mac, that has an image that is catered by you. So the common themes here are either QGIS or Esri desktop, where you get to choose the size of the image,
06:01
so how much GPU it has, how much CPU it has, how much RAM it has. And the benefits here is sort of twofold. In both cases, it's a way, for example, to run a workshop, where you don't want to spend half an hour setting up everybody's computer to work with what you have presented.
06:20
You can have a desktop in the cloud ready to go, or in other cases, you might have a licensed software desktop product, and you have 100 employees, but you don't want to buy 100 licenses, so you can buy five licenses. And this is a common use case for many of our customers, and you share those five licenses by putting their desktop effectively in the cloud,
06:42
and then you can overpower that desktop and not have every employee have an overpowered desktop or laptop, as an example. So I want to walk through a few simple architectures.
07:02
So in this case, we're looking at dynamic Tyler. T Tyler and Tagola are common examples for this. T Tyler has many deployment options. T Tyler is a dynamic Tyler specific for raster data, and one of their deployment options employs API Gateway with AWS Lambda.
07:22
So in this use case, often you have your raster data as a COG, a cloud-optimized geotiff on S3, and you have your users which have some sort of web app where you have a map libre or similar framework that's hitting an endpoint at the API Gateway. The API Gateway is just an API endpoint, an HTTP endpoint,
07:42
and that calls to AWS Lambda, and the Lambda goes and reads the geotiff and converts it to a web-friendly format, whether that be PNG or JPEG in a web-tile-friendly way. And you launch as many Lambdas as you need, so it's scale by default,
08:00
and that's part of the power of serverless, and if no one's making any API calls at all, nothing's running. It's just waiting for the next API call. This architecture, we also see this a lot for data processing. Lambdas, it's for function as a service, so if you have a data processing step
08:22
that is less than 15 minutes, that's when you want to use Lambda. And I'm more, I'm a Python guy, so I'll talk in Python, but when you want to do data processing with the Python geospatial libraries, in this same architecture, you can have Rasterio, Fiona, Shapley, Geopandas, and friends,
08:43
all available as a Lambda function, and so your users are hitting the API endpoint, the processing happens through Lambda, it's all serverless, it's all geospatial capabilities. And as a side note, if you aren't familiar with GeoLambda,
09:01
it's a Lambda layer that allows you to bring GDAL as a prerequisite to Lambda, so it's basically when the Lambda launches, GDAL is baked in and ready, so you can import Rasterio and have all of your Rasterio and GDAL capabilities ready to go.
09:22
Another architecture is, if we look at some of our OGC products, OGC servers like MapServer, GeoServer, MapNIC. So in this case, MapServer can be deployed as a Docker container, and you can store your Docker container in Amazon's Elastic Cloud repository,
09:42
and as I mentioned, Fargate is the serverless way to manage those containers, it'll launch additional containers as needed, so if MapServer's getting hit hard, Fargate can launch 10 Docker containers, 100 Docker containers, whatever the service need might be. And then MapServer just,
10:01
it's a Docker, it's contained, so it just works as it does with its map file, talking to the storage backend, whether that's S3 or PostGIS instance, whether that's serverless or 24-7 storage. So in this case, we could have a load balancer where the users are talking to a pool of Docker containers,
10:20
and my example here is MapServer, but GeoServer and MapNIC are also good examples, talking to storage, and then feeding the geospatial data in an OGC pattern, whether that's WMS, WCS, et cetera.
10:41
So this is the example I talked earlier, where you want to standardize the desktop environment. Most common we see is for the workshop scenario, where you want to present learning, education, and the easiest way to do that
11:01
is to have everybody using the same environment, and the easiest way to do that is to give everybody the same environment and avoid the whole, well, it works on my machine kind of scenario. So as I mentioned, AppStream is the way, it's even easier because the user is accessing a web page
11:21
in order to access the virtual desktop, so there's nothing that they have to install for that. There's some differences about persistence, how you would choose one versus the other, so AppStream is a good choice for the workshop, and then as I also mentioned, we have a lot of customers that may have 100 users to access, for example, Esri desktop,
11:43
but they don't want to install Esri desktop on 100 employee laptops, and so putting it in the cloud as a virtual desktop is a solution for them, and it's also a serverless solution, so if no one's accessing that desktop, you're not paying for it. And then likewise, because the desktop is in the cloud,
12:07
you have available to you all the cloud storage and services that would normally be available to you, except you're even on a faster network because you're in the matrix of sorts.
12:24
So this is an interesting example here. This is serverless analytics. So AWS Batch is the centerpiece here. AWS Batch is sort of a heavy-hitting lambda. You can run long-running data models, so when you have functions that are more than 15 minutes,
12:44
AWS Batch aligns with either EC2, which is not necessarily serverless, but it's not always running when it's accessed, or Docker containers, and I've been on teams in my prior life that use this model.
13:00
For example, so Step Functions is sort of an Airflow type. It's an event-based system, and using that, events can happen. So a data scientist would have a model that they want to run, and they set it up with the right sized EC2 or Docker and the right processing,
13:20
so an event gets triggered that might trigger a lambda that then goes and says, okay, I'm going to grab this Docker, put it on a large or medium or extra large EC2 instance, and have Batch contain that, and then it's going to run for three days, and then when it's done running for three days, it's done running.
13:41
So this is common in the sort of data science world of doing serverless processing, and then again, S3 here is just an example storage option. So with all these geospatial servers and services now available
14:03
in a serverless capacity, we are moving ever closer to a cloud-native future. When I wrote this, I kind of thought that not everybody was familiar with what cloud-native is, but then I saw the talks here at the conference, and we're embracing cloud-native very heavily, and I love it.
14:22
Some of you may recall that there was a cloud-native geospatial community outreach event that took place earlier in the year. If you're not familiar with that, there is a YouTube available. It's a 10-hour YouTube. You'll love it. But they're broken up into 20, 30-minute talks, right?
14:41
So it's a lot of good information. And basically, what I'm talking about in the serverless realm ties into that. There's a lot of momentum towards removing the heavy lift of desktop analytics and visualization and towards a cloud-first solution, specifically incorporating these new data formats,
15:01
such as Cloud Optimized GeoTIFF, Cloud Optimized Point Cloud, and GeoParquet that was announced earlier this year. Databases aren't going away. Even when NoSQL showed up, databases aren't going away. That's still there, but with these cloud-native data formats, we're seeing a lot more native integrations with cloud geospatial.
15:26
And as a parallel to that, I think an interesting community to keep your eye on is Pangeo. Pangeo does a lot of atmospheric and climate data processing, and they work in the same realm.
15:42
They come from... Oh, shoot. What's it called? They come from the labs, the research labs, with high CPU and high compute. I'm blanking on the name.
16:02
So they employ many of the same infrastructure requirements, except they work with their own unique data sets. You might have heard of ZAR or TileDB and what's called an ND array, because the climate data is typically a cube structure, where Raster is more often two-dimensional. This cube structure is better handled for atmospheric data,
16:23
and the older data sets of HDF and NetCDF are common in this space as well. So Pangeo is sort of a parallel to what we're seeing with the cloud-native geospatial framework. So this is just a quick blog that I wanted to do a shout-out for.
16:43
This was written a couple of months ago, not by me, but it sort of showcases QGIS in the cloud, in the AppStream capabilities that I've mentioned before. It's a good read. Apologies I don't have a good way to share the URL,
17:03
but if you come say hi, then I can make sure to give it to you. But it's also searchable by title. Thanks for your time. Any questions?