We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Implementing a STAC Service with gRPC and Protocol Buffers

00:00

Formal Metadata

Title
Implementing a STAC Service with gRPC and Protocol Buffers
Alternative Title
How to host and access STAC Imagery using Google's gRPC Remote Procedure Call framework and Protobuf messages
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
At Swiftera we've built a Spatio Temporal Asset Catalog (STAC) service using the gRPC framework and protobuf messages (instead of the OpenAPI framework and JSON messages) . The Remote Procedure Call framework, gRPC, and the protobuf message format are what Google uses internally for it's micro-services (10s of billions of messages a second). Since Google open sourced gRPC 4 years ago, it has been widely adopted by companies moving massive amounts of data (Netflix, Salesforce, Spotify and others). But it isn't only about performance, it's also an suprisingly easy framework to get up and running. At FOSS4GNA we open sourced our NAIP metadata service and the IDL defining the services and messages. By Bucharest we plan to have added Landsat and Sentinel to our public gRPC service. We want to share more about what it's like to work with gRPC and the ease of development for hosting your own gRPC services.
Keywords
129
131
137
139
Thumbnail
28:17
Buffer solutionCommunications protocolWeb serviceTwitterFeedbackPoint cloudSpacetimeSoftwareData Encryption StandardEndliche ModelltheorieTwitterInformationCloud computingDigital photographyComputer animation
Web serviceSpacetimeFrequencySystems engineeringArchaeological field surveyMedical imagingBuffer solutionCommunications protocolFrequencySpacetimeData analysisSoftware engineeringArchaeological field surveyMedical imagingGeometryNear-ringComputer animation
Buffer solutionCommunications protocolWeb serviceSpacetimeMaizeSeries (mathematics)Library catalogInformationFormal languageRange (statistics)SynchronizationLevel (video gaming)Intrusion detection systemGoogolOpen setString (computer science)EmailRepresentation (politics)Binary fileTelephone number mappingMobile WebElectric generatorCodeData bufferInternational Date LineMessage passingNumberData typeMassImplementationSoftware frameworkRemote procedure callPoint cloudComponent-based software engineeringSystem programmingClient (computing)Stability theoryEstimationClient (computing)CodeMathematicsLibrary (computing)Revision controlJava appletGoogolFigurate numberFile formatLibrary catalogMessage passingGroup actionProduct (business)Different (Kate Ryan album)Software developerSoftware frameworkWorkstation <Musikinstrument>SpacetimeDigital photographyMereologyMusical ensembleGraph (mathematics)Process (computing)MultiplicationSoftwareProfil (magazine)Goodness of fitOrder (biology)Web servicePersonal computerInternetworkingOpen setNumberString (computer science)Computer fileOpen sourceBlogInformationCombinational logicPoint cloudLatent heatFormal languagePosition operatorLevel (video gaming)Right angleBinary codeServer (computing)EmailElectric generatorKey (cryptography)GeometryCompact spaceMaxima and minimaSerial portNear-ringNeuroinformatikSet (mathematics)CompilerComputer animation
String (computer science)CodeElectric generatorNumberMessage passingGoogolOpen setEmailBinary fileRepresentation (politics)Mobile WebData bufferInternational Date LineCommunications protocolBuffer solutionData typeSoftware frameworkRemote procedure callWeb servicePoint cloudComponent-based software engineeringSystem programmingFormal languageClient (computing)Stability theoryFile formatHost Identity ProtocolBlogSpacetimeConvex hullCodeWeb serviceMessage passingSerial portComputer fileGreen's functionMultiplication signGraph (mathematics)Codierung <Programmierung>File formatBitFacebookGoogolProduct (business)Computer animationXML
File formatBlogMessage passingPersonal digital assistantCodierung <Programmierung>Web serviceBuffer solutionSpacetimeFormal grammarCommunications protocolEstimationInsertion lossIntrusion detection systemMessage passingSoftware frameworkFacebookFile formatType theoryDynamical systemSerial portArithmetic meanQuicksortStandard deviationLibrary (computing)Stack (abstract data type)Line (geometry)CompilerLevel (video gaming)Web serviceOrder (biology)Field (computer science)Greatest elementClient (computing)Computer fileCodeProjective planeProduct (business)MassProfil (magazine)Formal languageComputer animation
Square numberFormal languageNeuroinformatikPoint cloudComputer animation
Core dumpSquare numberComputer networkCommunications protocolBuffer solutionSpacetimeSoftware frameworkServer (computing)Client (computing)Computer animation
SpacetimeComputer networkSquare numberCommunications protocolBuffer solutionSoftware frameworkPoint (geometry)Remote procedure callSoftwareBand matrixMessage passingMeeting/InterviewComputer animation
Square numberGeometryElectric currentWeb serviceFile formatInformationPoint cloudAzimuthGoogolQuery languageBuffer solutionCommunications protocolComputer networkSpacetimeWeightIntrusion detection systemTable (information)Flow separationShape (magazine)Computer fileDependent and independent variablesSeries (mathematics)Computing platformAngleClient (computing)Demo (music)CodeAreaBound stateEnvelope (mathematics)Letterpress printingSample (statistics)Time zoneTimestampEstimationEquals signMetrePoint (geometry)Limit (category theory)Glass floatField (computer science)Java appletScripting languageProcess (computing)Message passingMultiplication signQuery languageLevel (video gaming)MereologyConnectivity (graph theory)CuboidSampling (statistics)CompilerStructural loadTable (information)Library (computing)Different (Kate Ryan album)Client (computing)Statement (computer science)Goodness of fitDistanceMultiplicationCASE <Informatik>Data storage deviceMetadataGeometryStack (abstract data type)Electronic mailing listGeneric programmingOrder (biology)Extension (kinesiology)Web serviceShape (magazine)Point cloudTemporal logicRange (statistics)Key (cryptography)Parameter (computer programming)Shared memoryImage resolutionSlide ruleTwitterCodeSoftware repositoryResultantAzimuthOpticsMathematical optimizationGoogolFood energyOcean currentPoint (geometry)Water vaporSmith chartComputer fileLastteilungDatabaseEmailSet (mathematics)Social classComputer animation
WaveSpacetimeScripting languageClient (computing)Java appletProcess (computing)GoogolPoint cloudMereologyRevision controlMultiplicationArchaeological field surveySoftware repositoryBitProcess (computing)QuicksortWeb serviceMultiplication signStack (abstract data type)Computer animation
SpacetimeClient (computing)Java appletScripting languageProcess (computing)GoogolPoint cloudCommunications protocolBuffer solutionWeb serviceSynchronizationWeb service1 (number)Open sourceLatent heatComputer animation
Buffer solutionCommunications protocolMaxima and minimaIntrusion detection systemWeb serviceQuicksortFerry CorstenPresentation of a groupGraphical user interfaceComputer animation
Content (media)QuicksortMultiplication signBit rateStack (abstract data type)GoogolType theoryComputer animation
Point cloudMereologyWeb serviceClosed setCartesian coordinate systemNeuroinformatikSource codeGoodness of fitUltraviolet photoelectron spectroscopyStack (abstract data type)GeometryTrailComputer animation
Web serviceMetadataGeometryTrailNetwork topologyNP-hardLibrary (computing)Electronic data processingTouch typingComputer animation
GeometryLibrary (computing)Task (computing)Similarity (geometry)Performance appraisalVulnerability (computing)Client (computing)Contrast (vision)Electronic data processingPoint cloudMessage passingFile formatMultiplication signSeries (mathematics)Spherical capGoogolCommunications protocolComputer animation
SequenceGoodness of fitMessage passingRepresentation (politics)Spherical capBinary fileSingle-precision floating-point formatGame theoryOnline helpPresentation of a groupLie groupComputer animation
Object (grammar)Goodness of fitPresentation of a group
Transcript: English(auto-generated)
Thank you, so I have way too many slides, but we're just gonna do it. It's gonna be great so Just quickly about me
These are kind of my credentials for why I should even be talking about that I think the most important credential is the photo of me with a really full neckbeard Because if you're gonna be doing cloud infrastructure Neckbeards they count for a lot Fog modeler is my my Twitter handle, and I'm gonna be releasing more information on Twitter in the coming months
We're gonna have some releases and whatnot I work for a company called near space labs. We're a high altitude high frequency imaging company and Yeah, we've just completed our first really large survey in Austin, Texas We're hiring if anyone is interested in software engineering or data analysis
So yeah now onto the talk, but it but first I'd like to talk about my my beef with the geo JSON But actually my beef with geo JSON doesn't matter because it's you know it's it's part of the GIS world
I'd also like to point out. It's really amazing that on the internet you can get a Stock photo of a meat tenderizer holding an Erlenmeyer flask while wearing glasses and smiling I Didn't I didn't know that I needed that So what is stack? I don't know if a lot of you have been in some of the other stack
talks but this is a joint effort between a lot of different groups to Create a standard for for sharing spatio-temporal asset catalogs, so basically What I think of it is is like
You know if you work for a planet and you work for max are and for near space labs each one of us making our own API for accessing data our own client libraries It just doesn't make any sense like this kind of reproduction of work And it's also a pain in the ass for all the customers because then they have to go through and use these different client libraries
So if there's one client library for everyone to search different imagery data sets It's just gonna make everyone a lot happier But so stack is It's implemented with JSON and
Open API and some some linting on the JSON so that you're kind of your definition for your service is Insured to be up to the specification that the group has put forward a Different version would be using protobuf and and gRPC so first like what is what is protobuf?
So this is the the lingua franca of Google since 2003 It's a binary message format. You can kind of think of it as like a more compact XML or or JSON it's it's strongly typed. I don't know if you've ever heard this story of this like ten years ago Steve
Yegge, I can't I don't know what his name is, but he wrote this internal blog post at Google Criticizing Google saying we're not doing a good job of consuming our own Api's he was saying like AWS is doing this terrific job of forcing everyone that works at AWS
To be consuming the API's and I think part of the reason that Google couldn't do that was because all of their services Use protobufs they don't use JSON they don't use XML, and it was just they they figured it was a difficult ramp-up for people to learn how to use this another thing about this protobufs is
That it's it's a nice way to actually evolve your message format, so instead of using JSON or XML you've got this protobuf and as you Extend it Old API's that we're using a previous version of this message say this person message
That's defined here would still continue to function even though you're adding new Key value pairs on to the the format definition and you can even do things like Change the names, so you see here. There's string email you could You could change string to Correo electronico, and then it would still work
It's just this number that's to the right that actually defines The position of that data and where it's stored in the protobuf message So what is gRPC? so gRPC is something you can think of as like this is a framework similar to to a
combination of open API and rest It's been open source by by Google in 2016 It's now part of the cloud native computing foundation, so it's it is supported it is also something that you can actually
Yeah, it's it's well supported. You can you can rely on this and it is still an active development So why why even look into using gRPC and protobufs? I yes, so The serialization of protobufs is yeah that it makes the protobufs the messages themselves smaller than JSON
so when you care about You know network bound processes You want to put less information out onto the network and so smaller messages means that you're going to be taking up less bandwidth Part of the other performance credentials is like protobuf and gRPC is what Google uses
Internally for all of its services and so when you can say that they're using Ten billion you know of these RPC requests per second You can think like all right. This is this is pretty well tested
So like why why we chose it internally like the streaming is really easy you can have bi-directional streaming you can have like server-side streaming and It also allows us to quickly generate client libraries in multiple languages So if somebody wants to access our data using C++ or go or Java
Like we already have a client library that we can compile for them Just using the the proto definition file, which is this this Message on the right hand side I'm able to use something called a proto compiler in order to
create the serialization and deserialization code for these messages and Using those proto files. I can also define my my services So these are a few graphs that Uber put together when they were looking for a new message format because JSON messages were getting to be a little bit too large
You can't really read this But you can see the bars and and green things are what's our JSON messages and the red are the protobuf messages and so what this is is the total encoding and decoding time and You can see that
Protobuf does pretty well something called thrift is is actually what they found to be the fastest thrift was a Few engineers from Google leaving going to Facebook and saying like hey, we should have our own RPC framework and and message format You can also see that the message size of
protobuf it's Significantly smaller than the the JSON messages. It's interesting if you look This is you know, and and this is particular to to Uber's data It can be it can be different for every type of message format, but all the way down at the bottom here Pickle is actually larger than JSON
So at I don't know how I don't know what the format of their data was that pickling it with Python would make it worse than that it actually would be in JSON, but Just something to note So this this pink line on the lower left hand corner is sort of where over said like alright
These are the the best performing message formats for for both encoding and size But uber didn't end up using protobuf because it's really strict So, you know you have this proto file definition and it has all of these fields in it And it's not something as dynamic as JSON is like with JSON
We can just kind of create this map and stuff whatever we want in it It does sort of require though that the person sending the message and the person receiving the message both know What they're looking for not always though you can just kind of search through a message and and look for things
so Actually this this strictness of protobuf is to our advantage when we're defining a standard for how we want to transmit data Because right now what we do with what they've they've done with the stack project. It's it's really impressive They've they've also created a linter in order to make sure that whenever people
Define their stack service it is in accordance with what the definition of stack means with these proto files we can just have a proto file and and Yeah, you don't have to have a linter. It's it's very it's strict
since these are some of I told you how there's this proto compiler which actually creates both the Client libraries for serialization and deserialization and also the client stub for communicating with the gRPC service So these are all the languages that are supported by the cloud native computing foundation
But there are other languages that are supported by the community. I know it isn't really a language, but it's it's on there um So and who's using gRPC a lot of the people that are using gRPC are pretty big companies and so
Yeah, I think that some people are afraid of Adopting a new framework what I found personally was it's easier for me to get up and running With a gRPC client and server just it was easier for me than using open API
the proto definition was clear for me to understand the Documentation that on gRPC. Io was really good so But I think the reason that you're seeing a lot of these bigger companies move internally towards using these rPC Frameworks is that they're getting to a point where their budget they care about
The amount of network bandwidth that they're consuming that that it actually Yeah, someone can say like we can save a considerable amount of money by making our messages smaller Yeah, so one of the other things that I didn't mention about gRPC
But I should is that it it uses HTTP 2 natively I think more of us are getting familiar with the idea that HTTP 2 is is the way forward So like if you're like say you're compiling gdol yourself now you actually want to compile curl with this
ng-http 2 library because that will improve all of your kind of your cloud optimized geotiff pulling and all of that stuff, so gRPC is built on that like natively you you actually can't use HTTP 1.2 1.1 which
There's some load balancing issues with that, but we can talk about that in questions So our current stack service so recently we wanted to kind of make some some stack Datasets available a personally as a company. We're using nape because we're actually using nape for part of our pipeline
That's hosted on AWS, and then there's also a landsat on AWS that a lot of people are using and We haven't seen so many people Do a stack service with landsat on on Google cloud, and we thought oh that'd be cool So we've also on our stack service where we're serving the Google cloud data
Which in a way, it's kind of cool that they've got landsat data from 1972 to the present So and in all of these cases you're able to search by you know spatial extent temporal extent by like the ground sampling distance or cloud cover and Yeah, what other people have talked about Matt talked about earlier
It's really cool that I can say like I want to you know Get something like cloud cover make a request and get back results from multiple Datasets according to that like I'm not just restricted to searching for cloud cover on landsat like that same
Variable works on the data that we provide or the data that nape provides Of course nape is actually Cloud free so so postgres, so we're using postgres. This is just I'm more sharing this because I'm not terribly experienced with postgres And I'm trying to figure out like what is the right thing to do?
So right now what we have is like we have a stack table where we say all right We have geometry in there We have the time component and then all other tables refer to this stack ID As a foreign key so you can see down below
There's this electro optical table And that's the one that has all of these details where if it's electro optical data And it has something like cloud cover azimuth then you can search that data according to those parameters In order to get the landsat data in we actually Google cloud makes a they have a I think it's a big query table, so we just
Actually, there's a big query table, and then there's a CSV, so we just took their CSV and use that to Put it into our postgres database, so we've also we've created the nape at assets So the whole idea like again like maybe some of you haven't heard about stack like this there are these
Qualities of stack data that you want to search by and this metadata is interesting to you say like the cloud cover or this Azimuth, but you also want to know what are the assets associated like what are the lists of like geotiffs or other metadata? That's stored in cloud storage that you want to access
and these assets for us were built using the CSV and from these these shape files from Esri actually I think organized those the shape files So Let's see queries Yeah, these are kind of the different levels of queries of what you can you can query by so at the stack level
geometry processed date updated date observed date there are these these in Typical stack right now I think it's mostly just like there's a date time that you can query by we felt like we wanted an observed time a
Processed time and then also an updated time like when was the last time that the metadata was updated We've also we've open-sourced a Python client, which is just basically like a generic Python client for GRPC stack so this Python client will not work with your your JSON stack
But it will work with the service that we have up and running So if you actually are interested in querying and playing around with GRPC There's a lot of good documentation there, and yeah, you should be able to check it out So these are kind of some sample queries of what it looks like to use the
The generated protobuf code so everything in here This is this is Python everything in here that has kind of underscore PB 2 at the top here and the from statements That's generated code that was created by the the proto compiler
So yeah, we've got this is a like a bounding box query we've got a query that is more like a time range query and Then some more complicated queries and again all of this is documented in our
In the repo that I shared earlier in the slides, and I guess I'll tweet out these slides I don't know how how this is getting shared at this conference so upcoming work So like I said before like we've just recently collected a lot of data and our data resolution yeah, I mean it's probably like 30 centimeter data and
Yeah, we collected multiple surveys of Austin, Texas as sort of like our first run So we'd like to make a lot of that publicly available so that people can start using that so that's like I've been working full-time on like stack stuff and then also data processing
So I'm gonna step back from stack a little bit although all all continue to help with any kind of issues that come up with the repos There's a few other things in here like I'm gonna write some some medium articles Maybe on gRPC and protobuf if people in the community are interested in learning about these tools and using them more I'd love to share my experience, so if there's interest please let me know
Yeah, and we're gonna continue the process of getting our gRPC protobuf version of stack actually accepted as a part of the stack specification And I'd also like I yeah, I'd like to get sentinel 2 in there. I'd like to get next rad data in
and I've so I've written a service a stack service and go and Again like if there's interest in this I'd I'd love for people to let me know because I would I would you know I've already got the okay to open source it and But it takes effort to open source it and if no one's gonna look at it, then like I would rather not
Yeah, so I wonder if this will actually work How does this work? It won't work if it goes to Chrome. I bet which is sort of oh no it will so this was from a
recent flight in Texas and Oh, you can't see it. What a bummer. I can see it. It's beautiful Yeah, I'm not really sure how that works Exit the presentation All right
So I guess I just close that and then well At any rate we can maybe maybe you can help me show it and then I can take questions
So if there are any questions now's the time I feel like I wouldn't really kind of like there's a lot of content there the protobufs a gRPC stack It's it's a lot so But I really I feel like I'm sort of like I'm working for Google like I want people to use protobuf in gRPC
But by the way protobuf in gRPC. They're like part of the cloud native Computing foundation, so you don't have to worry about it being in like a closed source solution anymore Thanks for that was a really good talk. I'm interested to know how you are consuming these gRPC services that you're publishing
So what are they used in what apps applications are they used in and how what do they do? So internally I'm using Like we're using gRPC and stack right now for just basically keeping track of all of our imagery Metadata we also have a gRPC geometry service
that we're using because my Because the stack service is written in go and there's not really like like the one go Geometry library, we've got a geometry library written in Java. That's a gRPC service, and then we also have a
Our data processor is is connecting to a lot of these different services, so we're using them a lot in-house And then I guess all of us if you're using Google cloud you're using like gRPC in a way because even though Yeah, a lot of those client libraries are using gRPC under the hood. Oh
Cool, yeah, so it's just uh it's just pretty I mean I really like this like being able to see something like this and Yeah, but are there any other questions? I? think I can just
He has one okay Thanks So you brought up the issue of thrift, and I guess it made me curious if you could compare and contrast strengths and weaknesses of thrift for similar tasks So I actually can't I didn't I think the the
Documentation for gRPC and protobuf was good enough that I just went all right I'm doing this like I'm not gonna I don't have enough time to actually do the the kind of performance evaluation But from what I've heard it's it's really great and actually as far as like a message encoding format There's something called cap and proto which is like a play on I think
captain crunch the cereal But so this this gentleman who wrote this this protos back at Google left and has written The the binary format to beat all binary formats, but it's just not very well supported, but you can actually use thrift messages in gRPC if you would prefer to which is also a cool thing you could probably even use Json if
You wanted to subject yourself to that Anything else? Thanks representation and
Well you mentioned about Json and they have a lint I'm just thinking that they could have used the Json schema which basically can be used to Make like a XML schema So to be used like an XML schema for XML is the same thing for Json schema Other than that it's also worth mentioning that for gRPC. It's good
Because basically you can serialize and deserialize entire Let's say objects in it and no matter of course good doesn't have object path anyway So the thing is that in Json that it's kind of difficult because you might have lots of problems with that because serializing deserializing in Json gives you faulty objects like saying like that because
They are not checked and yeah, thanks Thank you for great presentation David and