Implementing a STAC Service with gRPC and Protocol Buffers
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43568 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Bucharest 201916 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
Buffer solutionCommunications protocolWeb serviceTwitterFeedbackPoint cloudSpacetimeSoftwareData Encryption StandardEndliche ModelltheorieTwitterInformationCloud computingDigital photographyComputer animation
00:46
Web serviceSpacetimeFrequencySystems engineeringArchaeological field surveyMedical imagingBuffer solutionCommunications protocolFrequencySpacetimeData analysisSoftware engineeringArchaeological field surveyMedical imagingGeometryNear-ringComputer animation
01:10
Buffer solutionCommunications protocolWeb serviceSpacetimeMaizeSeries (mathematics)Library catalogInformationFormal languageRange (statistics)SynchronizationLevel (video gaming)Intrusion detection systemGoogolOpen setString (computer science)EmailRepresentation (politics)Binary fileTelephone number mappingMobile WebElectric generatorCodeData bufferInternational Date LineMessage passingNumberData typeMassImplementationSoftware frameworkRemote procedure callPoint cloudComponent-based software engineeringSystem programmingClient (computing)Stability theoryEstimationClient (computing)CodeMathematicsLibrary (computing)Revision controlJava appletGoogolFigurate numberFile formatLibrary catalogMessage passingGroup actionProduct (business)Different (Kate Ryan album)Software developerSoftware frameworkWorkstation <Musikinstrument>SpacetimeDigital photographyMereologyMusical ensembleGraph (mathematics)Process (computing)MultiplicationSoftwareProfil (magazine)Goodness of fitOrder (biology)Web servicePersonal computerInternetworkingOpen setNumberString (computer science)Computer fileOpen sourceBlogInformationCombinational logicPoint cloudLatent heatFormal languagePosition operatorLevel (video gaming)Right angleBinary codeServer (computing)EmailElectric generatorKey (cryptography)GeometryCompact spaceMaxima and minimaSerial portNear-ringNeuroinformatikSet (mathematics)CompilerComputer animation
06:59
String (computer science)CodeElectric generatorNumberMessage passingGoogolOpen setEmailBinary fileRepresentation (politics)Mobile WebData bufferInternational Date LineCommunications protocolBuffer solutionData typeSoftware frameworkRemote procedure callWeb servicePoint cloudComponent-based software engineeringSystem programmingFormal languageClient (computing)Stability theoryFile formatHost Identity ProtocolBlogSpacetimeConvex hullCodeWeb serviceMessage passingSerial portComputer fileGreen's functionMultiplication signGraph (mathematics)Codierung <Programmierung>File formatBitFacebookGoogolProduct (business)Computer animationXML
07:46
File formatBlogMessage passingPersonal digital assistantCodierung <Programmierung>Web serviceBuffer solutionSpacetimeFormal grammarCommunications protocolEstimationInsertion lossIntrusion detection systemMessage passingSoftware frameworkFacebookFile formatType theoryDynamical systemSerial portArithmetic meanQuicksortStandard deviationLibrary (computing)Stack (abstract data type)Line (geometry)CompilerLevel (video gaming)Web serviceOrder (biology)Field (computer science)Greatest elementClient (computing)Computer fileCodeProjective planeProduct (business)MassProfil (magazine)Formal languageComputer animation
10:10
Square numberFormal languageNeuroinformatikPoint cloudComputer animation
10:36
Core dumpSquare numberComputer networkCommunications protocolBuffer solutionSpacetimeSoftware frameworkServer (computing)Client (computing)Computer animation
10:58
SpacetimeComputer networkSquare numberCommunications protocolBuffer solutionSoftware frameworkPoint (geometry)Remote procedure callSoftwareBand matrixMessage passingMeeting/InterviewComputer animation
11:25
Square numberGeometryElectric currentWeb serviceFile formatInformationPoint cloudAzimuthGoogolQuery languageBuffer solutionCommunications protocolComputer networkSpacetimeWeightIntrusion detection systemTable (information)Flow separationShape (magazine)Computer fileDependent and independent variablesSeries (mathematics)Computing platformAngleClient (computing)Demo (music)CodeAreaBound stateEnvelope (mathematics)Letterpress printingSample (statistics)Time zoneTimestampEstimationEquals signMetrePoint (geometry)Limit (category theory)Glass floatField (computer science)Java appletScripting languageProcess (computing)Message passingMultiplication signQuery languageLevel (video gaming)MereologyConnectivity (graph theory)CuboidSampling (statistics)CompilerStructural loadTable (information)Library (computing)Different (Kate Ryan album)Client (computing)Statement (computer science)Goodness of fitDistanceMultiplicationCASE <Informatik>Data storage deviceMetadataGeometryStack (abstract data type)Electronic mailing listGeneric programmingOrder (biology)Extension (kinesiology)Web serviceShape (magazine)Point cloudTemporal logicRange (statistics)Key (cryptography)Parameter (computer programming)Shared memoryImage resolutionSlide ruleTwitterCodeSoftware repositoryResultantAzimuthOpticsMathematical optimizationGoogolFood energyOcean currentPoint (geometry)Water vaporSmith chartComputer fileLastteilungDatabaseEmailSet (mathematics)Social classComputer animation
17:34
WaveSpacetimeScripting languageClient (computing)Java appletProcess (computing)GoogolPoint cloudMereologyRevision controlMultiplicationArchaeological field surveySoftware repositoryBitProcess (computing)QuicksortWeb serviceMultiplication signStack (abstract data type)Computer animation
18:31
SpacetimeClient (computing)Java appletScripting languageProcess (computing)GoogolPoint cloudCommunications protocolBuffer solutionWeb serviceSynchronizationWeb service1 (number)Open sourceLatent heatComputer animation
19:06
Buffer solutionCommunications protocolMaxima and minimaIntrusion detection systemWeb serviceQuicksortFerry CorstenPresentation of a groupGraphical user interfaceComputer animation
19:46
Content (media)QuicksortMultiplication signBit rateStack (abstract data type)GoogolType theoryComputer animation
20:21
Point cloudMereologyWeb serviceClosed setCartesian coordinate systemNeuroinformatikSource codeGoodness of fitUltraviolet photoelectron spectroscopyStack (abstract data type)GeometryTrailComputer animation
20:51
Web serviceMetadataGeometryTrailNetwork topologyNP-hardLibrary (computing)Electronic data processingTouch typingComputer animation
21:10
GeometryLibrary (computing)Task (computing)Similarity (geometry)Performance appraisalVulnerability (computing)Client (computing)Contrast (vision)Electronic data processingPoint cloudMessage passingFile formatMultiplication signSeries (mathematics)Spherical capGoogolCommunications protocolComputer animation
22:35
SequenceGoodness of fitMessage passingRepresentation (politics)Spherical capBinary fileSingle-precision floating-point formatGame theoryOnline helpPresentation of a groupLie groupComputer animation
23:46
Object (grammar)Goodness of fitPresentation of a group
Transcript: English(auto-generated)
00:07
Thank you, so I have way too many slides, but we're just gonna do it. It's gonna be great so Just quickly about me
00:20
These are kind of my credentials for why I should even be talking about that I think the most important credential is the photo of me with a really full neckbeard Because if you're gonna be doing cloud infrastructure Neckbeards they count for a lot Fog modeler is my my Twitter handle, and I'm gonna be releasing more information on Twitter in the coming months
00:44
We're gonna have some releases and whatnot I work for a company called near space labs. We're a high altitude high frequency imaging company and Yeah, we've just completed our first really large survey in Austin, Texas We're hiring if anyone is interested in software engineering or data analysis
01:06
So yeah now onto the talk, but it but first I'd like to talk about my my beef with the geo JSON But actually my beef with geo JSON doesn't matter because it's you know it's it's part of the GIS world
01:23
I'd also like to point out. It's really amazing that on the internet you can get a Stock photo of a meat tenderizer holding an Erlenmeyer flask while wearing glasses and smiling I Didn't I didn't know that I needed that So what is stack? I don't know if a lot of you have been in some of the other stack
01:44
talks but this is a joint effort between a lot of different groups to Create a standard for for sharing spatio-temporal asset catalogs, so basically What I think of it is is like
02:00
You know if you work for a planet and you work for max are and for near space labs each one of us making our own API for accessing data our own client libraries It just doesn't make any sense like this kind of reproduction of work And it's also a pain in the ass for all the customers because then they have to go through and use these different client libraries
02:22
So if there's one client library for everyone to search different imagery data sets It's just gonna make everyone a lot happier But so stack is It's implemented with JSON and
02:41
Open API and some some linting on the JSON so that you're kind of your definition for your service is Insured to be up to the specification that the group has put forward a Different version would be using protobuf and and gRPC so first like what is what is protobuf?
03:03
So this is the the lingua franca of Google since 2003 It's a binary message format. You can kind of think of it as like a more compact XML or or JSON it's it's strongly typed. I don't know if you've ever heard this story of this like ten years ago Steve
03:22
Yegge, I can't I don't know what his name is, but he wrote this internal blog post at Google Criticizing Google saying we're not doing a good job of consuming our own Api's he was saying like AWS is doing this terrific job of forcing everyone that works at AWS
03:41
To be consuming the API's and I think part of the reason that Google couldn't do that was because all of their services Use protobufs they don't use JSON they don't use XML, and it was just they they figured it was a difficult ramp-up for people to learn how to use this another thing about this protobufs is
04:02
That it's it's a nice way to actually evolve your message format, so instead of using JSON or XML you've got this protobuf and as you Extend it Old API's that we're using a previous version of this message say this person message
04:22
That's defined here would still continue to function even though you're adding new Key value pairs on to the the format definition and you can even do things like Change the names, so you see here. There's string email you could You could change string to Correo electronico, and then it would still work
04:46
It's just this number that's to the right that actually defines The position of that data and where it's stored in the protobuf message So what is gRPC? so gRPC is something you can think of as like this is a framework similar to to a
05:04
combination of open API and rest It's been open source by by Google in 2016 It's now part of the cloud native computing foundation, so it's it is supported it is also something that you can actually
05:22
Yeah, it's it's well supported. You can you can rely on this and it is still an active development So why why even look into using gRPC and protobufs? I yes, so The serialization of protobufs is yeah that it makes the protobufs the messages themselves smaller than JSON
05:45
so when you care about You know network bound processes You want to put less information out onto the network and so smaller messages means that you're going to be taking up less bandwidth Part of the other performance credentials is like protobuf and gRPC is what Google uses
06:06
Internally for all of its services and so when you can say that they're using Ten billion you know of these RPC requests per second You can think like all right. This is this is pretty well tested
06:22
So like why why we chose it internally like the streaming is really easy you can have bi-directional streaming you can have like server-side streaming and It also allows us to quickly generate client libraries in multiple languages So if somebody wants to access our data using C++ or go or Java
06:43
Like we already have a client library that we can compile for them Just using the the proto definition file, which is this this Message on the right hand side I'm able to use something called a proto compiler in order to
07:01
create the serialization and deserialization code for these messages and Using those proto files. I can also define my my services So these are a few graphs that Uber put together when they were looking for a new message format because JSON messages were getting to be a little bit too large
07:23
You can't really read this But you can see the bars and and green things are what's our JSON messages and the red are the protobuf messages and so what this is is the total encoding and decoding time and You can see that
07:40
Protobuf does pretty well something called thrift is is actually what they found to be the fastest thrift was a Few engineers from Google leaving going to Facebook and saying like hey, we should have our own RPC framework and and message format You can also see that the message size of
08:03
protobuf it's Significantly smaller than the the JSON messages. It's interesting if you look This is you know, and and this is particular to to Uber's data It can be it can be different for every type of message format, but all the way down at the bottom here Pickle is actually larger than JSON
08:21
So at I don't know how I don't know what the format of their data was that pickling it with Python would make it worse than that it actually would be in JSON, but Just something to note So this this pink line on the lower left hand corner is sort of where over said like alright
08:41
These are the the best performing message formats for for both encoding and size But uber didn't end up using protobuf because it's really strict So, you know you have this proto file definition and it has all of these fields in it And it's not something as dynamic as JSON is like with JSON
09:03
We can just kind of create this map and stuff whatever we want in it It does sort of require though that the person sending the message and the person receiving the message both know What they're looking for not always though you can just kind of search through a message and and look for things
09:21
so Actually this this strictness of protobuf is to our advantage when we're defining a standard for how we want to transmit data Because right now what we do with what they've they've done with the stack project. It's it's really impressive They've they've also created a linter in order to make sure that whenever people
09:45
Define their stack service it is in accordance with what the definition of stack means with these proto files we can just have a proto file and and Yeah, you don't have to have a linter. It's it's very it's strict
10:02
since these are some of I told you how there's this proto compiler which actually creates both the Client libraries for serialization and deserialization and also the client stub for communicating with the gRPC service So these are all the languages that are supported by the cloud native computing foundation
10:24
But there are other languages that are supported by the community. I know it isn't really a language, but it's it's on there um So and who's using gRPC a lot of the people that are using gRPC are pretty big companies and so
10:42
Yeah, I think that some people are afraid of Adopting a new framework what I found personally was it's easier for me to get up and running With a gRPC client and server just it was easier for me than using open API
11:01
the proto definition was clear for me to understand the Documentation that on gRPC. Io was really good so But I think the reason that you're seeing a lot of these bigger companies move internally towards using these rPC Frameworks is that they're getting to a point where their budget they care about
11:23
The amount of network bandwidth that they're consuming that that it actually Yeah, someone can say like we can save a considerable amount of money by making our messages smaller Yeah, so one of the other things that I didn't mention about gRPC
11:41
But I should is that it it uses HTTP 2 natively I think more of us are getting familiar with the idea that HTTP 2 is is the way forward So like if you're like say you're compiling gdol yourself now you actually want to compile curl with this
12:00
ng-http 2 library because that will improve all of your kind of your cloud optimized geotiff pulling and all of that stuff, so gRPC is built on that like natively you you actually can't use HTTP 1.2 1.1 which
12:21
There's some load balancing issues with that, but we can talk about that in questions So our current stack service so recently we wanted to kind of make some some stack Datasets available a personally as a company. We're using nape because we're actually using nape for part of our pipeline
12:41
That's hosted on AWS, and then there's also a landsat on AWS that a lot of people are using and We haven't seen so many people Do a stack service with landsat on on Google cloud, and we thought oh that'd be cool So we've also on our stack service where we're serving the Google cloud data
13:01
Which in a way, it's kind of cool that they've got landsat data from 1972 to the present So and in all of these cases you're able to search by you know spatial extent temporal extent by like the ground sampling distance or cloud cover and Yeah, what other people have talked about Matt talked about earlier
13:22
It's really cool that I can say like I want to you know Get something like cloud cover make a request and get back results from multiple Datasets according to that like I'm not just restricted to searching for cloud cover on landsat like that same
13:40
Variable works on the data that we provide or the data that nape provides Of course nape is actually Cloud free so so postgres, so we're using postgres. This is just I'm more sharing this because I'm not terribly experienced with postgres And I'm trying to figure out like what is the right thing to do?
14:03
So right now what we have is like we have a stack table where we say all right We have geometry in there We have the time component and then all other tables refer to this stack ID As a foreign key so you can see down below
14:21
There's this electro optical table And that's the one that has all of these details where if it's electro optical data And it has something like cloud cover azimuth then you can search that data according to those parameters In order to get the landsat data in we actually Google cloud makes a they have a I think it's a big query table, so we just
14:44
Actually, there's a big query table, and then there's a CSV, so we just took their CSV and use that to Put it into our postgres database, so we've also we've created the nape at assets So the whole idea like again like maybe some of you haven't heard about stack like this there are these
15:04
Qualities of stack data that you want to search by and this metadata is interesting to you say like the cloud cover or this Azimuth, but you also want to know what are the assets associated like what are the lists of like geotiffs or other metadata? That's stored in cloud storage that you want to access
15:23
and these assets for us were built using the CSV and from these these shape files from Esri actually I think organized those the shape files So Let's see queries Yeah, these are kind of the different levels of queries of what you can you can query by so at the stack level
15:46
geometry processed date updated date observed date there are these these in Typical stack right now I think it's mostly just like there's a date time that you can query by we felt like we wanted an observed time a
16:01
Processed time and then also an updated time like when was the last time that the metadata was updated We've also we've open-sourced a Python client, which is just basically like a generic Python client for GRPC stack so this Python client will not work with your your JSON stack
16:23
But it will work with the service that we have up and running So if you actually are interested in querying and playing around with GRPC There's a lot of good documentation there, and yeah, you should be able to check it out So these are kind of some sample queries of what it looks like to use the
16:45
The generated protobuf code so everything in here This is this is Python everything in here that has kind of underscore PB 2 at the top here and the from statements That's generated code that was created by the the proto compiler
17:05
So yeah, we've got this is a like a bounding box query we've got a query that is more like a time range query and Then some more complicated queries and again all of this is documented in our
17:21
In the repo that I shared earlier in the slides, and I guess I'll tweet out these slides I don't know how how this is getting shared at this conference so upcoming work So like I said before like we've just recently collected a lot of data and our data resolution yeah, I mean it's probably like 30 centimeter data and
17:44
Yeah, we collected multiple surveys of Austin, Texas as sort of like our first run So we'd like to make a lot of that publicly available so that people can start using that so that's like I've been working full-time on like stack stuff and then also data processing
18:00
So I'm gonna step back from stack a little bit although all all continue to help with any kind of issues that come up with the repos There's a few other things in here like I'm gonna write some some medium articles Maybe on gRPC and protobuf if people in the community are interested in learning about these tools and using them more I'd love to share my experience, so if there's interest please let me know
18:26
Yeah, and we're gonna continue the process of getting our gRPC protobuf version of stack actually accepted as a part of the stack specification And I'd also like I yeah, I'd like to get sentinel 2 in there. I'd like to get next rad data in
18:42
and I've so I've written a service a stack service and go and Again like if there's interest in this I'd I'd love for people to let me know because I would I would you know I've already got the okay to open source it and But it takes effort to open source it and if no one's gonna look at it, then like I would rather not
19:05
Yeah, so I wonder if this will actually work How does this work? It won't work if it goes to Chrome. I bet which is sort of oh no it will so this was from a
19:21
recent flight in Texas and Oh, you can't see it. What a bummer. I can see it. It's beautiful Yeah, I'm not really sure how that works Exit the presentation All right
19:42
So I guess I just close that and then well At any rate we can maybe maybe you can help me show it and then I can take questions
20:02
So if there are any questions now's the time I feel like I wouldn't really kind of like there's a lot of content there the protobufs a gRPC stack It's it's a lot so But I really I feel like I'm sort of like I'm working for Google like I want people to use protobuf in gRPC
20:22
But by the way protobuf in gRPC. They're like part of the cloud native Computing foundation, so you don't have to worry about it being in like a closed source solution anymore Thanks for that was a really good talk. I'm interested to know how you are consuming these gRPC services that you're publishing
20:43
So what are they used in what apps applications are they used in and how what do they do? So internally I'm using Like we're using gRPC and stack right now for just basically keeping track of all of our imagery Metadata we also have a gRPC geometry service
21:04
that we're using because my Because the stack service is written in go and there's not really like like the one go Geometry library, we've got a geometry library written in Java. That's a gRPC service, and then we also have a
21:21
Our data processor is is connecting to a lot of these different services, so we're using them a lot in-house And then I guess all of us if you're using Google cloud you're using like gRPC in a way because even though Yeah, a lot of those client libraries are using gRPC under the hood. Oh
21:44
Cool, yeah, so it's just uh it's just pretty I mean I really like this like being able to see something like this and Yeah, but are there any other questions? I? think I can just
22:03
He has one okay Thanks So you brought up the issue of thrift, and I guess it made me curious if you could compare and contrast strengths and weaknesses of thrift for similar tasks So I actually can't I didn't I think the the
22:21
Documentation for gRPC and protobuf was good enough that I just went all right I'm doing this like I'm not gonna I don't have enough time to actually do the the kind of performance evaluation But from what I've heard it's it's really great and actually as far as like a message encoding format There's something called cap and proto which is like a play on I think
22:44
captain crunch the cereal But so this this gentleman who wrote this this protos back at Google left and has written The the binary format to beat all binary formats, but it's just not very well supported, but you can actually use thrift messages in gRPC if you would prefer to which is also a cool thing you could probably even use Json if
23:08
You wanted to subject yourself to that Anything else? Thanks representation and
23:20
Well you mentioned about Json and they have a lint I'm just thinking that they could have used the Json schema which basically can be used to Make like a XML schema So to be used like an XML schema for XML is the same thing for Json schema Other than that it's also worth mentioning that for gRPC. It's good
23:43
Because basically you can serialize and deserialize entire Let's say objects in it and no matter of course good doesn't have object path anyway So the thing is that in Json that it's kind of difficult because you might have lots of problems with that because serializing deserializing in Json gives you faulty objects like saying like that because
24:06
They are not checked and yeah, thanks Thank you for great presentation David and