We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Maps in Motion: Introduction to the STAC Video Extension

00:00

Formal Metadata

Title
Maps in Motion: Introduction to the STAC Video Extension
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
This talk will introduce a new Spatiotemporal Asset Catalog (STAC) extension for geospatial video assets. The extension is designed to standardize the metadata for all types of overhead geospatial video assets, including those collected by satellite, UAV, or airborne sensors, while accommodating situations in which the sensor moves throughout the video. The talk will include a brief overview of the STAC ecosystem (elements and extensions), and explain the Video extension’s schema. In addition, there will be a complete end-to-end demonstration including data preprocessing and STAC item creation (entirely using FOSS tools), and a FOSS method for displaying STAC Video extension-enabled items on an interactive map. The audience need not prepare in any way for this introductory presentation, although some background in STAC and geospatial video might be beneficial. Otherwise, this talk has a broad appeal for data professionals through to frontend developers who are keen to add some motion to their maps!
Keywords
Field extensionSoftware developerLatent heatMetadataStreaming mediaKummer-TheorieBlogStack (abstract data type)Computer animation
Streaming mediaStreaming mediaKummer-TheorieStack (abstract data type)Mobile appPhysical systemComputer configurationComputer animation
Streaming mediaAreaRWE DeaAdvanced Boolean Expression LanguageFile formatFile format1 (number)Kummer-TheorieMultiplication signForm (programming)MetadataStreaming mediaStandard deviationMusical ensembleComputer fileFrame problemAttribute grammarTimestampLoop (music)SatelliteWhiteboardComputer animation
Streaming mediaMetadataMultiplication signComputer animation
Streaming mediaFrame problemAreaLink (knot theory)Greatest elementAugmented reality
InformationData streamFunction (mathematics)Streaming mediaSineBinary fileStreaming mediaOpen sourceData streamPeg solitaireInformationTouchscreenMetadataKey (cryptography)LengthGreatest elementCASE <Informatik>Computer fileFunction (mathematics)Binary codeModule (mathematics)Computer animation
NumberGrass (card game)Disk read-and-write headComputing platformCoordinate systemPhysical systemParsingMetadataSingle-precision floating-point formatAttribute grammarParsingNumberRight angleElectronic mailing listModule (mathematics)Streaming mediaFrame problemBinary fileData storage deviceData streamComputer animation
FreewarePolygonSet (mathematics)QuicksortFrame problemRight anglePoint (geometry)Computer fileStreaming mediaData storage deviceGeometryRectangleComputer animation
Frame problemStreaming mediaDifferent (Kate Ryan album)GeometryFrame problemUniform resource locatorMetadataQuicksortInterpolationCASE <Informatik>Computer animation
Library catalogLibrary catalogStreaming mediaTemporal logicMultiplication signMetadataCategory of beingQuicksortCASE <Informatik>Latent heatDifferent (Kate Ryan album)Power (physics)Field extensionSoftware repositoryStack (abstract data type)Kummer-TheorieConnected spaceComputer animation
Streaming mediaPixelField (computer science)String (computer science)Dimensional analysisIntegerLogical constantDefault (computer science)Streaming mediaLatent heatGroup actionType theoryFrame problemCountingKummer-Theorie1 (number)Shape (magazine)MetadataSoftware repositoryCategory of beingGeometryComputer fileComputer animation
CASE <Informatik>Library catalogStack (abstract data type)Streaming mediaUniform resource locatorData storage deviceLevel (video gaming)GeometryPoint cloudQuicksortGame controllerComputer fileComputer animation
Bit rateHistologyContent (media)Network operating systemMultiplication signLevel (video gaming)Frame problemGeometryStack (abstract data type)Computer animation
Kummer-TheorieBit rateLibrary catalogKummer-TheorieQuicksortStreaming mediaCASE <Informatik>Library catalogStack (abstract data type)Multiplication signBlogFrame problemSingle-precision floating-point formatComputer configurationNP-hardMetadataEntire functionLink (knot theory)SkewnessTwitterOnline helpMusical ensembleMatching (graph theory)Computer animation
Transcript: English(auto-generated)
Okay, thank you. Yeah, so this talk will be about stacked video extension, or possibly I should have retitled it as how I tried to write a blog post about video and ended up writing metadata specification extension. Yeah, I'm a geospatial developer at a company
called SparkGeo in Canada. And in this talk, we'll talk about what is spatial video, how to process it into the system, a short introduction on stack. What is the stack
video extension and how does that relate to stack? What are some, a couple of options for deploying a video stack extension app? And what are some future steps in the stack
video extension? So what is spatial video? What do I mean by spatial video? So often, spatial video is marketed as full motion video, or FMV, and it can mean a few different things.
My video is supposed to loop here, but I hope you saw it once. There was a satellite and a UAV and an airplane all taking video. So there are various definitions of FMV, but what I mean is that this is any video in which both the sensor and the collected video frame might be
moving through time. The actual video asset data can take many forms, basically any format of video, but the most common ones would be MPEG-2 transport stream, and you'll notice
that it's got a TS file extension, or MOV MPEG-4. And so these videos have associated metadata, either embedded in-band, and usually in a format that's byte packets of metadata that are encoded
directly inside the video, and it's usually adhering to the MISB standard, so that's the motion imagery standards board. They are the main body out there that's making video
metadata standards, or the metadata may be delivered out of band, so in a completely separate file like a CSV. Yeah, so the metadata, it exists in the form of packets that have
a bunch of metadata attributes that correspond to a time frame, a time stamp within the video. So how do we actually do anything with this spatial video, and how do we deal with the metadata? Because for most of us, when we open up a video, we just double-click it and
QuickTime comes up and we watch the video, but there's a lot of embedded metadata in there. So here's an example of a full motion video. So this comes off of ArcGIS.com, and you can download the video yourself through the link, but you'll notice that the video here,
it's taken from a helicopter that has a sensor on the bottom that is filming this truck, and as the video progresses, the sensor moves and also the collected frame area moves.
So the way that we can actually get at this metadata, we can use open source tools like FFmpeg or ImageMagick, and when we look at the information about the video, so in this case,
my video file is called truck.ts, you'll see at the bottom, so we've got a video stream, an audio stream, and a data stream, and often the data stream is where video subtitles go, but in this case, you can embed a byte stream of metadata.
So that third stream is where all of the metadata exists, and you'll notice that it's denoted KLV, which stands for key length value, and that's a well-known data encoding
standard so that we can actually decode those bytes into something more manageable. So as an intermediate step, we can copy out the data stream into a standalone binary file using the command on the screen here where we select the data stream. We say that we're
want to copy it, and then we just give it a name of the output file. So this out binary file would contain all of the binary byte stream data. We still wouldn't be able to read it as a human
being, but using the Python module KLV data that you can download off GitHub, we can use the KLVdata.stream parser to read that binary file. And then finally, when we look at the metadata list within that stream parser, we can see the metadata on the right-hand side.
So this would be an example of a single metadata packet, and it's not even the entire thing. I think this video has 50 or so different attributes for almost every video frame in
the video. So you can imagine that there's a fair amount of metadata that's actually encoded within that data stream. And so some interesting attributes here for us would be number 10, their sensor latitude, sensor longitude, sensor true altitude, and we're going to pull out
some of that metadata so that we'll store the sensor center points, frame sensor points, frame corner points, all in external GeoJSON files. And we can do that using the
like sensor latitude, longitude, altitude, and then there's also frame sensor and offset corner lat longs. So there's all of the metadata necessary so that we could make the video that played on the right-hand side where there was a sensor sort of orbiting around the rectangular
video frame geometry. And one thing to notice that when the sensor is writing those metadata packets, it's working as fast as it can,
but it probably won't write as many metadata packets as it's collecting video frames. So in this case, the video only had 700 metadata packets, but there were 4,400 video frames.
So if you play that without doing any sort of interpolation, the video will appear to like play for seven or so frames and then jump to the next location, then jump and jump. So it's a really herky-jerky thing. But if you interpolate the frame corners and different geometries in
between the known metadata locations, you'll get a much smoother video. So turning to the spatio-temporal asset catalog, if you've been in this room at all today,
you've heard the introduction to STACK probably 10 times already. So STACK is a metadata specification that there's a couple of different connected specs. There's the main STACK spec that describes catalogs, collections, and items. And in our case, sort of like
one item would describe a video or a collection of really tightly related videos as assets. And then there's also the STACK API spec.
And yeah, so the STACK spec comes with tons of different item properties already defined, but through custom extensions, we can add additional properties. And you can actually go to GitHub and if you're given permission through the STACK powers that be,
you can write your own extension and contribute it to the repo. And so I've done that with the video extension. You can go to the repo here and see examples of items and you can read the JSON schema spec if you really want to. But there's a bunch of new
properties that are available through the extension. So some interesting ones are the video shape. You can do either the frame rate or the frame count, and then a bunch of video specific pieces of metadata that you can optionally include or not. And then for our
video asset, inside each item, you would want one or more groups of this type of asset. So there would be one data. So all of this is controlled by roles, asset roles. And
one of the assets would have a data role and the rest would have metadata role. And then you can group them by using asset roles as well. So you can connect the actual video asset
with the sidecar geometry files using a group role. And then so how do we actually deploy this and show it on the web or something? So the most simple case is that you only have one stack item, one video, and then these associated geometry files. So you could just
put those in cloud storage and access them directly on your map. But a little more sophisticated, you could have a stack API that talks to a stack catalog in Amazon RDS or whatever,
and then a separate API that generates a pre-signed or signed URL so that you can sort of control how the access works to your assets. And then use that pre-signed URL on your
map. So finally, we can create a map that looks like this. And so just by switching out the
stack ID, you would be able to load the stack item and use the appropriate frame geometries to show on your map and play it through time like so. And so what's the future of the
extension? I'm not completely happy with this. Oh, this emoji didn't really turn out very well, but it looked different at home, but it's sort of like a frowny face emoji. So ideally, we could handle searching per video frame and find the single video frame that
overlaid a point. But with this extension, you wouldn't really be able to do that. You would have to make sure that the matching videos would be returned. The problem with making one
stack item per video frame is that you can have thousands of video frames easily within a video. So if you're happy having a stack catalog that's 4,000 or 5,000 times bigger than your original stack catalog, then go for it. Maybe that's possible for your use case.
Another option might be, I don't even know how this would work, but if you did a stack API extension that maybe searched for the matching videos and then was smart enough to go into the in-band metadata to find the matching video frames, that could be an option. But
I probably won't be the one to write that. Yeah, thanks, Brad Hards, for all your help with the stack extension. And there's some formatting going askew there, but you can reach me on
Twitter at dkweens. And you can also find the blog post that I was mentioning that describes almost this entire workflow at the link here.