Multimedia processing with FFMpeg and Python
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 131 | |
Author | ||
Contributors | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/69479 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 202444 / 131
1
10
12
13
16
19
22
33
48
51
54
56
70
71
84
92
93
95
99
107
111
117
123
00:00
System on a chipSet (mathematics)MultimediaCorrelation and dependenceStreaming mediaVideoconferencingMultiplication signSelf-organizationSoftware developerInformationoutputProcess (computing)Function (mathematics)FlagLetterpress printingMultimedia2 (number)Parameter (computer programming)Computer fileVideoconferencingPersonal digital assistantStreaming mediaCASE <Informatik>Complex (psychology)Task (computing)Computer programmingFunctional (mathematics)Core dumpYouTubeObject (grammar)BitGroup actionExpert systemGraphical user interfaceCommon Language InfrastructureOpen setFile formatSoftware testingDefault (computer science)Google ChromeFrame problemComputer animationLecture/ConferenceMeeting/Interview
04:31
Copyright infringementYouTubeVideoconferencingStreaming mediaConditional-access moduleoutputRadical (chemistry)Function (mathematics)Thresholding (image processing)Parameter (computer programming)Process (computing)Arithmetic progressionComputer fileAlgorithmComputer animation
05:24
Compact spaceComputer fileCASE <Informatik>Filter <Stochastik>VideoconferencingOverlay-NetzHistogramArithmetic meanSystem callParameter (computer programming)BitoutputStreaming mediaFunction (mathematics)Graph (mathematics)Complex (psychology)MultimediaCodierung <Programmierung>Source codeResultantKonturfindungGraph coloringProcess (computing)File formatPoisson-KlammerWebsiteMetadataSoftware testingTask (computing)FlagElectronic visual displayAsynchronous Transfer ModeBit rateInformationChainLine (geometry)CodecScaling (geometry)Computer animationSource code
10:49
WebsiteRight angleResultantFunction (mathematics)FlagWritingOverlay-NetzStreaming mediaDifferent (Kate Ryan album)System callString (computer science)Process (computing)Graph (mathematics)Wrapper (data mining)Group actionLink (knot theory)Computer fileProjective planeParameter (computer programming)Functional (mathematics)VideoconferencingElectric generatoroutputFitness functionScaling (geometry)File formatGraph (mathematics)Filter <Stochastik>Asynchronous Transfer ModeElectronic visual displayHistogramChainNumberLengthSlide ruleMultiplicationKonturfindungVariable (mathematics)Peg solitaireDefault (computer science)MappingCodeCASE <Informatik>Complex (psychology)Computer animation
15:05
Evolutionarily stable strategySystem callFunction (mathematics)ResultantProcess (computing)CodeVideoconferencingSource codeFrame problemComplex (psychology)Computer animation
15:40
Image registrationOpen setHand fanVideoconferencingImage registrationVideoconferencingBitConditional-access moduleMereologyStreaming mediaSource codeFrame problemRectangleElectronic mailing listFunctional (mathematics)InformationSquare numberPixelParameter (computer programming)Process (computing)outputObject (grammar)Type theoryClique-widthInterior (topology)System callFunction (mathematics)Computer fileFile formatSoftware repository2 (number)Web pageModal logicMultiplication signExistenceIterationCondition numberCASE <Informatik>Intelligent NetworkBuffer solutionRight angleStructural loadRepository (publishing)SynchronizationString (computer science)Software testingOpen setArray data structureResultantFerry CorstenError messageTunisComputer animation
24:32
NeuroinformatikBitVideoconferencingCodeResultantComputer programmingImage resolutionProcess (computing)Computer animation
25:05
Condition numberStreaming mediaComputer fileProcess (computing)Function (mathematics)Software maintenanceDependent and independent variablesMultiplication signProjective planeStudent's t-testVideoconferencingIntegrated development environmentType theoryProper mapLibrary (computing)CodecSampling (statistics)Operator (mathematics)Wrapper (data mining)Online helpoutputSinc functionSource codeOpen sourceQR codeMultimediaLink (knot theory)Content (media)SoftwareSuite (music)Event horizonComputer animation
28:53
TelecommunicationLecture/Conference
29:21
Assembly languageWebsitePressureMereologyBitCodeFrame problemFunction (mathematics)Computer fileRoundness (object)MultimediaProcess (computing)Interface (computing)Library (computing)VideoconferencingSlide ruleLevel (video gaming)Open setImplementationStreaming mediaGame controllerMedical imagingYouTubeElectric generatorTelecommunicationProjective planeReal-time operating systemArithmetic progressionNetwork socketTask (computing)Graphical user interfaceWeb pagePeg solitaireHypermediaVideo gameWebsiteLecture/ConferenceMeeting/InterviewComputer animation
Transcript: English(auto-generated)
00:04
It's actually my first time at EuroPython, so nice to see you. And first, very, very briefly about me, I'm a Python developer. I'm also one of the PyCon Poland organizers, and I have been a PyCon Poland organizer
00:23
since 2016. I've been helping Filip, who is present here, too. And also, I'm working part-time for the Warsaw University of Technology, which is displayed here. I am actually a teaching assistant.
00:43
I am teaching advanced Python programming. And I am also a much beach volleyball player, and yes, you can play beach volleyball indoors, especially in Polish winter. It's awesome, really. And I also play guitar. This is me on the way to last year's PyCon CZ on a train.
01:04
And yeah, hopefully, we'll play together today's evening. And the agenda of today's talk. First, I'm going to talk about what FFmpeg is actually. Then we are going to go through some very simple tasks using the FFmpeg CLI.
01:25
Then we are going to move to complex video stream processing with the FFmpeg CLI. And then we are going to move this, like, defining the whole processing to Python,
01:45
to make it not so complex with FFmpeg Python. And then we will go through a more advanced use case of frame-by-frame object detection with FFmpeg Python and OpenCV. And then we are going to briefly talk about testing.
02:03
So what is FFmpeg exactly? Does anybody here know what FFmpeg stands for, the abbreviation? Anyone? Probably, yeah? Fast and Furious, no, I don't think so, but it would be funny.
02:25
It's just Fast Forward Motion Picture Experts Group, because MPEG stands for Motion Picture Experts Group, and FF stands for Fast Forward. So yeah, Fast and Furious wouldn't be so bad either, I guess. But it is like a Swiss Army knife for multimedia processing.
02:45
It supports most of audio, video, and subtitle codecs, even, and formats. And it is used internally by a lot of tools you probably use, and you were not aware that they run FFmpeg underneath for multimedia processing, like Google Chrome, like Audacity, Blender, OBS Studio, Ambient, Jellyfin,
03:04
and many more, including YouTube GL. Actually, the core function of managing the M4U playlists of YouTube GL is based on FFmpeg. And this tool was built in 2000 by Fabrice Belard.
03:20
And it's developed rapidly for the years after that. So to get started with FFmpeg, you can just trim a video file using this very, very simple command. Well, first, you specify the binary, which is FFmpeg.
03:40
Then you have this hide banner argument, which is not really necessary, but it's nice because it hides all the information FFmpeg prints by default, including the flags it's been compiled with, like the features it's been compiled with. And you just get the output you probably expect. Then you pass the input flag, the I flag here,
04:01
with the video file as an argument. Then you tell it to seek to the 30th second of that video file. And then the T flag means to take the 10 seconds after that. And then this Y flag means that if the output file already exists,
04:21
it should override it. And the output file is videoTrim.mp4 this time. So it's very simple. Let's do something a bit more complex. And let's say we have this original video. Actually, in Poland, it's quite nice because it's legal to record and publish dash cam videos.
04:41
So I can use this as an example and avoid copyright infringement on YouTube. And we have this original video. And let's detect some edges using FFmpeg. So again, we open the terminal. We use the FFmpeg command. We hide the banner. Then we pass the input.
05:02
And here's the input file. And then we tell it to use a video filter called edge detect, which uses Connie's edge detect algorithm. We pass some arguments like the low and high threshold. Again, we tell it to override the output. And we specify the output file. And then it runs. It tells us the progress of our processing.
05:22
We wait for some time. And then we can open our file, which looks like this. We have detected all the edges. It was very, very simple. And yeah, it's a simple use case, but it has a lot of filters. You can just look at the documentation to see how many of them are available.
05:42
So another tool which is provided by FFmpeg is FFProbe, which is a very nice small tool which allows you to see some information about the multimedia files you are inspecting. So when you run it on some MP4 file, for example, you can see the codecs it's using.
06:03
You can see a lot of metadata like the encoder, the duration, the bit rate. The streams that are present in this file, including the codecs these streams use, like H.264 here. And also, as you can see, you have audio stream,
06:21
which is also marked as Japanese. So you can see a lot about a video file if you run FFProbe on it. And it's a very useful tool for testing too, which we are going to talk about later. So how to do a bit more complex filtering?
06:40
Because what I have shown you is just very simple tasks of applying just one filter and having one input and one output. But FFmpeg can do a lot more than that. You can use it to design pretty complex processing pipelines using just the CLI.
07:05
So now we are going to go for an example of generating an RGB color histogram from the source video. Then we are going to run edge detection on the source video, like in parallel, and then place the results of the above on top of the source video as an overlay.
07:25
So we can see three of these streams just in one video. So the FFmpeg command for this looks like that. And this might be not very pleasant to look at at the first sight,
07:41
but over the time, like if you use FFmpeg long enough, it starts to be understandable, but still not very easy to maintain, especially if you get something a bit more complex than what I am showing here. But the basic concept here is that in this filter complex flag, which is used for complex filters,
08:01
you specify your signal graph that you use for processing. So here you have actually a small chain of filters like this first line. The first filter is the format filter, which changes the format of the stream to GBRP, which is just plain RGB we are using to generate the RGB histogram later here.
08:26
This is the filter which we use to generate the histogram. And on the left, in these square brackets, you have the filter input arguments, and here you have the output argument,
08:40
like more like an output node. And the zero here is always the zero file passed to the FFmpeg call. But yeah, I mean, it's pretty complex, like later on you have some other stuff, but an easier way to represent this is just to draw the graph.
09:00
So here you can see that first we have our input file, which is marked as zero. Then we apply here this GBRP format, and then the histogram filter with display mode stuck, and we assign it to the hist node. Then we scale this node to make it just, you know,
09:29
small enough to display on the original video and to have the right proportions. Then we marked it as histscaled,
09:42
and then we pass it as the second argument, the overlay filter, which accepts two arguments. So here, the first overlay filter usage gets the original, the zero stream as the first argument,
10:01
and the second argument is our histogram, which we overlay on top. So the second path is the edge detection and scaling. We just scale it to 25% of the original video size, so we can fit it on top of the original video.
10:22
And then we apply edge detection, sorry, and then we pass it, we mark it as edges, and then we pass it again to the overlay filter with the arguments x of 800 and y of zero, which is just the placement of the overlay. And then again, we merge these together
10:43
as output slash histogram.mp4, which is the output file. So this is the result. And this is pretty cool, right? Compared to just plain edge detection, this is already much more useful.
11:04
So let's move to FFmpeg Python, because, well, writing this filter complex argument, well, it wasn't too hard, but it wasn't easy and just straightforward if you never used FFmpeg before. And well, if you use something like this in a project
11:22
and you added some conditional filtering, it would get very unpleasant to maintain, I would say. So FFmpeg Python is a nice and pretty small Python project, which is available on GitHub, and here you have the GitHub link. And it is a convenient wrapper for the FFmpeg CLI,
11:43
which is focusing on the filter graphs that FFmpeg supports. And it can generate FFmpeg arguments from your Python code, which sounds very nice compared to using the CLI, right? It also provides helper functions for running FFmpeg in a subprocess,
12:01
and it also includes a FFmpeg probe function, which returns FFprobe results as a Python dict, so it's much easier to parse than the outputs I've shown you before. And how to do the very same thing we've done a few slides before using the CLI in Python. So first we import FFmpeg.
12:21
Then we define our input file and assign it to the input video variable. Then we first define our histogram stream, like the filter chain, right? We run a few filters on the input video.
12:41
We don't really run the filters. We will run the filters when we run the FFmpeg command, but we design our filter graph here, right, using these filter calls. So first we apply this format filter with the argument GBRP. Then we filter it with the histogram filter with display mode stack. And then we apply the scale filter to adjust the size of our histogram appropriately.
13:05
Then we define our edges processing chain. And here we have, again, our input video and the edge detect filter, and then the scale filter, which, of course, makes the video fit on top of the original video.
13:24
Then we define that we want to use the overlay filter to overlay the hist stream we defined before on top of the input video. And then we use the overlay again to just overlay this histogram,
13:42
to overlay this edges stream on top of the hist overlay stream. And then we just define our output, and we say that it should pass the Y flag to FFmpeg to override the output if it exists, and just run it.
14:02
So it's much simpler, and it still generates pretty much the same. When you compile it to print the string that FFmpeg Python generates, like the FFmpeg call generates, it basically returns the same with very small differences.
14:21
Like here, for the overlay by default, it uses the argument of end of file action repeat. But in this case, it's not necessary, because the streams are of the same length. So if one stream ends, we don't really have to repeat anything in the overlay, right?
14:42
So it's not necessary. And another not necessary thing it does is mapping explicitly this node number 6 to the output file, because, well, there is just one output file. It's not necessary, but FFmpeg does support multiple output files in one processing.
15:06
So again, the output is pretty much the same. It's exactly the same, right? It's the same call, so we get the same results. So, well, we've done some processing with FFmpeg Python.
15:23
We use it to generate a FFmpeg call for complex processing. So how can we actually run some Python code on the frames of our source video and then write it to our output video?
15:41
So now we are going to do something a bit more interesting. We have this original video here. And again, it's some dash cam footage. And we are going to try to detect registration plates of the cars passing by. So let's go briefly through the logical concept of a solution I'm going to propose.
16:04
Like this is probably one of the possible solutions. There is plenty of them, but this is fairly simple and intuitive. So first we are going to create and spawn our first FFmpeg process, which is just going to decode our stream.
16:21
Because, well, MP4 files usually contain compressed streams. And we can't just load them as simple frames to Python. So we have to decompress them. And FFmpeg is an excellent tool for decompressing video streams, because it accepts pretty much any codec existing.
16:41
So then when we read the standard output for the just piped to our Python process, we are going to load these frames as NumPy arrays, and then use OpenCV and the OpenCV's cascade classifier to find the registration plates,
17:02
and then mark them using OpenCV again. And then after we found and detected and marked these registration plates, we are going to re-encode these frames to another MP4 file by using a second FFmpeg process.
17:22
So how the source code would look like? Well, first we have some imports, and we need to import NumPy and OpenCV too. And then we define our input and output file. And then we have this mark plates function.
17:41
I am not going to go through this function like step by step, because I don't think it's necessary for this talk. And actually, all the examples are available on the GitHub pages page for this talk. So you'll be able to access it and run it afterwards with all the source videos and everything. So you can just test it afterwards.
18:03
But the important part is that this function accepts an OpenCV frame, which is basically a NumPy array, and it accepts a list of the plates we found as just OpenCV rectangles. So we just pass the coordinates of the plates we found on the frame.
18:24
And then this function just, you know, draws a nice square around the registration plate. And then to, you know, get some information about our input video, you have to use FFProbe.
18:41
And to do this, we just call FFmpegProbe here on the input file, and then we take the streams from this file and extract information like the width of the stream, the height of the stream, and the frame rate of the stream, because we are going to want to slow it down so we see step by step what our tool detects.
19:05
And to just like we assume that there is just one video stream in this file, so we use this if condition in our generator, and we just use Python's next to just get one video stream, because there should be just one.
19:21
If there is none, we would get a stop iteration error, which is not too bad in this case. And then we define our first FFmpeg process, which is also not too complex. We just define the input. Then we define our output, which is just pipe.
19:42
And we want the output format to be raw video, because we want to pass raw frames to Python. And our pixel format would be BGR, which stands for, well, blue, green, red, and 24 for 24 bits. So it's pretty much understandable.
20:03
And then we just run the stream asynchronously, and we tell the subprocess underneath, like FFmpeg Python tells the subprocess, which it creates to pipe the standard output. And then we define our second process, which is a bit more complex,
20:20
because we need to specify some information about the input, because it can't really guess what the input is when it's just piped, especially that these are just raw frames. So we define our input, which is just pipe, and our format is raw video. Again, we pass the pixel format we use.
20:41
Then we tell it that the size of the video is specified here. This is the information we extracted from the FF probe call. And then we specify the output frame rate, which is the original frame rate divided by four. So it will be four times slower than the original video.
21:02
And then we define our output file. Here's the MP4 file, the MP4 file name string, and the pixel format, the standard most common pixel format for streams in MP4 is youth 420p.
21:21
And then we tell it again to override the output if it exists, and again, run async. And the most important part of detection is the OpenCV classifier. This XML file is available on the OpenCV repository, so you can just download it.
21:40
And it's still available on the GitHub repo for this talk, too. So you can just run it and test it yourself. And this cascade classifier, well, this XML file contains some features that the classifier is going to apply to detect the registration plates. And here we iterate over the frames just by reading
22:04
from the standard output of the first process. And we know that the size of a frame would be video width multiplied by video height and multiplied by three channels, because, well, it's RGB, right?
22:21
So it's just three channels, three bytes per pixel. So here we define our input frame because, well, we read some bytes from the standard output of the first process, and we need to load it as a OpenCV frame.
22:40
So we run NumPy from buffer, and we know that these are unsigned ints. And then we reshape our array for it to be just video height by video width by three channels. And then, again, we convert it to type Uint8.
23:04
And then here, this is not really necessary for this case, but you need to remember that OpenCV would do most of the processing on the objects you pass. So we can expect that this mark plates call,
23:20
in which we apply some OpenCV functions, would mutate the frame. So if you apply something afterwards or try to detect something else, you might be surprised that you are trying to detect something on a frame you already painted on. And then here's our function which finds the plates, function call which finds the plates.
23:40
We use the OpenCV's detect multi-scale function, and we pass the frame, would tell it some parameters which we can fine-tune accordingly to our results. And, well, this is the most obvious part, that we define that our registration plate size should be greater
24:02
than 50 by 20 pixels and smaller than 100 by 40 pixels. And then we run our mark plates function, and then we write to the standard input of the second process. Again, we have to convert our OpenCV frame just to plain bytes
24:21
so we can write to the other process. And after everything, we close the second process and then we wait for both processes to exit. So this is the result of this code. And, well, it's not 100% accurate,
24:40
but as you can see, it does work. And if you applied some more complex processing, and maybe if the video had the plates in a bit better resolution because for me it's pretty hard to read it, and I'm a human, not a computer program,
25:01
but yes, it works. Well, so if you are using FFmpeg Python, how are you supposed to test your software? Because it's a very important thing to do, right? So for simple processing, you can just use FFmpeg probe
25:20
to check if the output file is healthy and contains streams of the duration you are expecting. You can also use FFmpeg compile to retrieve the generated commands. So you are sure that your complex conditions which use FFmpeg Python for defining these processing pipelines
25:43
generate the correct output. And a very useful thing I found a pretty long time ago which is not that easy to find is that the FFmpeg maintainers provide a very diverse library of multimedia samples at fatesuite.ffmpeg.org.
26:01
It's a very, very, like you can get codecs you don't really see like most of the time. So it's very useful if you want to develop a solution which accepts everything as input. So it's very nice. And some conclusions. Well, FFmpeg is very powerful and complex. You can just run the full help to see how long it is.
26:23
It's impossible to remember every filter of FFmpeg. And when you are using it for complex multimedia processing you should probably use some wrapper to make it more human readable. The CLI is awesome for basic operations. I use it all the time when I need to compress some video file
26:43
I want to send over some chat or something. It's excellent. And FFmpeg Python has one issue that it seems to be abandoned. Last comet was two years ago and there is no response from the maintainer on the issues asking if the project is still alive.
27:02
And this is an issue but still it's an open source project. I think it's under the MIT license so you are free to clone it. And a very, very nice thing which happened recently is that my students at Warsaw University of Technology used the project idea I provided
27:22
of rewriting this library because it lacks some features like proper type hints and IDE support. They took an attempt to rewrite this library with a compatible API and it actually covers like I would say half of the features original library provides
27:41
but it's a nice start and it actually generates filter functions from the FFmpeg source code. Like the C source code with all the type hints and everything and it's a very nice idea if you want to check it out it will be also linked to this talk. I'm not really affiliated with the project because this is a project of my students.
28:02
It's also under the MIT license and it's published on GitHub and it has some very nice ideas. It's not perfect but it's a nice way to start writing something that would be a worthy successor to the original library. And another thing I would like to tell you
28:21
is that PyCon Poland which I helped to organize happens this year by the end of August actually from the 29th of August till the 1st of September and it happens in Gliwice in southern Poland and 90% of content is in English.
28:40
We have talks, workshops, a social event, everything. It's very nice. You are very welcome there. And thank you very much. This is the QR code for all the links related to this talk and now some time for questions, I guess.
29:04
So if you have any questions please stand by the mic. Let's see if this is on.
29:20
Thanks for the talk. That was very inspiring. The Python library, is this focused on creating output files or can you also have a live output like from the command line ffmpeg that you could reuse? Yes, you can do live output using this library. I mean this library is just an interface
29:42
to calling ffmpeg processes and this is just a plain ffmpeg process and ffmpeg does support doing stuff in real time even like you know live RTMP streaming or something like this. Like ffmpeg is really a Swiss Army knife of multimedia processing and streaming and yeah like it's also you can like
30:04
I have some extra slides not for this talk really but they show you how to connect to the ffmpeg process using a Unix socket so you can track the progress of your processing too in real time. So it's a very nice thing. So yes you can stream real-time stuff
30:21
using ffmpeg and ffmpeg Python. Thank you so much for your talk. It was really inspiring. Throughout it my mind was racing about side project ideas. Have you used ffmpeg in any side projects or friends have used it? You mean like for my side projects? Yeah yeah.
30:41
Actually I well I actually use it for a project at the company I work for and I'm not sure how much I can tell about the project so maybe we can talk about it afterwards but well it's a very nice tool like I use it for you know for the very simple tasks
31:03
if you you know there is some video you want to download from some page if for example YouTube DL doesn't support it you can use ffmpeg for that too. Yeah like it's it's a very very fun thing. I even used it to downscale the image of the Warsaw University of Technology in this talk
31:22
because it was like 10 megabytes and I was like this is too much and yeah like there's a lot of tools that can do this but ffmpeg can do it too like ffmpeg can even process GIFs and generate GIFs so yeah I think I generated a few GIFs in ffmpeg just just for fun so like yeah like it's a very very versatile tool
31:40
and and it at first it looks scary to to use it but but it's it's not really like after you get used to it it's it's much much more pleasant than like running some graphical user interface for you know like editing videos so yeah.
32:02
Thank you for the talk. Is there more higher level library also available because all of this piping seems quite low level and it seems like. Oh you mean that this process communication. Yeah I don't think so. At least I haven't found a more high level library
32:24
you should use ffmpeg and there are some high level libraries that utilize ffmpeg for like playing audio from python as far as I as I know but I mean like somebody can implement something that's even even easier I mean like piping yes it does require some knowledge
32:42
about about you know multi-processing in python which which can be pretty hard if you have not used it before but I think this is one of the easiest libraries to get started with ffmpeg for example there is pyav which is a binding
33:02
to the ffmpeg c libraries and this is much much harder to use because you know like it's you have great control over everything like all the processing videos frame by frame but it's you have to think about a lot of things but actually this code yes it's it's a bit complex
33:23
what I've shown what I have shown you with this open cv processing but I think after after you you know you can run the examples afterwards and I think like you can you can text me on linkedin just like anywhere on the discord I I will be glad to to help to to somebody to understand how how this multi-processing part works
33:45
thank you thank you everyone for participating and asking the question and you could ask more question in the discord and thanks again and let's give a round of applause for Michael
Recommendations
Series of 14 media