Sharp photos and short movies on a mobile phone
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61972 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023133 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
PlastikkarteWave packetMobile WebComputer animation
00:29
VideokarteVideoconferencingSelectivity (electronic)Open setMotion captureFile formatInterface (computing)Computer animation
00:56
Computer fontPoint (geometry)VideokarteBand matrixPlastikkarteComputer hardwareComputer animation
01:53
CoprocessorSemiconductor memoryConnectivity (graph theory)Parameter (computer programming)Medical imagingDevice driverMaxima and minimaAreaCASE <Informatik>Bus (computing)Data conversionComputer animation
02:39
Digital photographyPresentation of a groupStack (abstract data type)Digital video recorderComplex (psychology)VideoconferencingReal-time operating systemInfinityFrame problemPotenz <Mathematik>Expected valueGamma functionGoodness of fitMedical imagingExtension (kinesiology)BitComputer animation
04:29
Complex (psychology)Computer animation
05:09
Tape driveMaxima and minimaStreaming mediaProcess (computing)PixelFile systemFile formatMedical imagingOnline helpPatch (Unix)Pattern recognitionCartesian coordinate systemElectronic visual displayRow (database)VideoconferencingData compressionTimestampSemiconductor memoryBitSpacetimeMathematical optimizationGastropod shellReading (process)Hacker (term)SynchronizationFlash memoryLimit (category theory)Set (mathematics)Color spaceCurveMultiplication signBefehlsprozessorGame controllerKernel (computing)MiniDiscImage resolutionHeegaard splittingEmailHypermediaCodeInterface (computing)Electronic mailing listComputer animation
12:37
Inclusion mapFormal languageCodeProcess (computing)Digital photographyLine (geometry)Total S.A.WritingRow (database)PixelComputer animation
13:25
Wide area networkElectronic meeting systemConditional-access moduleChemical equationPixelImage resolutionCuboidDigital photographyVideoconferencingCore dumpComputer hardwareSoftwareTraffic reportingComputer animation
14:43
Digital photographyMedical imagingSoftware developerInfinityPatch (Unix)MereologyLine (geometry)SoftwareFocus (optics)Process (computing)BitBefehlsprozessorCartesian coordinate systemChemical equation2 (number)Computer animation
17:14
Digital video recorderFlash memoryMiniDiscMultiplication signTimestampFrame problemRaw image formatCodeComputer hardwarePixelRow (database)SpacetimeStreaming mediaCodierung <Programmierung>Computer animation
18:36
Direction (geometry)Contrast (vision)Hacker (term)Focus (optics)MereologyImage resolutionMedical imagingPixelGreatest elementLine (geometry)Right angleAsynchronous Transfer ModeNetwork topologyPosition operatorCodeDifferent (Kate Ryan album)Cross-correlationComputer animation
21:24
MathematicsPatch (Unix)Cartesian coordinate systemLibrary (computing)Raw image formatElectronic mailing listCache (computing)Data conversionCodeFile formatIn-System-ProgrammierungHypermediaSoftwareGame controllerComputer animation
22:33
SoftwareMultiplication signIn-System-ProgrammierungDigital video recorderOperating systemComputer animation
24:33
Program flowchart
Transcript: English(auto-generated)
00:09
Okay, so, hello, I'm Pawel Machek, and I'm here to talk about cameras, but you can also talk to me about quicker train horses, mobile phones, kernel, smart watch bar based on
00:24
ESP32, Mobian on my molester. So first thing first, video followings is not for cameras, it's for frame grabbers and they are really very different, which is basically what this talk will be about.
00:44
They can do remote controls, but they cannot do autofocus for you and so on. But the interface is fairly simple, you just open depth video 0, select format and capture. Unfortunately, what you get is blurry photo, which will be either
01:01
all white or all black. This is with autofocus and auto something. Anyway, there are phones with smart sensors, one such example is PinePhone, and those are pretty close to the frame grabbers. They do basically everything in hardware.
01:24
This used to be a pretty common design in the past, which made a good sense at that point because USB had limited bandwidth and you could not push uncompressed data through it. It's easy to standardize, but it doesn't make much sense today.
01:42
If you have like five lens on your phone, you don't want to have five jpeg encounters there. So we are moving to DAMP sensors, which basically do bare minimum. There you set parameters like exposure, gain, select area and so on.
02:03
And it just passes the bare data over the fast bus and it usually ends up in your memory. And then you have component called ESP, which is image signal processor, which will do the jpeg conversion and such stuff.
02:23
Unfortunately, in case of the interesting phones, which is official Librem 5, PinePhone and PinePhone Pro, we either don't have the processor or we don't have drivers for that, so we can't use it. So this is how the image, this is a
02:41
photo if you try to take it without the automatics. Can you recognize what's there? It's a USB connector. It's recognizable, I'd say. So what do we need to do?
03:00
Nokia N900 is another example of complex design, which used to be very important historically. And actually the photos in the presentation are from N900 with open source stack. In real time, you need to do auto exposure, because otherwise you will have black or white frame. And you need auto exposure for
03:21
autofocus. On most cameras, you really want autofocus too, because you can't just focus to infinity and expect good image. And that's pretty much everything you need to do for the video recording in the real time. Then you have preview. Preview is a bit less important than the video recording, but it's also important.
03:41
You need to convert from buyer to RGB, and you need to do gamma connection, because the sensors are linear in one side and exponential on the other side. GPU can help here. And then there are extensive post-processing steps, like auto white
04:00
balance, lens shading compensation, getting rid of bad pixels, and probably many others I forgot about. Advantage of this is that this can be done after taking photo or after recording the video. And there are quite good tools for that, including raw therapy, Euro and so on. So people were working,
04:25
unlike the other parts, this got some work done before. So what we are talking, for example, on the N900, you have LED flash, which is completely independent device. You have voice coils, support for autofocus, which is again
04:46
separate device somewhere on I2C. Then you have two sensors, front and back camera. You have GPIO switch to select which camera you want. And then you have ISP, which is quite a complex piece of hardware, which will not be
05:04
important for this presentation, because we will do without it. So tools to use, there is a great set of tools to use, but they have some limitations each. One
05:22
which looks very nice is Gstreamer. And Gstreamer is really great if you have an unlimited CPU. Unfortunately, you don't have unlimited CPU. It's like if I was willing to hack its C code, it would be very powerful. But there is some learning curve involved in that too. And at the end, Gstreamer might be
05:46
the right tool to use, but I found other tools easier. There's FFmpeg, which has quite nice and very simple command line interface. So I use it and at the end I didn't really need much. Just please take these images and compress me every
06:04
video from there. There's Megapixels. Megapixels is very nice application focused on mobile phone, very well optimized, but its origin is PinePhone and they don't use live camera there.
06:22
So, not quite suitable. Then there's LiveCamera. Everybody says LiveCamera is the future of video on Minox. It probably is, but there are still many steps to get there. And there's Millipixels. Millipixels is fork of Megapixels, which
06:44
is ported to Librem 5 and to LiveCamera, more importantly. So in many ways, so Megapixels actually currently looks nicer because it is based on newer GTK. On the other hand, Millipixels use LiveCamera and that's
07:00
important stuff. Okay, so this will be a bit of history and reasons and so on. I started to play with camera on PinePhone and first idea was, hey, Gstreamer is there to capture video, let's use Gstreamer, right? Okay, I
07:25
thought this is what should be most portable. I did some shell scripting, media control to set up the pipelines. That's not fun. And then just use Gstreamer to save the bare images to the disk. And I could do 200 kiloPixels, which is
07:45
not great, but better than no video at all, maybe. And I realized that CPU can compress at 70 kiloPixels images in real-time, which is, well, people were doing this,
08:03
but it's some time ago. So I tried to improve. There is IOVU format the camera could do, which is dead by earth and converted to like for better processing. And I could just capture up to 0.9 megapixel video with that.
08:24
And if you wanted, you can take a look there. Maybe it's useful for someone, but, well, there was a reason. The reason was called colorimetry and someone in Gstreamer decided to do a regression, basically.
08:44
And all the Gstreamer stuff stopped working. And I realized that, well, perhaps it wasn't too great to start with anyway. So I started looking around. Quickly, I found Lit Camera, which is the future, right? And, well, it's C++.
09:05
It didn't work at all on Pinephone, so I had to do some quite heavy patching. I got some help on the mailing list and I realized it has JPEG support, which is, well, you avoid a lot of stuff because if JPEGs are already color space converted
09:25
and compressed and so on. And I realized that maybe JPEG works having second look. So I did. You can't save data into megapixel resolution to flash because the flash is not fast enough, but it was, like, almost possible.
09:46
So, hey, JPEGs are four times smaller. Perhaps this could be adjusted. And saving sound is easy. So maybe we can, well, maybe we already have everything we need.
10:00
And this is how Unix camera was born. I realized the current reason. Someone decided that passing uncached data to user space is fun thing to do. And Lit Camera decided that passing uncached memory up to the application is great.
10:22
I thought someone stole my CPU because the performance penalty is about 10 times. But not. It's just the way it is. I believe this needs to be fixed. If you fight with Gstreamer and the performance seems too bad, this is probably why it's too bad.
10:42
And I don't know, talk to your kernel person which can change it. By the way, in the old days we used to have a read interface to get data from the camera. This is now deprecated. Of course it is faster to read the data
11:00
than to get uncached memory, right? That's how badly uncached memory sucks. Anyway, so Unix camera started. Audio is really simple. You just create a small C application to record sound, split it to chunks so you can have easy processing later and timestamp them, which is important for synchronization.
11:25
Lit Camera with some small hacks can write 35 frames per second to megapixel this data to the file system. All you need to do is edit timestamp and symlinks so your preview can tell you which is the latest image. Very easy.
11:44
Control application, you probably don't want to start your video recognition from command line. But that's also very easy. You just take some GTK and Python. It creates timestamps telling, hey, start recording now.
12:02
And displays preview, which is the most intensive thing there. And this is basically what runs during the recording. So this has to be written and a bit optimized. Post-processing is not that important, right? So you just use Python at ffmpeg to compress the resulting video stream easily.
12:24
This is something I was pretty happy about. If you want to duplicate it, you will need some setup like patching the camera and so on. But code is out there and there will be easier method in future. So I like this solution because I could use multiple languages to do my camera recording,
12:45
write language for the job. In the end, this was few hundreds of lines of code total. And it could do some quite interesting stuff. Like you could take still pictures during recording. You simply copy the JPEG one more time. Easy.
13:01
In video resolution, but if you are recording it and two megapixels from phone camera, I'd say this is going to be pretty decent picture anyway. You could take photos with arbitrary delay. Like you could even take photos before the user asked for them because you are taking all of them anyway, so you just don't delete them.
13:23
This was fun. Then I've got access to Librem 5, which is different in important ways. It has dump sensor, so it won't give you JPEG.
13:40
But it had better support. Let camera work there out of the box. There was milli-pixel application, as I explained about before, it with patched megapixels. But it had no auto exposure, auto white balance or autofocus support. It couldn't report video. And there's more issues on Librem 5.
14:04
Canon could use some work. It only gives you 8-bit data, which is not really good enough for good photos. You can select one of these three resolutions. So megapixel, three megapixels or 13 megapixels. And for some reason only 23.5 frames per second work.
14:27
I don't know why. Hardware has face detection autofocus, which is a very cool sounding toy. And I have to thank Purism for their hardware and for the great work they did on the software stack.
14:41
They are heroes. This is the best photo I got with the Anokia N900. So many pixels. They are very simple application. There is a small development team, so it's easy to work with. It's plain C. It's easy to mark patches.
15:02
It does all the processing on the CPU, which is great if you want to change the processing. So I started to do auto exposure because that's the most important part. And I did a very simple one. I prototyped on N900 years ago.
15:21
So basically if you have too much, too wide pixels, like overexposed, you need to turn down the exposure, right? And if you don't have enough, wide enough pixels, you need to turn the exposure up. And this is it. And this works well enough.
15:42
It takes few seconds to converge, can be improved. I don't know how to do that, but this is good enough to take photos. Other thing is auto white balance. This is not that important because you can do it in post-processing. Anyway, they did have manual white balance, so I felt this is easy enough to do.
16:06
It will need some more work. Again, if it's too blue, you make it more red. If it's too red, you make it more blue. That's it. Works well enough. And in few hundred lines of code, I had simple software only.
16:25
Auto exposure and I got that merged. Next step is autofocus. Autofocus is something which deserves more respect because you really want it tuned. But, well, if you want to do it simply, you just start from the infinity.
16:43
You compute blurriness of each frame, and you only need to take a look at part of the image if you want to save your CPU. And you start your sweep, you start to bring the focus closer. And when the image gets more blurry, well, you stop.
17:01
You might want to go a little bit back because of the physical issues of the lens. But this works well, better than manual focus. And I got it merged rather quickly. And next step was video. So I decided that I like the ideas from UNIXI camera.
17:23
And simply did 0.8 megapixels recording directly to the disk. I hacked millipixels to save timestamp frames and left post-processing after the user presses the stop button. Easy to do.
17:41
Obviously, there are disadvantages, right? You are now limited by the disk space. And maybe you could say it's not quite nice to the flash to just stream raw data to it. But hey, the flash is cheap and the phone will die anyway. Post-processing is quite long. It takes five times slower than recording.
18:03
I guess this could be optimized. This is, again, my old code and Python with FFmpeg. Ideally, there is hardware to do the encoding. We should use it. I feel that doing that is an awful lot of work.
18:22
Anyway, this is now upstream. So if you update your Librem 5, you should be able to take video off. And I believe it's important to have something other than a video recording. Next thing I want to talk about, which is very exciting, is face detection autofocus.
18:46
You may want to Google it for a nice explanation. But basically, they have selected some blue pixels. They are special. And they are special in a way that they only take light from certain directions.
19:01
So you have a lens and if it's focused, it's okay. The light comes and meets at the sensor. But if you are out of focus, a funny setup happens and the light from the left of the image
19:22
ends up at a different place on the sensor than the light from the right part of the lens. But if you block the light based on the direction on the chip, which is easy to do, you can use it for focus. So if you take a line from the sensor and on the top you will have left special pixels
19:48
and on the bottom you have right special pixels, for example. Then you have this, the tree you will see on the line will be at different positions on different special pixels.
20:01
Well, then you can use this to focus, right? You just compute correlation between the two lines. And it directly tells you how much out of focus you are and in which direction you should focus. This was great to play with, it was like hacking.
20:21
Unfortunately, it is not too usable on Lebron 5. There are two issues for the special pixels are quite far apart, which they basically have to because if you made all the pictures special, you would lose your resolution. And it only works in the high resolution mode.
20:43
And you don't want to run your preview in high resolution mode. So if someone is interested in fade detection autofocus, I have the code on the GitLab somewhere. It was a fun experiment, it worked. But I decided like for real focus, you would probably have to do hybrid,
21:03
like do coarse focus using the fade detection and then do contrast detection on the end. Seems like a lot of work and with the driver, which could only give you 23 frames per second and so on. Well, I decided not to take this much.
21:27
So I have some wish lists and I think I have like five minutes left. So five minutes talking or five minutes questions? Including everything. Including everything, okay. So I have a long wish list for all of the world.
21:42
I would like to have better media control support in the tools because it just doesn't work. Up is changed and the tools didn't catch up. I would like library to get conversions between formats and so on. I would like better than 8-bit support. I would like multiple applications accessing the camera at the same time.
22:05
Better raw support would be nice and someone should really solve the caching problem because that's bad. For libcamera, I shouldn't be hacking millipixels, I should be hacking libcamera. But libcamera doesn't really support software ISP
22:22
and I am not a great C++ hacker, so I could do it. But they will reject the patches if I do. So I would much prefer them to do the preparation and then I would fill the code. And that's pretty much it. So time for questions.
22:56
Okay. Okay, so the comment is that they want my work on software ISP
23:05
and I guess I will want to cooperate. But libcamera is not easy to hack for me because of the C++ stuff. So be patient and maybe it would be better if someone else did it.
23:24
Yes, so there will not be much to see. So, you know, millipixels could use some work too, but I can take pictures, trust me. I didn't use autofocus for this because...
23:44
Yes, I can do it like you. It is your way from December or something like that. Ah, okay. So it's now upstream, so you can just update the operating system and you will get one. And it should be possible to do just a short video recording too.
24:02
So now you have all been recorded and now the CPU is busy converting that. And battery is grown now. Yes, so what? Okay, more questions? Okay, so I guess...
24:21
People are so impressed, but there is no questions. So thank you very much. Thank you.