We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Status of camera support on mobile FOSS devices

00:00

Formal Metadata

Title
Status of camera support on mobile FOSS devices
Subtitle
An open discussion about the state of cameras on Linux-powered mobile devices
Title of Series
Number of Parts
287
Author
Contributors
Et al.
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Camera support has traditionally been a pain point when it comes to Linux phones. Years and years of competition on the photography capabilities of consumer devices have pushed vendors to adopt more and more custom, closed source implementations, leaving users of free software powered mobile devices with nothing but poor solutions limited to work on the single devices they have been developed for. With the increasing maturation of libcamera, a complete user space camera stack for Linux devices is now finally available, and Linux phones developers can now cooperate on more mature camera solutions for their devices. The BoF will serve for phone developers and camera developers as a cooperation space, to better understand their mutual needs and move forward camera support for the whole Linux ecosystem. Cameras have traditionally been a distinguishing factor in the traditional mobile/smartphone market. Year after year the number of pixels available in a phone's camera have become a stable part of the phone producers marketing material and as a direct consequence of such fierce competition everything around cameras has usually been quite secretive, with vendors implementing rather cumbersome software architectures to work around the software licensing requirements which would have otherwise required them to open at least part of what they consider their secret sauces. Vendor's reluctance to discuss and innovate in a common shared space and the undeniable deficiencies (or better, the complete lack of existence) of anything resembling a camera stack in the Linux ecosystem has pushed camera support to a quite uncomfortable position when it comes to FOSS-powered mobile devices. Until very recently nothing comparable to what could be achieved by a rather cheap Android phone, running binary blobs both in user and kernel space, can be easily realized by using a fully open infrastructure. Three years after starting of the libcamera project, its adoption as the default camera stack for the Raspberry Pi ecosystem and its increasing permeation in the x86 device space thanks to the support of vendors like ChromeOS and a more robust integration in Pipewire, it's now time to finally address the camera issue in the FOSS phone space. With the recent interest from Librem5 and Pinephone communities in the project, this BoF intends to provide a space where phone developers, libcamera developers and hopefully vendors can discuss their mutual headaches and try to sketch a way forward in order to provide to free-software equipped mobile device a camera support that can do more than what an Android phone was capable of 10+ years ago.
Mobile WebDevice driverMotion captureComputer-generated imagerySystem programmingInterface (computing)Analog-to-digital converterPattern languageRaw image formatInformationDigital signalDomain nameAnalogyPixelFile formatData conversionRotationDisjunctive normal formPhysical systemInternet service providerPlastikkarteBinary fileComplex (psychology)Loop (music)Presentation of a groupBitDevice driverMoore's lawInterface (computing)PixelSingle-precision floating-point formatRow (database)Physical systemRepresentation (politics)CASE <Informatik>InformationComplex systemGraph coloringSoftware developerPoint (geometry)Pattern languageMedical imagingColor spaceBinary codeMotion captureLattice (order)RoboticsSinguläres IntegralSet (mathematics)Level (video gaming)Domain nameSequenceMathematicsFrame problemParameter (computer programming)AnalogyCartesian coordinate systemRing (mathematics)Data conversionSeries (mathematics)Form (programming)Image resolutionCharacteristic polynomialDifferent (Kate Ryan album)ResultantFile formatReverse engineeringSystem on a chipChemical equationFunctional (mathematics)CoprocessorConnectivity (graph theory)QuicksortComputer programmingOpen sourceMultiplication signSoftwareAlgorithmStack (abstract data type)Mobile WebWorkstation <Musikinstrument>Power (physics)Thomas BayesMereologyCellular automatonStatisticsSampling (statistics)FrustrationSpacetimeArchaeological field surveyClosed setEmailMoment (mathematics)UsabilityCollaborationismWeb pageDisk read-and-write headWhiteboardElectronic mailing listConfiguration spaceBarrelled spaceOperator (mathematics)MetreProcess (computing)NeuroinformatikArithmetic meanChainUniform resource locatorCoefficient of determinationBeat (acoustics)DiagramComputer animation
Binary fileMobile WebSystem programmingDevice driverSinguläres IntegralComputer hardwareMessage passingPixelVideoconferencingGame controllerConfiguration spaceData conversionExtension (kinesiology)Representation (politics)File formatPixelIn-System-ProgrammierungMedical imagingInternet service providerHypermediaNetwork topologyImage resolutionMaxima and minimaPhysical systemScaling (geometry)Game controllerSingle-precision floating-point formatOrder (biology)Configuration spaceCartesian coordinate systemConnectivity (graph theory)Combinational logicPoint (geometry)SpacetimeComplex (psychology)Parameter (computer programming)Motion captureCASE <Informatik>Digital video recorderImage processingCounterexampleCodeRaw image formatCharacteristic polynomialVideo gameSoftwareNeuroinformatikDevice driverSystem on a chipBinary codeComputer hardwareFirmwareMultiplication signVideoconferencingEvoluteAbstractionPlastikkarteMereologyKernel (computing)Interface (computing)Standard deviationBit rateComa BerenicesPerfect groupInformationForm (programming)QuicksortNumberSuite (music)Thomas BayesDifferent (Kate Ryan album)Flow separationLogic gateGoodness of fitExploit (computer security)Percolation theoryWebsite40 (number)Row (database)Cellular automatonFrame problemForestResultantShape (magazine)Program flowchart
Configuration spacePhysical systemCodeNP-hardMobile WebRegular graphInterface (computing)Tape driveSystem programmingScripting languageDemonComputer hardwareMassWhiteboardGeneric programmingInternet service providerGUI widgetLatent heatAbstractionComputer configurationEmulatorLibrary (computing)Source codeImplementationState of matterSound effectMereologyPhysical systemWhiteboardOrder (biology)Revision controlSoftware frameworkLatent heatCartesian coordinate systemSoftware developerData structureFunctional (mathematics)System callGame controllerDegree (graph theory)Multiplication signPoint (geometry)Library (computing)ImplementationCASE <Informatik>Coefficient of determinationSet (mathematics)Configuration spaceState of matterScripting languageRow (database)Intelligente KameraOperating systemMobile WebElectric generatorComputing platformProcess (computing)Term (mathematics)SpacetimeEvoluteLie groupMedical imagingMachine visionMatrix (mathematics)Uniform boundedness principleParameter (computer programming)Presentation of a groupLattice (order)Axiom of choiceAnalytic continuationFrame problemMoment (mathematics)Different (Kate Ryan album)Connectivity (graph theory)Percolation theoryMaterialization (paranormal)Open sourceRegular graphFile formatLine (geometry)CodeBinary space partitioningComputer architectureStack (abstract data type)SoftwareAlgorithmSingle-precision floating-point formatInternet service providerCross-platformAdaptive behaviorFlow separationRepository (publishing)CodeRight angleScaling (geometry)Computer hardwareEmulatorKernel (computing)Open setXMLComputer animation
Core dumpMultiplication signConnected spaceSlide rulePerspective (visual)Different (Kate Ryan album)Right angleShared memoryMobile WebXMLComputer animationMeeting/Interview
Software developerLaurent seriesWhiteboardMountain passFrame problemCartesian coordinate systemKernel (computing)Projective planeSystem callCore dumpAlgorithmCASE <Informatik>Software frameworkVolume (thermodynamics)Standard deviationReal numberSurfaceBitImage processingLine (geometry)Software developerComputer hardwareFirmwareComputer virusData analysisDevice driverStreaming mediaVideoconferencingTable (information)SpacetimeOrder (biology)Video gameSoftware engineeringTerm (mathematics)LaptopProcess (computing)QuicksortComputing platformPhysical systemGoodness of fitRoundness (object)Tablet computerCellular automatonGradientCommitment schemeOpen sourceDifferent (Kate Ryan album)Service (economics)Endliche ModelltheoriePhysical lawMonster groupArmPixelNumberForcing (mathematics)Coma BerenicesFocus (optics)AreaRow (database)Point (geometry)HierarchyWhiteboardUniformer RaumReading (process)MereologyMeeting/InterviewComputer animation
Software developerLaurent seriesWhiteboardPoint (geometry)Mobile WebRing (mathematics)Charge carrierDigital photographySmartphoneVideoconferencingSystem callRevision controlProduct (business)Term (mathematics)Electric generatorAxiom of choicePower (physics)VideoconferencingSystem callConcentricStatement (computer science)Point (geometry)Office suiteCellular automatonMereologyCartesian coordinate systemSlide ruleConfiguration space40 (number)WordComputing platformVapor barrierCycle (graph theory)Patch (Unix)CASE <Informatik>PixelComplete metric spaceGraph coloringRouter (computing)InternetworkingPresentation of a groupLevel (video gaming)Nortel NetworksOpen sourceDigital photographyUniverse (mathematics)Perspective (visual)HorizonCodeSoftware developerMultiplication signBitSmartphoneSinc functionFocus (optics)Different (Kate Ryan album)NeuroinformatikExpected valueFunctional (mathematics)State of matterLaptopDevice driverOnline chatContrast (vision)Set (mathematics)File formatData miningAndroid (robot)PasswordMobile Web1 (number)Context awarenessReal numberCalculusMobile appRobotComputer hardwareInternet forumComputer animation
Charge carrierVideoconferencingSystem callDigital photographySmartphoneInternet service providerRaw image formatStatisticsAlgorithmBefehlsprozessorFile formatMotion captureInstance (computer science)Frame problemCartesian coordinate systemDigital photographyAlgorithmMetropolitan area networkAssociative propertyQuicksortOctahedronNumberSet (mathematics)Form (programming)Library (computing)Group actionNeuroinformatikLimit (category theory)Open setOcean currentService (economics)Ring (mathematics)Sign (mathematics)Variety (linguistics)Moment (mathematics)PixelMedical imagingConfiguration spaceImplementationServer (computing)Inheritance (object-oriented programming)Level (video gaming)System callDialectExecution unitPoint (geometry)Asynchronous Transfer ModeForcing (mathematics)Demoscene1 (number)Game controllerConnectivity (graph theory)Source codeFile formatMereologyWeb browserCASE <Informatik>Process (computing)VideoconferencingRow (database)Insertion lossGradientDifferent (Kate Ryan album)Internet service providerUsabilityStatisticsPortable communications deviceTask (computing)Term (mathematics)LogicAcoustic shadowMultiplication signAngleDevice driverComputer hardwareBitGraph coloringParameter (computer programming)In-System-ProgrammierungSlide ruleStack (abstract data type)BefehlsprozessorCodeModule (mathematics)System on a chipComputing platformRoboticsFunction (mathematics)Graphics processing unitLatent heatTexture mappingMotion captureCategory of beingType theoryFlow separationPower (physics)Software testingComputer animation
DataflowInternet service providerRaw image formatBefehlsprozessorAlgorithmStatisticsFile formatMotion captureTask (computing)Charge carrierComputer hardwareMedical imagingImage processingIn-System-ProgrammierungProcess (computing)Kernel (computing)Multiplication signQuicksortAlgorithmHybrid computerDevice driverComputing platformConnectivity (graph theory)ImplementationSource codeInstance (computer science)Software developerCodeBuffer solutionKeyboard shortcutEmailGeneric programmingExpected valueLatent heatTerm (mathematics)Front and back endsInterface (computing)Different (Kate Ryan album)Line (geometry)String (computer science)CASE <Informatik>StatisticsComputer architectureNetwork topologyConfiguration spaceCuboidBitLaurent seriesOrder (biology)Portable communications devicePixelCartesian coordinate systemPatch (Unix)Range (statistics)Electronic mailing listOcean currentBefehlsprozessorInformationFunctional (mathematics)Figurate numberVideo gameIndependence (probability theory)Scaling (geometry)Limit (category theory)Extension (kinesiology)Point (geometry)Sound effectDesign by contractPressureData storage deviceSoftware bugGroup actionObservational studyInheritance (object-oriented programming)Cycle (graph theory)INTEGRALNeuroinformatikDigital photographyMatching (graph theory)Image resolutionNatural numberComputer animation
Task (computing)Charge carrierDivergenceMultiplication signEmailCartesian coordinate systemFunctional (mathematics)Point (geometry)MultilaterationSlide ruleTouchscreenElectronic mailing listElement (mathematics)Limit (category theory)Serial portNumberLatent heatCycle (graph theory)PlanningDivergencePixelDifferent (Kate Ryan album)AlgorithmDevice driverMoment (mathematics)PiLattice (order)Entire functionPhysical systemProjective planeState of matterMedical imagingChemical equationImplementationPatch (Unix)1 (number)WordOcean currentCoprocessorTerm (mathematics)Set (mathematics)Power (physics)Software developerCuboidComputing platformIn-System-ProgrammierungView (database)Game theorySystem on a chipStack (abstract data type)Address spacePresentation of a groupGroup actionVideo gameCalculationTurbo-CodePattern languageTouch typingRow (database)Reverse engineeringCASE <Informatik>Musical ensembleSocial classRight angleNatural numberShared memoryProcess (computing)ArmDirection (geometry)RoboticsWhiteboardComputer animation
Online chatMobile WebTerm (mathematics)Matrix (mathematics)CASE <Informatik>WindowMultiplication signMathematicsRight angleMultilaterationTime zoneSlide ruleGraphical user interfaceWeb browserComputer animation
Term (mathematics)Stack (abstract data type)Slide ruleComputer fileFile formatMenu (computing)VideoconferencingPhysical systemComputer wormNormed vector spaceLatent heatStreaming mediaAsynchronous Transfer ModeGeneric programmingRandom numberFunction (mathematics)VacuumDrum memoryHardware-in-the-loop simulationComputer configurationCivil engineeringCone penetration testLibrary (computing)Computer musicResonanceComputer iconMathematicsExecution unitSoftware frameworkBitLevel (video gaming)Cartesian coordinate systemComputer programmingBoilerplate (text)Motion capturePoint (geometry)Film editingBilderkennungStreaming mediaMaxima and minimaCore dumpRight angleVirtual memoryLibrary (computing)Game controllerFile formatMultiplication signMedical imagingBlack boxQuicksortData compressionFunctional (mathematics)PrototypeCodierung <Programmierung>Keyboard shortcutPhysical systemHeat transferDirection (geometry)Bookmark (World Wide Web)Computer architectureBuffer solutionVideoconferencingGraphics processing unitArmCountingBuildingFrame problemIn-System-ProgrammierungStack (abstract data type)Artificial neural networkLoop (music)CodeCarry (arithmetic)Different (Kate Ryan album)Mobile appAreaHybrid computerMereologySpacetimePhysical lawSoftwareData storage deviceAsynchronous Transfer ModeResultantCAN busStaff (military)Open setComputer animation
Mobile WebRegular graphSurfaceVideoconferencingMotion captureWeb portalTerm (mathematics)MultimediaAlgorithmImplementationMedical imagingCuboidLine (geometry)PlanningCASE <Informatik>VideoconferencingStreaming mediaKey (cryptography)Motion captureDecision theoryFocus (optics)QuicksortComputer-assisted translationWeb browserCartesian coordinate systemInformation securityWritingMultiplication signProcess (computing)BlogInformationView (database)Goodness of fitWebsiteSurfaceStatisticsTerm (mathematics)Point (geometry)Graphical user interfaceRing (mathematics)FreewareMoment (mathematics)Complex (psychology)CodeImplementationLink (knot theory)WindowLatent heatImage processingGeneric programmingAlgorithmComputer architectureLipschitz-StetigkeitTouchscreenWeb portalComputing platformStack (abstract data type)Service (economics)CoprocessorMixed realitySpacetimePortable communications deviceFile formatRaw image formatPhysical systemRegular graphComputer animation
ImplementationAlgorithmDevice driverBinary fileSinguläres IntegralInterface (computing)ImplementationMachine visionSelf-organizationAlgorithmTerm (mathematics)Strategy gameTouchscreenLogic programmingFile formatMedical imagingInterface (computing)TwitterLatent heatFocus (optics)2 (number)SpacetimeStatisticsResultantContrast (vision)Slide ruleBuffer solutionBroadcasting (networking)Multiplication signCodeFluid staticsDevice driverProcess (computing)CoprocessorMultilaterationPoint (geometry)DemosceneDifferent (Kate Ryan album)Group actionBitConfiguration spaceGame controllerKernel (computing)MeasurementComputing platformRow (database)Software developerVector potentialPrice indexComputer hardwareLogicActive contour modelOperator (mathematics)CausalityCasting (performing arts)Level (video gaming)Data structureCombinational logicFerry CorstenForcing (mathematics)Search engine (computing)NeuroinformatikAddress spaceAreaState of matterComputer animation
Interactive televisionPlug-in (computing)WebsiteCartesian coordinate systemVideoconferencingComputer animationMeeting/Interview
Transcript: English(auto-generated)
Hello everyone and welcome to this presentation, to this meeting about cameras and mobile devices powered by free and open source software. This session will be held as a discussion with a selected panel of guests, which we
will introduce later, which will follow this presentation, which aims to set a bit of a background and provide an introduction to the topics that we're going to discuss later. A bit of introduction before, my name is Jacopo, I'm a Linux camera engineer, I do spend most
of my time working on libcamera and v4l2 vdf Linux 2 driver development. I would like to use this presentation to provide a bit of a background on the topics that we're going to discuss later, starting from a brief presentation of the image capture
pipeline and what the 3a loop looks like. I would like to provide a classification of camera systems in a simplified classification of camera systems and later introduce the complex camera interface and what is the case for a camera stack and why is this useful, potentially useful for the whole ecosystem.
So let's start from the very beginning. I'm going rather quickly on this part, there are a lot of resources around about what an image capture pipeline looks like, so I'm going very fast. Let's start with the very beginning with light beam hitting an image sensor and with
the image information gets in converting into electrical signals. From there we get the most primitive representation of an image as provided by the sensor, which is what is usually called the row by a pattern. A row by a pattern which transports a single color information for each pixel is not usable
as it is, it's not usable to be presented by the user to the application in its form. So in its series of conversion steps, starting from D by ring to the color space and color format conversion, which results in the final image as presentable to application and users.
Between the row by a pattern and the final image, there is a lot of transformation image manipulation and amazement techniques, which gets applied both in the digital and the analog domain of the image. Just to name a few with our names that you might have heard of, there is defective
pixel correction, less shading, black level calibration, and other advanced techniques which might be applied in each step of the capture pipeline. This is not enough to provide an image usable or presentable to user because another
set of transformation steps are required, in example, to obtain the final desired size for the image by cropping or subsampling the image sensor. By performing the usual rotation, flipping, mirroring, converting between color spaces or formats, packaging like JPEG or other formats, with a lot of other more advanced
techniques that could be built on top to obtain advanced feature like zero shutter lag or HDR. As I've said, I don't want to go into detail, but just to present the fact that all that we have presented so far is just enough to capture a single image.
While it turns out that the image capture itself is more complex systems which closely resemble closed-loop systems, where statistics, as collected by processing the robot frame, are transformed by a dedicated chip or by a dedicated computational capable device into
two parameters that are used to apply transformation to obtain the final image, but that do also influence the way the next frames are captured. The components which is usually responsible to perform this kind of operation, both applying
the parameters, both computing them by parsing statistics, it's usually called an image signal processor. Its functionalities could be exploited to implement what is usually called the 3A loop, which is more complex than that because it's not only about auto-exposure, auto-white
balancing and auto-focus, but there are other tuning parameters which are dynamically calculated based on the characteristics on the captured image. With that in mind, I would like to try to classify camera systems in a very simplified
taxonomy, starting with what I would call a legacy design, even if that's not totally fair, because that's a perfectly legit design also today, but it comes from a time where
it's more capable than today. So the computational power was not comparable to modern devices and most of the computationally intensive image manipulation algorithm used to run on the sensor itself, which usually embedded a small tiny ISP, which gets controlled by a set of I2C commands usually.
In this configuration, all the lifting is performed by the sensor itself, which presents to the SoC an image in a format which is already consumable or already presentable to the final applications, which means that we are heavily relying on the characteristics
of the image sensor and its feature sets. It also means that if you move in one design to use a different image sensor, also the user visible characteristics might change depending on that.
Your new sensor might not be capable of providing image in the desired format or resolution. So the application feature sets changes depending on the other design. There is another point which we are also going to explore later in the panel, which is about the fact that also today, still today, both in mainland Linux and both in
most of the BSPs, sensor driver are programmed by using binary blobs of register values pair as provided by the image vendors. And the way it usually works is that those binary sequences are generated by the vendors
provided to ODM or OEM to integrate in the BSP drivers and they then later surface into mainland drives as well. So we still either rely on the vendors providing the correct programming sequences or reverse engineering them, which is not an easy or funny exercise.
This of course creates a sort of vendor lock-ins, which does not only mean that you are confined to use a single producer if you have made a design with them, but also means that you have a dependency on the supply and support chain, which is not easy accessible
if you are not a producer, which makes millions of devices a hero or something. Of course, if you are a device producer, it's also worth noting that smart sensor are usually more expensive than a sensor, which only provides raw images.
For all these reasons and for the, with the incredible increasing computability capabilities of embedded devices, in the last years, it's more and more common to find designs which embed an ISP inside the SOC itself.
In this way, the sensor is only requested to provide images in raw formats and all the evil lifting is performed on the SOC side itself. This of course have several perks. The first one, if we want to consider the software side first, is that the SOC usually runs Linux, which means that the code that controls the image processing is closer or
potentially accessible. And that means that there is more space compared to using vendor firmware binaries to get really to know what happened there. Of course, there is a matter of features and performances because if the SOC is integrated
in the SOC, if the ISP is integrated in the SOC itself, it could exploit the ISP interconnects to the other parts of the systems and could exploit run higher clock rates and in general could provide a more extensive feature set compared to a tiny ISP embedded
on an image sensor. Also, since we are in control of the code, which runs on the ISP and performs the image manipulation, there are higher chances of code reuse and standardization.
From reusable 3A algorithm, which are not anymore delegated to the sensor firmware, to also we live in a time where open-hearted designs could be exploited to produce dedicated chips or dedicated components, which might compete with proprietary ones.
Of course, with the evolution of technology and the evolution of the hardware feature that an SOC could provide, the software side, the kernel side in this specific case, added to evolve as well. VDF Linux has been designed in a time where most of the design revolved around having
a smart sensor and VDF Linux was designed through the single dev node abstraction when there is a single point of control towards user space, which is the usual dev video something we are all used to work with. With introduction of with more and more features appearing on the SOC side, of course,
the current interface as it was did not scale at the time. So, I'm here presenting a totally fictional design where on the SOC side, you have the usual CSI2 receiver, as well as other components which might perform rescaling conversion and
formatting of pixels. Each one of them has a representation in user space through the v4l2 sub-device extension to the v4l2 API and through the topology API, which are known as the media controller
API. There is a question, this is a design which has been, which come from like 10 years ago or something, and there is a question which is totally legit today, which is why all of this complexity, why do we now have an explosion of v4l2 sub-devices and user
space? Why cannot we control everything through a single point of control and make application possibly easier or make life easier for them? Let's make a counter example for that, assuming we want to capture an image in a known size and format from the capture pipeline.
With a single dev node abstraction, we have a single point of control where to apply this request. Since the ARDU was simple on the SOC side, it wasn't enough at the time generally to apply the same format to both the receiver side and the sensor driver side so that they
are compatible and you're mostly done with that and you could expect images to arrive in the format that you have requested. Let's assume the same request is performed on much complicated systems like this one. So, you're asking for an image in a known resolution and no format at the end of the
capture pipeline, but there are several different combinations that could allow you to obtain this very result. Let's assume in the first case you are plumbing an application that does still capture, so you want the highest possible image quality. One possible configuration would be to require your sensor to operate at maximum resolution
and provide images in unmodified bio-format and perform resizing and conversion along your ISP pipeline in order to obtain the resolution that you have requested. In this way, you will maximize image quality, but since the sensor operates at a high resolution
and the highest possible resolution, its frame rate might be limited by the characteristic of the sensor itself. So, let's assume instead you want to plan a use case for video recording and you want to have a sustained FPS for the whole duration of the video. One possible configuration in that case would be to request your sensor to provide
images in an already downscaled format which matches your final image request and only perform conversion along the pipeline in order to guarantee that the sensor could operate at a higher frame rate. And this is another perfectly legit configuration to obtain the final image as you have requested those.
Let's also assume you are in a more complicated system with two cameras and you only have a single scaler or a pixel converter. In that case, you might want to decide to dedicate the converter or the full ISP pipeline
to one camera and if your sensor is capable of doing so, requiring it to provide images in a format which is already presentable to application. This is another perfectly legit use case and you should be made capable of deciding
how to configure your system depending on what are the final requests from your applications. So the argument for all this complexity in user space ends up in having user space configuring its single point of the capture pipeline precisely, which is a burden and
makes systems more complicated to deal with. The counter argument is why cannot we encode everything in the kernel infrastructure itself? Our coding is usually generally a bad choice, even more so in an operating system.
The duty of an operating system is to enable application to implement use cases, not encode them. And the more we are coding in kernel space, the more we are limited to make the contribution and the driving useful for a general audience.
This means that to have usable systems and to realize a known use case, there is a part of system or platform configuration specific that has to be applied to the whole system before it can produce images in the format you desire. This means that for Linux in the last 10 years, evolution of the application got a
In the regular desktop systems, most of the application like cheese, like XAVU TV or several other, they assume to work with UVC cameras. UVC cameras are a special kind of smart cameras in terms that they could be operated
through the single dev node infrastructure and do not need any device on any platform specific configuration. It's also usually Gstreamer or other frameworks like FFmpeg are usually the preferred way
to access cameras, not only because they provide an easier API compared to the to the Rho VDF Linux 2 API, but also because they are usually easy scriptable and the system configuration on embedded devices is usually performed by a set of scripts, which is, of course, a solution that cannot scale behind, beyond the single vendor use case
or the single implementation. And so the case for a camera stack, it's all in this part. And three years ago, the planets align and there was the possibility to start working on a camera framework
for Linux with the specific target of abstracting away all the platform specific configuration from application, which should be made capable of relying on a single and unified API, which is totally platform independent.
The drawing I'm showing here comes from the libcamera materials. There are a lot of presentation about the libcamera architecture online. I'm not going into detail into that. I just want to stress out that one of the targets of any camera stack, and libcamera doesn't diverge from that,
is to isolate the platform specific code in a part of the library, which should not influence how the application, the API towards applications. We have decided to call that part of libcamera a pipeline handler, but it's really hardly a configuration to accommodate a specific use case.
So what are the perks and the rationale for a camera stack for Linux? The first one is that all the device specific components should be isolated in a known place, so that application do not care about that and could be developed generically enough to run on different systems without modifications.
There is a pain point at the moment that in order to enable that to happen, hardware support has to be implemented in the camera framework itself. The vision for that is that a pipeline handler, so the platform specific part of the system, it's really hardware support.
It's something that should be part of the Bando BSP if they want to provide a usable camera. Of course, it's a long way to get that because vendors like camera vendors or any kind of vendors, the less they have to provide in terms of software support
and software development, the happier they are. And it's not easy, but it's a long process to buy them and to convince them into adding support for their platform in a way that it's abstracted away from the single use cases they have implemented. And of course, to get them on board, there is a requirement to have some criticalness
around a single point of contact or a single code repository where they can contribute to and provide usable cameras for their customers. The other argument for a camera stack is to provide the generic and unified API
towards application, which includes a set of known controls. Right now, VDFLIN's controls are standardized in their definition, but the way they are applied to the device, it's very much device specific. A value that gives you an exposure on a camera will give you a totally different exposure on another one,
and that's not acceptable for applications which have to deal with that by themselves. So standardizing on a set of controls and providing a generic API does not only allow application to be written in a generic way, but does also allow to implement and provide adaption layer
towards other known frameworks like Gstreamer or Pipewire, or in the case of Android, the camera half version three. Also, or an emulation layer for retain compatibility with legacy v4.2 applications. There is also the fact that if centralizing the implementation of 3A algorithm in a single place
might give potential for a lot of code reuse, or even to the definition of a framework that allows to implement platform independent algorithm that performs the most usual 3A functionalities.
There is also the fact that we have what I would call the state of the art implementation of an open library for 3A algorithm, which is provided by Raspberry Pi. And that's inspiration and a point of reference for everyone that would like to look into that and provide a competing implementation
or improve the existing one. So all of that was to make the case for a camera stack for Linux system. And the question I would like to explore in the continuation of this meeting is how much of this does apply to mobile devices, especially free and open source mobile devices,
and how much of that could be reused in their development. I hope you have enjoyed this small presentation. I would like to thank you for your attention and I hope as well that you will enjoy the following discussion, which will now move to the Jitsuko,
where you are very warmly invited to contribute to by asking questions in the public matrix channel. Thank you all.
Thank you, Jacobo, for this interesting talk and introduction into the world of cameras.
As we have quite a few speakers today on the panel, I will keep it short and hand it over to you, Jacobo. I'm very much looking forward to the discussion.
Seems there is, we've lost, ah, he's back, okay. Sorry, I lost my connection just at the time when you say, I will end it over to you. The right timing indeed.
I hope to be back. Yes. Okay. So thank you for the introduction and thank you everybody for being here. Thank you for the guests in the panel, which are here and which I very much move on to introduce immediately. So there is some question people here.
I hope that they can bring a different perspective about cameras and about Linux mobile devices with their expertise. So I'm start introducing, I will share my desktop if I could, where I have some slides, which I would like to share during the discussion.
I just need to find the right slide deck, and that's the one. So I'm not sure if you can see that or if it's any useful to you, but I would like to start introducing in alphabetical order. We got David and Naush from Raspberry Pi.
They are camera engineers working on camera enable on the Raspberry Pi. So David, hi there. I would like to briefly introduce yourself. I know Naush will join us a bit later, so I'll let you introduce yourself first. Yeah, hello everyone. Just a quick introduction then. So yeah, so I'm working on cameras at Raspberry Pi.
I've been doing cameras off and on for a while now. And indeed Naush and I, we were two of the engineers who did a lot of the work on the, if you can cast your minds back to the last round of the Symbian phones, phones like the N8 and the 808 PureView,
which had pretty good camera systems actually for the time anyway. So Naush and I did the, sort of led the work on that really back then. And now we find ourselves at Raspberry Pi doing similar things, I suppose. I mean, life at Raspberry Pi is rather different. We're not so much interested in creating mobile devices. It's more about creating APIs and toolkits.
So it's a bit different, but there's still a lot in common. Okay, thank you. Next we have Dorut and Martin. They are software engineer for Purism working on the Librem 5 phone. So if you would like to briefly introduce yourself as well, I'll let you choose an order.
The first one that takes the camera. Hello. Hi, Dorut. It's Martin here. My name is Dorut. Together with Martin, we work on the Librem 5 project to introduce the support for the two cameras
that are present on the phone to the mainline kernel eventually. We are not there yet because there's a lot of work. And apart from the kernel, we also work on basically the entire stack
from libcamera that was the topic of the previous talk. As well as applications, but I will let Benjamin talk about that. Yeah, we work on the Librem 5, which is a phone made by Purism.
And if Martin wants to introduce himself specifically, here is your turn. Hello, everybody. Yes, I'm working on the kernel as well for the phone. And the Librem 5, I think,
includes some pretty nice cameras actually that we just need to make proper use of. And during the last year, thanks to Laurent, we made sure that we can now stream frames on one of the cameras using a mainline kernel
and using the standard tools with the volumes control, whatever. And I just want to hope to get some feeling about the camera and it's basically a device where we have an OpenGL ES2 implementation,
but nothing else apart with respect to image processing. So that will be interesting. Thank you for that. Thank you. Since you introduced him, I would like to introduce Benjamin,
which is working on camera support for the pine phone. Benjamin, if you would like to quickly introduce yourself as well. Sorry, I'll cut out there for a sec. I'm Ben. I work for Sublime HQ on Sublime Text,
which is completely unrelated to the camera stuff I do. I took over fairly recently over megapixels, which is the most common camera application for the pine phone and hopefully soon the pine phone pro. I also dabble in driver development,
working on the same camera stuff. Great. Thank you. Then we have invited to join us Hans and Kate from Red Hat. Hans is a long term camera contributor and is now looking into enabling camera use cases for Intel tablet platform is my understanding
and Kate is helping with implementing an autofocus algorithm for the same platforms. If you would like to introduce yourself quickly as well. Hi everyone. I'm Hans de Goude. Everything has been set. I'm a kernel developer. I'm working on the iPU3 hardware enablement
together with Dan, who will get to, I think. Oh yeah, he's next. So yeah, nothing else. Hi everyone. I'm Kate and I'm a camera developer. I implement the iPU3 algorithm
and based on Hans kernel works and he quickly developed some drivers such as VCM and sensor strikers. So I based on his work and developed the iPU3 algorithm
to make all the device up and running. Thank you. Thank you. And as Hans, you introduced him as well. We have Daniel on the call from the Linux surface community. Daniel has been doing an amazing job making devices with no Linux compatible A-C-P-I tables working with Linux.
That was an amazing job. So then if you would like to introduce yourself as well. Thank you. Yeah. Hi everybody. I'm Dan. My work background is really data analysis rather than programming. But in March of 2020, I found that I needed my laptop cameras to work. So I thought how hard can that be and kind of fell down a bit of a rabbit hole from there.
I've mainly been working in the kernel adding support for the cameras on the Microsoft surface line and sorry quirks for their weird firmware at the same time. Great. Thank you. And finally, we have Lauren and Kieran from the ideas on board.
Lauren is the lead architect and main developer of the camera and Kieran is the core developer of the camera as well. If you could quickly introduce yourself. Sure. Thank you a couple. So as you mentioned, yes, I'm here in my quality of lead architect of the camera here today.
I'm also the founder of ideas on board. So my background has been extensive development and mostly in the Linux kernel side for the past more than 15 years. More than I can count actually. And we started the camera project a bit more than three years ago, moving from kind of side to the user space and the framework side, which was a very interesting journey where we managed to learn quite a lot,
including the fact that if you design kernel APIs and we're thinking about video for Linux in this case, without having actual use cases in user space and real applications to use that you end up making lots of mistakes.
So we we've been suffering from work I have done in video for Linux myself. So I constantly have to apologize to myself when looking at the camera. But I'm very happy to be here today and hope that we'll be able to cover quite a few interesting discussion points,
especially given that we have different different communities that haven't necessarily interacted that much in the past. So let's see if we can collaborate nicely together. Thank you. Hello, I'm Karen. Also been working with Laurent for six or seven years now.
So we were working on cameras and graphics before that. And that's kind of brought me in. One of the main developers on the camera as well, and I'm really happy to see that it hopefully will provide more user application support for more platforms.
So I'm really happy to see so many different coming in to talk today. OK, so thank you, everyone. The format that we are going to try to respect today. I prepared five discussion points, which I would like to introduce to just sparkle the discussion. We have 50 minutes time, which is pretty strict.
And we will do our best also because the bot which closes the video conferences is merciless, as someone has said in the chat. And I would like to keep a Q&A session for the end around 10 minutes. But if there are interesting questions which comes up during the discussion, we will be happy to take it.
The only thing is that I might not be able to see them because I'm in the call. So if you notice something that which is worth discussing, please notify me. I'm not sure if my slides are actually useful in this way. So I'll try to describe the slides, which is a bit more complicated. But so I would like to start from the point from the question which has been left unanswered in my previous presentation.
How much of the context which I present, assuming my understanding is correct, applies to Linux mobile devices and how and why they might be considered different compared to other platforms?
I don't know if you can see that for real. I found this one on the Internet forum a few months ago, and that was that was said to be the most commonly found picture in the smartphone of Linux power users. And it's the basically the back of a router or something with a Wi-Fi password.
What is the hint here? The hint is that a Linux power user might not be that interested into cameras or at least in advanced functionalities of cameras. And certainly cameras are not a selling point for Linux devices like they are for commercial ones.
One other point which I would like to bring to your attention is the fact that if you're a producer, which makes a new phone every one or two years, moving your work to the next generation of product is is a considerable effort if you have to do it every one or two years. But if you have two phones or one phone or you are in a community
which will never see new phones, that's Nokia 900 that will never see a new version. Unfortunately, maybe you are not that concerned about the effort about porting from one version to another. So the first question which I would like to ask, and that's specifically for Purism and Ben from PinePhone.
It's what is the expected set of features that cameras should provide in a Linux powered mobile devices? I would like to ask Dorota to start first. Do you have any hint, anything to share about that regarding the Pine, the Librem phone?
Oh, yes, of course. So I'm taking offense with the statement that camera support is not a selling point for Linux devices. It has been one of the most requested features and it still remains one of the most requested features for our phone.
You actually have the cameras working well. And I think in general, saying something like selling point of Linux devices, like Linux in particular, it's kind of a red herring. People want a phone. They don't really care whether it runs or not. And therefore, they will apply the same considerations regardless of what the
phone is running underneath, as long as it meets their functional expectations. And one of those expectations is having a working camera. So I think that makes it just a regular thing, not really different from any other phones.
When we try to contrast phones with stationary computers, I think there are some differences. The focus is perhaps a little bit different in terms of you cannot go outside and take pretty pictures with your laptop.
I mean, you can try that, but you're not going to have a very good experience. So just taking pictures and just taking snapshots is going to be a case that is not present on the desktops. That being said, I expect that video calling is going to take the cake when it comes to actual usage.
So I would say that from the perspective of the users, while those are different cases, the underlying stack doesn't really need to differ that much between a phone and a traditional desktop use case.
I see. So thank you for your answer. Benjamin, could you relate with that about the desired feature set for a phone like the PinePhone? And that's an additional question, if you want to reply as well.
If you have considered, I know you did, using a camera stack, have you found that an entry barrier into doing so, which is not justified for the use case that you are trying to implement? So to answer the first thing, I'm in complete agreement.
I think that the camera is a very important part of the phone, which is why I got into doing this in the first place. And I think the end goal should be to get to the same stage that Android is at this point, where you can hop in a video call with the camera and it just works, and you can take photos and video and it just works.
And these are things that are part of at least some people's everyday lives. I know they're part of mine. So I think those are all important. In terms of camera stack, that's a bit of a complicated question.
After long deliberation, for the time being, LibCamera doesn't seem to be in a state where I can use it for megapixels. Part of that is probably simply the speed at which things can get developed, because I can work on the camera driver and megapixels at the same time, pushing features in both cases.
If LibCamera was in between there, I'd have to also do patches to LibCamera, which then go through a review cycle and everything. At least for the PinePhone, and especially the PinePhone Pro, these are still really early days where any kind of barrier to further development can be a huge hindrance.
Thank you. That was very interesting to know. Dorota or Martin, do you have anything to add to that, to the last two questions? Do you find the entry barrier justified for the use cases that you're trying to achieve?
Yeah, I think I agree with what was just said. That LibCamera as a layer in between is, I would not say it's not necessary,
but it makes dealing with the entire stack slower. When you could hard code some internals, basically we could hard code the hardware configuration and get on with it, if we didn't want to use the camera, but on the other hand, I understand why camera exists and why it is needed in the long term.
So it's kind of a difficult situation whether I should want to invest in the long term of cameras or whether the short term of just having our device support it is the right thing to do.
I also think that, at least in the current stage, LibCamera is very focused on applications that aren't camera applications because the megapixels app needs to have a very intricate set of features,
or at least that is going to be the eventual goal, that maybe even hardware specific features that just don't make sense to put into a more generalized library. And on the other hand, LibCamera makes all sorts of sense
if you're, say, a web browser or any other application that needs to take a picture, but it doesn't have to look amazing, it's not supposed to be a work of art, it's just a picture, or like video conferencing, you're just making a video. For those sorts of things, it's currently, for the PinePhone at least, it's getting into a better state.
Okay, I hope we can resume this point later in the discussion. If no other comments on this part, I would like to move to, at least Laurent, yes? Can I make a quick comment, Jacobo, on that?
I think it's a fairly interesting remark, especially when it comes down to the fact that we have an ecosystem of very different applications when it comes to camera usage and indeed on the video conferencing side we may not need to get the most quality out of the camera, even though, as time goes by, I think that people expect better and better quality there as well.
But when it comes to still image capture, for instance, you may want to apply different types of processing on a picture, that doesn't necessarily make sense in other use cases. But we really, at least that's a design goal of LibCamera, we really want to enable all those use cases,
regardless of whether you just want a high-level API with something that works for video conferencing and getting usable images out of that or if you want to go for something much lower level. So LibCamera does provide process images but also has the ability, if the platform allows that, of course, if the hardware allows that, to produce robot images where applications
could have full control on the processing of low-level parameters. So I'm entirely sure that there are some use cases that we could discuss where having an extra component in the middle between the application and the drivers may get a bit in the way and may make things more difficult.
But even for, I would say, power users or power user applications, applications that really want to have detailed control of the camera, that's something that we want to enable. I'm not saying that we're necessarily there yet, but it's certainly a design goal. Thank you. I hope we can resume the discussion in the next points as well
with maybe opinion from Raspberry Pi and other vendors. The next point I would like to discuss before diving deeper into that part, it's looking at the current designs for at least the devices which exist on the market today. And I'm thinking about the Libram 5 and the Pi and Phone
because those are the ones that I know the most. But I'm sure there are other communities which I'm not aware of, so feel free to speak up. I would like to get in touch as well. So I'm using the Pi and Phone Libram 5 as an example designs because they fall in the category which I a bit unfairly labeled as legacy designs
because they both feature an SoC which does not have an ISP on board. So I tried to list a few challenges which I imagine you had to face, like how do you perform the wiring? Is there a 3A loop? What does it generate statistics? And how are the algorithms implemented? How do you decide which formats to support?
And I would like to ask again to Ben and Orta if they could explain how things work today on the Pi and Phone and the Libram 5. Ben, if you want to go first this time. Cool. So the Pi and Phone has both cameras are smart sensors, so they have an integrated ISP.
They don't offer a lot of data beyond just here's the image we recorded, but they do allow for a fair amount of configuration of the ISP itself.
Megapixels generally just uses it in raw format and then handles the debayering and other post-processing on the CPU or the GPU for preview. The formats are all, well, both of them support a variety of formats.
I think it might also be good to cover the Pi and Phone Pro here because the Pi and Phone Pro is... I have a point for that. Okay, never mind. I got a separate one for that. It's kind of a different design. Dorota, do you have any other insight to share about how the camera stack looks like for the Libram 5? Oh, yeah. The Libram 5 camera stack has kind of a tiny little ISP on the camera module itself.
It can do some basic corrections like color correction and bignetting. I think that's the same on both sensors. As far as I can tell on the IMX8 SoC, there is a little ISP as well,
but it's limited to gathering statistics from the photos, from the capture frames. The sensor passes down raw data, both sensors do, and then the SoC has the task of processing them, and the challenge is how to process them fast.
So far, that has been a big limitation. We are processing them on the CPU currently. We do have access to the GPU, but the GPU is also not very well developed. We only have access to OpenGL ES2, which brings quite large limitations to what we can actually do.
There is no compute mode on that GPU, so we are going to have to get creative to get the data fast out of it. I was meant to make a point to discuss explicitly how the GPU wiring is performed,
and I was interested to know if different implementations were easily portable between different devices of the wiring on the GPU. I haven't prepared any slides for that. I don't know if there are other comments on how such a design could be made more portable
or could be replicated in other infrastructures, maybe. So, I've got a good answer to that. Because the, or at least Megapixels right now uses OpenGL ES2 for debayering, that code is, because it's OpenGL and OpenGL runs on everything, it is extremely portable.
There's already been work done on a separate Qualcomm chip. Someone's worked on, I forget who it was, but someone's worked on supporting a Qualcomm chip using Megapixels, and they had to do very limited work with the debayer shader.
This is just for the preview, not for actual image capture, but they had to do very limited work to get that working. And the main part of that was actually just working with a different image format
rather than doing different logic for the debayering. So, in terms of portability, the GPU debayering is about as portable as you could possibly get. Okay. I have a question on that, because you also have OpenGL-based debayering actually also using GL ES2
in the Q-Cam test application that we have at Leap Camera. So really a similar implementation, but that has been developed completely separately. I fully agree about the portability when it comes to the source code and when it comes to the fact that you get an output image that's pixel perfect.
But what we have found is that depending on the GPU that we're running out and possibly also depending on the OpenGL implementation, performances can vary quite a lot. So even though the source code itself is portable and will produce the correct output, you can have really bad performances and maybe lower-end GPUs
if you don't pay attention to the number of texture units that you have or to really low-level details. So my experience there is that, yes, there's portability at the source code level, but if you want it to run smoothly, it can become more complicated. So for a simple algorithm like debayering, that may not be too much of an issue,
but if you add more processing steps and you need to make sure that everything runs with correct performances on lots of different platforms, I think that can be quite a bit of a challenge. Okay. Thank you. I'm sorry if I have to cut the discussion, but I'm a bit concerned about time, so I will try to move a bit faster than that.
So describing the architecture for the two existing phones, both of them doesn't have a full feature ISP and relies on some functionalities provided by the ISP on the camera sensor. This is a question on the slide, mostly for the libcamera team, and we got something that is called a simple pattern,
which was actually started by the MegaPixel original developer in order to support the pipeline. It has been deviated to that in order to accommodate more platforms, and recently I have seen Ben and Dorit trying to interface with that.
And I'm under the impression there is a bit of a misconception about what the simple pipeline does and what it is supposed to do. So I would like to ask Laurent if you could clarify that a bit. And also, sorry, after that, I would also like to introduce the big discussion. I would really like Nosh to be here for giving his opinion, being the pipeline and the developer for the Raspberry Pi,
about what does it take and when it is necessary to implement the pipeline for your own device specifically. So Laurent, if you could clarify those two points. Okay, I'll try to be quick. So as I explained very clearly, the pipeline handler is really the backend implementation in libcamera
that has the device-specific code, and it's supposed to, well, to configure the camera pipeline, to pass buffers around, to interact with the kernel drivers. So it does not do image processing. That's the job of a separate component that we call the IPA,
for image processing algorithm. But it's really the plumbing. And the simple pipeline handler started because we had two competing implementations, two separate implementations, to add support in libcamera, to create a pipeline handler for the bind phone indeed, and also for an ST SOC. They came pretty much at the same time in the mailing list,
and the hardware is fairly simple there. You have a smart sensor that includes an ISP, and on the SOC side, at least what needed to be supported, it's just a CSR2 receiver. I think on the ST platform, there may have been a scaler as well in the pipeline. But it's very simple.
There's no ISP that is being used on the SOC side. And those two implementations that have two pipeline handlers for those two different platforms were developed separately and were really mostly the same. So that's when we thought that we could try to share the code there and have one implementation that would cover pretty much any platform that doesn't have an ISP.
It doesn't mean that if your platform doesn't have an ISP, it necessarily has to be supported by the simple pipeline handler. You could have a specific pipeline handler for the platform if it requires so. But we thought that it would be possible to create code that only relies on generic kernel APIs on the device topology that is exposed
by the kernel drivers to be able to do automatic configuration of the camera pipeline. So that's really how it started. The goal was to get something that could work out of the box. And to some extent, adding a new platform, adding support for a new platform,
if the hardware is simple and doesn't have an SOC, can be as simple as adding a single line in the source file where you just add a string that identifies the platform so that the simple pipeline handler can handle the kernel drivers automatically. It's not always the case.
We had some discussion on the libcamera mailing list a while ago, and that's something I need to get back to. But the current implementation doesn't support scaling on the sensor side, which means that it always configures the sensor for the largest resolution, the highest resolution, and then scales, if possible, on the SOC side,
which means that we're certainly missing features there. A couple of patches that were submitted on the list to fix that, but they also broke support for other platforms that are currently supported. And I actually didn't have the time to look at it in details at that point, but that's something I would like to resume working on to make sure that we also support the features
that the sensor itself provides and so have a wider range of platforms that could be supported there. But even though, for instance, the simple pipeline handler today supports or could support the Librem 5, one of the features that Dorota mentioned
in the SOC there is that it doesn't have an ISP as such. It cannot do image processing and debayering, but it has a statistics engine. And that's something that can be useful even if you have to implement ISP support using a GPU, for instance. The hardware statistics engine can be useful to compute information that can be consumed by the GPU after that and the algorithms that are implemented there
without having to spend CPU or GPU time to compute those. And this is something for which I expect that we could possibly have a dedicated pipeline handler because there are specific features to that platform that need to be taken into account. As I mentioned, we don't support today
in the simple pipeline handler any ISP, so it means that to make it really usable, you need to have a sensor that integrates an ISP. But I think that, and we'll get back to discussions about the GPU implementation of an ISP, this is something that I foresee as being a generic component that could be used by the simple pipeline handler as is,
or even pipeline handlers for platforms that have some hardware processing capabilities. It's not uncommon, especially for lower-end platforms, to have some sort of hardware ISP that can perform image processing, but not everything, and have some of the algorithms, some of the processing running on the GPU,
or possibly even on a separate DSP. So having a hybrid hardware pipeline with different hardware components that need to work together is something that we'll need to support eventually, and that's also something that I would like to work on on the camera, so having GPU support as a generic component that could be used by different backends
and different platforms. Okay, thank you. I would be interested in knowing from Dorota, Martin, and Ben, will you try to interact with the simple pipeline handler? And looking at the development on the mailing list, I was under the impression that there was expectation for it to work, mostly out of the box, and the expectation that instead
you should write your own pipeline handler was kind of a showstopper, basically. This is for your impression on this one, but also about the fact that support for the pipeline handler shouldn't be meant to be something in charge of who makes the device.
It should be something that should come from the vendors. And on this, I would like to ask also in-camera people's opinions on how is this going to work, or where this is going. So Benjamin, Dorota, I know you tried to look into the simple pipeline.
Have you considered writing your own one or going directly to change the other interface and use that in the application? Is it still more appealing for you? So in terms of having it working with the simple pipeline, there's some kernel patches that probably have landed in Megi's PinePhone kernel now that do make the PinePhone work with the simple pipeline.
So any application that uses libcamera and once those patches land, it should just work. For megapixels, the simple pipeline has too many limitations,
and I haven't looked at doing a dedicated pipeline because megapixels already does all that stuff, and that would just be a lot of extra work for very little current benefit that could be used on the PinePhone Pro or other projects instead.
Sure. You mentioned limitations there, the simple pipeline handler, that's something. We don't have to discuss it now, but maybe the mailing is on IC later. I would be interested if maybe you had a list of a few bullet points of what you think of the unique limitations that you run into. So that's something I expect you to fix, but if we are aware...
that that's something that we can possibly address in the future. Sure. I would like to ask you to note it down and move this point for later discussion probably in the Q and A. Moving forward, I briefly mentioned the Python Pro, which is in my view is kind of a game changer because it's the first design that we have in this kind of devices,
which fully qualifies as a modern design. So it has an ISP which is capable of many different functions. So Ben, I would like to ask, what are the camera plans for the Python Pro? And it happens that the Rockchip RK399, which is the SoC that this phone use,
it's one of the platform that libcameras support natively. And after Ben's opinion on what are the current development plan, I would like to ask Kieran, first thing, what does it mean to have the processor supported natively and what are the features set, which is could currently be expected for libcameras running on this kind of SoCs?
So in terms of the status of the Python Pro, the drivers don't work. For the rear facing camera, we do have a driver in mainline. It doesn't work and we don't know why.
I've spent multiple weeks trying to get this thing to turn on and work and it powers, but we don't get any data out of it. That's, sorry Ben, that's the sensor driver, right? Yes, that is the sensor driver. The ISP does have drivers and presumably they work.
Yeah, it's mainline supported so that should be working. The front facing camera doesn't have a driver at all. There is a driver that looks like it might work for the OV8856, but we don't know because that also just doesn't want to power on.
Okay. Kieran, instead, as I mentioned, the Rockchip 399, 3399, it's natively supported by libcameras. What does it mean in terms of development effort, which can be saved for application developers?
And what are the features set which is currently supported on those kinds of platforms? So for the 3399, I'd say it hasn't been one of our primary targets, but it was one of our early targets that we started with when we were bringing up. So it did get a dedicated pipeline handler from the beginning.
We haven't invested as much time on the algorithms to handle the auto white balance and the auto exposure, but there are patches on the list that make that much more usable now, which are somewhat directly derived from the Raspberry Pi's implementation. They're waiting a review cycle before they get integrated, I think.
So I mean, I would say that for a Rockchip ISP platform that by using libcameras, you would get all of the image quality in a much more usable state directly out of the box. But of course, there is still lots more that we can do. Also wanted to touch on the fact that I heard
there's the 8865 sensor on the Pinephone Pro. I think that's one of the ones that I know Dan has been playing with, so. So on the Pinephone Pro, it has an 8858, but the driver in mainline that might work is the 8856.
Okay, too many different serial numbers. So Ben, the current plan for the camera support, apart from the kernels driver implementation, what about the application side? Have you been considering moving megapixels to porting that to new platforms?
Have you been considering using a different design which maybe relies on libcameras support for those kinds of platforms? Do we have any plans there that you would like to share? Sure, so the idea is to bring the same support for the Pinephone Pro as already exists for the Pinephone. That should be fairly simple.
A lot of the work that's gone into that Qualcomm device I mentioned earlier is the same work that would be needed for the Pinephone Pro. So most of the stuff is already working and we just really need, well, I just really need to get the drivers working and then I can start focusing on megapixels. On the Qualcomm device,
is there a system in place that handles auto exposure and auto white balance already to handle the getting the sensor to run with- I don't know that, sorry. Well, I would propose to move that to a different point as well because I would have the same question.
I know that for the current Pinephone, the autofocus is implemented in example by the sensor. If you're moving to a different sensor or even to a raw sensor, how does it work that in your application? Because you lose that functionality because we have moved to a different sensor. But I will be more than happy to keep discussing that but I'm afraid about time again.
So if nobody has any other comments on that part, I would like to introduce a different point in the discussion, which is about the application themselves. So I briefly mentioned in the presentation that in the last years, the desktop camera applications
and the mobile or embedded camera application kind of diverged because the first one is seem to work with simpler or UVC cameras, while the second ones are left a little bit behind. I invited David, Dan and Hans to give a presentation about cameras application
for different use cases, starting from a set of vendor specific application like the Raspberry Pi application. So David, I know you have also prepared some slides for that, I don't know if you are capable of showing those, but if you want to spare a few words on the Raspberry Pi camera stack
and what are the libcameras and Pi camera too for Raspberry Pi. Yeah, hi, can you hear me all right, everyone? Sure. Yeah, great. I should just say that my colleague Nosh is trying to join here but he's not having much luck at the moment. He's clicking on the join button or whatever, but it doesn't seem to be doing anything for him.
So I don't quite know if he'll manage to join us, but we'll see what happens there. Yeah, so I've got one or two slides just to share so that I can talk about things. I guess I click on this toggle screen share, right? Yeah. And that might do something or it might do something bad. I don't know.
Let's give it a try. Entire screen. Okay, let's go with entire screen. So yeah, so just a few little slides. So I'll just talk through them quickly. Talking about applications. Well, I guess just a few words, why is Raspberry Pi dealing with libcameras
given that you've gone all these years without it? I just listed a few of the reasons there and kind of the main reason for using libcameras is kind of that it just doesn't work. It just exists, I guess. Sorry, David, I have to interrupt you,
but I think your slides are not visible. The screen is blank for me and for Peter as well. I don't know if that's the same for the rest. You got anything come up there, no? Nope. Okay, well, this is great, isn't it?
Let me, it gives me, okay, it says an element is sharing my screen. I can stop sharing. But it doesn't seem to be sharing at all, does it? And let's just try that again. Maybe if I share a window.
Yes, I can see your text up. Okay, so you can see my matrix window, let me. So can you, has that changed? Not for me.
Okay, maybe, let's try, try one more time. Otherwise I'll just have to talk, I guess. Maybe it's because I'm.
So I'm sorry, if you feel like talking about that, I'm sure that slides could be posted later for download. If you feel like just talking about that, that will be better. Yeah, I don't seem to be able to get. Yeah, it just, it only offers me the Chrome window. It doesn't offer me the other one.
Anyway, hold on, I just need one more try here.
Yeah, why you want trying that? Do you have a journey? Better, yes. Okay, well, I'll just show it in the browser from Google Drive and that'll do.
Okay, so why is Raspberry Pi bothering with the camera? It's kind of a question, you know, because for years we've had this, I guess the architecture has been that we have this kind of GPU thing alongside the actual ARM cores. We have this GPU thing which handles all that, well, it's not only the graphics and the video encoders and decoders and all that stuff, but the ISP is there as well and the camera system.
And so we've just used the Broadcom camera stack with Broadcom as the company that makes the SOC. And we just kind of talked to it as a bit of a black box and we sort of have had little to do with it. But sort of going forward, that's not really where we want to be for all kinds of reasons.
And I've listed some of them there. Basically, we kind of have a need to take control of the camera stack for our own purposes, I guess is what it boils down to. Because that means that we can add new features, we can fix bugs, we can actually create an open ecosystem for people to use. And, you know, at the end of it all, I guess,
is the thing that we just don't really understand what the hell the Broadcom camera system is doing anymore, really. So we're kind of stuck if we carry on with that. So that's why we moved. So we've cut down the functionality in that GPU to an absolute bare minimum.
So we end up, we've transferred everything over to the ARM cores and libcamera just kind of happened to be there at the right time, really, to give us a framework to plug into. As regards what we give to our users now, of course, you know, users generally don't want to, you know,
they certainly don't want to program v4l2 controls, but even libcamera APIs are beyond what most of our users are capable of. Well, some of them anyway. And in fact, even, you know, C++ APIs or whatever is not what most people want.
So again, historically, you know, we've had certain applications that run under the Broadcom legacy stack. And so we've spent some time just reproducing those. In some respects, they're not terribly interesting. You know, they do a random collection of things, of video captures and image captures and streaming applications and stuff like that.
It's a bit of a random set, but so we have the need to reproduce that so that people can carry on pretty much as they were. I mean, a few lessons that we had from that was that, you know, we found the libcamera API is still quite low level for most people, I think. You know, it is quite hard to use. You spend a lot of time worrying about formats and devices
and buffer counts and memory mapping things and all that sort of stuff. So again, you know, people have to be quite brave, I think, to tackle that for themselves. So, you know, we found the need to put a bit of framework on top of that just to make that a little bit easier to use.
Just one example of that. When we write camera applications now, they kind of look like this, basically. There's just a little, you just, you know, there's a bit of stuff on top that hides all the boilerplate. So you just run a little loop and you get camera frames and that's kind of how it works.
As I say, even that isn't really what our users want because a lot of our users have been using basically a Python library to talk to the cameras on the Raspberry Pi. The Python library that we've had for many years now is called PyCamera. And it's actually a little bit unlucky for us
because we didn't write it. It got written by a third party. Only people kind of assume that Raspberry Pi is somehow responsible for it, but we never were. But so there's this thing out there that people like to use. And of course, now that we're sweeping away the Broadcom camera stack, their beloved Python library has bitten the dust.
So they're quite unhappy. So what we need to do is we need to write something again on top of the new libCameraBase stack that we have that gives them the same kind of functionality. And so we've imaginatively given it the name PyCamera 2.
So just a little example of what it's all about. This is actually what most of our users want. They don't want applications that do particular captures or applications that, or they don't want ready-made camera applications. So they're quite different from like the mobile phone users. They don't want ready-made camera applications where they push a button and it takes a picture.
They actually want to program stuff, but they want to program it quite straightforwardly. And so these are some of those examples that we need to enable for them. So look at- One question for you, David. The Python binding should sit on top of yourself or your application, or are you planning to exploit the forthcoming bindings,
Python bindings for libCamera and build on top of that? Yeah, so what we've done is we've taken the kind of prototype Python bindings that you have in libCamera. And again, we've put another layer on top of that, which is what we call PyCamera 2. And this is, so these are examples of what PyCamera 2 looks like. And this stuff all actually runs right now.
So the example on the, just to look at the one on the right there, that's how you do face tracking. Essentially, you just start a camera stream and you configure the camera and you start it. And then you just import OpenCV and you use that because the images transfer directly.
So it gives a really easy way for people to actually do with the camera whatever it is that they want. So very often people want to do some kind of image analysis. Often they want to feed them to, you know, their favorite neural network framework. Thank you, David, but I'm afraid that if that's your last point, I will have to interrupt you because we have 10 minutes left and I would like to give Arthur the possibility
to introduce this. If you have a last point to make. Yeah, so I guess the one last point to make is that certainly our users, they're perhaps a bit different, but we need to give them an API where they can basically do whatever they want. I mean, if they want to write an application that's a bit like a mobile capture application, we enable that for them. But at the same time, we need to enable for them
the ability to do all those other things. And really, it's this kind of quite high level camera programming that I think our users are after anyway. Okay, thanks a lot. I'll leave it there, yeah. Thank you. I hope that could be expanded later as well. If you could stop sharing, I'm sorry.
And thank you. I would like to, so we have here in the room then, which has been working Linux surface, which basically is a Linux, regular Linux application running on portable devices with complex cameras. And at the same time, we have Hans,
which is presenting what is the plan for the full feature camera application on NextUp system. So then if you could start by introducing how the camera stack looks like for your current devices. Sure. So for the Microsoft Surface line, apart from one anomaly which uses a USB cameras,
they all share the same front and rear sensor. And those are outputting raw bio data to an Intel IPU 3. And that's giving, well, that's converting it into a format that user space can handle and also giving 3A statistics. And then live cameras, IPU 3 pipeline
can capture the images and the auto white balance, auto focus and auto exposure algorithms there give us a useful image. But where it's kind of falling down at the moment is user experience. Apart from cheese, and I think Chromium has a build that can do it, but most browsers certainly don't.
Applications don't see the cameras as existing basically. So like Skype, Zoom, Teams, they will tell you that there isn't a camera. So at the moment, we're kind of hacking it by using V4 or L2 loopback to create a dummy node and then streaming to that with G streamer, which does work, but needs to be ran every time
and isn't very friendly. Yeah, in the introduction of this, we've called this the short-term solution, right? Which it just makes a few application works. If you could share the plans, I prepared, well, you suggest me a drawing from a blog post on the GNOME website
where an architecture for a full feature camera application is presented. We'd like to introduce a few of the main concept of what we're seeing here. Yes, so there's also a link on the slide, which I also put in the generic chat, which is about a blog post from Christian Schaller,
which has a lot more info. But inside Red Hat, a couple of people have been thinking about like, we are seeing more and more Intel devices, like the Surface line also using basically complex camera setups with an image signal processor somewhere in the mix. So we've been thinking about what we want
a final picture to look like, or at least a picture in a couple of years. And we've come to the conclusion that things like browsers, like Firefox, need to be adjusted because they are currently sort of assuming that the old UVC-like stack where they can just directly open a single video for the next two device.
And we definitely believe that LIP camera has a place in here, but we also believe that Pipewire has a place in here, and the camera already has a Pipewire plugin, or I don't know what the exact technical term is. I think Pipewire doesn't call it plugins, but that doesn't matter. So the idea is that we want the browsers
to use the XTG portal for video capture in Pipewire to request, for example, a video stream for video conferencing because we need to move to using portals and portals also means using Pipewire. Pipewire, for example, also used for screen sharing,
which is probably why it only worked for the Chrome window but that's a side story. So the idea is that browsers need to start supporting this anyway because we want to have the browser in a flat pack and run it sandbox eventually. For security reasons, we want especially a big complex application like a browser to run in a sandbox,
which means that we will need to go through Pipewire anyway because that is how currently it's planned that sandbox applications will access video. And as such, if we're going to do work to make a browser support that camera, then the best way does is to do that through Pipewire or some way to do the work once and we get the sandboxing sorted for free.
Or you could see it the other way around, we get the camera support for free when we add sandboxing support. Indeed. Thank you, you have been very clear and I'm sorry if I'm counting it a bit short. No, it's fine. I hope that the reference to the blog post might provide more useful references.
And I think that looking at how camera application will look like for other domains, for other system, it's indeed a useful inspiration and might lead to a more fruitful discussion. But I'm afraid that I will need to move to the next two points, which we only superficially could touch. The first one is about implementing a reusable 3A algorithm.
And I would like to ask Kate, which has been working on implementing an autofocus algorithm recently, how much she thinks that there might be potential to reuse 3A algorithm in generic way without platform specific dependencies.
Hi, I'm Kate. I'm working on IPU3 autofocus implementation. I also face several challenges to make AF autofocus up and running service code tool, which is an IPU3 based platform. First, we have to have some knowledge of IPU3 image processing units,
such as the AF filter and AF scene configuration laser, and also its behaviors. First, we have to, the processor will give the, also the processor will give us the AF statistics,
as well as the result of the image age detector, and also we have to find out on the correct data format to calculate the contrast. Sorry about that. Once we confirm the buffer format and we actually see the correct development trend
of the image contrast, we start to search the image with the maxmar variance, which is expected to be a focused image. Also, based on focused contribution, such as vision and controller, drivers and the image sensor driver,
the IPU3 AF algorithm could be proposed. Yes, so in terms of the, based on the work mentioned above, in terms of the code reviews, these AF algorithms highly depends on the hardware
implementation, which I write the code. When I write the code, sometimes I have to trace the kernel implementation to understand the behavior of IPU3. But according to other implementation, such as autofocus for Raspberry Pi, both of them, IPU3 and Raspberry Pi,
we based on the same similar strategy to find a focused image. So I think customized code for imaging processor is required since they have different configuration measure and the interface to set and fetch the statistics from the processor. However, we have similar strategy
to implement the algorithm. The vendor provides a different data format of the image static statics. So some customized code are required. So the code reviews may focus
on the imagery processing algorithm and strategy. Or maybe we can base on some strategy implementation and put some vendor specific interface to set and get information. It may be a good way to achieve the algorithm code reviews. So based on those basic algorithm
and strategy implementation, contributor may put the code reviews to focus on the logic and the program logic without taking care of any complicated algorithm implementation. Thank you. Thank you so much. I would like to explore this point very much
because I think the potential of having a collection of generalized algorithm for 3A, which of course need a platform specific background or platform specific implementation has a lot of value. And I would like you and David to have this discussion but I'm afraid this has to be moved later as well because we have another point
which I would like to spend more time on. This has actually been proposed by Martin and I very much can relate with that. And that's about the pain and just the pain of writing sensor drive. Martin, I know you had a question specifically on that. Would like to share it with the group.
Yes, thank you. You outlined that already a bit even in your talk. I think the problem is really almost structural that there are many sensor drivers being written and even put into the mainline kernel
that we really don't fully control in the end. And I see that now I want to improve the sensor and not really able to just because I'm relying on these enormous binary data
versus the value pairs. At least we should be aware of that and come up with any, I don't know, creative ideas how to- Yeah, I'm sorry. I would like to keep discussing that later. I prepared a few slides of possible solution or strategies or at least suggestion that might help
getting out of this messy situation where we are. But I'm afraid the live broadcasting is now ending in 30 seconds. So I would like to use this time to thank you for joining this discussion, providing your contribution and thanking especially the FOSDEM organization committee.
Let me stop sharing my screen. Here you go. So then the FOSDEM organization committee because they made all of this possible, which has been a little bit more complicated than regular presentation. So thank you everyone.
I hope that there will be space for the Q&A in the private room. Bye-bye. Okay. My timer has expired. I don't know if it's gonna be in the recording or not, or we are still live or not. So give me some indication of that.
We still have a minute. Oh, I just received that, but my timer here expired. So we have one minute more. Can we have one of the questions in the Q&A? The first one, go ahead. Top one. Hans, how does the libcamera interact with Pipewire? Well, as I said, there is a Pipewire plugin
which talks to your camera. So it's possible to use them together or it should be possible. But I'm not sure how it works from the application side, because I think the application side then also needs to support asking for the video device to Pipewire.