LWN: Comments on "Coping with complex cameras"

What is a camera?

laurent.pinchart — Tue, 08 Oct 2024 21:16:16 +0000

If you're dealing with systems that require controlling an ISP, libcamera would then be a good way forward. If you're dealing with UVC webcams, libcamera could still help but is probably overkill as it won't give you the features you mentioned were missing (such as transcoding between formats or decoding JPEG). If you're dealing with IP cameras, libcamera will definitely not help as we don't support those.

What is a camera?

Sesse — Tue, 08 Oct 2024 18:45:54 +0000

FWIW, my application is low-level enough to already talk to ALSA directly (both for PCM and for MIDI) and have its own userspace USB drivers for certain capture cards, so it sounds like libcamera directly would be the most likely avenue to try first. :-)

Tuning for image quality?

laurent.pinchart — Tue, 08 Oct 2024 18:35:50 +0000

libcamera has a tuning tool. It hasn't reached feature and quality parity with closed-source implementations yet, but we're actively working on improving it. The image quality doesn't depend only on tuning, but also on the implementation of the ISP control algorithms. This is also an area that we are actively working on.

What is a camera?

laurent.pinchart — Tue, 08 Oct 2024 18:34:09 +0000

> Do you have any good recommendations—is it libcamera that one would want to use? (I'd rather not go to the level of FFmpeg or Gstreamer if I can avoid it; FFmpeg is a great codec library but pretty weak on anything involving live, and Gstreamer is just a world of pain)

It depends on your use cases. For desktop applications, the future is Pipewire, which itself will interface with libcamera (or for the time being directly with V4L2 for USB webcams). For more specific applications, especially in embedded and IoT use cases, I recommend GStreamer with the libcamerasrc element. On Android one would of course use the libcamera Android adaptation layer that implements the camera HAL3 API. Direct usage of the libcamera API isn't something I expect to see very commonly for generic-purpose applications.

The complexity has merely moved

laurent.pinchart — Tue, 08 Oct 2024 18:31:47 +0000

> Not entirely true, as an ISP-based processing allow more complex processing pipelines with things like face recognition, more complex algorithms and extra steps to enhance image's quality.

atnot's point is that ISP were there already, just hidden by the webcam firmware, and that's largely true. We have USB webcams today that handle face detection internally and enhance the image quality in lots of way (we also have lots of cheap webcams with horrible image quality of course).

> For (3), we may need to add something to pass calibration data.

It's unfortunately more complicated than that (I stopped counting the number of times I've said this). "Calibration" or "tuning" data is generally not something that is passed directly to drivers, but needs to be processed by userspace based on dynamically changing conditions. For instance, the lens shading compensation data, when expressed as a table, needs to be resampled based on the camera sensor field of view. Lots of tuning data never makes it to the device but only influences userspace algorithms.

> Yet, the final node of the pipeline (the capture device) is the same, and can be completely mapped using the current V4L2 API: a video stream, usually compressed with a codec (mpeg, h-264, ...) with a known video resolution, frame rate and a fourcc identifying the output video format (bayer, yuv, mpeg, h-264, ...).

There is usually no video encoder in the camera pipeline when using an ISP in the main SoC. Encoding is performed by a separate codec.

> By the way, even cameras that have their own CPUs and use just V4L2 API without the media controller have calibration data. Those are typically used during device initialization as a series of register values inside driver tables. While we want to know what each register contains (so we strongly prefer to have those registers mapped with #define macros), it is not mandatory to have all of them documented.

Those are usually not calibration or tuning data, as they are not specific to particular camera instances.

Tuning for image quality?

DemiMarie — Tue, 08 Oct 2024 16:22:03 +0000

Are there any plans to perform the per-device tuning needed for optimal image quality, or will the image quality always be worse than with the OEM OS?

What is a camera?

Sesse — Tue, 08 Oct 2024 15:49:48 +0000

Do you have any good recommendations—is it libcamera that one would want to use? (I'd rather not go to the level of FFmpeg or Gstreamer if I can avoid it; FFmpeg is a great codec library but pretty weak on anything involving live, and Gstreamer is just a world of pain)

What is a camera?

laurent.pinchart — Tue, 08 Oct 2024 14:44:06 +0000

> > All that said. v4l is a beautiful piece of engineering for the "capture part".
>
> Not sure if I agree;

I concur. We've made lots of mistakes in V4L2 over the years, and I'm responsible for some of them. That's not specific to V4L2 though, the whole kernel is developed by making mistakes and then trying to fix them. The important part is to constantly improve the APIs and implementation. And make new mistakes along the way to replace the old ones :)

This being said, I think that V4L2 is in a much better place than it was 10 years ago for cameras and ISPs. Don't use the API directly in your applications though. While V4L2 was designed as an application API, the world has moved on and we now need and increasingly have userspace frameworks to handle the hardware complexity.

IPU4 and IPU6 support?

laurent.pinchart — Tue, 08 Oct 2024 14:38:47 +0000

Those are questions for Intel. My understanding is they have no plan to provide IPU4 support. For the IPU6 PSYS, there's a global consensus that we all want it to be supported upstream, but no agreement yet on how to get there.

IPU4 and IPU6 support?

DemiMarie — Tue, 08 Oct 2024 03:07:45 +0000

Will IPU4 and the PSYS part of IPU6 be supported? That's what I'm most interested in today, as it would allow shipping high quality cameras on Linux laptops.

Where to move secret sauce

laurent.pinchart — Fri, 04 Oct 2024 19:38:59 +0000

>This looks like comparing Apples with Oranges. Conceptually, cameras are more similar to monitors and GPUs to ISPs

In this context, a "camera" is usually a device made of at least an imaging sensor (the chip with the glass with shiny colours sitting under the lens) and an ISP (a hardware components that performs image processing tasks that would be too expensive for the CPU). When the imaging sensor and the ISP are in different chips, as is the case for the "complex cameras" we're dealing with, there are also different types of physical transmitters and receivers between the two components (MIPI CSI-2 is often involved). Pre- or post-processing steps can also be implemented in DSPs, NPUs, GPUs and/or CPUs as part of the camera pipeline. There needs to be a software component in the system with a global view of the whole pipeline and all the elements it contains, in order to configure them and run real time control loops.

The imaging sensor can contain a firmware, but that's largely out of scope here. It only deals with the internal operation of the sensor (sequencing exposure or read-out of lines for instance), and not with image processing by the ISP. GPUs, DSPs and NPUs, if used in the camera pipeline, can also include firmwares, but that's not relevant either from the points of view of the ISP or the top-level control loops.

> (which do of course run firmware)

Many ISPs don't. They are often fixed-function pipelines with a large number of parameters, but without any part able to execute an instruction set. Some ISPs are made of lower-level hardware blocks that need to be scheduled at high frequency and with very low latency, or contain a vector processor that executes an ISA. In those cases, the ISP usually contains a small MCUs that runs a low-level firmware. When the ISP is designed to be integrated in a large SoC, those firmwares often have very limited amount of memory and no access to the imaging sensor. For these reasons they are mostly designed to expose the ISP as a fixed-function pipeline to the OS.

When the ISP is a standalone chip, sitting between the imaging sensor and the main SoC, the MCU integrated with the ISP is usually a bit more powerful and will run the camera control algorithms, taking full control over the imaging sensor. The ISP chip then exposes a higher-level interface to the main SoC, similar to what an imaging sensor with an integrated ISP would expose.

Other firmwares can also be involved. Large SoCs often include cores meant to run firmwares, and those can usually interact with the entire camera (imaging sensor and ISP). Some vendors implement full camera control in such firmwares, exposing a higher-level interface similar to a webcam. There's a big downside in doing so, as adding support for a different imaging sensor, or even tuning the camera for a different lens, requires modifying that firmware. I believe this is done for instance by the Apple M1, as they have full control of the platform. More directly relevant for Linux, this kind of architecture is also seen in automotive environments where the camera is controlled by a real-time OS, and Linux then accesses some of the camera streams with a higher level of abstraction.

The complexity has merely moved

Wol — Fri, 04 Oct 2024 19:00:13 +0000

> I think it goes both ways. I can't tell exactly when this transition started and what was the trigger, but once processing moved to the main SoC, it opened the door to more possibilities that people may not have thought of otherwise. In turn, that probably justified the transition, accelerating the move.

This seems to be pretty common across a lot of technology. It starts with general purpose hardware, moves to special-purpose hardware because it's faster/cheaper, the special purpose hardware becomes obsolete / ossified, it moves back to general purpose hardware, rinse and repeat. Just look at routers or modems ...

Cheers,
Wol

What is a camera?

Sesse — Fri, 04 Oct 2024 15:49:30 +0000

> All that said. v4l is a beautiful piece of engineering for the "capture part".

Not sure if I agree; as a userspace programmer, I just gave up supporting V4L2 input at some point because the API was so painful and bare-bones. Every single camera under the sun seems to support a slightly different set of formats and settings, and the onus is on you as an application to figure out which ones to support (e.g. you'll frequently need to embed a JPEG decoder!). At some point, I wanted to support _output_ via v4l2loopback, but that means you'll need to go through the format dance again; browsers and other clients will accept only a set of formats and nothing will try to convert for you. Eventually I went to the point of looking at the Chromium source and picking the format it accepted that was the least pain for me to create. :-) Thankfully, I only needed 720p60, so I didn't have to worry about fast DMA between my GPU and V4L2 buffers.

Where to move secret sauce

neggles — Fri, 04 Oct 2024 15:21:22 +0000

The command processors running those firmware blob are actually their own dedicated processor in modern desktop GPUs - nVidia's GSP, for example, is a fairly high-clock RISC-V core (as of appx. Ampere generation)

What is a camera?

mchehab — Fri, 04 Oct 2024 10:43:38 +0000

> But I am more aligned with the opinion of Dave Airlie that v4l2 is not suitable for ISPs (today). We need a change in the abstraction level and we need more vendors in our community.

Internally, V4L2 core code is generic and good enough to support ISPs. IMO, what it is needed is new IOCTL(s) - maybe at sub-device level - which would avoid the need of sending multiple ioctls per frame, with fences and dmabuf support. From internal code's perspective, just like we currently have videobuf2-v4l2.c and videobuf2-dvb.c as the top layer for per-API buffer handling, we may need a videobuf2-codec.c layer on the top of VB2 to handle the needs for ISP using such new IOCTL(s).

The complexity has merely moved

mchehab — Fri, 04 Oct 2024 10:27:54 +0000

> So it's not that cameras have gotten more complex. They're still doing exactly the same thing. It's just that where that code runs has changed and existing interfaces are not equipped to deal with that.

Not entirely true, as an ISP-based processing allow more complex processing pipelines with things like face recognition, more complex algorithms and extra steps to enhance image's quality.

During the libcamera discussions, we referred to the entire set as simply "V4L2 API", but there are actually three different APIs used to control complex camera hardware: V4L2 "standard" API, media controller and sub-devices API. Currently, on complex cameras:

1. input and capture (output) devices are controlled via V4L2 API (enabled via config VIDEO_DEV);
2. the pipeline is controlled via the media controller API (enabled via config MEDIA_CONTROLLER);
3. each element of the pipeline is individually controlled via the V4L2 sub-device API (enabled via config VIDEO_V4L2_SUBDEV_API).

Following the discussions, it seems that we could benefit of having need new ioctl(s) for (1) to simplify the number of ioctl calls for memory-to-memory sub-devices, to simplify ISP processing, perhaps as a part of sub-device API.

For (3), we may need to add something to pass calibration data.

Yet, the final node of the pipeline (the capture device) is the same, and can be completely mapped using the current V4L2 API: a video stream, usually compressed with a codec (mpeg, h-264, ...) with a known video resolution, frame rate and a fourcc identifying the output video format (bayer, yuv, mpeg, h-264, ...).

Most of vendor-specific "magic" happens at the intermediate nodes inside the pipeline. Typically, modern cameras produce two different outputs: a video stream and a metadata stream. The metadata is used by vendor-specific 3A algorithms (auto focus, auto exposure and auto whitebalance), among others. The userspace component (libcamera) need to use such metadata to produce a set of changes to be applied to the next frames by the ISP. They also use a set of vendor-specific settings that are related to the hardware attached to the ISP, including camera sensor and lens. Those are calibrated by the hardware vendor.

The main focus of the complex camera discussions is around those intermediate nodes.

As I said during libcamera discussions, from my perspective as the Media subsystem maintainer, I don't care how the calibration data was generated. This is something that IMO we can't contribute much, as it would require an specialized lab
to test the ISP+sensor+lens with different light conditions and different environments (indoor, outdoor, different focus settings, etc.). I do care, however, to now allow executing binary blobs sent from userspace at the Kernel.

By the way, even cameras that have their own CPUs and use just V4L2 API without the media controller have calibration data. Those are typically used during device initialization as a series of register values inside driver tables. While we want to know what each register contains (so we strongly prefer to have those registers mapped with #define macros), it is not mandatory to have all of them documented.

In the past, our efforts were to ensure that the Kernel drivers is fully open sourced. Now that we have libcamera, the requirement is that the driver (userspace+kernel) to be open sourced. The Kernel doesn't need to know how configuration data passed from userspace was calculated, provided that such calculus is part of libcamera.

Where to move secret sauce

marcH — Fri, 04 Oct 2024 06:56:39 +0000

> GPUs already have a lot of computational power, so they can spare a bit of it to run the firmware. Cameras do not...

This looks like comparing Apples with Oranges. Conceptually, cameras are more similar to monitors and GPUs to ISPs (which do of course run firmware)

What is a camera?

ribalda — Thu, 03 Oct 2024 21:35:33 +0000

> As someone not familiar with V4L: What does the system assume a camera is? Is it just a device that periodically sends an opaque buffer, and that can be configured?

Laurent has made a very good description. A simpler description is that today cameras usually contain two parts:
- The online capture part: Produces periodically frames and can be configured.
- The offline capture part aka as ISP: that takes N frames and M configuration buffers from memory and produces X frames and Y statistics buffers. The number of buffers and the size of them is hardware specific.

> Are there any weird devices reasonably called cameras that V4L currently can't and foreseeable won't work with?

In my opinion, for ISPs, the issue with v4l2 is that we do not have a good API to: "send N buffers and receive N buffers" efficiently. Instead we have to deal with controls, entities, pads, formats...

Some of those abstractions were implemented to standardize the capture process. Other abstractions were implemented to avoid sending random data to the ISP from userspace.

We can probably "fix" v4l2, and Laurent has given a good list of things that we are missing. But I am more aligned with the opinion of Dave Airlie that v4l2 is not suitable for ISPs (today). We need a change in the abstraction level and we need more vendors in our community.

All that said. v4l is a beautiful piece of engineering for the "capture part".

The complexity has merely moved

laurent.pinchart — Thu, 03 Oct 2024 20:11:56 +0000

> And (from my perspective) that was all driven by Android, and now Linux is having to catch up so it can run on the same hardware.

It started before Android as far as I know. Back in the Nokia days, the FrakenCamera project (https://graphics.stanford.edu/papers/fcam/) developed an implementation of a computational photography on a Nokia N900 phone (running Linux). A Finnish student at Stanford university participated in the project and later joined Google, where he participated in the design of the HAL3 API based on the FCam design.

The big picture isn't that gloomy

ribalda — Thu, 03 Oct 2024 20:07:48 +0000

I also want to thank Laurent for all his work co-organizing this MicroConfonference.

In the last years, Laurent has done an amazing job with vendors such as Raspberry PI or the Arm Mali. They are the golden standard for what an open camera stack should look like.

But that work is difficult to map into the hardware that runs *most* of the consumer electronics today. We do not support the cameras in most (maybe all) of the phones and the only way to use the current intel hardware is to software emulate the ISP, with limited capabilities and very poor performance.

The vendors that want to collaborate with us say that *for ISPs* they do not need any of the abstractions provided by V4L2 (formats, controls, media controller) and that the current V4L2 openness model is not compatible with their business model. The very same vendors are delivering open graphic stacks... so it is not fair to say that the lack of support is all their fault.

There are some positive points from the MC:
- It is the first time that 4 vendors attended an open conference.
- We are talking about relaxing the openness requirements of v4l2 in favor of our users
- We have started to look into what other subsystems are doing in terms of building an ecosystem

If we manage to include the vendors into our community, support for new ISPs will keep flowing (instead of being heroic achievements), and users will soon enjoy their cameras in their open OSs.

The complexity has merely moved

excors — Thu, 03 Oct 2024 19:55:27 +0000

From what I vaguely remember, back when Android only supported Camera HAL1 (which used the old-fashioned webcam model: the application calls setParameters(), startPreview(), takePicture(), and eventually gets the JPEG in a callback and can start the process again), there was already a trend to put the ISP on the main SoC. It's almost always cheaper to have fewer chips on your PCB, plus you can use the same ISP hardware for both front and rear cameras (just reconfigure the firmware when switching between them), and you can use the same RAM for the ISP and for games (since you're not going to run both at the same time), etc, so there are significant cost and power-efficiency benefits.

The early SoC ISPs were quite slow and/or bad so they'd more likely be used for the front camera, while the higher-quality rear camera might have a discrete ISP, but eventually the SoCs got good enough to replace discrete ISPs on at least the lower-end phones. (And when they did have a discrete ISP it would probably be independent of the camera sensor, because the phone vendor wants the flexibility to pick the best sensor and the best ISP for their requirements, not have them tied together into a single module by the camera vendor.)

Meanwhile phone vendors added proprietary extensions to HAL1 to support more sophisticated camera features (burst shot, zero shutter lag, HDR, etc), because cameras were a big selling point and a great way to differentiate from competitors. They could implement that in their app/HAL/drivers/firmware however they wanted (often quite hackily), whether it was an integrated or discrete ISP.

Then Google developed Camera HAL2/HAL3 (based around a pipeline of synchronised frame requests/results) to support those features properly, putting much more control on the application side, standardising a lower-level interface to the hardware. I'm guessing that was harder to implement on some discrete ISPs that weren't flexible enough, whereas integrated ones typically depended more heavily on the CPU so they were already more flexible. It also reduced the demand for fancy new features on ISP hardware since the application could now take responsibility for them (using CPU/GPU/NPU to manipulate the images efficiently enough, and using new algorithms without the many-year lag it takes to implement a new feature in dedicated hardware).

It was a slow transition though - Android still supported HAL1 in new devices for many years after that, because phone vendors kept using cameras that wouldn't or couldn't support HAL3.

So, I don't think there's a straightforward casuality or a linear progression; it's a few different things happening in parallel, spread out inconsistently across more than a decade, applying pressure towards the current architecture. And (from my perspective) that was all driven by Android, and now Linux is having to catch up so it can run on the same hardware.

What is a camera?

laurent.pinchart — Thu, 03 Oct 2024 19:28:55 +0000

> As someone not familiar with V4L: What does the system assume a camera is? Is it just a device that periodically sends an opaque buffer, and that can be configured?

V4L2 has two levels of APIs for camera. The "traditional" (I sometimes call it "legacy", due to my bias towards embedded systems and direct control of ISPs) API is high-level, and maps to what you can expect today from a USC webcam. The camera is a device that periodically produces images stored in buffers, in a variety of formats. It also exposes controls, to configure contrast, brightness, focus, zoom, or more device-specific parameters. V4L2 defines many pixel formats, some of them used for depth maps or temperature maps.

The lower-level API exposes all building blocks of the camera processing pipeline (camera sensor, MIPI CSI-2 receiver, ISP, ...) and lets userspace control all of them individually. The kernel drivers are mostly responsible for ensuring nothing blows up, by verifying for instance that buffers are big enough to store images in the configured capture format. The controls are also lower-level, and more numerous. ISPs typically have from tens of kBs to MBs of parameters (to be precise, a large part of that parameters space is made of 1D and 2D tables, so we're not dealing with millions of individually computed independent values). They also produce, in addition to images, statistics (histograms for instance).

In this lower-level model, userspace is responsible for computing, in real time, sensor and ISP parameters for the next frame using statistics from the previous frame. This control loop, usually referred to as the ISP algorithms control loop, is totally out of scope of the kernel drivers. It is the component that I consider legitimate for vendors not to disclose. In libcamera we implement an open-source ISP algorithm control loop module (named Image Processing Algorithms module, or IPA) for each supported platform, and also allow vendors to ship their closed-source IPA modules. This is similar to Mesa shipping an open-source 3D stack, without preventing vendors from shipping their competing closed-source stack.

> Are there any weird devices reasonably called cameras that V4L currently can't and foreseeable won't work with?

Yes and no. In my opinion, there are no devices that can reasonably be called a camera that could easily be supported in DRM and couldn't be supported by V4L2 with a reasonable development effort. There are however cameras that V4L2 can't support and would have a hard time managing. I'm thinking about very high frame rate cameras for instance (10kfps and higher), or cameras that require CPU action for every line of the image. This isn't so much a V4L2 limitation than a Linux limitation, handling an interrupt every 10µs with real-time performance requires a real-time operating system.

V4L2 has known limitations today. It lacks multi-context support (the ability for multiple independent clients to time-multiplex a memory-to-memory ISP), an atomic API (the ability to queue processing for a frame with a single ioctl call), fences, or sub-frame latency. Some of these issues are being addressed (there's a patch series for multi-context support), will be addressed (the building blocks for the atomic API have slowly been upstreamed over the past few years) or could be addressed (patches have been posted for fences, but didn't get merged due to a lack of real-life performance gain ; sub-frame latency currently needs a champion to build a use case and make a proposal).

Where to move secret sauce

Cyberax — Thu, 03 Oct 2024 18:53:48 +0000

GPUs already have a lot of computational power, so they can spare a bit of it to run the firmware. Cameras do not, so they want to use the main CPU (with maybe some hardware accelerators) to run the image processing algorithms.

The complexity has merely moved

laurent.pinchart — Thu, 03 Oct 2024 18:49:29 +0000

I think it goes both ways. I can't tell exactly when this transition started and what was the trigger, but once processing moved to the main SoC, it opened the door to more possibilities that people may not have thought of otherwise. In turn, that probably justified the transition, accelerating the move.

There are financial arguments too, in theory, at least if development costs and the economical impact on the users are ignored, this architecture is supposed to be cheaper. A more detailed historical study would be interesting. Of course it won't change the situation we're facing today.

Where to move secret sauce

jengelh — Thu, 03 Oct 2024 17:41:09 +0000

>vendors claim that they cannot be documented. This area, it seems, is a minefield of patents and "special sauce";

Just move it into firmware. If GPUs can do it, so can those camera vendors.

The complexity has merely moved

intelfx — Thu, 03 Oct 2024 17:37:30 +0000

> due to a combination of processing moving from the camera module to the main SoC, and processing itself getting more complex (regardless of whether or not the latter is a partial consequence of the former).

Forgive me if I'm wrong, but I always thought it was mostly the other way around, no? I. e. due to R&D advances and market pressure the image processing started to get increasingly more complex (enter computational photography), and at some point during that process it became apparent that it's just all around better to get rid of the separate processing elements in the camera unit and push things into the SoC instead.

Thus, a few years later, this trend is finally trickling down to general-purpose computers and thus Linux proper.

Or am I misunderstanding how things happened?

What is a camera?

SLi — Thu, 03 Oct 2024 17:32:10 +0000

As someone not familiar with V4L: What does the system assume a camera is? Is it just a device that periodically sends an opaque buffer, and that can be configured?

I'm trying to think of weirder cameras that might require different approaches, but fundamentally I guess everything even remotely camera-like (depth cameras, IR temperature cameras, ...) can be represented as a buffer of WxHxC pixels where W and H tend to be large and C much smaller, as long as you don't assume too much about C.

Some of this gets maybe blurred a bit by things like light field cameras, but in the end you still have a fairly normal camera sensor, only what it gets to measure is more of a Fourier transformed scene.

Are there any weird devices reasonably called cameras that V4L currently can't and foreseeable won't work with?

Khronos Kamaros camera API

laurent.pinchart — Thu, 03 Oct 2024 17:01:33 +0000

Kamaros is an ongoing effort of the Khronos group to standardize a cross-platform API for cameras. It is currently foreseen to expose a level of abstraction similar to the libcamera API, or to the Android Camera HAL3 API. If and when a first version of Kamaros is published, I believe we will implement it as a layer on top of libcamera, the same way that libcamera has an Android Camera HAL3 adaptation layer. Drawing a parallel to the graphics world, libcamera with a Kamaros adaptation layer would be equivalent to Mesa implementing the Vulkan API. Kamaros will be implemented once in the adaptation layer, and work with all platforms supported by libcamera.

All of this is of course more on the realm of speculation than commitment, as there is no public Kamaros API yet.

Khronos Kamaros camera API

joib — Thu, 03 Oct 2024 16:54:30 +0000

What about the Kamaros work by Khronos? How does that fit into the FOSS camera stack picture? (I know very little about camera programming, so feel free to ELI5)

The complexity has merely moved

laurent.pinchart — Thu, 03 Oct 2024 16:34:20 +0000

I think it's a bit more complicated than this. With the processing moving to the main SoC, it became possible for the main OS to have more control over that processing. I think this has led to more complex processing being implemented, that would have been more difficult (or more costly) to do if everything had remained on the camera module side. Advanced HDR processing is one such feature for instance, and NPU-assisted algorithms such as automatic white balance is another example. Some of this would probably still have happened without the processing shifting to the main SoC, but probably at a different pace and possibly in a different direction.

In any case, the situation we're facing is that cameras now look much more complex from a Linux point of view, due to a combination of processing moving from the camera module to the main SoC, and processing itself getting more complex (regardless of whether or not the latter is a partial consequence of the former).

The complexity has merely moved

laurent.pinchart — Thu, 03 Oct 2024 16:18:24 +0000

That's a good summary, thank you. It became more complex only from a Linux kernel and userspace point of view.

V4L2 was originally modelled as a high-level API meant to be used directly by applications, with an abstraction level designed for TV capture cards and webcams, where all the processing was handled internally by hardware and firmware. It has evolved with the introduction of the Media Controller and V4L2 subdev APIs to support a much lower level of abstraction. These evolutions were upstreamed nearly 15 years ago (time flies) by Nokia. Unfortunately, due to Nokia's demise, the userspace part of the framework never saw the light of day. Fast forward to 2018, the libcamera project was announced to fill that wide gap and be the "Mesa of cameras". We now have places for all of this complex code to live, but the amount of work to support a new platform is significantly larger than it used to be.

The complexity has merely moved

atnot — Thu, 03 Oct 2024 16:04:16 +0000

A bit of context, as the framing of cameras becoming more "complex" is kind of misleading I think.

What has happened instead is that while in the past, the camera module contained a CPU and ISP with firmware that did all the processing, with smartphones this has increasingly been integrated to the main SoC instead to save cost, power, and increase flexibility. So instead of the camera module sending a finished image over a (relatively) low-speed bus, image sensors are now directly attached to the CPU and deliver raw sensor data to an on-board ISP. This means that what used to be the camera firmware needs to be moved to the main CPU too. This is not a problem under Android since they can just ship that "firmware" as a blob on the vendor partition and use standard system APIs to get a completed image. But for v4l2, which is fundamentally built around directly passing images through from the device to applications, that's a problem.

So it's not that cameras have gotten more complex. They're still doing exactly the same thing. It's just that where that code runs has changed and existing interfaces are not equipped to deal with that.

The big picture isn't that gloomy

laurent.pinchart — Thu, 03 Oct 2024 13:56:40 +0000

> Support for complex camera devices in Linux seems likely to be messy and proprietary for some time but, with luck, it will slowly improve.

I'd like to bring a bit of a more positive spin to this conclusion. The Linux kernel and libcamera already have fully open-source support for multiple ISPs, most notably the Raspberry Pi 4 ISP, the VSI ISP8000 (a.k.a. rkisp1 for historical reasons) found in SoCs from Rockchip, NXP and other vendors, and the Intel IPU3 ISP (found in Sky Lake and Kaby Lake SoCs). Support for the Raspberry Pi 5 ISP and the Arm Mali C55 ISP is developed in the open and close to getting merged upstream in both the kernel and libcamera. The list is constantly growing.

This being said, it's not all rainbows and unicorns either, support for some important platforms is missing today.

Clarification on the usage of DRM vs. V4L2

laurent.pinchart — Thu, 03 Oct 2024 13:43:13 +0000

Thank you Jon for summarizing this long and passionate discussion, it wasn't an easy job. I would like to also thank Ricardo for co-organizing this micro-conference and handling the larger part of the logistics.

> Libcamera developer Laurent Pinchart said that the current model for ISPs does not involve a ring buffer; instead, user space submits a lot of configuration data for each frame to be processed. Both seemed to think that the DRM approach might not work for this kind of device.

Small clarification here: my opinion isn't that the DRM approach couldn't work for these devices (memory-to-memory ISPs), but that it doesn't bring much technical advantage compared to what can already be done with V4L2, or to what will be possible with V4L2 once current work in progress gets merged upstream.

This comment is only intended as a clarification of what I expressed (or tried to express) during the micro-conference, not as an attempt to debate this opinion on LWN.net. There are some technical limitations of V4L2 in its current form that we all agreed exist, but nobody has provided quantitative data to prove that (and how much) they affect the use cases we discussed.