The complexity has merely moved

Posted Oct 4, 2024 10:27 UTC (Fri) by mchehab (subscriber, #41156)
In reply to: The complexity has merely moved by atnot
Parent article: Coping with complex cameras

> So it's not that cameras have gotten more complex. They're still doing exactly the same thing. It's just that where that code runs has changed and existing interfaces are not equipped to deal with that.

Not entirely true, as an ISP-based processing allow more complex processing pipelines with things like face recognition, more complex algorithms and extra steps to enhance image's quality.

During the libcamera discussions, we referred to the entire set as simply "V4L2 API", but there are actually three different APIs used to control complex camera hardware: V4L2 "standard" API, media controller and sub-devices API. Currently, on complex cameras:

1. input and capture (output) devices are controlled via V4L2 API (enabled via config VIDEO_DEV);
2. the pipeline is controlled via the media controller API (enabled via config MEDIA_CONTROLLER);
3. each element of the pipeline is individually controlled via the V4L2 sub-device API (enabled via config VIDEO_V4L2_SUBDEV_API).

Following the discussions, it seems that we could benefit of having need new ioctl(s) for (1) to simplify the number of ioctl calls for memory-to-memory sub-devices, to simplify ISP processing, perhaps as a part of sub-device API.

For (3), we may need to add something to pass calibration data.

Yet, the final node of the pipeline (the capture device) is the same, and can be completely mapped using the current V4L2 API: a video stream, usually compressed with a codec (mpeg, h-264, ...) with a known video resolution, frame rate and a fourcc identifying the output video format (bayer, yuv, mpeg, h-264, ...).

Most of vendor-specific "magic" happens at the intermediate nodes inside the pipeline. Typically, modern cameras produce two different outputs: a video stream and a metadata stream. The metadata is used by vendor-specific 3A algorithms (auto focus, auto exposure and auto whitebalance), among others. The userspace component (libcamera) need to use such metadata to produce a set of changes to be applied to the next frames by the ISP. They also use a set of vendor-specific settings that are related to the hardware attached to the ISP, including camera sensor and lens. Those are calibrated by the hardware vendor.

The main focus of the complex camera discussions is around those intermediate nodes.

As I said during libcamera discussions, from my perspective as the Media subsystem maintainer, I don't care how the calibration data was generated. This is something that IMO we can't contribute much, as it would require an specialized lab
to test the ISP+sensor+lens with different light conditions and different environments (indoor, outdoor, different focus settings, etc.). I do care, however, to now allow executing binary blobs sent from userspace at the Kernel.

By the way, even cameras that have their own CPUs and use just V4L2 API without the media controller have calibration data. Those are typically used during device initialization as a series of register values inside driver tables. While we want to know what each register contains (so we strongly prefer to have those registers mapped with #define macros), it is not mandatory to have all of them documented.

In the past, our efforts were to ensure that the Kernel drivers is fully open sourced. Now that we have libcamera, the requirement is that the driver (userspace+kernel) to be open sourced. The Kernel doesn't need to know how configuration data passed from userspace was calculated, provided that such calculus is part of libcamera.

The complexity has merely moved

Posted Oct 8, 2024 18:31 UTC (Tue) by laurent.pinchart (subscriber, #71290) [Link]

> Not entirely true, as an ISP-based processing allow more complex processing pipelines with things like face recognition, more complex algorithms and extra steps to enhance image's quality.

atnot's point is that ISP were there already, just hidden by the webcam firmware, and that's largely true. We have USB webcams today that handle face detection internally and enhance the image quality in lots of way (we also have lots of cheap webcams with horrible image quality of course).

> For (3), we may need to add something to pass calibration data.

It's unfortunately more complicated than that (I stopped counting the number of times I've said this). "Calibration" or "tuning" data is generally not something that is passed directly to drivers, but needs to be processed by userspace based on dynamically changing conditions. For instance, the lens shading compensation data, when expressed as a table, needs to be resampled based on the camera sensor field of view. Lots of tuning data never makes it to the device but only influences userspace algorithms.

> Yet, the final node of the pipeline (the capture device) is the same, and can be completely mapped using the current V4L2 API: a video stream, usually compressed with a codec (mpeg, h-264, ...) with a known video resolution, frame rate and a fourcc identifying the output video format (bayer, yuv, mpeg, h-264, ...).

There is usually no video encoder in the camera pipeline when using an ISP in the main SoC. Encoding is performed by a separate codec.

> By the way, even cameras that have their own CPUs and use just V4L2 API without the media controller have calibration data. Those are typically used during device initialization as a series of register values inside driver tables. While we want to know what each register contains (so we strongly prefer to have those registers mapped with #define macros), it is not mandatory to have all of them documented.

Those are usually not calibration or tuning data, as they are not specific to particular camera instances.