Access to complex video devices with libcamera

By Jonathan Corbet
July 25, 2019

Laurent Pinchart began his Open Source Summit Japan 2019 talk with a statement that, once upon a time, camera devices were simple pipelines that produced a sequence of video frames. Applications could control cameras using the Video4Linux (V4L) API by way of a single device node; there were "lots of knobs", but the overall task was straightforward. That situation has changed over the years, and application developers need more help; that is where the libcamera project comes in.

In truth, if your editor may interject a brief comment, even the basic V4L API is not entirely straightforward for the uninitiated. There is a negotiation process that must happen so that the application can determine whether a given camera can deliver the sort of data stream that is needed. The number of parameters to tweak is large. It was not uncommon to find applications that worked with some camera devices, but not others. V4L makes many things possible, but even the simplest tasks may not be easy.

libcamera

Now (returning to the talk), consider the situation with contemporary hardware, which is much more complex. A typical camera device has a collection of processing blocks (handling tasks like image scaling, color correction, color-space conversion, autofocus, etc.) that can be interconnected in a variety of ways. The media controller subsystem was introduced to expose this complexity to user space, but it only helps so much. A special-purpose application that, for example, understands a single video device found on a specific handset can be managed, but writing an application that can handle a wide variety of video hardware is challenging at best.

Developers at Nokia had, some years ago, envisioned a plugin-based mechanism for camera configuration but, before it could be implemented, Nokia canceled its smartphone project and development lapsed. Ten years later, libcamera was started with the intent of being "the Mesa of the camera stack"; its purpose is to make it easy for applications to interface with camera devices.

What is envisioned is a four-layer stack:

libcamera is the lowest-level layer, interfacing directly with the kernel. It is implemented entirely in user space, with no changes to kernel APIs planned.
A set of bindings will make libcamera available in a range of different programming languages.
The "adaptation layer" provides a set of interfaces to libcamera for existing applications; they will include a V4L compatibility layer, an Android HAL interface, and a GStreamer interface. The intent is to make libcamera suitable for all Linux-based devices.
The application layer exists already, in the form of GStreamer, native V4L applications, Android applications, etc. There will also be native libcamera applications in the future.

Application interface

The first thing a libcamera application has to do is to enumerate the available cameras. A "camera" in this context is what users might see as a camera device; much of the underlying complexity (sensor, DMA bridge, processing units, etc.) is hidden within each camera device. These cameras expose a set of capabilities, such as how many concurrent video streams they can support, what types of controls they have, and the resolutions they are capable of. "Profiles" exist as a way of pulling together the capabilities needed for given tasks; there can be profiles for applications like "point-and-shoot camera" or "video conferencing".

The handling of concurrent streams is a key feature of libcamera. For example, a point-and-shoot device might have one mid-resolution stream used to preview a scene on a handset's screen and a full-resolution stream for image capture.

Controls in libcamera can be set on a per-frame basis, hardware permitting. These controls can include exposure time, focus settings, white balance, etc. This is not a useful feature for applications like video conferencing, Pinchart said, but it's important for tasks like face recognition or machine vision, where the application needs to know the parameters associated with each frame.

A native libcamera application will, after enumeration of the available devices, reserve access to the device(s) needed. Access is exclusive in libcamera; if multiplexing is needed, a framework like GStreamer can provide it. There is a configuration stage where the camera creates a template configuration from a set of available roles; the application will then tweak the parameters as needed and validate the result to ensure that the camera can support it. There is, thus, still a negotiation process required, but the creation of an initial configuration should ease that process considerably. This configuration is done for every stream that the application needs.

Once that is done, the application will allocate a set of buffers for incoming video data. A "create request" operation will create a request to capture a single video frame with a given set of parameters and queue it to the camera. Most applications will queue multiple requests to keep the video pipeline flowing; buffers can be turned around and queued with new requests after their data is consumed.

Advanced algorithms

Naturally, there is full support for image-processing algorithms, ranging from automatic exposure, white-balance, and focus setting through to advanced noise reduction and more. There is a balancing act required here: these algorithms, it seems, are often provided by the manufacturer of the camera, and many of them are proprietary software. Libcamera will support them as separate, loadable modules; Pinchart said that we want all of this code to be open source, but that's not the case now and the first priority is to make it all work in a safe and reliable manner.

One important design decision here is that image-processing modules do not talk directly to the hardware; they go through the standard interfaces like everything else. A typical module will get statistics (or image data) from the hardware, compute the optimal image parameters, then use libcamera interfaces to configure the device accordingly. "There will be no secret ioctl() calls", he said. Modules will also be sandboxed to limit the damage they can do to the rest of the system.

The low-level camera device abstraction is designed with as much device-independent code as possible. There are a lot of independent low-level camera implementations now; for example, the Android and ChromeOS teams do not talk to each other and each create their own implementations, he said. The intent is to make it easy for vendors to add support for their devices directly to libcamera so that everybody can work from the same implementation.

The libcamera project is in a relatively early stage of development; no actual releases have been made yet. It worked well enough for Pinchart to do a quick video-conferencing demonstration with an obliging developer in Europe who stayed awake until the time came; the image quality showed that work remains to be done, but the basic pipeline works. If libcamera continues to progress and meets its goals, it seems likely to show up on systems in the not-too-distant future.

[Your editor thanks the Linux Foundation for supporting his travel to the event.]

Index entries for this article
Conference	Open Source Summit Japan/2019

Access to complex video devices with libcamera

Posted Jul 25, 2019 22:35 UTC (Thu) by andy_shev (subscriber, #75870) [Link] (5 responses)

While idea is good, I can’t get rid of picturing https://xkcd.com/927/ in my mind.

Access to complex video devices with libcamera

Posted Jul 26, 2019 4:12 UTC (Fri) by creemj (subscriber, #56061) [Link] (4 responses)

While xkcd:827 is often trotted out when someone proposes new interfaces or standards, in this case I am not so sure it is apt. There is a real problem with camera interfaces and the multiple standards: v4l2, libdc1394, gig-e (not natively supported on Linux), camera-link (though this is pretty much obsolete now), USB-3 protocols (often requiring closed propriety software), and I am sure there are more. If you want to write software that can support all (or even most cameras, particularly machine vision cameras) then you have to support multiple protocols. So I this as one space where a library that manages all these protocols and provides one unified interface to access cameras would be a very significant advance.

Access to complex video devices with libcamera

Posted Jul 26, 2019 19:01 UTC (Fri) by guus (subscriber, #41608) [Link] (1 responses)

GigE is definitely natively supported on Linux; there are multiple free (at least as in beer) SDKs that support it. USB-3 Vision is just dc1394 over USB, and is supported by libdc1394. The two most annoying experiences I've had while working with numerous scientific/machine vision cameras are:

1. Cameras and/or framegrabbers requiring firmware to be loaded before they work.
2. Cameras using a proprietary protocol for configuring parameters like exposure time, frame rate and so on.

UVC and DC1394 are very nice because they at least have a standard way of setting parameters that almost all cameras use. Gig-E has the GenICam standard, but it's not supported by all cameras.

For USB webcams, setting parameters is trivial nowadays, but actually getting the stream of images is getting more and more complex. The reason is that high end cameras are getting better resolution and framerates, yet are still bound by USB bandwidth limitations. So while early cameras sent simple YUV or JPEG-compressed images, now we have cameras that can actually send a H.264 stream. Videochat applications might want to get the H.264 stream because they can then send it out over the Internet without having to recompress anything.

Access to complex video devices with libcamera

Posted Jul 31, 2019 14:55 UTC (Wed) by loose11 (guest, #114677) [Link]

>> Gig-E has the GenICam standard, but it's not supported by all cameras.

Some camera manufactures claiming that they implemented the standard. I had a issue with GenICam, where the standard was "implemented" but they made some hidden calls for specific parameters in there library. At the end, they were overriding my parameters.

Access to complex video devices with libcamera

Posted Jul 27, 2019 5:49 UTC (Sat) by flussence (guest, #85566) [Link] (1 responses)

If this could do for video inputs what libinput did for HID devices, that'd be wonderful. I know quite a few people who have no choice but to keep win32 around because it's the only way to get their off-brand (or sometimes brand) HDMI-to-USB3 or whatever device to function.

Access to complex video devices with libcamera

Posted Aug 1, 2019 10:39 UTC (Thu) by swilmet (subscriber, #98424) [Link]

I also think that libinput is a quite good comparison as an analogous project. A good description of why libinput exists:
https://who-t.blogspot.com/2018/07/why-its-not-good-idea-...

In the talk/article:

> libcamera was started with the intent of being "the Mesa of the camera stack"; its purpose is to make it easy for applications to interface with camera devices.

I would be interested to know more about this comparison to Mesa, the libcamera website doesn't mention Mesa. What do libcamera and Mesa have in common?

Access to complex video devices with libcamera

Posted Jul 29, 2019 12:55 UTC (Mon) by Kamiccolo (subscriber, #95159) [Link] (1 responses)

After following libcamera for a while with quite an interest, just saw the following line on the documentation (which was quite a downer):
> Other types of camera, including analog cameras, depth cameras, thermal cameras, external digital picture or movie cameras, are out of scope for this project.

Access to complex video devices with libcamera

Posted Aug 1, 2019 10:41 UTC (Thu) by swilmet (subscriber, #98424) [Link]

Are those cameras used through the same Linux kernel APIs/subsystems?

Access to complex video devices with libcamera

Posted Aug 1, 2019 15:48 UTC (Thu) by lamby (subscriber, #42621) [Link]

As someone who regularly passes "--device=$(find /dev/video* -print -quit)" to fswebcam… thanks. :)