# Notes on DMABUF and video

While working with the Linux video subsystem, especially while working on the Librem 5 even before libobscura, the term "DMABUF" appeared a lot. Yes, it's a buffer. No, you can't always use it.

It's always been some kind of a magically limited but also powerful thing that you must master if you want fast video, but you're likely to hit a corner case if you're not careful.

And the knowledge is spread all over.

So this is my latest understanding of what it is and how to use it. I intend to update this in follow-up posts, so please correct me if you see something wrong.

## Background

Video devices operate on pictures. Those pictures are large amounts of data, and they need to travel somewhere to be useful: your screen and your eyes, or your hard drive, or the internet. Typically, your screen, though.

Moving around big pictures at 30 frames per second means moving around a lot of data across different components: all the way from the camera device to the GPU and the display. Moving data from one place to another is copying, and every copy along the way takes up some computing resources. So in order to have a smooth experience, unneeded copies should be eliminated.

## DMABUF

One mechanism to avoid copies is called DMABUF: DMA (direct memory access) buffers. Those buffers can be used directly by devices like the camera controller, or the GPU. In the case of video coming from the camera, the idea is that once the camera data arrives from the hardware processing units into the main memory, the GPU can access it directly (with its DMA engine), without creating another "GPU area" and copying image data there from a "camera area".

Unless it can't. DMABUF buffers are always associated with a device. This allows for the buffer to live in a special-purpose area of memory, like the GPU VRAM, where other devices might not be able to access it.

Location is one reason why DMABUF buffers are not exchangeable, but different devices can do different things. If your camera's DMA engine can access only the first 4GiB of memory, the buffer you allocated for your GPU might be out of reach. Or take the shape of the data: if your camera produces buffers with rows being a multiple of 4 bytes, but your GPU can read rows being multiples of 32 bytes, then the GPU may misinterpret the images.

## DMABUF life cycle

I'm working on a camera application, so this is the case I focus on: buffers get filled, I read them out later.

At a high level, every buffer needs to be allocated, then fed to the camera capture device, then received once it's filled and read out - by the GPU or otherwise. Then, finally, the buffer is de-allocated.

In V4L2, this is achieved by the following sequence of IOCTLs:

`VIDIOC_REQBUFS(N, MMAP)` allocates multiple (N) DMABUF buffers, without giving any access to the user yet
`fd = VIDIOC_EXPBUF` returns a handle in the form of a file descriptor (fd)
`VIDIOC_REQBUFS(N, DMABUF)` - not sure why this is needed
`VIDIOC_QBUF(fd, DMABUF)` passes the buffer to the device for writing
`VIDIOC_DQBUF` waits until a previous buffer is ready and returns it
`VIDIOC_REQBUFS(0, DMABUF)` not sure what it does

TODO: which call assigns a size to the buffers? They are magically the right size... and what if the format changes? TODO: Which call releases the memory?

In kernel documentation, there are some mentions of "importing" a buffer into the device. I find this unhelpful for understanding how this works. The "import" operation is not the opposite of the "export" operation, which was very confusing when I was just trying to figure it all out. It just seems to be the act of enqueueing. It would mesh with my mind better if all mentions of "import" were unwrapped and instead the focus was on what the operation actually does.

## Reading the data

The handles to the DMA buffers are file descriptors. To read the data out, mmap them. In Rust, it's as simple as:

```
let outf = unsafe { File::from_raw_fd(fd) };
let outfmap = unsafe { memmap2::Mmap::map(&outf) }?;

println!("  length    : {}", outfmap.len());

// Prevent File from dropping and closing the fd implicitly.
let _ = File::into_raw_fd(outf);
```

Or is it?

The buffers may reside in some other part of memory, like maybe on the GPU.

And can you expose a GPU buffer directly to the CPU? Maaaybe – there are unified memory architectures out there, like some systems with integrated graphics. But unless you know exactly what hardware you're running on, you may find out that the buffer cannot be mapped.

Even if it can be mapped, I'm guessing that access might be slow, for example if using a bounce buffer (can anyone fact check me on this?). That kind of defeats the speed point.

But that's not all the problems coming from different memory placement yet. You have to pay attention to caching. See, the buffer receives data by DMA, meaning that the memory is updated behind the back of the CPU. Which is exactly the point: the CPU should not be busy watching the transfer. BUT! If the CPU is caching some data from the buffer, any update will make the cache out of date and the CPU will never know.

So you have to make sure the CPU throws away or updates any old cached memories of the updated area.

There's an IOCTL for that: `DMA_BUF_IOCTL_SYNC`. I won't get into details, unless someone asks me to explain the docs.

## Complete reading example

For your convenience and my own, I took an existing dma-buf crate and adjusted it to be a little nicer to use in libobscura. Here's how you map a buffer without fretting about safety in dma-boom:

```
use dma_boom::DmaBuf;
use dma_boom::test;

// It's up to you to find a working buffer.
let buf: &DmaBuf = test::get_dma_buf();

{
    // Request sync and create an access guard.
    // Multiple read-only accesses can co-exist
    let mmap = buf.memory_map_ro().unwrap();
    // The actual slice
    let data = mmap.as_slice();
    if data.len() >= 4 {
        println!("Data buffer: {:?}...", &data[..4]);
    }
} // `mmap` goes out of scope and unmaps the buffer
```

Comments