File Sealing & memfd_create()

From:  David Herrmann <>
Subject:  [PATCH v3 0/7] File Sealing & memfd_create()
Date:  Fri, 13 Jun 2014 12:36:52 +0200
Message-ID:  <>
Cc:  Michael Kerrisk <>, Ryan Lortie <>, Linus Torvalds <>, Andrew Morton <>,,,, Greg Kroah-Hartman <>,, Lennart Poettering <>, Daniel Mack <>, Kay Sievers <>, Hugh Dickins <>, Tony Battersby <>, Andy Lutomirski <>, David Herrmann <>
Archive-link:  Article


This is v3 of the File-Sealing and memfd_create() patches. You can find v1 with
a longer introduction at gmane:
An LWN article about memfd+sealing is available, too:
v2 with some more discussions can be found here:

This series introduces two new APIs:
  memfd_create(): Think of this syscall as malloc() but it returns a
                  file-descriptor instead of a pointer. That file-descriptor is
                  backed by anon-memory and can be memory-mapped for access.
  sealing: The sealing API can be used to prevent a specific set of operations
           on a file-descriptor. You 'seal' the file and give thus the
           guarantee, that it cannot be modified in the specific ways.

A short high-level introduction is also available here:

Changed in v3:
 - fcntl() now returns EINVAL if the FD does not support sealing. We used to
   return EBADF like pipe_fcntl() does, but that is really weird and I don't
   like repeating that.
 - seals are now saved as "unsigned int" instead of "u32".
 - i_mmap_writable is now an atomic so we can deny writable mappings just like
   i_writecount does.
 - SHMEM_ALLOW_SEALING is dropped. We initialize all objects with F_SEAL_SEAL
   and only unset it for memfds that shall support sealing.
 - memfd_create() no longer has a size argument. It was redundant, use
   ftruncate() or fallocate().
 - memfd_create() flags are "unsigned int" now, instead of "u64".
 - NAME_MAX off-by-one fix
 - several cosmetic changes
 - Added AIO/Direct-IO page-pinning protection

The last point is the most important change in this version: We now bail out if
any page-refcount is elevated while setting SEAL_WRITE. This prevents parallel
GUP users from writing to sealed files _after_ they were sealed. There is also a
new FUSE-based test-case to trigger such situations.

The last 2 patches try to improve the page-pinning handling. I included both in
this series, but obviously only one of them is needed (or we could stack them):
 - 6/7: This waits for up to 150ms for pages to be unpinned
 - 7/7: This isolates pinned pages and replaces them with a fresh copy

Hugh, patch 6 is basically your code. In case that gets merged, can I put your
Signed-off-by on it?

I hope I didn't miss anything. Further comments welcome!


David Herrmann (7):
  mm: allow drivers to prevent new writable mappings
  shm: add sealing API
  shm: add memfd_create() syscall
  selftests: add memfd_create() + sealing tests
  selftests: add memfd/sealing page-pinning tests
  shm: wait for pins to be released when sealing
  shm: isolate pinned pages when sealing files

 arch/x86/syscalls/syscall_32.tbl               |   1 +
 arch/x86/syscalls/syscall_64.tbl               |   1 +
 fs/fcntl.c                                     |   5 +
 fs/inode.c                                     |   1 +
 include/linux/fs.h                             |  29 +-
 include/linux/shmem_fs.h                       |  17 +
 include/linux/syscalls.h                       |   1 +
 include/uapi/linux/fcntl.h                     |  15 +
 include/uapi/linux/memfd.h                     |   8 +
 kernel/fork.c                                  |   2 +-
 kernel/sys_ni.c                                |   1 +
 mm/mmap.c                                      |  24 +-
 mm/shmem.c                                     | 320 ++++++++-
 mm/swap_state.c                                |   1 +
 tools/testing/selftests/Makefile               |   1 +
 tools/testing/selftests/memfd/.gitignore       |   4 +
 tools/testing/selftests/memfd/Makefile         |  40 ++
 tools/testing/selftests/memfd/fuse_mnt.c       | 110 +++
 tools/testing/selftests/memfd/fuse_test.c      | 311 +++++++++
 tools/testing/selftests/memfd/memfd_test.c     | 913 +++++++++++++++++++++++++
 tools/testing/selftests/memfd/ |  14 +
 21 files changed, 1807 insertions(+), 12 deletions(-)
 create mode 100644 include/uapi/linux/memfd.h
 create mode 100644 tools/testing/selftests/memfd/.gitignore
 create mode 100644 tools/testing/selftests/memfd/Makefile
 create mode 100755 tools/testing/selftests/memfd/fuse_mnt.c
 create mode 100644 tools/testing/selftests/memfd/fuse_test.c
 create mode 100644 tools/testing/selftests/memfd/memfd_test.c
 create mode 100755 tools/testing/selftests/memfd/


