| From: |
| Bernd Schubert <bschubert-AT-ddn.com> |
| To: |
| Miklos Szeredi <miklos-AT-szeredi.hu> |
| Subject: |
| [PATCH RFC v6 00/16] fuse: fuse-over-io-uring |
| Date: |
| Fri, 22 Nov 2024 00:43:16 +0100 |
| Message-ID: |
| <20241122-fuse-uring-for-6-10-rfc4-v6-0-28e6cdd0e914@ddn.com> |
| Cc: |
| Jens Axboe <axboe-AT-kernel.dk>, Pavel Begunkov <asml.silence-AT-gmail.com>, linux-fsdevel-AT-vger.kernel.org, io-uring-AT-vger.kernel.org, Joanne Koong <joannelkoong-AT-gmail.com>, Josef Bacik <josef-AT-toxicpanda.com>, Amir Goldstein <amir73il-AT-gmail.com>, Ming Lei <tom.leiming-AT-gmail.com>, David Wei <dw-AT-davidwei.uk>, bernd-AT-bsbernd.com, Bernd Schubert <bschubert-AT-ddn.com> |
| Archive-link: |
| Article |
[Still marked as RFC as I need to run tests on this version
and need to review myself. The design should be complete now.]
This adds support for uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
approach was taken from ublk.
Motivation for these patches is all to increase fuse performance,
by:
- Reducing kernel/userspace context switches
- Part of that is given by the ring ring - handling multiple
requests on either side of kernel/userspace without the need
to switch per request
- Part of that is FUSE_URING_REQ_COMMIT_AND_FETCH, i.e. submitting
the result of a request and fetching the next fuse request
in one step. In contrary to legacy read/write to /dev/fuse
- Core and numa affinity - one ring per core, which allows to
avoid cpu core context switches
A more detailed motivation description can be found in the
introction of previous patch series
https://lore.kernel.org/r/20241016-fuse-uring-for-6-10-rf...
That description also includes benchmark results with RFCv1.
Performance with the current series needs to be tested, but will
be lower, as several optimization patches are missing, like
wake-up on the same core. These optimizations will be submitted
after merging the main changes.
The corresponding libfuse patches are on my uring branch, but needs
cleanup for submission - that will be done once the kernel design
will not change anymore
https://github.com/bsbernd/libfuse/tree/uring
Testing with that libfuse branch is possible by running something
like:
example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring --uring-q-depth=128 /scratch/source /scratch/dest
With the --debug-fuse option one should see CQE in the request type,
if requests are received via io-uring:
cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
unique: 4, result=104
Without the --uring option "cqe" is replaced by the default "dev"
dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
unique: 4, success, outsize: 120
Future work
- different payload sizes per ring
- zero copy
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
Changes in v6:
- Update to linux-6.12
- Use 'struct fuse_iqueue_ops' and redirect fiq->ops once
the ring is ready.
- Fix return code from fuse_uring_copy_from_ring on
copy_from_user failure (Dan Carpenter / kernel test robot)
- Avoid list iteration in fuse_uring_cancel (Joanne)
- Simplified struct fuse_ring_req_header
- Adds a new 'struct struct fuse_ring_ent_in_out'
- Fix assigning ring->queues[qid] in fuse_uring_create_queue,
it was too early, resulting in races
- Add back 'FRRS_INVALID = 0' to ensure ring-ent states always
have a value > 0
- Avoid assigning struct io_uring_cmd *cmd->pdu multiple times,
once on settings up IO_URING_F_CANCEL is sufficient for sending
the request as well.
- Link to v5: https://lore.kernel.org/r/20241107-fuse-uring-for-6-10-rf...
Changes in v5:
- Main focus in v5 is the separation of headers from payload,
which required to introduce 'struct fuse_zero_in'.
- Addressed several teardown issues, that were a regression in v4.
- Fixed "BUG: sleeping function called" due to allocation while
holding a lock reported by David Wei
- Fix function comment reported by kernel test rebot
- Fix set but unused variabled reported by test robot
- Link to v4: https://lore.kernel.org/r/20241016-fuse-uring-for-6-10-rf...
Changes in v4:
- Removal of ioctls, all configuration is done dynamically
on the arrival of FUSE_URING_REQ_FETCH
- ring entries are not (and cannot be without config ioctls)
allocated as array of the ring/queue - removal of the tag
variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH
is more cumbersome now and needs an almost unused
struct fuse_pqueue per fuse_ring_queue and uses the unique
id of fuse requests.
- No device clones needed for to workaroung hanging mounts
on fuse-server/daemon termination, handled by IO_URING_F_CANCEL
- Removal of sync/async ring entry types
- Addressed some of Joannes comments, but probably not all
- Only very basic tests run for v3, as more updates should follow quickly.
Changes in v3
- Removed the __wake_on_current_cpu optimization (for now
as that needs to go through another subsystem/tree) ,
removing it means a significant performance drop)
- Removed MMAP (Miklos)
- Switched to two IOCTLs, instead of one ioctl that had a field
for subcommands (ring and queue config) (Miklos)
- The ring entry state is a single state and not a bitmask anymore
(Josef)
- Addressed several other comments from Josef (I need to go over
the RFCv2 review again, I'm not sure if everything is addressed
already)
- Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-wi...
- Link to v2: https://lore.kernel.org/all/20240529-fuse-uring-for-6-9-r...
- Link to v1: https://lore.kernel.org/r/20240529-fuse-uring-for-6-9-rfc...
---
Bernd Schubert (15):
fuse: rename to fuse_dev_end_requests and make non-static
fuse: Move fuse_get_dev to header file
fuse: Move request bits
fuse: Add fuse-io-uring design documentation
fuse: make args->in_args[0] to be always the header
fuse: {uring} Handle SQEs - register commands
fuse: Make fuse_copy non static
fuse: Add fuse-io-uring handling into fuse_copy
fuse: {uring} Add uring sqe commit and fetch support
fuse: {uring} Handle teardown of ring entries
fuse: {uring} Allow to queue fg requests through io-uring
fuse: {uring} Allow to queue to the ring
fuse: {uring} Handle IO_URING_F_TASK_DEAD
fuse: {io-uring} Prevent mount point hang on fuse-server termination
fuse: enable fuse-over-io-uring
Pavel Begunkov (1):
io_uring/cmd: let cmds to know about dying task
Documentation/filesystems/fuse-io-uring.rst | 101 ++
fs/fuse/Kconfig | 12 +
fs/fuse/Makefile | 1 +
fs/fuse/dax.c | 13 +-
fs/fuse/dev.c | 139 +--
fs/fuse/dev_uring.c | 1339 +++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 201 ++++
fs/fuse/dir.c | 41 +-
fs/fuse/fuse_dev_i.h | 69 ++
fs/fuse/fuse_i.h | 21 +
fs/fuse/inode.c | 5 +-
fs/fuse/xattr.c | 9 +-
include/linux/io_uring_types.h | 1 +
include/uapi/linux/fuse.h | 57 ++
io_uring/uring_cmd.c | 6 +-
15 files changed, 1939 insertions(+), 76 deletions(-)
---
base-commit: 3022e9d00ebec31ed435ae0844e3f235dba998a9
change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8
Best regards,
--
Bernd Schubert <bschubert@ddn.com>