| From: |
| Breno Leitao <leitao-AT-debian.org> |
| To: |
| "David S. Miller" <davem-AT-davemloft.net>, Eric Dumazet <edumazet-AT-google.com>, Jakub Kicinski <kuba-AT-kernel.org>, Paolo Abeni <pabeni-AT-redhat.com>, Simon Horman <horms-AT-kernel.org>, Kuniyuki Iwashima <kuniyu-AT-google.com>, Willem de Bruijn <willemb-AT-google.com>, metze-AT-samba.org, axboe-AT-kernel.dk, Stanislav Fomichev <sdf-AT-fomichev.me> |
| Subject: |
| [PATCH net-next v2 0/4] net: move .getsockopt away from __user buffers |
| Date: |
| Wed, 01 Apr 2026 08:44:25 -0700 |
| Message-ID: |
| <20260401-getsockopt-v2-0-611df6771aff@debian.org> |
| Cc: |
| io-uring-AT-vger.kernel.org, bpf-AT-vger.kernel.org, netdev-AT-vger.kernel.org, Linus Torvalds <torvalds-AT-linux-foundation.org>, linux-kernel-AT-vger.kernel.org, kernel-team-AT-meta.com, Breno Leitao <leitao-AT-debian.org> |
| Archive-link: |
| Article |
Currently, the .getsockopt callback requires __user pointers:
int (*getsockopt)(struct socket *sock, int level,
int optname, char __user *optval, int __user *optlen);
This prevents kernel callers (io_uring, BPF) from using getsockopt on
levels other than SOL_SOCKET, since they pass kernel pointers.
Following Linus' suggestion [0], this series introduces sockopt_t, a
type-safe wrapper around iov_iter, and a getsockopt_iter callback that
works with both user and kernel buffers. AF_PACKET and CAN raw are
converted as initial users, with selftests covering the trickiest
conversion patterns.
[0] https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3t...
Below are some questions raised during the RFC discussion:
1) Should optlen be an iov_iter as well?
No. optlen can remain a plain kernel int since do_sock_getsockopt_iter() syncs
it back to userspace on both success and failure. The existing callback
patterns all work with this approach:
a) Most callbacks (roughly 2/3) always write back optlen.
b) Some callbacks read optlen but never update it. The original
value is written back unchanged.
c) CAN raw updates optlen even on error (-ERANGE) to report the
required buffer size:
err = -ERANGE;
if (put_user(fsize, optlen))
err = -EFAULT;
No regression, since opt.optlen is always written back to
userspace by the wrapper.
d) Bluetooth uses put_user() with mixed sizes (u32, u16, u8) but
never updates optlen. Same as case (b).
2) Can callbacks change iov_iter direction mid-flight?
Yes. Some protocols read from and then write back to optval in the same
getsockopt call. For example, PACKET_HDRLEN reads a tpacket version from optval
and writes back the corresponding header size.
The converted callback handles this by temporarily flipping the iter direction,
reverting the position, and writing back:
case PACKET_HDRLEN:
// opt->iter.data_source is ITER_SOURCE;
if (copy_from_iter(&val, len, &opt->iter) != len)
return -EFAULT;
// unroll the bytes
iov_iter_revert(&opt->iter, len);
opt->iter.data_source = ITER_DEST;
// ... update val ...
if (copy_to_iter(&val, len, &opt->iter) != len)
return -EFAULT;
The callback needs to handle two things after reading from the iter:
reset the position with iov_iter_revert(), and flip data_source back
to ITER_DEST before writing.
- ITER_DEST — the iter is a destination (kernel writes to it).
copy_to_iter() works, copy_from_iter() refuses.
- ITER_SOURCE — the iter is a source (kernel reads from it).
copy_from_iter() works, copy_to_iter() refuses.
3) In which case iov_iter_revert() needs to be called?
When a callback needs to read from and then write back to the same
buffer in a single getsockopt call. The iter advances its position on
copy_from_iter(), so you need iov_iter_revert() to reset the position
back to the start before you can copy_to_iter() into the same location.
Without the revert, copy_to_iter() would write past the end of the
buffer since the iter already advanced during the read.
4) Do we have any selftest for this change?
Yes, I've created a commit that I am using to test it, but, I am not
sure how useful it is rigth now, so, not appending it here.
You can find it at
https://github.com/leitao/linux/commit/2d9311947061f1baa4...
Note: The dance regarding changes to iov_iter_revert() (2) and
opt->iter.data_source (3) is a bit fragile. It will not be a bad idea to
creaet a helper (e.g., sockopt_read_val()) would be safer to prevent
others from getting it wrong.
I am not adding it now, so, it is easier to read the bare bones of the
change and helpers can come later.
Link: https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3t... [0]
---
Changes in v2:
- Restore optlen even on error path (getsockopt_iter fails)
- Move af_packet.c and can instead of netlink (given these are the most
complicate ones).
- Link to v1: https://patch.msgid.link/20260130-getsockopt-v1-0-9154fcf...
---
Breno Leitao (4):
net: add getsockopt_iter callback to proto_ops
net: call getsockopt_iter if available
af_packet: convert to getsockopt_iter
can: raw: convert to getsockopt_iter
include/linux/net.h | 19 +++++++++++++++++++
net/can/raw.c | 28 +++++++++++++---------------
net/packet/af_packet.c | 18 ++++++++++--------
net/socket.c | 48 +++++++++++++++++++++++++++++++++++++++++++++---
4 files changed, 87 insertions(+), 26 deletions(-)
---
base-commit: 2d9311947061f1baa43858f597dd6c54d7ccc5d2
change-id: 20260130-getsockopt-9f36625eedcb
Best regards,
--
Breno Leitao <leitao@debian.org>