diff options
author | Arjun Roy <arjunroy@google.com> | 2020-02-14 15:30:50 -0800 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2020-02-16 19:25:02 -0800 |
commit | 33946518d493cdf10aedb4a483f1aa41948a3dab (patch) | |
tree | ac31781461616de8689ed71d153a29ae8b8523ee | |
parent | c8856c051454909e5059df4e81c77b9c366c5515 (diff) |
tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.
For applications using epoll, returning sk_err along with the result
of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a spurious wakeup.
Consider a multi-threaded application using epoll. A thread may awaken
with EPOLLIN but another thread may already be reading. The
spuriously-awoken thread does not necessarily know that another thread
'won'; rather, it may be possible that it was woken up due to the
presence of an error if there is no data. A zerocopy read receiving 0
bytes thus would need to be followed up by recvmsg to be sure.
Instead, we return sk_err directly with zerocopy, so the application
can avoid this extra system call.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r-- | include/uapi/linux/tcp.h | 1 | ||||
-rw-r--r-- | net/ipv4/tcp.c | 8 |
2 files changed, 8 insertions, 1 deletions
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 548f480b9c66..1a7fc856e237 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -346,5 +346,6 @@ struct tcp_zerocopy_receive { __u32 length; /* in/out: number of bytes to map/mapped */ __u32 recv_skip_hint; /* out: amount of bytes to skip */ __u32 inq; /* out: amount of bytes in read queue */ + __s32 err; /* out: socket error */ }; #endif /* _UAPI_LINUX_TCP_H */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a697f1455f8d..1b685485a5b5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3676,14 +3676,20 @@ static int do_tcp_getsockopt(struct sock *sk, int level, lock_sock(sk); err = tcp_zerocopy_receive(sk, &zc); release_sock(sk); + if (len == sizeof(zc)) + goto zerocopy_rcv_sk_err; switch (len) { - case sizeof(zc): + case offsetofend(struct tcp_zerocopy_receive, err): + goto zerocopy_rcv_sk_err; case offsetofend(struct tcp_zerocopy_receive, inq): goto zerocopy_rcv_inq; case offsetofend(struct tcp_zerocopy_receive, length): default: goto zerocopy_rcv_out; } +zerocopy_rcv_sk_err: + if (!err) + zc.err = sock_error(sk); zerocopy_rcv_inq: zc.inq = tcp_inq_hint(sk); zerocopy_rcv_out: |