Age | Commit message (Collapse) | Author |
|
Introduce a helper to initialize the BBR pacing rate unconditionally,
based on the current cwnd and RTT estimate. This is a pure refactor,
but is needed for two following fixes.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce a helper to convert a BBR bandwidth and gain factor to a
pacing rate in bytes per second. This is a pure refactor, but is
needed for two following fixes.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In bbr_set_pacing_rate(), which decides whether to cut the pacing
rate, there was some code that considered exiting STARTUP to be
equivalent to the notion of filling the pipe (i.e.,
bbr_full_bw_reached()). Specifically, as the code was structured,
exiting STARTUP and going into PROBE_RTT could cause us to cut the
pacing rate down to something silly and low, based on whatever
bandwidth samples we've had so far, when it's possible that all of
them have been small app-limited bandwidth samples that are not
representative of the bandwidth available in the path. (The code was
correct at the time it was written, but the state machine changed
without this spot being adjusted correspondingly.)
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When there is an established connection in direction A->B, it is
possible to receive a packet on port B which then executes
ct(commit,force) without first performing ct() - ie, a lookup.
In this case, we would expect that this packet can delete the existing
entry so that we can commit a connection with direction B->A. However,
currently we only perform a check in skb_nfct_cached() for whether
OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
lookup previously occurred. In the above scenario, a lookup has not
occurred but we should still be able to statelessly look up the
existing entry and potentially delete the entry if it is in the
opposite direction.
This patch extends the check to also hint that if the action has the
force flag set, then we will lookup the existing entry so that the
force check at the end of skb_nfct_cached has the ability to delete
the connection.
Fixes: dd41d330b03 ("openvswitch: Add force commit.")
CC: Pravin Shelar <pshelar@nicira.com>
CC: dev@openvswitch.org
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some time ago David Woodhouse reported skb_under_panic
when we try to push ethernet header to fragmented ipv6 skbs.
It was fixed for ipv6 by Florian Westphal in
commit 1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak")
However similar problem still exist in ipv4.
It does not trigger skb_under_panic due paranoid check
in ip_finish_output2, however according to Alexey Kuznetsov
current state is abnormal and ip_fragment should be fixed too.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random updates from Ted Ts'o:
"Add wait_for_random_bytes() and get_random_*_wait() functions so that
callers can more safely get random bytes if they can block until the
CRNG is initialized.
Also print a warning if get_random_*() is called before the CRNG is
initialized. By default, only one single-line warning will be printed
per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
warning will be printed for each function which tries to get random
bytes before the CRNG is initialized. This can get spammy for certain
architecture types, so it is not enabled by default"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: reorder READ_ONCE() in get_random_uXX
random: suppress spammy warnings about unseeded randomness
random: warn when kernel uses unseeded randomness
net/route: use get_random_int for random counter
net/neighbor: use get_random_u32 for 32-bit hash random
rhashtable: use get_random_u32 for hash_rnd
ceph: ensure RNG is seeded before using
iscsi: ensure RNG is seeded before use
cifs: use get_random_u32 for 32-bit lock random
random: add get_random_{bytes,u32,u64,int,long,once}_wait family
random: add wait_for_random_bytes() API
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ->s_options removal from Al Viro:
"Preparations for fsmount/fsopen stuff (coming next cycle). Everything
gets moved to explicit ->show_options(), killing ->s_options off +
some cosmetic bits around fs/namespace.c and friends. Basically, the
stuff needed to work with fsmount series with minimum of conflicts
with other work.
It's not strictly required for this merge window, but it would reduce
the PITA during the coming cycle, so it would be nice to have those
bits and pieces out of the way"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
isofs: Fix isofs_show_options()
VFS: Kill off s_options and helpers
orangefs: Implement show_options
9p: Implement show_options
isofs: Implement show_options
afs: Implement show_options
affs: Implement show_options
befs: Implement show_options
spufs: Implement show_options
bpf: Implement show_options
ramfs: Implement show_options
pstore: Implement show_options
omfs: Implement show_options
hugetlbfs: Implement show_options
VFS: Don't use save/replace_mount_options if not using generic_show_options
VFS: Provide empty name qstr
VFS: Make get_filesystem() return the affected filesystem
VFS: Clean up whitespace in fs/namespace.c and fs/super.c
Provide a function to create a NUL-terminated string from unterminated data
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull network field-by-field copy-in updates from Al Viro:
"This part of the misc compat queue was held back for review from
networking folks and since davem has jus ACKed those..."
* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
get_compat_bpf_fprog(): don't copyin field-by-field
get_compat_msghdr(): get rid of field-by-field copyin
copy_msghdr_from_user(): get rid of field-by-field copyin
|
|
Marcelo noticed an array overflow caused by commit c28445c3cb07
("sctp: add reconf_enable in asoc ep and netns"), in which sctp
would add SCTP_CID_RECONF into extensions when reconf_enable is
set in sctp_make_init and sctp_make_init_ack.
Then now when all ext chunks are set, 4 ext chunk ids can be put
into extensions array while extensions array size is 3. It would
cause a kernel panic because of this overflow.
This patch is to fix it by defining extensions array size is 4 in
both sctp_make_init and sctp_make_init_ack.
Fixes: c28445c3cb07 ("sctp: add reconf_enable in asoc ep and netns")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make name consistent with other TC event notification routines, such as
tcf_add_notify() and tcf_del_notify()
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When PACKET_QDISC_BYPASS is not used, Tx queue selection will be done
before the packet is enqueued, taking into account any mappings set by
a queuing discipline such as mqprio without hardware offloading. This
selection may be affected by a previously saved queue_mapping, either on
the Rx path, or done before the packet reaches the device, as it's
currently the case for AF_PACKET.
In order for queue selection to work as expected when using traffic
control, there can't be another selection done before that point is
reached, so move the call to packet_pick_tx_queue to
packet_direct_xmit, leaving the default xmit path as it was before
PACKET_QDISC_BYPASS was introduced.
A forward declaration of packet_pick_tx_queue() is introduced to avoid
the need to reorder the functions within the file.
Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With 802.1ad support the vlan_ingress code started checking for vlan
protocol mismatch which causes the current tag to be inserted and the
bridge vlan protocol & pvid to be set. The vlan tag insertion changes
the skb mac_header and thus the lookup mac dest pointer which was loaded
prior to calling br_allowed_ingress in br_handle_frame_finish is VLAN_HLEN
bytes off now, pointing to the last two bytes of the destination mac and
the first four of the source mac causing lookups to always fail and
broadcasting all such packets to all ports. Same thing happens for locally
originated packets when passing via br_dev_xmit. So load the dest pointer
after the vlan checks and possible skb change.
Fixes: 8580e2117c06 ("bridge: Prepare for 802.1ad vlan filtering support")
Reported-by: Anitha Narasimha Murthy <anitha@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we convert atomic_t to refcount_t, a new kernel warning
on "increment on 0" is introduced in the netpoll code,
zap_completion_queue(). In fact for this special case, we know
the refcount is 0 and we just have to set it to 1 to satisfy
the following dev_kfree_skb_any(), so we can just use
refcount_set(..., 1) instead.
Fixes: 633547973ffc ("net: convert sk_buff.users from atomic_t to refcount_t")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Reshetova, Elena <elena.reshetova@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- Fix -EACCESS on commit to DS handling
- Fix initialization of nfs_page_array->npages
- Only invalidate dentries that are actually invalid
Features:
- Enable NFSoRDMA transparent state migration
- Add support for lookup-by-filehandle
- Add support for nfs re-exporting
Other bugfixes and cleanups:
- Christoph cleaned up the way we declare NFS operations
- Clean up various internal structures
- Various cleanups to commits
- Various improvements to error handling
- Set the dt_type of . and .. entries in NFS v4
- Make slot allocation more reliable
- Fix fscache stat printing
- Fix uninitialized variable warnings
- Fix potential list overrun in nfs_atomic_open()
- Fix a race in NFSoRDMA RPC reply handler
- Fix return size for nfs42_proc_copy()
- Fix against MAC forgery timing attacks"
* tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (68 commits)
NFS: Don't run wake_up_bit() when nobody is waiting...
nfs: add export operations
nfs4: add NFSv4 LOOKUPP handlers
nfs: add a nfs_ilookup helper
nfs: replace d_add with d_splice_alias in atomic_open
sunrpc: use constant time memory comparison for mac
NFSv4.2 fix size storage for nfs42_proc_copy
xprtrdma: Fix documenting comments in frwr_ops.c
xprtrdma: Replace PAGE_MASK with offset_in_page()
xprtrdma: FMR does not need list_del_init()
xprtrdma: Demote "connect" log messages
NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
xprtrdma: Don't defer MR recovery if ro_map fails
xprtrdma: Fix FRWR invalidation error recovery
xprtrdma: Fix client lock-up after application signal fires
xprtrdma: Rename rpcrdma_req::rl_free
xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
xprtrdma: Pre-mark remotely invalidated MRs
xprtrdma: On invalidation failure, remove MWs from rl_registered
...
|
|
Pull nfsd updates from Bruce Fields:
"Chuck's RDMA update overhauls the "call receive" side of the
RPC-over-RDMA transport to use the new rdma_rw API.
Christoph cleaned the way nfs operations are declared, removing a
bunch of function-pointer casts and declaring the operation vectors as
const.
Christoph's changes touch both client and server, and both client and
server pulls this time around should be based on the same commits from
Christoph"
* tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux: (53 commits)
svcrdma: fix an incorrect check on -E2BIG and -EINVAL
nfsd4: factor ctime into change attribute
svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
svcrdma: use offset_in_page() macro
svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw API
svcrdma: Clean-up svc_rdma_unmap_dma
svcrdma: Remove frmr cache
svcrdma: Remove unused Read completion handlers
svcrdma: Properly compute .len and .buflen for received RPC Calls
svcrdma: Use generic RDMA R/W API in RPC Call path
svcrdma: Add recvfrom helpers to svc_rdma_rw.c
sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst
svcrdma: Don't account for Receive queue "starvation"
svcrdma: Improve Reply chunk sanity checking
svcrdma: Improve Write chunk sanity checking
svcrdma: Improve Read chunk sanity checking
svcrdma: Remove svc_rdma_marshal.c
svcrdma: Avoid Send Queue overflow
svcrdma: Squelch disconnection messages
sunrpc: Disable splice for krb5i
...
|
|
The configure callback of fib_rules_ops can change the refcnt of a
fib rule. For instance, mlxsw takes a refcnt when adding the processing
of the rule to a work queue. Thus the rule refcnt can not be reset to
to 1 afterwards. Move the refcnt setting to after the allocation.
Fixes: 5361e209dd30 ("net: avoid one splat in fib_nl_delrule()")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Otherwise, we enable a MAC forgery via timing attack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up.
FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up.
Reported by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up.
Commit 38f1932e60ba ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.
Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
invokes rpcrdma_defer_mr_recovery in some error cases where the MW
has not even been DMA-mapped yet.
Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.
Also note that if ib_dma_map_sg is asked to map 0 nents, it will
return 0. So the extra "if (i == 0)" check is no longer needed.
Fixes: 42fe28f60763 ("xprtrdma: Do not leak an MW during a DMA ...")
Fixes: 505bbe64dd04 ("xprtrdma: Refactor MR recovery work queues")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.
I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.
Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.
Commit 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.
Fixes: d7a21c1bed54 ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.
The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.
With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.
This sets up a race between two processes:
1. After the signal, xprt_rdma_free calls ro_unmap_safe.
2. While ro_unmap_safe is still running, the server replies and
rpcrdma_reply_handler runs, calling ro_unmap_sync.
Both processes invoke ib_unmap_fmr on the same FMR.
The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.
If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.
But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.
Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up: I'm about to use the rl_free field for purposes other than
a free list. So use a more generic name.
This is a refactoring change only.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
There are rare cases where an rpcrdma_req can be re-used (via
rpcrdma_buffer_put) while the RPC reply handler is still running.
This is due to a signal firing at just the wrong instant.
Since commit 9d6b04097882 ("xprtrdma: Place registered MWs on a
per-req list"), rpcrdma_mws are self-contained; ie., they fully
describe an MR and scatterlist, and no part of that information is
stored in struct rpcrdma_req.
As part of closing the above race window, pass only the req's list
of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
itself.
Some extra transport header sanity checking is removed. Since the
client depends on its own recollection of what memory had been
registered, there doesn't seem to be a way to abuse this change.
And, the check was not terribly effective. If the client had sent
Read chunks, the "list_empty" test is negative in both of the
removed cases, which are actually looking for Write or Reply
chunks.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
There are rare cases where an rpcrdma_req and its matched
rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
reply handler is still using that req. This is typically due to a
signal firing at just the wrong instant.
As part of closing this race window, avoid using the wrong
rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
invalidated while we are sure the rep is still OK to use.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Callers assume the ro_unmap_sync and ro_unmap_safe methods empty
the list of registered MRs. Ensure that all paths through
fmr_op_unmap_sync() remove MWs from that list.
Fixes: 9d6b04097882 ("xprtrdma: Place registered MWs on a ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In xprt_alloc_slot(), the spin lock is only needed to provide atomicity
between the atomic_add_unless() failure and the call to xprt_add_backlog().
We do not actually need to hold it across the memory allocation itself.
By dropping the lock, we can use a more resilient GFP_NOFS allocation,
just as we now do in the rest of the RPC client code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
struct rpc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
|
|
Pass struct rpc_request as the first argument instead of an untyped blob.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pass struct rpc_request as the first argument instead of an untyped blob,
and mark the data object as const.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
|
|
Merge yet more updates from Andrew Morton:
- various misc things
- kexec updates
- sysctl core updates
- scripts/gdb udpates
- checkpoint-restart updates
- ipc updates
- kernel/watchdog updates
- Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature"
- "stackprotector: ascii armor the stack canary"
- more MM bits
- checkpatch updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
writeback: rework wb_[dec|inc]_stat family of functions
ARM: samsung: usb-ohci: move inline before return type
video: fbdev: omap: move inline before return type
video: fbdev: intelfb: move inline before return type
USB: serial: safe_serial: move __inline__ before return type
drivers: tty: serial: move inline before return type
drivers: s390: move static and inline before return type
x86/efi: move asmlinkage before return type
sh: move inline before return type
MIPS: SMP: move asmlinkage before return type
m68k: coldfire: move inline before return type
ia64: sn: pci: move inline before type
ia64: move inline before return type
FRV: tlbflush: move asmlinkage before return type
CRIS: gpio: move inline before return type
ARM: HP Jornada 7XX: move inline before return type
ARM: KVM: move asmlinkage before type
checkpatch: improve the STORAGE_CLASS test
mm, migration: do not trigger OOM killer when migrating memory
drm/i915: use __GFP_RETRY_MAYFAIL
...
|
|
The current check will always be true and will always jump to
err1, this looks dubious to me. I believe && should be used
instead of ||.
Detected by CoverityScan, CID#1450120 ("Logically Dead Code")
Fixes: 107c1d0a991a ("svcrdma: Avoid Send Queue overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Don't populate array error_code on the stack but make it static. Makes
the object code smaller by almost 250 bytes:
Before:
text data bss dec hex filename
10366 983 0 11349 2c55 net/dccp/input.o
After:
text data bss dec hex filename
10161 1039 0 11200 2bc0 net/dccp/input.o
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bpf_skb_adjust_net() ignores the return value of bpf_skb_net_shrink/grow,
and always return 0, fix it by return 'ret'.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Fix 64-bit division in mlx5 IPSEC offload support, from Ilan Tayari
and Arnd Bergmann.
2) Fix race in statistics gathering in bnxt_en driver, from Michael
Chan.
3) Can't use a mutex in RCU reader protected section on tap driver, from
Cong WANG.
4) Fix mdb leak in bridging code, from Eduardo Valentin.
5) Fix free of wrong pointer variable in nfp driver, from Dan Carpenter.
6) Buffer overflow in brcmfmac driver, from Arend van SPriel.
7) ioremap_nocache() return value needs to be checked in smsc911x
driver, from Alexey Khoroshilov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
net: stmmac: revert "support future possible different internal phy mode"
sfc: don't read beyond unicast address list
datagram: fix kernel-doc comments
socket: add documentation for missing elements
smsc911x: Add check for ioremap_nocache() return code
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
net: hns: Bugfix for Tx timeout handling in hns driver
net: ipmr: ipmr_get_table() returns NULL
nfp: freeing the wrong variable
mlxsw: spectrum_switchdev: Check status of memory allocation
mlxsw: spectrum_switchdev: Remove unused variable
mlxsw: spectrum_router: Fix use-after-free in route replace
mlxsw: spectrum_router: Add missing rollback
samples/bpf: fix a build issue
bridge: mdb: fix leak on complete_info ptr on fail path
tap: convert a mutex to a spinlock
cxgb4: fix BUG() on interrupt deallocating path of ULD
qed: Fix printk option passed when printing ipv6 addresses
net: Fix minor code bug in timestamping.txt
net: stmmac: Make 'alloc_dma_[rt]x_desc_resources()' look even closer
...
|
|
semantic
__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator. This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always
ignored for smaller sizes. This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.
Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success. This will work independent of the order and overrides the
default allocator behavior. Page allocator users have several levels of
guarantee vs. cost options (take GFP_KERNEL as an example)
- GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
attempt to free memory at all. The most light weight mode which even
doesn't kick the background reclaim. Should be used carefully because
it might deplete the memory and the next user might hit the more
aggressive reclaim
- GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
allocation without any attempt to free memory from the current
context but can wake kswapd to reclaim memory if the zone is below
the low watermark. Can be used from either atomic contexts or when
the request is a performance optimization and there is another
fallback for a slow path.
- (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
non sleeping allocation with an expensive fallback so it can access
some portion of memory reserves. Usually used from interrupt/bh
context with an expensive slow path fallback.
- GFP_KERNEL - both background and direct reclaim are allowed and the
_default_ page allocator behavior is used. That means that !costly
allocation requests are basically nofail but there is no guarantee of
that behavior so failures have to be checked properly by callers
(e.g. OOM killer victim is allowed to fail currently).
- GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
and all allocation requests fail early rather than cause disruptive
reclaim (one round of reclaim in this implementation). The OOM killer
is not invoked.
- GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
behavior and all allocation requests try really hard. The request
will fail if the reclaim cannot make any progress. The OOM killer
won't be triggered.
- GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
and all allocation requests will loop endlessly until they succeed.
This might be really dangerous especially for larger orders.
Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic. No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.
This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.
[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|