Age | Commit message (Collapse) | Author |
|
Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery")
introduced access to state after it was just potentially freed by
nfs4_put_open_state leading to a random data corruption somewhere.
BUG: unable to handle kernel paging request at ffff88004941ee40
IP: [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm
CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000
RIP: 0010:[<ffffffff813baf01>] [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
RSP: 0018:ffff88003ff4bd68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8
RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88
RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98
R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00
FS: 0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0
Stack:
ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68
ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48
ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40
Call Trace:
[<ffffffff813baad5>] ? nfs4_do_reclaim+0x35/0x740
[<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0
[<ffffffff813bb7cd>] nfs4_run_state_manager+0x5ed/0xa40
[<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740
[<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740
[<ffffffff810af0d1>] kthread+0x101/0x120
[<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0
[<ffffffff818843af>] ret_from_fork+0x1f/0x40
[<ffffffff810aefd0>] ? kthread_create_on_node+0x250/0x250
Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff <f0> 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6
RIP [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
RSP <ffff88003ff4bd68>
CR2: ffff88004941ee40
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Older versions of gcc don't understand named initializers inside a
anonymous structure or union member. It can be worked around by adding
the bracin gin the initializer for the anonymous member.
Without this, gcc 4.4.4 will fail the build with
CC fs/nfs/nfs4state.o
fs/nfs/nfs4state.c:69: error: unknown field ‘data’ specified in initializer
fs/nfs/nfs4state.c:69: warning: missing braces around initializer
fs/nfs/nfs4state.c:69: warning: (near initialization for ‘zero_stateid.<anonymous>.data’)
make[2]: *** [fs/nfs/nfs4state.o] Error 1
introduced in commit 93b717fd81bf ("NFSv4: Label stateids with the type")
Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Anna Schumaker <Anna.Schumaker@netapp.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When we're using a delegation to represent our open state, we should
ensure that we use the stateid that was used to create that delegation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In order to more easily distinguish what kind of stateid we are dealing
with, introduce a type that can be used to label the stateid structure.
The label will be useful both for debugging, but also when dealing with
operations like SETATTR, READ and WRITE that can take several different
types of stateid as arguments.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Currently, we don't test if the state owner is in use before we try to
recover it. The problem is that if the refcount is zero, then the
state owner will be waiting on the lru list for garbage collection.
The expectation in that case is that if you bump the refcount, then
you must also remove the state owner from the lru list. Otherwise
the call to nfs4_put_state_owner will corrupt that list by trying
to add our state owner a second time.
Avoid the whole problem by just skipping state owners that hold no
state.
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
mount
A test case is as the description says:
open(foobar, O_WRONLY);
sleep() --> reboot the server
close(foobar)
The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
line before going to restart, there is
clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
out state and when we go to close it, “call_close” doesn’t get set as
state flag is not set and CLOSE doesn’t go on the wire.
Signed-off-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
All these functions do is call nfs41_ping_server() without adding
anything. Let's remove them and give nfs41_ping_server() a better name
instead.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
RFC5661 states:
The server has encountered an unrecoverable fault with the
backchannel (e.g., it has lost track of the sequence ID for a slot
in the backchannel). The client MUST stop sending more requests
on the session's fore channel, wait for all outstanding requests
to complete on the fore and back channel, and then destroy the
session.
Ensure we do so...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Try to handle this for now by invalidating all outstanding layouts for this
server and then testing all the open+lock+delegation stateids.
At some later stage, we may want to optimise by separating out the testing of
delegation stateids only, and adding testing of layout stateids.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If the server tells us that only some state has been revoked, then we
need to run the full TEST_STATEID dog and pony show in order to discover
which locks and delegations are still OK. Currently we blow away all
state, which means that we lose all locks!
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
While the NFSv4.1 code has always drained the slot tables in order to stop
non-recovery related RPC calls when doing lease recovery, the NFSv4 code
did not.
The reason for the difference in behaviour is that NFSv4 does not have
session state, and so RPC calls can in theory proceed while recovery is
happening. In practice, however, anything I/O or state related needs to
wait until recovery is over.
This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that
we can simplify the state recovery code by assuming that we do not have to
deal with races between recovery and ordinary I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Problem: When an operation like WRITE receives a BAD_STATEID, even though
recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
the open state, because of clearing delegation state for the associated
inode, nfs_inode_find_state_and_recover() gets called and it makes the
same state with RECLAIM_NOGRACE flag again. As a results, when we restart
looking over the open states, we end up in the infinite loop instead of
breaking out in the next test of state flags.
Solution: unset the RECLAIM_NOGRACE set because of
calling of nfs_inode_find_state_and_recover() after returning from calling
recover_open() function.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull NFS client updates from Trond Myklebust:
"Another set of mainly bugfixes and a couple of cleanups. No new
functionality in this round.
Highlights include:
Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping
Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned"
mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()
Features:
- Various improvements to the RDMA transport code's handling of
memory registration
- Various code cleanups"
* tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
fs/nfs: fix new compiler warning about boolean in switch
nfs: Remove unneeded casts in nfs
NFS: Don't attempt to decode missing directory entries
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
NFS: Rename idmap.c to nfs4idmap.c
NFS: Move nfs_idmap.h into fs/nfs/
NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
NFS: Add a stub for GETDEVICELIST
nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
nfs: fix DIO good bytes calculation
nfs: Fetch MOUNTED_ON_FILEID when updating an inode
sunrpc: make debugfs file creation failure non-fatal
nfs: fix high load average due to callback thread sleeping
NFS: Reduce time spent holding the i_mutex during fallocate()
NFS: Don't zap caches on fallocate()
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Add "open" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
...
|
|
This file is only used internally to the NFS v4 module, so it doesn't
need to be in the global include path. I also renamed it from
nfs_idmap.h to nfs4idmap.h to emphasize that it's an NFSv4-only include
file.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If the call to exchange-id returns with the EXCHGID4_FLAG_CONFIRMED_R flag
set, then that means our lease was established by a previous mount instance.
Ensure that we detect this situation, and that we clear the state held by
that mount.
Reported-by: Jorge Mora <Jorge.Mora@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We do not want to allow a race with another NFS mount to cause
nfs41_walk_client_list() to establish a lease on our nfs_client before
we're done checking for trunking.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights incluse:
Features:
- Removing the forced serialisation of open()/close() calls in
NFSv4.x (x>0) makes for a significant performance improvement in
metadata intensive workloads.
- Full support for the pNFS "flexible files" layout type
- Further RPC/RDMA client improvements from Chuck
Bugfixes:
- Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING
- Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL
- Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called
as part of the namespace cleanup,
- Stable fix: Ensure we reference the inode for return-on-close in
delegreturn
- Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to
the same source address/port combination during a disconnect/
reconnect event. This is a requirement imposed by most NFSv3
server duplicate reply cache implementations.
Optimisations:
- Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT
Other:
- Add Anna Schumaker as co-maintainer for the NFS client"
* tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits)
SUNRPC: Cleanup to remove xs_tcp_close()
pnfs: delete an unintended goto
pnfs/flexfiles: Do not dprintk after the free
SUNRPC: Fix stupid typo in xs_sock_set_reuseport
SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG
SUNRPC: Handle connection reset more efficiently.
SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag
SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release
SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection
SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT
SUNRPC: Remove TCP socket linger code
SUNRPC: Remove TCP client connection reset hack
SUNRPC: TCP/UDP always close the old socket before reconnecting
SUNRPC: Add helpers to prevent socket create from racing
SUNRPC: Ensure xs_reset_transport() resets the close connection flags
SUNRPC: Do not clear the source port in xs_reset_transport
SUNRPC: Handle EADDRINUSE on connect
SUNRPC: Set SO_REUSEPORT socket option for TCP connections
NFSv4.1: Fix pnfs_put_lseg races
NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
...
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When we relax the sequencing on the NFSv4.1 OPEN/CLOSE code, we will want
to use the value NULL to indicate that no sequencing is needed.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We can now add a dedicated spinlock without expanding struct inode.
Change to using that to protect the various i_flctx lists.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Christoph Hellwig <hch@lst.de>
|
|
* bugfixes:
NFSv4.1: Fix an NFSv4.1 state renewal regression
NFSv4: fix open/lock state recovery error handling
NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
NFS: Fabricate fscache server index key correctly
SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
nfs: fix duplicate proc entries
|
|
The current open/lock state recovery unfortunately does not handle errors
such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
just proceeds as if the state manager is finished recovering.
This patch ensures that we loop back, handle higher priority errors
and complete the open/lock state recovery.
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
a second time, then the client will currently take this to mean that it must
declare all locks to be stale, and hence ineligible for reboot recovery.
RFC3530 and RFC5661 both suggest that the client should instead rely on the
server to respond to inelegible open share, lock and delegation reclaim
requests with NFS4ERR_NO_GRACE in this situation.
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Commit c9fdeb28 removed a 'continue' after checking if the lease needs
to be renewed. However, if client hasn't moved, the code falls down to
starting reboot recovery erroneously (ie., sends open reclaim and gets
back stale_clientid error) before recovering from getting stale_clientid
on the renew operation.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: c9fdeb280b8c (NFS: Add basic migration support to state manager thread)
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This reverts commit 49a4bda22e186c4d0eb07f4a36b5b1a378f9398d.
Christoph reported an oops due to the above commit:
generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1]
SMP
[ 2187.042899] Modules linked in:
[ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151
[ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 2187.044287] Workqueue: nfsiod free_lock_state_work
[ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000
[ 2187.044287] RIP: 0010:[<ffffffff81361ca6>] [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296
[ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000
[ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8
[ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000
[ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240
[ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000
[ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0
[ 2187.044287] Stack:
[ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258
[ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0
[ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240
[ 2187.044287] Call Trace:
[ 2187.044287] [<ffffffff810cc877>] process_one_work+0x1c7/0x490
[ 2187.044287] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490
[ 2187.044287] [<ffffffff810cd569>] worker_thread+0x119/0x4f0
[ 2187.044287] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10
[ 2187.044287] [<ffffffff810cd450>] ? init_pwq+0x190/0x190
[ 2187.044287] [<ffffffff810d3c6f>] kthread+0xdf/0x100
[ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0
[ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3
[ 2187.044287] RIP [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP <ffff88007a4efd58>
[ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]---
The original reason for this patch was because the fl_release_private
operation couldn't sleep. With commit ed9814d85810 (locks: defer freeing
locks in locks_delete_lock until after i_lock has been dropped), this is
no longer a problem so we can revert this patch.
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- stable fix for a bug in nfs3_list_one_acl()
- speed up NFS path walks by supporting LOOKUP_RCU
- more read/write code cleanups
- pNFS fixes for layout return on close
- fixes for the RCU handling in the rpcsec_gss code
- more NFS/RDMA fixes"
* tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
nfs: reject changes to resvport and sharecache during remount
NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error
SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred
NFS: fix two problems in lookup_revalidate in RCU-walk
NFS: allow lockless access to access_cache
NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU
NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU
NFS: support RCU_WALK in nfs_permission()
sunrpc/auth: allow lockless (rcu) lookup of credential cache.
NFS: prepare for RCU-walk support but pushing tests later in code.
NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.
NFS: add checks for returned value of try_module_get()
nfs: clear_request_commit while holding i_lock
pnfs: add pnfs_put_lseg_async
pnfs: find swapped pages on pnfs commit lists too
nfs: fix comment and add warn_on for PG_INODE_REF
nfs: check wait_on_bit_lock err in page_group_lock
sunrpc: remove "ec" argument from encrypt_v2 operation
sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c
sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c
...
|
|
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().
So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.
Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.
All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.
The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"
The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.
A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).
Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We got a report of the following warning in Fedora:
BUG: sleeping function called from invalid context at mm/slub.c:969
in_atomic(): 1, irqs_disabled(): 0, pid: 533, name: bash
3 locks held by bash/533:
#0: (&sp->so_delegreturn_mutex){+.+...}, at: [<ffffffffa033da62>] nfs4_proc_lock+0x262/0x910 [nfsv4]
#1: (&nfsi->rwsem){.+.+.+}, at: [<ffffffffa033da6a>] nfs4_proc_lock+0x26a/0x910 [nfsv4]
#2: (&sb->s_type->i_lock_key#23){+.+...}, at: [<ffffffff812998dc>] flock_lock_file_wait+0x8c/0x3a0
CPU: 0 PID: 533 Comm: bash Not tainted 3.15.0-0.rc1.git1.1.fc21.x86_64 #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000000 00000000d664ff3c ffff880078b69a70 ffffffff817e82e0
0000000000000000 ffff880078b69a98 ffffffff810cf1a4 0000000000000050
0000000000000050 ffff88007cc01a00 ffff880078b69ad8 ffffffff8121449e
Call Trace:
[<ffffffff817e82e0>] dump_stack+0x4d/0x66
[<ffffffff810cf1a4>] __might_sleep+0x184/0x240
[<ffffffff8121449e>] kmem_cache_alloc_trace+0x4e/0x330
[<ffffffffa0331124>] ? nfs4_release_lockowner+0x74/0x110 [nfsv4]
[<ffffffffa0331124>] nfs4_release_lockowner+0x74/0x110 [nfsv4]
[<ffffffffa0352340>] nfs4_put_lock_state+0x90/0xb0 [nfsv4]
[<ffffffffa0352375>] nfs4_fl_release_lock+0x15/0x20 [nfsv4]
[<ffffffff81297515>] locks_free_lock+0x45/0x90
[<ffffffff8129996c>] flock_lock_file_wait+0x11c/0x3a0
[<ffffffffa033da6a>] ? nfs4_proc_lock+0x26a/0x910 [nfsv4]
[<ffffffffa033301e>] do_vfs_lock+0x1e/0x30 [nfsv4]
[<ffffffffa033da79>] nfs4_proc_lock+0x279/0x910 [nfsv4]
[<ffffffff810dbb26>] ? local_clock+0x16/0x30
[<ffffffff810f5a3f>] ? lock_release_holdtime.part.28+0xf/0x200
[<ffffffffa02f820c>] do_unlk+0x8c/0xc0 [nfs]
[<ffffffffa02f85c5>] nfs_flock+0xa5/0xf0 [nfs]
[<ffffffff8129a6f6>] locks_remove_file+0xb6/0x1e0
[<ffffffff812159d8>] ? kfree+0xd8/0x2d0
[<ffffffff8123bc63>] __fput+0xd3/0x210
[<ffffffff8123bdee>] ____fput+0xe/0x10
[<ffffffff810bfb6d>] task_work_run+0xcd/0xf0
[<ffffffff81019cd1>] do_notify_resume+0x61/0x90
[<ffffffff817fbea2>] int_signal+0x12/0x17
The problem is that NFSv4 is trying to do an allocation from
fl_release_private (in order to send a RELEASE_LOCKOWNER call). That
function can be called while holding the inode->i_lock, and it's
currently set up to do __GFP_WAIT allocations. v4.1 code has a
similar problem.
This patch adds a work_struct to the nfs4_lock_state and has the code
queue the free_lock_state operation to nfsiod.
Reported-by: Josh Stone <jistone@redhat.com>
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Do the following set of ops with a file on a NFSv4 mount:
exec 3>>/file/on/nfsv4
flock -x 3
exec 3>&-
You'll see the LOCK request go across the wire, but no LOCKU when the
file is closed.
What happens is that the fd is passed across a fork, and the final close
is done in a different process than the opener. That makes
__nfs4_find_lock_state miss finding the correct lock state because it
uses the fl_pid as a search key. A new one is created, and the locking
code treats it as a delegation stateid (because NFS_LOCK_INITIALIZED
isn't set).
The root cause of this breakage seems to be commit 77041ed9b49a9e
(NFSv4: Ensure the lockowners are labelled using the fl_owner and/or
fl_pid).
That changed it so that flock lockowners are allocated based on the
fl_pid. I think this is incorrect. flock locks should be "owned" by the
struct file, and that is already accounted for in the fl_owner field of
the lock request when it comes through nfs_flock.
This patch basically reverts the above commit and with it, a LOCKU is
sent in the above reproducer.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- massive cleanup of the NFS read/write code by Anna and Dros
- support multiple NFS read/write requests per page in order to deal
with non-page aligned pNFS striping. Also cleans up the r/wsize <
page size code nicely.
- stable fix for ensuring inode is declared uptodate only after all
the attributes have been checked.
- stable fix for a kernel Oops when remounting
- NFS over RDMA client fixes
- move the pNFS files layout driver into its own subdirectory"
* tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
NFS: populate ->net in mount data when remounting
pnfs: fix lockup caused by pnfs_generic_pg_test
NFSv4.1: Fix typo in dprintk
NFSv4.1: Comment is now wrong and redundant to code
NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state
xprtrdma: Disconnect on registration failure
xprtrdma: Remove BUG_ON() call sites
xprtrdma: Avoid deadlock when credit window is reset
SUNRPC: Move congestion window constants to header file
xprtrdma: Reset connection timeout after successful reconnect
xprtrdma: Use macros for reconnection timeout constants
xprtrdma: Allocate missing pagelist
xprtrdma: Remove Tavor MTU setting
xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
xprtrdma: Reduce the number of hardway buffer allocations
xprtrdma: Limit work done by completion handler
xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
xprtrmda: Reduce lock contention in completion handlers
xprtrdma: Split the completion queue
xprtrdma: Make rpcrdma_ep_destroy() return void
...
|
|
The addition of lockdep code to write_seqcount_begin/end has lead to
a bunch of false positive claims of ABBA deadlocks with the so_lock
spinlock. Audits show that this simply cannot happen because the
read side code does not spin while holding so_lock.
Cc: <stable@vger.kernel.org> # 3.13.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Tested-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
|
|
In commit 5521abfdcf4d6 (NFSv4: Resend the READ/WRITE RPC call
if a stateid change causes an error), we overloaded the return value of
nfs4_select_rw_stateid() to cause it to return -EWOULDBLOCK if an RPC
call is outstanding that would cause the NFSv4 lock or open stateid
to change.
That is all redundant when we actually copy the stateid used in the
read/write RPC call that failed, and check that against the current
stateid. It is doubly so, when we consider that in the NFSv4.1 case,
we also set the stateid's seqid to the special value '0', which means
'match the current valid stateid'.
Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
RFC3530 and RFC5661 both prescribe that the 'opaque' field of the
open stateid returned by new OPEN/OPEN_DOWNGRADE/CLOSE calls for
the same file and open owner should match.
If this is not the case, assume that the open state has been lost,
and that we need to recover it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Do not return an error when nfs4_copy_delegation_stateid succeeds.
Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com
Fixes: ef1820f9be27b (NFSv4: Don't try to recover NFSv4 locks when...)
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull NFS client bugfixes:
- Stable fix for data corruption when retransmitting O_DIRECT writes
- Stable fix for a deep recursion/stack overflow bug in rpc_release_client
- Stable fix for infinite looping when mounting a NFSv4.x volume
- Fix a typo in the nfs mount option parser
- Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is
* tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix pnfs Kconfig defaults
NFS: correctly report misuse of "migration" mount option.
nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once
SUNRPC: Avoid deep recursion in rpc_release_client
SUNRPC: Fix a data corruption issue when retransmitting RPC calls
|
|
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.
[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
There is no guarantee that doing so will work however, so we can end up
retrying the call in an infinite loop.
Worse yet, we create the new client using rpc_clone_client_set_auth,
which creates the new client as a child of the old one. Thus, we can end
up with a *very* long lineage of rpc_clnts. When we go to put all of the
references to them, we can end up with a long call chain that can smash
the stack as each rpc_free_client() call can recurse back into itself.
This patch fixes this by simply ensuring that the SETCLIENTID call will
only be retried in this situation if the last attempt did not use
RPC_AUTH_UNIX.
Note too that with this change, we don't need the (i > 2) check in the
-EACCES case since we now have a more reliable test as to whether we
should reattempt.
Cc: stable@vger.kernel.org # v3.10+
Cc: Chuck Lever <chuck.lever@oracle.com>
Tested-by/Acked-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
commit 6686390bab6a0e0 (NFS: remove incorrect "Lock reclaim failed!"
warning.) added a test for a delegation before checking to see if any
reclaimed locks failed. The test however is backward and is only doing
that check when a delegation is held instead of when one isn't.
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Fixes: 6686390bab6a: NFS: remove incorrect "Lock reclaim failed!" warning.
Cc: stable@vger.kernel.org # 3.12
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
In nfs4_wait_clnt_recover(), hold a reference to the clp being
waited on. The state manager can reduce clp->cl_count to 1, in
which case the nfs_put_client() in nfs4_run_state_manager() can
free *clp before wait_on_bit() returns and allows
nfs4_wait_clnt_recover() to run again.
The behavior at that point is non-deterministic. If the waited-on
bit still happens to be zero, wait_on_bit() will wake the waiter as
expected. If the bit is set again (say, if the memory was poisoned
when freed) wait_on_bit() can leave the waiter asleep.
This is a narrow fix which ensures the safety of accessing *clp in
nfs4_wait_clnt_recover(), but does not address the continued use
of a possibly freed *clp after nfs4_wait_clnt_recover() returns
(see nfs_end_delegation_return(), for example).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
With the advent of NFSv4 sessions in NFSv4.1 and following, a "lease
moved" condition is reported differently than it is in NFSv4.0.
NFSv4 minor version 0 servers return an error status code,
NFS4ERR_LEASE_MOVED, to signal that a lease has moved. This error
causes the whole compound operation to fail. Normal compounds
against this server continue to fail until the client performs
migration recovery on the migrated share.
Minor version 1 and later servers assert a bit flag in the reply to
a compound's SEQUENCE operation to signal LEASE_MOVED. This is not
a fatal condition: operations against this server continue normally.
The server asserts this flag until the client performs migration
recovery on the migrated share.
Note that servers MUST NOT return NFS4ERR_LEASE_MOVED to NFSv4
clients not using NFSv4.0.
After the server asserts any of the sr_status_flags in the SEQUENCE
operation in a typical compound, our client initiates standard lease
recovery. For NFSv4.1+, a stand-alone SEQUENCE operation is
performed to discover what recovery is needed.
If SEQ4_STATUS_LEASE_MOVED is asserted in this stand-alone SEQUENCE
operation, our client attempts to discover which FSIDs have been
migrated, and then performs migration recovery on each.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
A migration on the FSID in play for the current NFS operation
is reported via the error status code NFS4ERR_MOVED.
"Lease moved" means that a migration has occurred on some other
FSID than the one for the current operation. It's a signal that
the client should take action immediately to handle a migration
that it may not have noticed otherwise. This is so that the
client's lease does not expire unnoticed on the destination server.
In NFSv4.0, a moved lease is reported with the NFS4ERR_LEASE_MOVED
error status code.
To recover from NFS4ERR_LEASE_MOVED, check each FSID for that server
to see if it is still present. Invoke nfs4_try_migration() if the
FSID is no longer present on the server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Migration recovery and state recovery must be serialized, so handle
both in the state manager thread.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
As of commit 5d422301f97b821301efcdb6fc9d1a83a5c102d6 we no longer zero the
state.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|