Age | Commit message (Collapse) | Author |
|
The object and block layouts already exist in their own
subdirectories. This patch completes the set!
Note that as a layout denotes nfs4 already, I stripped
that prefix out of the file names.
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Return the NULL pointer when the allocation fails.
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Return the NULL pointer when the allocation fails.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org> # 3.5.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Those flags are obsolete and checking them can incorrectly cause
remount operations to fail.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Place the call to resend the failed GETATTR under the error handler so that
when appropriate, the GETATTR is retried more than once.
The server can fail the GETATTR op in the OPEN compound with a recoverable
error such as NFS4ERR_DELAY. In the case of an O_EXCL open, the server has
created the file, so a retrans of the OPEN call will fail with NFS4ERR_EXIST.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
We cannot allow nfs_page_group_lock to use TASK_KILLABLE here, since
the loop would cause a busy wait if somebody kills the task.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Handle the case where nfs_create_request() returns an error.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
nfs_read_completion relied on the fact that there was a 1:1 mapping
of page to nfs_request, but this has now changed.
Regions not covered by a request have already been zeroed elsewhere.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Use the new pg_test interface to adjust requests to fit in the current
stripe / segment.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Remove alignment checks that would revert to MDS and change pg_test
to return the max ammount left in the segment (or other pg_test call)
up to size of passed request, or 0 if no space is left.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Support direct requests that span multiple pnfs data servers by
comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket.
Continue to use dreq->verf if the MDS is used / non-pNFS.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Since the ability to split pages into subpage requests has been added,
nfs_pgio_header->rpc_list only ever has one pgio data.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Use the newly added support for multiple requests per page for
rsize/wsize < PAGE_SIZE, instead of having multiple read / write
data structures per pageio header.
This allows us to get rid of nfs_pgio_multi.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Now that pg_test can change the size of the request (by returning a non-zero
size smaller than the request), pg_test functions that call other
pg_test functions must return the minimum of the result - or 0 if any fail.
Also clean up the logic of some pg_test functions so that all checks are
for contitions where coalescing is not possible.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Remove check that the request covers a whole page.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Remove unneeded else statement and clean up how commit info
dataserver buckets are replaced.
Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Change how nfs_mark_uptodate checks to see if writes cover a whole page.
This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Operations that modify state for a whole page must be syncronized across
all requests within a page group. In the write path, this is calling
end_page_writeback and removing the head request from an inode.
Both of these operations should not be called until all requests
in a page group have reached the point where they would call them.
This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Operations that modify state for a whole page must be syncronized across
all requests within a page group. In the read path, this is calling
unlock_page and SetPageUptodate. Both of these functions should not be
called until all requests in a page group have reached the point where
they would call them.
This patch should have no effect yet since all page groups currently
have one request, but will come into play when pg_test functions are
modified to split pages into sub-page regions.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Add "page groups" - a circular list of nfs requests (struct nfs_page)
that all reference the same page. This gives nfs read and write paths
the ability to account for sub-page regions independently. This
somewhat follows the design of struct buffer_head's sub-page
accounting.
Only "head" requests are ever added/removed from the inode list in
the buffered write path. "head" and "sub" requests are treated the
same through the read path and the rest of the write/commit path.
Requests are given an extra reference across the life of the list.
Page groups are never rejoined after being split. If the read/write
request fails and the client falls back to another path (ie revert
to MDS in PNFS case), the already split requests are pushed through
the recoalescing code again, which may split them further and then
coalesce them into properly sized requests on the wire. Fragmentation
shouldn't be a problem with the current design, because we flush all
requests in page group when a non-contiguous request is added, so
the only time resplitting should occur is on a resend of a read or
write.
This patch lays the groundwork for sub-page splitting, but does not
actually do any splitting. For now all page groups have one request
as pg_test functions don't yet split pages. There are several related
patches that are needed support multiple requests per page group.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Call nfs_can_coalesce_requests for every request, even the first one.
This is needed for future patches to give pg_test a way to inform
add_request to reduce the size of the request.
Now @prev can be null in nfs_can_coalesce_requests and pg_test functions.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
This is a step toward allowing pg_test to inform the the
coalescing code to reduce the size of requests so they may fit in
whatever scheme the pg_test callback wants to define.
For now, just return the size of the request if there is space, or 0
if there is not. This shouldn't change any behavior as it acts
the same as when the pg_test functions returned bool.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
@inode is passed but not used.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Hold the lock while modifying commit info dataserver buckets.
The following oops can be reproduced by running iozone for a while against
a 2 DS pynfs filelayout server.
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache
CPU: 0 PID: 903 Comm: iozone Not tainted 3.15.0-rc1-branch-dros_testing+ #44
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference
task: ffff880078164480 ti: ffff88006e972000 task.ti: ffff88006e972000
RIP: 0010:[<ffffffffa01936e1>] [<ffffffffa01936e1>] nfs_init_commit+0x22/0x
RSP: 0018:ffff88006e973d30 EFLAGS: 00010246
RAX: ffff88006e973e00 RBX: ffff88006e828800 RCX: ffff88006e973e10
RDX: 0000000000000000 RSI: ffff88006e973e00 RDI: dead4ead00000000
RBP: ffff88006e973d38 R08: ffff88006e8289d8 R09: 0000000000000000
R10: ffff88006e8289d8 R11: 0000000000016988 R12: ffff88006e973b98
R13: ffff88007a0a6648 R14: ffff88006e973e10 R15: ffff88006e828800
FS: 00007f2ce396b740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f03278a1000 CR3: 0000000079043000 CR4: 00000000001407f0
Stack:
ffff88006e8289d8 ffff88006e973da8 ffffffffa00f144f ffff88006e9478c0
ffff88006e973e00 ffff88006de21080 0000000100000002 ffff880079be6c48
ffff88006e973d70 ffff88006e973d70 ffff88006e973e10 ffff88006de21080
Call Trace:
[<ffffffffa00f144f>] filelayout_commit_pagelist+0x1ae/0x34a [nfs_layout_nfsv
[<ffffffffa0194f72>] nfs_generic_commit_list+0x92/0xc4 [nfs]
[<ffffffffa0195053>] nfs_commit_inode+0xaf/0x114 [nfs]
[<ffffffffa01892bd>] nfs_file_fsync_commit+0x82/0xbe [nfs]
[<ffffffffa01ceb0d>] nfs4_file_fsync+0x59/0x9b [nfsv4]
[<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20
[<ffffffff8114ee60>] vfs_fsync+0x1c/0x1e
[<ffffffffa01891c2>] nfs_file_flush+0x7f/0x84 [nfs]
[<ffffffff81127a43>] filp_close+0x3c/0x72
[<ffffffff81140e12>] __close_fd+0x82/0x9a
[<ffffffff81127a9c>] SyS_close+0x23/0x4c
[<ffffffff814acd12>] system_call_fastpath+0x16/0x1b
Code: 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8
RIP [<ffffffffa01936e1>] nfs_init_commit+0x22/0xe1 [nfs]
RSP <ffff88006e973d30>
---[ end trace 732fe6419b235e2f ]---
Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
At this point the read and write structures look identical, so combine
them into something shared by both.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
What we have here is two functions that look identical. Let's share
some more code!
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Once again, these two functions look identical in the read and write
case. Time to combine them together!
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Most of this code is the same for both the read and write paths, so
combine everything and use the rw_ops when necessary.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
These functions are almost identical on both the read and write side.
FLUSH_COND_STABLE will never be set for the read path, so leaving it in
the generic code won't hurt anything.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
At this point, the read and write versions of this function look
identical so both should use the same function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Write adds a little bit of code dealing with flush flags, but since
"how" will always be 0 when reading we can share the code.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The read and write paths set up this struct in exactly the same way, so
create a single shared struct.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Combining these functions will let me make a single nfs_rw_common_ops
struct (see the next patch).
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The read and write paths do exactly the same thing for the rpc_prepare
rpc_op. This patch combines them together into a single function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
I create a new struct nfs_rw_ops to decide the differences between reads
and writes. This struct will be set when initializing a new
nfs_pgio_descriptor, and then passed on to the nfs_rw_header when a new
header is allocated.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
These functions are identical for the read and write paths so they can
be combined.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The header had a pointer to the verifier that was set from the old write
data struct. We don't need to keep the pointer around now that we have
shared structures.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The only difference is the write verifier field, but we can keep that
for a little bit longer.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
At this point, the only difference between nfs_read_data and
nfs_write_data is the write verifier.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Reads and writes have very similar results. This patch combines the two
structs together with comments to show where the differing fields are
used.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Reads and writes have very similar arguments. This patch combines them
together and documents the few fields used only by write.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The read_pageio_init method is just a very convoluted way to grab the
right nfs_pageio_ops vector. The vector to chose is not a choice of
protocol version, but just a pNFS vs MDS I/O choice that can simply be
done inside nfs_pageio_init_read based on the presence of a layout
driver, and a new force_mds flag to the special case of falling back
to MDS I/O on a pNFS-capable volume.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The write_pageio_init method is just a very convoluted way to grab the
right nfs_pageio_ops vector. The vector to chose is not a choice of
protocol version, but just a pNFS vs MDS I/O choice that can simply be
done inside nfs_pageio_init_write based on the presence of a layout
driver, and a new force_mds flag to the special case of falling back
to MDS I/O on a pNFS-capable volume.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
"fdatasync() is similar to fsync(), but does not flush modified metadata
unless that metadata is needed in order to allow a subsequent data
retrieval to be correctly handled."
We absolutely need to commit the layouts to be able to retrieve the data
in case either the client, the server or the storage subsystem go down.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If we suspect that the server may have cleared the suid/sgid bit,
then mark the inode for revalidation.
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Fix a bug, whereby nfs_update_inode() was declaring the inode to be
up to date despite not having checked all the attributes.
The bug occurs because the temporary variable in which we cache
the validity information is 'sanitised' before reapplying to
nfsi->cache_validity.
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When double mounting same nfs filesystem, the devname saved in d_fsdata
will be lost.The second mount should not change the devname that
be saved in d_fsdata.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":
fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast
Introduced by commit 7f25bba819a3 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull yet more networking updates from David Miller:
1) Various fixes to the new Redpine Signals wireless driver, from
Fariya Fatima.
2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
Dmitry Petukhov.
3) UFO and TSO packets differ in whether they include the protocol
header in gso_size, account for that in skb_gso_transport_seglen().
From Florian Westphal.
4) If VLAN untagging fails, we double free the SKB in the bridging
output path. From Toshiaki Makita.
5) Several call sites of sk->sk_data_ready() were referencing an SKB
just added to the socket receive queue in order to calculate the
second argument via skb->len. This is dangerous because the moment
the skb is added to the receive queue it can be consumed in another
context and freed up.
It turns out also that none of the sk->sk_data_ready()
implementations even care about this second argument.
So just kill it off and thus fix all these use-after-free bugs as a
side effect.
6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.
7) pktgen needs to do locking properly for LLTX devices, from Daniel
Borkmann.
8) xen-netfront driver initializes TX array entries in RX loop :-) From
Vincenzo Maffione.
9) After refactoring, some tunnel drivers allow a tunnel to be
configured on top itself. Fix from Nicolas Dichtel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vti: don't allow to add the same tunnel twice
gre: don't allow to add the same tunnel twice
drivers: net: xen-netfront: fix array initialization bug
pktgen: be friendly to LLTX devices
r8152: check RTL8152_UNPLUG
net: sun4i-emac: add promiscuous support
net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
net: ipv6: Fix oif in TCP SYN+ACK route lookup.
drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
drivers: net: cpsw: discard all packets received when interface is down
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Drivers: net: hyperv: Address UDP checksum issues
Drivers: net: hyperv: Negotiate suitable ndis version for offload support
Drivers: net: hyperv: Allocate memory for all possible per-pecket information
bridge: Fix double free and memory leak around br_allowed_ingress
bonding: Remove debug_fs files when module init fails
i40evf: program RSS LUT correctly
i40evf: remove open-coded skb_cow_head
ixgb: remove open-coded skb_cow_head
igbvf: remove open-coded skb_cow_head
...
|
|
The vfs merge caused a latent bug to show up:
In file included from fs/ceph/super.h:4:0,
from fs/ceph/ioctl.c:3:
include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
^
In file included from include/linux/kernel.h:13:0,
from include/linux/uio.h:12,
from include/linux/socket.h:7,
from include/uapi/linux/in.h:22,
from include/linux/in.h:23,
from fs/ceph/ioctl.c:1:
include/linux/printk.h:214:0: note: this is the location of the previous definition
#define pr_fmt(fmt) fmt
^
where the reason is that <linux/ceph_debug.h> is included much too late
for the "pr_fmt()" define.
The include of <linux/ceph_debug.h> needs to be the first include in the
file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
noticeable until some unrelated header file changes brought in an
indirect earlier include of <linux/kernel.h>.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|