Age | Commit message (Collapse) | Author |
|
Prevent duplicate 'mds0 caps stale' message from spamming the console every
few seconds while the MDS restarts. Set s_renew_requested earlier, so that
we only print the message once, even if we don't send an actual request.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The incremental map decoding of pg pool updates wasn't skipping
the snaps and removed_snaps vectors. This caused osd requests
to stall when pool snapshots were created or fs snapshots were
deleted. Use a common helper for full and incremental map
decoders that decodes pools properly.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The wait_unsafe_requests() helper dropped the mdsc mutex to wait
for each request to complete, and then examined r_node to get the
next request after retaking the lock. But the request completion
removes the request from the tree, so r_node was always undefined
at this point. Since it's a small race, it usually led to a
valid request, but not always. The result was an occasional
crash in rb_next() while dereferencing node->rb_left.
Fix this by clearing the rb_node when removing the request from
the request tree, and not walking off into the weeds when we
are done waiting for a request. Since the request we waited on
will _always_ be out of the request tree, take a ref on the next
request, in the hopes that it won't be. But if it is, it's ok:
we can start over from the beginning (and traverse over older read
requests again).
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
We were releasing used caps (e.g. FILE_CACHE) from encode_inode_release
with MDS requests (e.g. setattr). We don't carry refs on most caps, so
this code worked most of the time, but for setattr (utimes) we try to
drop Fscr.
This causes cap state to get slightly out of sync with reality, and may
result in subsequent mds revoke messages getting ignored.
Fix by only releasing unused caps.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Drop session mutex unconditionally in handle_cap_grant, and do the
check_caps from the handle_cap_grant helper. This avoids using a magic
return value.
Also avoid using a flag variable in the IMPORT case and call
check_caps at the appropriate point.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Passing a session pointer to ceph_check_caps() used to mean it would leave
the session mutex locked. That wasn't always possible if it wasn't passed
CHECK_CAPS_AUTHONLY. If could unlock the passed session and lock a
differet session mutex, which was clearly wrong, and also emitted a
warning when it a racing CPU retook it and we did an unlock from the wrong
context.
This was only a problem when there was more than one MDS.
First, make ceph_check_caps unconditionally drop the session mutex, so that
it is free to lock other sessions as needed. Then adjust the one caller
that passes in a session (handle_cap_grant) accordingly.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
If we don't have the exported cap it's because we already released it. No
need to WARN.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
This causes an oops when debug output is enabled and we kick
an osd request with no current r_osd (sometime after an osd
failure). Check the pointer before dereferencing.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Previously we would decode state directly into our current ticket_handler.
This is problematic if for some reason we fail to decode, because we end
up with half new state and half old state.
We are probably already in bad shape if we get an update we can't decode,
but we may as well be tidy anyway. Decode into new_* temporaries and
update the ticket_handler only on success.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
It seems clear from the surrounding code that xpermits is allowed to be
NULL here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The reply parsing code attempts to decode the GETATTR response even if
the DELEGRETURN portion of the compound returned an error. The GETATTR
response won't actually exist if that's the case and we're asking the
parser to read past the end of the response.
This bug is fairly benign. The parser catches this without reading past
the end of the response and decode_getfattr returns -EIO. Earlier
kernels however had decode_op_hdr using the READ_BUF macro, and this
bug would make this printk pop any time the client got an error from
a delegreturn:
kernel: decode_op_hdr: reply buffer overflowed in line XXXX
More recent kernels seem to have replaced this printk with a dprintk.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Andreas Beckmann gave me a report that nilfs logged the following
warnings when it got a disk full:
nilfs_sufile_do_cancel_free: segment 0 must be clean
nilfs_sufile_do_cancel_free: segment 1 must be clean
These arise from a duplicate call to nilfs_segctor_cancel_freev in an
error path of log writer. This will fix the issue.
Reported-by: Andreas Beckmann <debian@abeckmann.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Release the old ticket_blob buffer when we get an updated service ticket
from the monitor. Previously these were getting leaked.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The buffer size was incorrectly calculated for the ceph_x_encrypt()
encapsulated ticket blob. Use a helper (with correct arithmetic) and
BUG out if we were wrong.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
We were failing to reconnect to services due to an old authenticator, even
though we had the new ticket, because we weren't properly retrying the
connect handshake, because we were calling an old/incorrect helper that
left in_base_pos incorrect. The result was a failure to reconnect to the
OSD or MDS (with an authentication error) if the MDS restarted after the
service had been up a few hours (long enough for the original authenticator
to be invalid). This was only a problem if the AUTH_X authentication was
enabled.
Now that the 'negotiate' and 'connect' stages are fully separated, we
should use the prepare_read_connect() helper instead, and remove the
obsolete one.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
When an inode was dropped while being migrated between two MDSs,
i_cap_exporting_issued was non-zero such that issue caps were non-zero and
__ceph_is_any_caps(ci) was true. This prevented the inode from being
removed from the snap realm, even as it was dropped from the cache.
Fix this by dropping any residual i_snap_realm ref in destroy_inode.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
All ci->i_snap_realm_item/realm->inodes_with_caps manipulation should be
protected by realm->inodes_with_caps_lock. This bug would have only bit
us in a rare race with a realm split (during some snap creations).
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Added assertion, and cleared one case where the implemented caps were
not following the issued caps.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
You can't store a pointer that you haven't filled in yet and expect it
to work.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
When replacing a xattr's value, in some case we wipe its name/value
first and then re-add it. The wipe is done by
ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or
block. We currently adjust name_offset for all the entries which have
(offset < name_offset). This does not adjust the entrie we're replacing.
Since we are replacing the entry, we don't adjust the total entry count.
When we calculate a new namevalue location, we trust the entries
now-wrong offset in ocfs2_xa_get_free_start(). The solution is to
also adjust the name_offset for the replaced entry, allowing
ocfs2_xa_get_free_start() to calculate the new namevalue location
correctly.
The following script can trigger a kernel panic easily.
echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE
mount -t ocfs2 $DEVICE $MNT_DIR
FILE=$MNT_DIR/$RANDOM
for((i=0;i<76;i++))
do
string_76="a$string_76"
done
string_78="aa$string_76"
string_82="aaaa$string_78"
touch $FILE
setfattr -n 'user.test1234567890' -v $string_76 $FILE
setfattr -n 'user.test1234567890' -v $string_78 $FILE
setfattr -n 'user.test1234567890' -v $string_82 $FILE
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
We should not attempt to free the page if __GFP_FS is not set. Otherwise we
can deadlock as per
http://bugzilla.kernel.org/show_bug.cgi?id=15578
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (205 commits)
ceph: update for write_inode API change
ceph: reset osd after relevant messages timed out
ceph: fix flush_dirty_caps race with caps migration
ceph: include migrating caps in issued set
ceph: fix osdmap decoding when pools include (removed) snaps
ceph: return EBADF if waiting for caps on closed file
ceph: set osd request message front length correctly
ceph: reset front len on return to msgpool; BUG on mismatched front iov
ceph: fix snaptrace decoding on cap migration between mds
ceph: use single osd op reply msg
ceph: reset bits on connection close
ceph: remove bogus mds forward warning
ceph: remove fragile __map_osds optimization
ceph: fix connection fault STANDBY check
ceph: invalidate_authorizer without con->mutex held
ceph: don't clobber write return value when using O_SYNC
ceph: fix client_request_forward decoding
ceph: drop messages on unregistered mds sessions; cleanup
ceph: fix comments, locking in destroy_inode
ceph: move dereference after NULL test
...
Fix trivial conflicts in Documentation/ioctl/ioctl-number.txt
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: trivial white space
[CIFS] checkpatch cleanup
cifs: add cifs_revalidate_file
cifs: add a CIFSSMBUnixQFileInfo function
cifs: add a CIFSSMBQFileInfo function
cifs: overhaul cifs_revalidate and rename to cifs_revalidate_dentry
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (30 commits)
Btrfs: fix the inode ref searches done by btrfs_search_path_in_tree
Btrfs: allow treeid==0 in the inode lookup ioctl
Btrfs: return keys for large items to the search ioctl
Btrfs: fix key checks and advance in the search ioctl
Btrfs: buffer results in the space_info ioctl
Btrfs: use __u64 types in ioctl.h
Btrfs: fix search_ioctl key advance
Btrfs: fix gfp flags masking in the compression code
Btrfs: don't look at bio flags after submit_bio
btrfs: using btrfs_stack_device_id() get devid
btrfs: use memparse
Btrfs: add a "df" ioctl for btrfs
Btrfs: cache the extent state everywhere we possibly can V2
Btrfs: cache ordered extent when completing io
Btrfs: cache extent state in find_delalloc_range
Btrfs: change the ordered tree to use a spinlock instead of a mutex
Btrfs: finish read pages in the order they are submitted
btrfs: fix btrfs_mkdir goto for no free objectids
Btrfs: flush data on snapshot creation
Btrfs: make df be a little bit more understandable
...
|
|
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: ensure bdi_unregister is called on mount failure.
NFS: Avoid a deadlock in nfs_release_page
NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode()
nfs4: Make the v4 callback service hidden
nfs: fix unlikely memory leak
rpc client can not deal with ENOSOCK, so translate it into ENOCONN
|
|
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: don't warn about page discards on shutdown
xfs: use scalable vmap API
xfs: remove old vmap cache
|
|
What we were doing before was to ask for the current window size as the
maximum allocation. This had the effect of limiting the amount of allocation
we could get for the local alloc during times when the window size was
shrunk due to fragmentation. In some cases, that could actually *increase*
fragmentation by artificially limiting the number of bits we can accept. So
while we still want to ask for a minimum number of bits equal to window
size, there is no reason why we should limit the number of bits the local
alloc should accept. Hence always allow the maximum number of local alloc
bits.
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
This is used by the inode lookup ioctl to follow all the backrefs up
to the subvol root. But the search being done would sometimes land one
past the last item in the leaf instead of finding the backref.
This changes the search to look for the highest possible backref and hop
back one item. It also fixes a leaked path on failure to find the root.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
When a root id of 0 is sent to the inode lookup ioctl, it will
use the root of the file we're ioctling and pass the root id
back to userland along with the results.
This allows userland to do searches based on that root later on.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
The search ioctl was skipping large items entirely (ones that are too
big for the results buffer). This changes things to at least copy
the item header so that we can send information about the item back to
userland.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
The search ioctl was working well for finding tree roots, but using it for
generic searches requires a few changes to how the keys are advanced.
This treats the search control min fields for objectid, type and offset
more like a key, where we drop the offset to zero once we bump the type,
etc.
The downside of this is that we are changing the min_type and min_offset
fields during the search, and so the ioctl caller needs extra checks to make sure
the keys in the result are the ones it wanted.
This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make
things more readable.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Use bitmap_weight() instead of doing hweight32() for each u32 element in
the page.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
jffs2 uses rb_node = NULL; to zero rb_root.
The problem with this is that 17d9ddc72fb8bba0d4f678 ("rbtree: Add
support for augmented rbtrees") in the linux-next tree adds a new field
to that struct which needs to be NULL as well. This patch uses RB_ROOT
as the intializer so all of the relevant fields will be NULL'd.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Eric Paris <eparis@redhat.com>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory
inode, but never setting it on the disk copy. Thus, acls were some times not
getting propagated between nodes. This patch fixes the issue by adding a
helper function ocfs2_acl_set_mode() which does this the right way.
ocfs2_set_acl() and ocfs2_init_acl() are then updated to call
ocfs2_acl_set_mode().
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
In reflink, we need to upate i_blocks for the target inode.
Reported-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
In ocfs2_validate_gd_parent, we check bg_chain against the
cl_next_free_rec of the dinode. Actually in resize, we have
the chance of bg_chain == cl_next_free_rec. So add some
additional condition check for it.
I also rename paramter "clean_error" to "resize", since the
old one is not clearly enough to indicate that we should only
meet with this case in resize.
btw, the correpsonding bug is
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
ocfs2_lock() will skip locks on file which has mode set to 02666. This
is a problem in cases where the mode of the file is changed after a
process has obtained a lock on the file.
ocfs2_lock() should skip the check for mandatory locks when unlocking a
file.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
If we are doing a forced shutdown, we can get lots of noise about
delalloc pages being discarded. This is happens by design during a
forced shutdown, so don't spam the logs with these messages.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Re-apply a commit that had been reverted due to regressions
that have since been fixed.
From 95f8e302c04c0b0c6de35ab399a5551605eeb006 Mon Sep 17 00:00:00 2001
From: Nick Piggin <npiggin@suse.de>
Date: Tue, 6 Jan 2009 14:43:09 +1100
Implement XFS's large buffer support with the new vmap APIs. See the vmap
rewrite (db64fe02) for some numbers. The biggest improvement that comes from
using the new APIs is avoiding the global KVA allocation lock on every call.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Only modifications here were a minor reformat, plus making the patch
apply given the new use of xfs_buf_is_vmapped().
Modified-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Re-apply a commit that had been reverted due to regressions
that have since been fixed.
Original commit: d2859751cd0bf586941ffa7308635a293f943c17
Author: Nick Piggin <npiggin@suse.de>
Date: Tue, 6 Jan 2009 14:40:44 +1100
XFS's vmap batching simply defers a number (up to 64) of vunmaps,
and keeps track of them in a list. To purge the batch, it just goes
through the list and calls vunamp on each one. This is pretty poor:
a global TLB flush is generally still performed on each vunmap, with
the most expensive parts of the operation being the broadcast IPIs
and locking involved in the SMP callouts, and the locking involved
in the vmap management -- none of these are avoided by just batching
up the calls. I'm actually surprised it ever made much difference.
(Now that the lazy vmap allocator is upstream, this description is
not quite right, but the vunmap batching still doesn't seem to do
much).
Rip all this logic out of XFS completely. I will improve vmap
performance and scalability directly in subsequent patch.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
The only change I made was to use the "new" xfs_buf_is_vmapped()
function in a place it had been open-coded in the original.
Modified-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
The space_info ioctl was using copy_to_user inside rcu_read_lock. This
commit changes things to copy into a buffer first and then dump the
result down to userland.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
key->type is u8, not u64.
fs/btrfs/ioctl.c: In function 'copy_to_sk':
fs/btrfs/ioctl.c:1024: warning: comparison is always true due to limited range of data type
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
bdi_unregister is called by nfs_put_super which is only called by
generic_shutdown_super if ->s_root is not NULL. So if we error out
in a circumstance where we called nfs_bdi_register (i.e. server !=
NULL) but have not set s_root, then we need to call bdi_unregister
explicitly in nfs_get_sb and various other *_get_sb() functions.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
I fixed the indent level.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
GFP_FS must be masked out, NOFS can't be or'd in.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
After callling submit_bio, the bio can be freed at any time. The
btrfs submission thread helper was checking the bio flags too late,
which might not give the correct answer.
When CONFIG_DEBUG_PAGE_ALLOC is turned on, it can lead to oopsen.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
We can use btrfs_stack_device_id() to get dev_item->devid
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Use memparse() instead of its own private implementation.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
df is a very loaded question in btrfs. This gives us a way to get the per-space
usage information so we can tell exactly what is in use where. This will help
us figure out ENOSPC problems, and help users better understand where their disk
space is going.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|