summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-02-03GFS2: Wait for unlock completion on umountSteven Whitehouse
This patch adds a wait on umount between the point at which we dispose of all glocks and the point at which we unmount the lock protocol. This ensures that we've received all the replies to our unlock requests before we stop the locking. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2010-02-02ocfs2: Remove overzealous BUG_ON during blocked lock processingSunil Mushran
During blocked lock processing, we should consider the possibility that the lock is no longer blocking. Joel Becker <joel.becker@oracle.com> assisted in fixing this issue. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Do not downconvert if the lock level is already compatibleSunil Mushran
During upconvert, if the master were to send a BAST, dlmglue will detect the upconversion in process and send a cancel convert to the master. Upon receiving the AST for the cancel convert, it will re-process the lock resource to determine whether it needs downconverting. Say, the up was from PR to EX and the BAST was for EX. After the cancel convert, it will need to downconvert to NL. However, if the node was originally upconverting from NL to EX, then there would be no reason to downconvert (assuming the same message sequence). This patch makes dlmglue consider the possibility that the current lock level is already compatible and that downconverting is not required. Joel Becker <joel.becker@oracle.com> assisted in fixing this issue. Fixes ossbz#1178 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178 Reported-by: Coly Li <coly.li@suse.de> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Prevent a livelock in dlmglueSunil Mushran
There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were to get an ast for an upconvert request, followed immediately by a bast, there is a small window where the fs may downconvert the lock before the process requesting the upconvert is able to take the lock. This patch adds a new flag to indicate that the upconvert is still in progress and that the dc thread should not downconvert it right now. Wengang Wang <wen.gang.wang@oracle.com> and Joel Becker <joel.becker@oracle.com> contributed heavily to this patch. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bastWengang Wang
During bast, set the OCFS2_LOCK_BLOCKED flag only if the lock needs to downconverted. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Use compat_ptr in reflink_arguments.Tao Ma
Although we use u64 to pass userspace pointers to the kernel to avoid compat_ioctl, it doesn't work in some ppc platform. So wrap them with compat_ptr and add compat_ioctl. The detailed discussion about compat_ptr can be found in thread http://lkml.org/lkml/2009/10/27/423. We indeed met with a bug when testing on ppc(-EFAULT is returned when using old_path). This patch try to fix this. I have tested in ppc64(with 32 bit reflink) and x86_64(with i686 reflink), both works. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2/dlm: Handle EAGAIN for compatibility - v2Sunil Mushran
Mainline commit aad1b15310b9bcd59fa81ab8f2b1513b59553ea8 made the dlm_begin_reco_handler() return -EAGAIN instead of EAGAIN. As this error is transmitted over the wire, we want the receiver, dlm_send_begin_reco_message(), to understand both the older EAGAIN and the newer -EAGAIN, to allow rolling upgrade of the cluster nodes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Add parenthesis to wrap the check for O_DIRECT.Tao Ma
Add parenthesis to wrap the check for O_DIRECT. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Only bug out when page size is larger than cluster size.Tao Ma
In CoW, we have to make sure that the page is already written out to the disk. So we have a BUG_ON(PageDirty(page)). In ppc platform we have pagesize=64K, so if the cs=4K, if the file have fragmented clusters, we will map the page many times. See this file as an example. Tree Depth: 0 Count: 19 Next Free Rec: 14 ## Offset Clusters Block# Flags 0 0 4 2164864 0x2 Refcounted 1 4 2 9302792 0x2 Refcounted ... We have to replace the extent recs one by one, so the page with index 0 will be mapped and dirtied twice. I'd like to leave the BUG_ON there while adding a check so that in case we meet with an error in other platforms, we can find it easily. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Fix memory overflow in cow_by_page.Tao Ma
In ocfs2_duplicate_clusters_by_page, we calculate map_end by shifting page_index. But actually in case we meet with a large offset(say in a i686 box, poff_t is only 32 bits and page_index=2056240), we will overflow. So change the type of page_index to loff_t. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02mm: flush dcache before writing into page to avoid aliasanfei zhou
The cache alias problem will happen if the changes of user shared mapping is not flushed before copying, then user and kernel mapping may be mapped into two different cache line, it is impossible to guarantee the coherence after iov_iter_copy_from_user_atomic. So the right steps should be: flush_dcache_page(page); kmap_atomic(page); write to page; kunmap_atomic(page); flush_dcache_page(page); More precisely, we might create two new APIs flush_dcache_user_page and flush_dcache_kern_page to replace the two flush_dcache_page accordingly. Here is a snippet tested on omap2430 with VIPT cache, and I think it is not ARM-specific: int val = 0x11111111; fd = open("abc", O_RDWR); addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); *(addr+0) = 0x44444444; tmp = *(addr+0); *(addr+1) = 0x77777777; write(fd, &val, sizeof(int)); close(fd); The results are not always 0x11111111 0x77777777 at the beginning as expected. Sometimes we see 0x44444444 0x77777777. Signed-off-by: Anfei <anfei.zhou@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <linux-arch@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-02Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: cfq-iosched: Do not idle on async queues blk-cgroup: Fix potential deadlock in blk-cgroup block: fix bugs in bio-integrity mempool usage block: fix bio_add_page for non trivial merge_bvec_fn case drbd: null dereference bug drbd: fix max_segment_size initialization
2010-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Use GFP_NOFS for alloc structure GFS2: Fix previous patch GFS2: Don't withdraw on partial rindex entries GFS2: Fix refcnt leak on gfs2_follow_link() error path
2010-02-02Fix 'flush_old_exec()/setup_new_exec()' splitLinus Torvalds
Commit 221af7f87b9 ("Split 'flush_old_exec' into two functions") split the function at the point of no return - ie right where there were no more error cases to check. That made sense from a technical standpoint, but when we then also combined it with the actual personality setting going in between flush_old_exec() and setup_new_exec(), it needs to be a bit more careful. In particular, we need to make sure that we really flush the old personality bits in the 'flush' stage, rather than later in the 'setup' stage, since otherwise we might be flushing the _new_ personality state that we're just setting up. So this moves the flags and personality flushing (and 'flush_thread()', which is the arch-specific function that generally resets lazy FP state etc) of the old process into flush_old_exec(), so that it doesn't affect any state that execve() is setting up for the new process environment. This was reported by Michal Simek as breaking his Microblaze qemu environment. Reported-and-tested-by: Michal Simek <michal.simek@petalogix.com> Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-01Merge branch 'reiserfs/kill-bkl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: reiserfs: Fix vmalloc call under reiserfs lock
2010-02-01GFS2: Use GFP_NOFS for alloc structureSteven Whitehouse
This is called under a glock, so its a good plan to use GFP_NOFS Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-01GFS2: Fix previous patchSteven Whitehouse
The do_div() call needs to remain. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-01GFS2: Don't withdraw on partial rindex entriesBenjamin Marzinski
ince gfs2 writes the rindex file a block at a time, and releases the exclusive lock after each block, it is possible that another process will grab the lock in the middle of the write. Since rindex entries are not an even divisor of blocks, that other process may see partial entries. On grows, this is fine. The process can simply ignore the the partial entires. Previously, the code withdrew when it saw partial entries. Now it simply ignores them. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-01-31nilfs2: fix potential leak of dirty data on umountRyusuke Konishi
This fixes incorrect usage of nilfs_segctor_confirm() test function in nilfs_segctor_destroy(); nilfs_segctor_confirm() returns zero if the filesystem is not clean, so its use in nilfs_segctor_destroy() needs inversion. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-01-30block: fix bugs in bio-integrity mempool usageChuck Ebbert
Fix two bugs in the bio integrity code: use_bip_pool() always returns 0 because it checks against the wrong limit, causing the mempool to be used only when regular allocation fails. When the mempool is used as a fallback we don't free the data properly. Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check total number of devices when removing missing Btrfs: check return value of open_bdev_exclusive properly Btrfs: do not mark the chunk as readonly if in degraded mode Btrfs: run orphan cleanup on default fs root Btrfs: fix a memory leak in btrfs_init_acl Btrfs: Use correct values when updating inode i_size on fallocate Btrfs: remove tree_search() in extent_map.c Btrfs: Add mount -o compress-force
2010-01-29Split 'flush_old_exec' into two functionsLinus Torvalds
'flush_old_exec()' is the point of no return when doing an execve(), and it is pretty badly misnamed. It doesn't just flush the old executable environment, it also starts up the new one. Which is very inconvenient for things like setting up the new personality, because we want the new personality to affect the starting of the new environment, but at the same time we do _not_ want the new personality to take effect if flushing the old one fails. As a result, the x86-64 '32-bit' personality is actually done using this insane "I'm going to change the ABI, but I haven't done it yet" bit (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the personality, but just the "pending" bit, so that "flush_thread()" can do the actual personality magic. This patch in no way changes any of that insanity, but it does split the 'flush_old_exec()' function up into a preparatory part that can fail (still called flush_old_exec()), and a new part that will actually set up the new exec environment (setup_new_exec()). All callers are changed to trivially comply with the new world order. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-28Btrfs: check total number of devices when removing missingJosef Bacik
If you have a disk failure in RAID1 and then add a new disk to the array, and then try to remove the missing volume, it will fail. The reason is the sanity check only looks at the total number of rw devices, which is just 2 because we have 2 good disks and 1 bad one. Instead check the total number of devices in the array to make sure we can actually remove the device. Tested this with a failed disk setup and with this test we can now run btrfs-vol -r missing /mount/point and it works fine. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: check return value of open_bdev_exclusive properlyJosef Bacik
Hit this problem while testing RAID1 failure stuff. open_bdev_exclusive returns ERR_PTR(), not NULL. So change the return value properly. This is important if you accidently specify a device that doesn't exist when trying to add a new device to an array, you will panic the box dereferencing bdev. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: do not mark the chunk as readonly if in degraded modeJosef Bacik
If a RAID setup has chunks that span multiple disks, and one of those disks has failed, btrfs_chunk_readonly will return 1 since one of the disks in that chunk's stripes is dead and therefore not writeable. So instead if we are in degraded mode, return 0 so we can go ahead and allocate stuff. Without this patch all of the block groups in a RAID1 setup will end up read-only, which will mean we can't add new disks to the array since we won't be able to make allocations. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: run orphan cleanup on default fs rootJosef Bacik
This patch revert's commit 6c090a11e1c403b727a6a8eff0b97d5fb9e95cb5 Since it introduces this problem where we can run orphan cleanup on a volume that can have orphan entries re-added. Instead of my original fix, Yan Zheng pointed out that we can just revert my original fix and then run the orphan cleanup in open_ctree after we look up the fs_root. I have tested this with all the tests that gave me problems and this patch fixes both problems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: fix a memory leak in btrfs_init_aclYang Hongyang
In btrfs_init_acl() cloned acl is not released Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: Use correct values when updating inode i_size on fallocateAneesh Kumar K.V
commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 20 12:57:53 2010 +0530 Btrfs: Use correct values when updating inode i_size on fallocate Even though we allocate more, we should be updating inode i_size as per the arguments passed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: remove tree_search() in extent_map.cMiao Xie
This patch removes tree_search() in extent_map.c because it is not called by anything. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: Add mount -o compress-forceChris Mason
The default btrfs mount -o compress mode will quickly back off compressing a file if it notices that compression does not reduce the size of the data being written. This can save considerable CPU because all future writes to the file go through uncompressed. But some files are both very large and have mixed data stored in them. In that case, we want to add the ability to always try compressing data before writing it. This commit adds mount -o compress-force. A later commit will add a new inode flag that does the same thing. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28block: fix bio_add_page for non trivial merge_bvec_fn caseDmitry Monakhov
We have to properly decrease bi_size in order to merge_bvec_fn return right result. Otherwise this result in false merge rejects for two absolutely valid bio_vecs. This may cause significant performance penalty for example fs_block_size == 1k and block device is raid0 with small chunk_size = 8k. Then it is impossible to merge 7-th fs-block in to bio which already has 6 fs-blocks. Cc: <stable@kernel.org> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-01-28reiserfs: Fix vmalloc call under reiserfs lockFrederic Weisbecker
Vmalloc is called to allocate journal->j_cnode_free_list but we hold the reiserfs lock at this time, which raises a {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} lock inversion. Just drop the reiserfs lock at this time, as it's not even needed but kept for paranoid reasons. This fixes: [ INFO: inconsistent lock state ] 2.6.33-rc5 #1 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/313 [HC0[0]:SC0[0]:HE1:SE1] takes: (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11118c8>] reiserfs_write_lock_once+0x28/0x50 {RECLAIM_FS-ON-W} state was registered at: [<c104ee32>] mark_held_locks+0x62/0x90 [<c104eefa>] lockdep_trace_alloc+0x9a/0xc0 [<c108f7b6>] kmem_cache_alloc+0x26/0xf0 [<c108621c>] __get_vm_area_node+0x6c/0xf0 [<c108690e>] __vmalloc_node+0x7e/0xa0 [<c1086aab>] vmalloc+0x2b/0x30 [<c110e1fb>] journal_init+0x6cb/0xa10 [<c10f90a2>] reiserfs_fill_super+0x342/0xb80 [<c1095665>] get_sb_bdev+0x145/0x180 [<c10f68e1>] get_super_block+0x21/0x30 [<c1094520>] vfs_kern_mount+0x40/0xd0 [<c1094609>] do_kern_mount+0x39/0xd0 [<c10aaa97>] do_mount+0x2c7/0x6d0 [<c10aaf06>] sys_mount+0x66/0xa0 [<c16198a7>] mount_block_root+0xc4/0x245 [<c1619a81>] mount_root+0x59/0x5f [<c1619b98>] prepare_namespace+0x111/0x14b [<c1619269>] kernel_init+0xcf/0xdb [<c100303a>] kernel_thread_helper+0x6/0x1c irq event stamp: 63236801 hardirqs last enabled at (63236801): [<c134e7fa>] __mutex_unlock_slowpath+0x9a/0x120 hardirqs last disabled at (63236800): [<c134e799>] __mutex_unlock_slowpath+0x39/0x120 softirqs last enabled at (63218800): [<c102f451>] __do_softirq+0xc1/0x110 softirqs last disabled at (63218789): [<c102f4ed>] do_softirq+0x4d/0x60 other info that might help us debug this: 2 locks held by kswapd0/313: #0: (shrinker_rwsem){++++..}, at: [<c1074bb4>] shrink_slab+0x24/0x170 #1: (&type->s_umount_key#19){++++..}, at: [<c10a2edd>] shrink_dcache_memory+0xfd/0x1a0 stack backtrace: Pid: 313, comm: kswapd0 Not tainted 2.6.33-rc5 #1 Call Trace: [<c134db2c>] ? printk+0x18/0x1c [<c104e7ef>] print_usage_bug+0x15f/0x1a0 [<c104ebcf>] mark_lock+0x39f/0x5a0 [<c104d66b>] ? trace_hardirqs_off+0xb/0x10 [<c1052c50>] ? check_usage_forwards+0x0/0xf0 [<c1050c24>] __lock_acquire+0x214/0xa70 [<c10438c5>] ? sched_clock_cpu+0x95/0x110 [<c10514fa>] lock_acquire+0x7a/0xa0 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c134f03f>] mutex_lock_nested+0x5f/0x2b0 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c11118c8>] reiserfs_write_lock_once+0x28/0x50 [<c10f05b0>] reiserfs_delete_inode+0x50/0x140 [<c10a653f>] ? generic_delete_inode+0x5f/0x150 [<c10f0560>] ? reiserfs_delete_inode+0x0/0x140 [<c10a657c>] generic_delete_inode+0x9c/0x150 [<c10a666d>] generic_drop_inode+0x3d/0x60 [<c10a5597>] iput+0x47/0x50 [<c10a2a4f>] dentry_iput+0x6f/0xf0 [<c10a2af4>] d_kill+0x24/0x50 [<c10a2d3d>] __shrink_dcache_sb+0x21d/0x2b0 [<c10a2f0f>] shrink_dcache_memory+0x12f/0x1a0 [<c1074c9e>] shrink_slab+0x10e/0x170 [<c1075177>] kswapd+0x477/0x6a0 [<c1072d10>] ? isolate_pages_global+0x0/0x1b0 [<c103e160>] ? autoremove_wake_function+0x0/0x40 [<c1074d00>] ? kswapd+0x0/0x6a0 [<c103de6c>] kthread+0x6c/0x80 [<c103de00>] ? kthread+0x0/0x80 [<c100303a>] kernel_thread_helper+0x6/0x1c Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Chris Mason <chris.mason@oracle.com>
2010-01-26fix oops in fs/9p late mount failureAl Viro
if 9P ->get_sb() fails late (at root inode or root dentry allocation), we'll hit its ->kill_sb() with NULL ->s_root Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fix leak in romfs_fill_super()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26get rid of pointless checks after simple_pin_fs()Al Viro
if we'd just got success from it, vfsmount won't be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix failure exits in bfs_fill_super()Al Viro
double iput(), leaks... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fix affs parse_options()Al Viro
Error handling in that sucker got broken back in 2003. If function returns 0 on failure, it's not nice to add return -EINVAL into it. Adding return 1 on other failure exits is also not a good thing (and yes, original success exits with 1 and some of failure exits with 0 are still there; so's the original logics in callers). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix remount races with symlink handling in affsAl Viro
A couple of fields in affs_sb_info is used in follow_link() and symlink() for handling AFFS "absolute" symlinks. Need locking against affs_remount() updates. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix a leak in affs_fill_super()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fnctl: f_modown should call write_lock_irqsave/restoreGreg Kroah-Hartman
Commit 703625118069f9f8960d356676662d3db5a9d116 exposed that f_modown() should call write_lock_irqsave instead of just write_lock_irq so that because a caller could have a spinlock held and it would not be good to renable interrupts. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tavis Ormandy <taviso@google.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-26NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctlyTrond Myklebust
Even if the server is crazy, we should be able to mark the stateid as being bad, to ensure it gets recovered. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarilyTrond Myklebust
Currently, nfs4_handle_exception() will call it twice if called with an error of -NFS4ERR_STALE_CLIENTID, -NFS4ERR_STALE_STATEID or -NFS4ERR_EXPIRED. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFSv4: Don't allow posix locking against servers that don't support itTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFSv4: Ensure that the NFSv4 locking can recover from stateid errorsTrond Myklebust
In most cases, we just want to mark the lock_stateid sequence id as being uninitialised. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFS: Avoid warnings when CONFIG_NFS_V4=nDavid Howells
Avoid the following warnings when CONFIG_NFS_V4=n: fs/nfs/sysctl.c:19: warning: unused variable `nfs_set_port_max' fs/nfs/sysctl.c:18: warning: unused variable `nfs_set_port_min' by making those variables contingent on NFSv4 being configured. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFS: Make nfs_commitdata_release staticH Hartley Sweeten
The symbol nfs_commitdata_release is only used locally in this file. Make it static to prevent the following sparse warning: warning: symbol 'nfs_commitdata_release' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFS: Try to commit unstable writes in nfs_release_page()Trond Myklebust
If someone calls nfs_release_page(), we presumably already know that the page is clean, however it may be holding an unstable write. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-26NFS: Fix a reference leak in nfs_wb_cancel_page()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
2010-01-25ocfs2/dlm: Print more messages during lock migrationSunil Mushran
When a lock resource is migrated, the dlm compares the migrated locks with that that was already existing on the new node. If the comparison fails, it BUGs. This patch prints more messages when the comparison fails inorder to help with the root cause analyis. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1206 This does not fix bz1206. However, if we run into it again, we will have more information to chew on. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2/dlm: Ignore LVBs of locks in the Blocked listSunil Mushran
During lock resource migration, o2dlm fills the packet with a LVB from the first valid lock. For sanity, it ensures that the other valid locks have the same LVB. If not, it BUGs. The valid locks are ones that have granted EX or PR lock levels and are either on the Granted or Converting lists. Locks in the Blocked list cannot have a valid LVB. This patch ensures that we skip the locks in the Blocked list. Fixes oss bugzilla#1202 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1202 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>