summaryrefslogtreecommitdiff
path: root/fs/ocfs2/dcache.c
AgeCommit message (Collapse)Author
2015-04-15VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19assorted conversions to %p[dD]Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-03move d_rcu from overlapping d_child to overlapping d_aliasAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-03ocfs2: revert iput deferring code in ocfs2_drop_dentry_lockGoldwyn Rodrigues
The following patches are reverted in this patch because these patches caused performance regression in the remote unlink() calls. ea455f8ab683 - ocfs2: Push out dropping of dentry lock to ocfs2_wq f7b1aa69be13 - ocfs2: Fix deadlock on umount 5fd131893793 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount Previous patches in this series removed the possible deadlocks from downconvert thread so the above patches shouldn't be needed anymore. The regression is caused because these patches delay the iput() in case of dentry unlocks. This also delays the unlocking of the open lockres. The open lockresource is required to test if the inode can be wiped from disk or not. When the deleting node does not get the open lock, it marks it as orphan (even though it is not in use by another node/process) and causes a journal checkpoint. This delays operations following the inode eviction. This also moves the inode to the orphaned inode which further causes more I/O and a lot of unneccessary orphans. The following script can be used to generate the load causing issues: declare -a create declare -a remove declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384) unique="`mktemp -u XXXXX`" script="/tmp/idontknow-${unique}.sh" cat <<EOF > "${script}" for n in {1..8}; do mkdir -p test/dir\${n} eval touch test/dir\${n}/foo{1.."\$1"} done EOF chmod 700 "${script}" function fcreate () { exec 2>&1 /usr/bin/time --format=%E "${script}" "$1" } function fremove () { exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*" } function fcp () { exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new" } echo ------------------------------------------------- echo "| # files | create #s | copy #s | remove #s |" echo ------------------------------------------------- for ((x=0; x < ${#iterations[*]} ; x++)) do create[$x]="`fcreate ${iterations[$x]}`" copy[$x]="`fcp ${iterations[$x]}`" remove[$x]="`fremove`" printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]} done rm "${script}" echo "------------------------" Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-29ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-14stop passing nameidata * to ->d_revalidate()Al Viro
Just the lookup flags. Die, bastard, die... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14vfs: switch i_dentry/d_alias to hlistAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14ocfs2: use list_for_each_entry in ocfs2_find_local_alias()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-23ocfs2: Remove masklog ML_DCACHE.Tao Ma
Remove mlog(0) from fs/ocfs2/dcache.c and the masklog DCACHE. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-03-07ocfs2: Remove EXIT from masklog.Tao Ma
mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-21ocfs2: Remove ENTRY from masklog.Tao Ma
ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-03-10ocfs2: fix d_revalidate oopsen on NFS exportsAl Viro
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-07fs: dcache per-inode inode alias lockingNick Piggin
dcache_inode_lock can be replaced with per-inode locking. Use existing inode->i_lock for this. This is slightly non-trivial because we sometimes need to find the inode from the dentry, which requires d_inode to be stabilised (either with refcount or d_lock). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: rcu-walk aware d_revalidate methodNick Piggin
Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache rationalise dget variantsNick Piggin
dget_locked was a shortcut to avoid the lazy lru manipulation when we already held dcache_lock (lru manipulation was relatively cheap at that point). However, how that the lru lock is an innermost one, we never hold it at any caller, so the lock cost can now be avoided. We already have well working lazy dcache LRU, so it should be fine to defer LRU manipulations to scan time. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache remove dcache_lockNick Piggin
dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: scale inode alias listNick Piggin
Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list from concurrent modification. d_alias is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale d_unhashedNick Piggin
Protect d_unhashed(dentry) condition with d_lock. This means keeping DCACHE_UNHASHED bit in synch with hash manipulations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2010-11-18Ocfs2: Stop tracking a negative dentry after dentry_iput().Tristan Ye
I suddenly hit the problem during 2.6.37-rc1 regression test, which was introduced by commit '5e98d492406818e6a94c0ba54c61f59d40cefa4a'(Track negative entries v3), following scenario reproduces the issue easily: Node A Node B ================ ============ $touch testfile $ls testfile $rm -rf testfile $touch testfile $ls testfile ls: cannot access testfile: No such file or directory This patch stops tracking the dentry which was negativated by a inode deletion, so as to force the revaliation in next lookup, in case we'll touch the inode again in the same node. It didn't hurt the performance of multiple lookup for none-existed files anyway, while regresses a bit in the first try after a file deletion. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-10Track negative entries v3Goldwyn Rodrigues
Track negative dentries by recording the generation number of the parent directory in d_fsdata. The generation number for the parent directory is recorded in the inode_info, which increments every time the lock on the directory is dropped. If the generation number of the parent directory and the negative dentry matches, there is no need to perform the revalidate, else a revalidate is forced. This improves performance in situations where nodes look for the same non-existent file multiple times. Thanks Mark for explaining the DLM sequence. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-27ocfs2: invalidate dentry if its dentry_lock isn't initialized.Tao Ma
In commit a5a0a630922a2f6a774b6dac19f70cb5abd86bb0, when ocfs2_attch_dentry_lock fails, we call an extra iput and reset dentry->d_fsdata to NULL. This resolve a bug, but it isn't completed and the dentry is still there. When we want to use it again, ocfs2_dentry_revalidate doesn't catch it and return true. That make future ocfs2_dentry_lock panic out. One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162. The resolution is to add a check for dentry->d_fsdata in revalidate process and return false if dentry->d_fsdata is NULL, so that a new ocfs2_lookup will be called again. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-21ocfs2: Fix deadlock on umountJan Kara
In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we moved the dentry lock put process into ocfs2_wq. This causes problems during umount because ocfs2_wq can drop references to inodes while they are being invalidated by invalidate_inodes() causing all sorts of nasty things (invalidate_inodes() ending in an infinite loop, "Busy inodes after umount" messages etc.). We fix the problem by stopping ocfs2_wq from doing any further releasing of inode references on the superblock being unmounted, wait until it finishes the current round of releasing and finally cleaning up all the references in dentry_lock_list from ocfs2_put_super(). The issue was tracked down by Tao Ma <tao.ma@oracle.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-04-23ocfs2: Add missing iput() during error handling in ocfs2_dentry_attach_lock()Sunil Mushran
In ocfs2_dentry_attach_lock(), if unable to get the dentry lock, we need to call iput(inode) because a failure here means no d_instantiate(), which means the normally matching iput() will not be called during dput(dentry). This patch fixes the oops that accompanies the following message: (3996,1):dlm_empty_lockres:2708 ERROR: lockres W00000000000000000a1046b06a4382 still has local locks! kernel BUG in dlm_empty_lockres at /rpmbuild/smushran/BUILD/ocfs2-1.4.2/fs/ocfs2/dlm/dlmmaster.c:2709! Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-03-27constify dentry_operations: OCFS2Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-02-02ocfs2: Push out dropping of dentry lock to ocfs2_wqJan Kara
Dropping of last reference to dentry lock is a complicated operation involving dropping of reference to inode. This can get complicated and quota code in particular needs to obtain some quota locks which leads to potential deadlock. Thus we defer dropping of inode reference to ocfs2_wq. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-01-25ocfs2: Remove mount/unmount votesMark Fasheh
The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27ocfs2: Remove bug statement in ocfs2_dentry_iput()Mark Fasheh
The existing bug statement didn't take into account unhashed dentries which might not have a cluster lock on them. This could happen if a node exporting the file system via NFS is rebooted, re-exported to nfs clients and then unmounted. It's fine in this case to not have a dentry cluster lock. Just remove the bug statement and replace it with an error print, which does the proper checks. Though we want to know if something has happened which might have prevented a cluster lock from being created, it's definitely not necessary to panic the machine for this. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06ocfs2: Re-order iput in ocfs2_drop_dentry_lockMark Fasheh
Do this to avoid a theoretical (I haven't seen this in practice) race where the downconvert thread might drop the dentry lock, allowing a remote unlink to proceed before dropping the inode locks. This could bounce access to the orphan dir between nodes. There doesn't seem to be a need to do the same in ocfs2_dentry_iput() as that's never called for the last ref drop from the downconvert thread. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-19Fix misspellings of "system", "controller", "interrupt" and "necessary".Robert P. J. Day
Fix the various misspellings of "system", controller", "interrupt" and "[un]necessary". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2006-09-24ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock()Mark Fasheh
We can't use LKM_LOCAL for new dentry locks because an unlink and subsequent re-create of a name/inode pair may result in the lock still being mastered somewhere in the cluster. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: manually d_move() during ocfs2_rename()Mark Fasheh
Make use of FS_RENAME_DOES_D_MOVE to avoid a race condition that can occur during ->rename() if we d_move() outside of the parent directory cluster locks, and another node discovers the new name (created during the rename) and unlinks it. d_move() will unconditionally rehash a dentry - which will leave stale data in the system. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Add dentry tracking APIMark Fasheh
Replace the dentry vote mechanism with a cluster lock which covers a set of dentries. This allows us to force d_delete() only on nodes which actually care about an unlink. Every node that does a ->lookup() gets a read only lock on the dentry, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. This patch adds the higher level API and the dentry manipulation code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Add new cluster lock typeMark Fasheh
Replace the dentry vote mechanism with a cluster lock which covers a set of dentries. This allows us to force d_delete() only on nodes which actually care about an unlink. Every node that does a ->lookup() gets a read only lock on the dentry, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. This patch adds the cluster lock type which OCFS2 can attach to dentries. A small number of fs/ocfs2/dcache.c functions are stubbed out so that this change can compile. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24ocfs2: don't use MLF* in the file systemMark Fasheh
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemMark Fasheh
The OCFS2 file system module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>