summaryrefslogtreecommitdiff
path: root/fs/btrfs/locking.c
AgeCommit message (Collapse)Author
2019-09-09btrfs: move cond_wake_up functions out of ctreeDavid Sterba
The file ctree.h serves as a header for everything and has become quite bloated. Split some helpers that are generic and create a new file that should be the catch-all for code that's not btrfs-specific. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Remove unused locking functionsNikolay Borisov
Those were split out of btrfs_clear_lock_blocking_rw by aa12c02778a9 ("btrfs: split btrfs_clear_lock_blocking_rw to read and write helpers") however at that time this function was unused due to commit 523983401644 ("Btrfs: kill btrfs_clear_path_blocking"). Put the final nail in the coffin of those 2 functions. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-25btrfs: Fix deadlock caused by missing memory barrierNikolay Borisov
Commit 06297d8cefca ("btrfs: switch extent_buffer blocking_writers from atomic to int") changed the type of blocking_writers but forgot to adjust relevant code in btrfs_tree_unlock by converting the smp_mb__after_atomic to smp_mb. This opened up the possibility of a deadlock due to re-ordering of setting blocking_writers and checking/waking up the waiter. This particular lockup is explained in a comment above waitqueue_active() function. Fix it by converting the memory barrier to a full smp_mb, accounting for the fact that blocking_writers is a simple integer. Fixes: 06297d8cefca ("btrfs: switch extent_buffer blocking_writers from atomic to int") Tested-by: Johannes Thumshirn <jthumshirn@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: switch extent_buffer write_locks from atomic to intDavid Sterba
The write_locks is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: switch extent_buffer spinning_writers from atomic to intDavid Sterba
The spinning_writers is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: switch extent_buffer blocking_writers from atomic to intDavid Sterba
The blocking_writers is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: trace: Introduce trace events for all btrfs tree locking eventsQu Wenruo
Unlike btrfs_tree_lock() and btrfs_tree_read_lock(), the remaining functions in locking.c will not sleep, thus doesn't make much sense to record their execution time. Those events are introduced mainly for user space tool to audit and detect lock leakage or dead lock. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: trace: Introduce trace events for sleepable tree lockQu Wenruo
There are two tree lock events which can sleep: - btrfs_tree_read_lock() - btrfs_tree_lock() Sometimes we may need to look into the concurrency picture of the fs. For that case, we need the execution time of above two functions and the owner of @eb. Here we introduce a trace events for user space tools like bcc, to get the execution time of above two functions, and get detailed owner info where eBPF code can't. All the overhead is hidden behind the trace events, so if events are not enabled, there is no overhead. These trace events also output bytenr and generation, allow them to be pared with unlock events to pin down deadlock. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: switch extent_buffer::lock_nested to boolDavid Sterba
The member is tracking simple status of the lock, we can use bool for that and make some room for further space reduction in the structure. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: use assertion helpers for extent buffer write lock countersDavid Sterba
Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::write_locks become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: add assertion helpers for extent buffer write lock countersDavid Sterba
The write_locks are a simple counter to track locking balance and used to assert tree locks. Add helpers to make it conditionally work only in DEBUG builds. Will be used in followup patches. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: use assertion helpers for extent buffer read lock countersDavid Sterba
Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::read_locks become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: add assertion helpers for extent buffer read lock countersDavid Sterba
The read_locks are a simple counter to track locking balance and used to assert tree locks. Add helpers to make it conditionally work only in DEBUG builds. Will be used in followup patches. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: use assertion helpers for spinning readersDavid Sterba
Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::spining_readers become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: add assertion helpers for spinning readersDavid Sterba
Add helpers for conditional DEBUG build to assert that the extent buffer spinning_readers constraints are met. Will be used in followup patches. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: use assertion helpers for spinning writersDavid Sterba
Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::spining_writers become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: add assertion helpers for spinning writersDavid Sterba
Add helpers for conditional DEBUG build to assert that the extent buffer spinning_writers constraints are met. Will be used in followup patches. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: simplify waiting loop in btrfs_tree_lockDavid Sterba
Currently, the number of readers and writers is checked and in case there are any, wait and redo the locks. There's some duplication before the branches go back to again label, eg. calling wait_event on blocking_readers twice. The sequence is transformed loop: * wait for readers * wait for writers * write_lock * check readers, unlock and wait for readers, loop * check writers, unlock and wait for writers, loop The new sequence is not exactly the same due to the simplification, for readers it's slightly faster. For the writers, original code does * wait for writers * (loop) wait for readers * wait for writers -- again while the new goes directly to the reader check. This should behave the same on a contended lock with multiple writers and readers, but can reduce number of times we're waiting on something. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: split btrfs_clear_lock_blocking_rw to read and write helpersDavid Sterba
There are many callers that hardcode the desired lock type so we can avoid the switch and call them directly. Split the current function to two. There are no remaining users of btrfs_clear_lock_blocking_rw so it's removed. The call sites will be converted in followup patches. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: split btrfs_set_lock_blocking_rw to read and write helpersDavid Sterba
There are many callers that hardcode the desired lock type so we can avoid the switch and call them directly. Split the current function to two but leave a helper that still takes the variable lock type to make current code compile. The call sites will be converted in followup patches. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: replace waitqueue_actvie with cond_wake_upDavid Sterba
Use the wrappers and reduce the amount of low-level details about the waitqueue management. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12btrfs: replace GPL boilerplate by SPDX -- sourcesDavid Sterba
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: Relax memory barrier in btrfs_tree_unlockNikolay Borisov
When performing an unlock on an extent buffer we'd like to order the decrement of extent_buffer::blocking_writers with waking up any waiters. In such situations it's sufficient to use smp_mb__after_atomic rather than the heavy smp_mb. On architectures where atomic operations are fully ordered (such as x86 or s390) unconditionally executing a heavyweight smp_mb instruction causes a severe hit to performance while bringin no improvements in terms of correctness. The better thing is to use the appropriate smp_mb__after_atomic routine which will do the correct thing (invoke a full smp_mb or in the case of ordered atomics insert a compiler barrier). Put another way, an RMW atomic op + smp_load__after_atomic equals, in terms of semantics, to a full smp_mb. This ensures that none of the problems described in the accompanying comment of waitqueue_active occur. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07btrfs: cleanup, remove stray return statementsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-10btrfs: comment the rest of implicit barriers before waitqueue_activeDavid Sterba
There are atomic operations that imply the barrier for waitqueue_active mixed in an if-condition. Signed-off-by: David Sterba <dsterba@suse.com>
2015-10-10btrfs: add comments to barriers before waitqueue_activeDavid Sterba
Reduce number of undocumented barriers out there. Signed-off-by: David Sterba <dsterba@suse.com>
2015-08-09btrfs: Add WARN_ON() for double lock in btrfs_tree_lock()Zhaolei
When a task trying to double lock a extent buffer, there are no lockdep warning about it because this lock may be in "blocking_lock" state, and make us hard to debug. This patch add a WARN_ON() for above condition, it can not report all deadlock cases(as lock between tasks), but at least helps us some. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-11-19btrfs: fix lockups from btrfs_clear_path_blockingChris Mason
The fair reader/writer locks mean that btrfs_clear_path_blocking needs to strictly follow lock ordering rules even when we already have blocking locks on a given path. Before we can clear a blocking lock on the path, we need to make sure all of the locks have been converted to blocking. This will remove lock inversions against anyone spinning in write_lock() against the buffers we're trying to get read locks on. These inversions didn't exist before the fair read/writer locks, but now we need to be more careful. We papered over this deadlock in the past by changing btrfs_try_read_lock() to be a true trylock against both the spinlock and the blocking lock. This was slower, and not sufficient to fix all the deadlocks. This patch adds a btrfs_tree_read_lock_atomic(), which basically means get the spinlock but trylock on the blocking lock. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reported-by: Patrick Schmid <schmid@phys.ethz.ch> cc: stable@vger.kernel.org #v3.15+
2014-06-19Btrfs: fix deadlocks with trylock on tree nodesChris Mason
The Btrfs tree trylock function is poorly named. It always takes the spinlock and backs off if the blocking lock is held. This can lead to surprising lockups because people expect it to really be a trylock. This commit makes it a pure trylock, both for the spinlock and the blocking lock. It also reworks the nested lock handling slightly to avoid taking the read lock while a spinning write lock might be held. Signed-off-by: Chris Mason <clm@fb.com>
2013-05-06btrfs: make static code static & remove dead codeEric Sandeen
Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: save us a read_lockLiu Bo
This does not change the logic of code, but can save us a read_lock. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28Btrfs: fix a misplaced address operator in a conditionStefan Behrens
This should obviously not be "if (&flag)" but "if (flag)". Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-07-23Btrfs: reduce calls to wake_up on uncontended locksChris Mason
The btrfs locks were unconditionally calling wake_up as the locks were released. This lead to extra thrashing on the waitqueue, especially for locks that were dominated by readers. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-03-22btrfs: return void in functions without error conditionsJeff Mahoney
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
2012-01-04Btrfs: add nested locking mode for pathsArne Jansen
This patch adds the possibilty to read-lock an extent even if it is already write-locked from the same thread. btrfs_find_all_roots() needs this capability. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-07-27Btrfs: switch the btrfs tree locks to reader/writerChris Mason
The btrfs metadata btree is the source of significant lock contention, especially in the root node. This commit changes our locking to use a reader/writer lock. The lock is built on top of rw spinlocks, and it extends the lock tracking to remember if we have a read lock or a write lock when we go to blocking. Atomics count the number of blocking readers or writers at any given time. It removes all of the adaptive spinning from the old code and uses only the spinning/blocking hints inside of btrfs to decide when it should continue spinning. In read heavy workloads this is dramatically faster. In write heavy workloads we're still faster because of less contention on the root node lock. We suffer slightly in dbench because we schedule more often during write locks, but all other benchmarks so far are improved. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-06btrfs: remove all unused functionsDavid Sterba
Remove static and global declarations and/or definitions. Reduces size of btrfs.ko by ~3.4kB. text data bss dec hex filename 402081 7464 200 409745 64091 btrfs.ko.base 398620 7144 200 405964 631cc btrfs.ko.remove-all Signed-off-by: David Sterba <dsterba@suse.cz>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-04-02Btrfs: fix typos in commentsWu Fengguang
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-03-24Btrfs: leave btree locks spinning more oftenChris Mason
btrfs_mark_buffer dirty would set dirty bits in the extent_io tree for the buffers it was dirtying. This may require a kmalloc and it was not atomic. So, anyone who called btrfs_mark_buffer_dirty had to set any btree locks they were holding to blocking first. This commit changes dirty tracking for extent buffers to just use a flag in the extent buffer. Now that we have one and only one extent buffer per page, this can be safely done without losing dirty bits along the way. This also introduces a path->leave_spinning flag that callers of btrfs_search_slot can use to indicate they will properly deal with a path returned where all the locks are spinning instead of blocking. Many of the btree search callers now expect spinning paths, resulting in better btree concurrency overall. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-03-24Btrfs: Check for a blocking lock before taking the spinChris Mason
This reduces contention on the extent buffer spin locks by testing for a blocking lock before trying to take the spinlock. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-03-09Btrfs: fix spinlock assertions on UP systemsChris Mason
btrfs_tree_locked was being used to make sure a given extent_buffer was properly locked in a few places. But, it wasn't correct for UP compiled kernels. This switches it to using assert_spin_locked instead, and renames it to btrfs_assert_tree_locked to better reflect how it was really being used. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-12Btrfs: make a lockdep class for the extent buffer locksChris Mason
Btrfs is currently using spin_lock_nested with a nested value based on the tree depth of the block. But, this doesn't quite work because the max tree depth is bigger than what spin_lock_nested can deal with, and because locks are sometimes taken before the level field is filled in. The solution here is to use lockdep_set_class_and_name instead, and to set the class before unlocking the pages when the block is read from the disk and just after init of a freshly allocated tree block. btrfs_clear_path_blocking is also changed to take the locks in the proper order, and it also makes sure all the locks currently held are properly set to blocking before it tries to retake the spinlocks. Otherwise, lockdep gets upset about bad lock orderin. The lockdep magic cam from Peter Zijlstra <peterz@infradead.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-09Btrfs: don't use spin_is_contendedChris Mason
Btrfs was using spin_is_contended to see if it should drop locks before doing extent allocations during btrfs_search_slot. The idea was to avoid expensive searches in the tree unless the lock was actually contended. But, spin_is_contended is specific to the ticket spinlocks on x86, so this is causing compile errors everywhere else. In practice, the contention could easily appear some time after we started doing the extent allocation, and it makes more sense to always drop the lock instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: Change btree locking to use explicit blocking pointsChris Mason
Most of the btrfs metadata operations can be protected by a spinlock, but some operations still need to schedule. So far, btrfs has been using a mutex along with a trylock loop, most of the time it is able to avoid going for the full mutex, so the trylock loop is a big performance gain. This commit is step one for getting rid of the blocking locks entirely. btrfs_tree_lock takes a spinlock, and the code explicitly switches to a blocking lock when it starts an operation that can schedule. We'll be able get rid of the blocking locks in smaller pieces over time. Tracing allows us to find the most common cause of blocking, so we can start with the hot spots first. The basic idea is: btrfs_tree_lock() returns with the spin lock held btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in the extent buffer flags, and then drops the spin lock. The buffer is still considered locked by all of the btrfs code. If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops the spin lock and waits on a wait queue for the blocking bit to go away. Much of the code that needs to set the blocking bit finishes without actually blocking a good percentage of the time. So, an adaptive spin is still used against the blocking bit to avoid very high context switch rates. btrfs_clear_lock_blocking() clears the blocking bit and returns with the spinlock held again. btrfs_tree_unlock() can be called on either blocking or spinning locks, it does the right thing based on the blocking bit. ctree.c has a helper function to set/clear all the locked buffers in a path as blocking. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05Btrfs: Fix checkpatch.pl warningsChris Mason
There were many, most are fixed now. struct-funcs.c generates some warnings but these are bogus. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-29Btrfs: add and improve commentsChris Mason
This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25btrfs_search_slot: reduce lock contention by cowing in two stagesChris Mason
A btree block cow has two parts, the first is to allocate a destination block and the second is to copy the old bock over. The first part needs locks in the extent allocation tree, and may need to do IO. This changeset splits that into a separate function that can be called without any tree locks held. btrfs_search_slot is changed to drop its path and start over if it has to COW a contended block. This often means that many writers will pre-alloc a new destination for a the same contended block, but they cache their prealloc for later use on lower levels in the tree. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: implement memory reclaim for leaf reference cacheYan
The memory reclaiming issue happens when snapshot exists. In that case, some cache entries may not be used during old snapshot dropping, so they will remain in the cache until umount. The patch adds a field to struct btrfs_leaf_ref to record create time. Besides, the patch makes all dead roots of a given snapshot linked together in order of create time. After a old snapshot was completely dropped, we check the dead root list and remove all cache entries created before the oldest dead root in the list. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Fix some build problems on 2.6.18 based enterprise kernelsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>