summaryrefslogtreecommitdiff
path: root/fs/btrfs/locking.h
AgeCommit message (Collapse)Author
2020-12-08btrfs: remove the recurse parameter from __btrfs_tree_read_lockJosef Bacik
It is completely unused now, remove it. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: merge back btrfs_read_lock_root_node helpersJosef Bacik
We no longer have recursive locking and there's no need for separate helpers that allowed the transition to rwsem with minimal code changes. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: locking: remove all the blocking helpersJosef Bacik
Now that we're using a rw_semaphore we no longer need to indicate if a lock is blocking or not, nor do we need to flip the entire path from blocking to spinning. Remove these helpers and all the places they are called. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: switch extent buffer tree lock to rw_semaphoreJosef Bacik
Historically we've implemented our own locking because we wanted to be able to selectively spin or sleep based on what we were doing in the tree. For instance, if all of our nodes were in cache then there's rarely a reason to need to sleep waiting for node locks, as they'll likely become available soon. At the time this code was written the rw_semaphore didn't do adaptive spinning, and thus was orders of magnitude slower than our home grown locking. However now the opposite is the case. There are a few problems with how we implement blocking locks, namely that we use a normal waitqueue and simply wake everybody up in reverse sleep order. This leads to some suboptimal performance behavior, and a lot of context switches in highly contended cases. The rw_semaphores actually do this properly, and also have adaptive spinning that works relatively well. The locking code is also a bit of a bear to understand, and we lose the benefit of lockdep for the most part because the blocking states of the lock are simply ad-hoc and not mapped into lockdep. So rework the locking code to drop all of this custom locking stuff, and simply use a rw_semaphore for everything. This makes the locking much simpler for everything, as we can now drop a lot of cruft and blocking transitions. The performance numbers vary depending on the workload, because generally speaking there doesn't tend to be a lot of contention on the btree. However, on my test system which is an 80 core single socket system with 256GiB of RAM and a 2TiB NVMe drive I get the following results (with all debug options off): dbench 200 baseline Throughput 216.056 MB/sec 200 clients 200 procs max_latency=1471.197 ms dbench 200 with patch Throughput 737.188 MB/sec 200 clients 200 procs max_latency=714.346 ms Previously we also used fs_mark to test this sort of contention, and those results are far less impressive, mostly because there's not enough tasks to really stress the locking fs_mark -d /d[0-15] -S 0 -L 20 -n 100000 -s 0 -t 16 baseline Average Files/sec: 160166.7 p50 Files/sec: 165832 p90 Files/sec: 123886 p99 Files/sec: 123495 real 3m26.527s user 2m19.223s sys 48m21.856s patched Average Files/sec: 164135.7 p50 Files/sec: 171095 p90 Files/sec: 122889 p99 Files/sec: 113819 real 3m29.660s user 2m19.990s sys 44m12.259s Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: introduce BTRFS_NESTING_NEW_ROOT for adding new rootsJosef Bacik
The way we add new roots is confusing from a locking perspective for lockdep. We generally have the rule that we lock things in order from highest level to lowest, but in the case of adding a new level to the tree we actually allocate a new block for the root, which makes the locking go in reverse. A similar issue exists for snapshotting, we cow the original root for the root of a new tree, however they're at the same level. Address this by using BTRFS_NESTING_NEW_ROOT for these operations. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: introduce BTRFS_NESTING_SPLIT for split blocksJosef Bacik
If we are splitting a leaf/node, we could do something like the following lock(leaf) BTRFS_NESTING_NORMAL lock(left) BTRFS_NESTING_LEFT + BTRFS_NESTING_COW push from leaf -> left reset path to point to left split left allocate new block, lock block BTRFS_NESTING_SPLIT at the new block point we need to have a different nesting level, because we have already used either BTRFS_NESTING_LEFT or BTRFS_NESTING_RIGHT when pushing items from the original leaf into the adjacent leaves. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: introduce BTRFS_NESTING_LEFT/RIGHT_COWJosef Bacik
For similar reasons as BTRFS_NESTING_COW, we need BTRFS_NESTING_LEFT/RIGHT_COW. The pattern is this lock leaf -> BTRFS_NESTING_NORMAL cow leaf -> BTRFS_NESTING_COW split leaf lock left -> BTRFS_NESTING_LEFT cow left -> BTRFS_NESTING_LEFT_COW We need this in order to indicate to lockdep that these locks are discrete and are being taken in a safe order. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: introduce BTRFS_NESTING_LEFT/BTRFS_NESTING_RIGHTJosef Bacik
Our lockdep maps are based on rootid+level, however in some cases we will lock adjacent blocks on the same level, namely in searching forward or in split/balance. Because of this lockdep will complain, so we need a separate subclass to indicate to lockdep that these are different locks. lock leaf -> BTRFS_NESTING_NORMAL cow leaf -> BTRFS_NESTING_COW split leaf lock left -> BTRFS_NESTING_LEFT lock right -> BTRFS_NESTING_RIGHT The above graph illustrates the need for this new nesting subclass. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: introduce BTRFS_NESTING_COW for cow'ing blocksJosef Bacik
When we COW a block we are holding a lock on the original block, and then we lock the new COW block. Because our lockdep maps are based on root + level, this will make lockdep complain. We need a way to indicate a subclass for locking the COW'ed block, so plumb through our btrfs_lock_nesting from btrfs_cow_block down to the btrfs_init_buffer, and then introduce BTRFS_NESTING_COW to be used for cow'ing blocks. The reason I've added all this extra infrastructure is because there will be need of different nesting classes in follow up patches. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: add nesting tags to the locking helpersJosef Bacik
We will need these when we switch to an rwsem, so plumb in the infrastructure here to use later on. I violate the 80 character limit some here because it'll be cleaned up later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: introduce btrfs_path::recurseJosef Bacik
Our current tree locking stuff allows us to recurse with read locks if we're already holding the write lock. This is necessary for the space cache inode, as we could be holding a lock on the root_tree root when we need to cache a block group, and thus need to be able to read down the root_tree to read in the inode cache. We can get away with this in our current locking, but we won't be able to with a rwsem. Handle this by purposefully annotating the places where we require recursion, so that in the future we can maybe come up with a way to avoid the recursion. In the case of the free space inode, this will be superseded by the free space tree. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23btrfs: Implement DREW lockNikolay Borisov
A (D)ouble (R)eader (W)riter (E)xclustion lock is a locking primitive that allows to have multiple readers or multiple writers but not multiple readers and writers holding it concurrently. The code is factored out from the existing open-coded locking scheme used to exclude pending snapshots from nocow writers and vice-versa. Current implementation actually favors Readers (that is snapshot creaters) to writers (nocow writers of the filesystem). The API provides lock/unlock/trylock for reads and writes. Formal specification for TLA+ provided by Valentin Schneider is at https://lore.kernel.org/linux-btrfs/2dcaf81c-f0d3-409e-cb29-733d8b3b4cc9@arm.com/ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: move btrfs_unlock_up_safe to other locking functionsDavid Sterba
The function belongs to the family of locking functions, so move it there. The 'noinline' keyword is dropped as it's now an exported function that does not need it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: move btrfs_set_path_blocking to other locking functionsDavid Sterba
The function belongs to the family of locking functions, so move it there. The 'noinline' keyword is dropped as it's now an exported function that does not need it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: make btrfs_assert_tree_locked static inlineDavid Sterba
The function btrfs_assert_tree_locked is used outside of the locking code so it is exported, however we can make it static inine as it's fairly trivial. This is the only locking assertion used in release builds, inlining improves the text size by 174 bytes and reduces stack consumption in the callers. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Remove unused locking functionsNikolay Borisov
Those were split out of btrfs_clear_lock_blocking_rw by aa12c02778a9 ("btrfs: split btrfs_clear_lock_blocking_rw to read and write helpers") however at that time this function was unused due to commit 523983401644 ("Btrfs: kill btrfs_clear_path_blocking"). Put the final nail in the coffin of those 2 functions. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: merge btrfs_set_lock_blocking_rw with it's callerDavid Sterba
The last caller that does not have a fixed value of lock is btrfs_set_path_blocking, that actually does the same conditional swtich by the lock type so we can merge the branches together and remove the helper. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: open code now trivial btrfs_set_lock_blockingDavid Sterba
btrfs_set_lock_blocking is now only a simple wrapper around btrfs_set_lock_blocking_write. The name does not bring any semantic value that could not be inferred from the new function so there's no point keeping it. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: replace btrfs_set_lock_blocking_rw with appropriate helpersDavid Sterba
We can use the right helper where the lock type is a fixed parameter. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: split btrfs_clear_lock_blocking_rw to read and write helpersDavid Sterba
There are many callers that hardcode the desired lock type so we can avoid the switch and call them directly. Split the current function to two. There are no remaining users of btrfs_clear_lock_blocking_rw so it's removed. The call sites will be converted in followup patches. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: split btrfs_set_lock_blocking_rw to read and write helpersDavid Sterba
There are many callers that hardcode the desired lock type so we can avoid the switch and call them directly. Split the current function to two but leave a helper that still takes the variable lock type to make current code compile. The call sites will be converted in followup patches. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12btrfs: replace GPL boilerplate by SPDX -- headersDavid Sterba
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Unify the include protection macros to match the file names. Signed-off-by: David Sterba <dsterba@suse.com>
2014-11-19btrfs: fix lockups from btrfs_clear_path_blockingChris Mason
The fair reader/writer locks mean that btrfs_clear_path_blocking needs to strictly follow lock ordering rules even when we already have blocking locks on a given path. Before we can clear a blocking lock on the path, we need to make sure all of the locks have been converted to blocking. This will remove lock inversions against anyone spinning in write_lock() against the buffers we're trying to get read locks on. These inversions didn't exist before the fair read/writer locks, but now we need to be more careful. We papered over this deadlock in the past by changing btrfs_try_read_lock() to be a true trylock against both the spinlock and the blocking lock. This was slower, and not sufficient to fix all the deadlocks. This patch adds a btrfs_tree_read_lock_atomic(), which basically means get the spinlock but trylock on the blocking lock. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reported-by: Patrick Schmid <schmid@phys.ethz.ch> cc: stable@vger.kernel.org #v3.15+
2013-03-14Btrfs: remove btrfs_try_spin_lockLiu Bo
Remove a useless function declaration Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-03-22btrfs: return void in functions without error conditionsJeff Mahoney
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
2011-07-27Btrfs: switch the btrfs tree locks to reader/writerChris Mason
The btrfs metadata btree is the source of significant lock contention, especially in the root node. This commit changes our locking to use a reader/writer lock. The lock is built on top of rw spinlocks, and it extends the lock tracking to remember if we have a read lock or a write lock when we go to blocking. Atomics count the number of blocking readers or writers at any given time. It removes all of the adaptive spinning from the old code and uses only the spinning/blocking hints inside of btrfs to decide when it should continue spinning. In read heavy workloads this is dramatically faster. In write heavy workloads we're still faster because of less contention on the root node lock. We suffer slightly in dbench because we schedule more often during write locks, but all other benchmarks so far are improved. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-05-06btrfs: remove all unused functionsDavid Sterba
Remove static and global declarations and/or definitions. Reduces size of btrfs.ko by ~3.4kB. text data bss dec hex filename 402081 7464 200 409745 64091 btrfs.ko.base 398620 7144 200 405964 631cc btrfs.ko.remove-all Signed-off-by: David Sterba <dsterba@suse.cz>
2009-03-09Btrfs: fix spinlock assertions on UP systemsChris Mason
btrfs_tree_locked was being used to make sure a given extent_buffer was properly locked in a few places. But, it wasn't correct for UP compiled kernels. This switches it to using assert_spin_locked instead, and renames it to btrfs_assert_tree_locked to better reflect how it was really being used. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-09Btrfs: don't use spin_is_contendedChris Mason
Btrfs was using spin_is_contended to see if it should drop locks before doing extent allocations during btrfs_search_slot. The idea was to avoid expensive searches in the tree unless the lock was actually contended. But, spin_is_contended is specific to the ticket spinlocks on x86, so this is causing compile errors everywhere else. In practice, the contention could easily appear some time after we started doing the extent allocation, and it makes more sense to always drop the lock instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: Change btree locking to use explicit blocking pointsChris Mason
Most of the btrfs metadata operations can be protected by a spinlock, but some operations still need to schedule. So far, btrfs has been using a mutex along with a trylock loop, most of the time it is able to avoid going for the full mutex, so the trylock loop is a big performance gain. This commit is step one for getting rid of the blocking locks entirely. btrfs_tree_lock takes a spinlock, and the code explicitly switches to a blocking lock when it starts an operation that can schedule. We'll be able get rid of the blocking locks in smaller pieces over time. Tracing allows us to find the most common cause of blocking, so we can start with the hot spots first. The basic idea is: btrfs_tree_lock() returns with the spin lock held btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in the extent buffer flags, and then drops the spin lock. The buffer is still considered locked by all of the btrfs code. If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops the spin lock and waits on a wait queue for the blocking bit to go away. Much of the code that needs to set the blocking bit finishes without actually blocking a good percentage of the time. So, an adaptive spin is still used against the blocking bit to avoid very high context switch rates. btrfs_clear_lock_blocking() clears the blocking bit and returns with the spinlock held again. btrfs_tree_unlock() can be called on either blocking or spinning locks, it does the right thing based on the blocking bit. ctree.c has a helper function to set/clear all the locked buffers in a path as blocking. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25btrfs_search_slot: reduce lock contention by cowing in two stagesChris Mason
A btree block cow has two parts, the first is to allocate a destination block and the second is to copy the old bock over. The first part needs locks in the extent allocation tree, and may need to do IO. This changeset splits that into a separate function that can be called without any tree locks held. btrfs_search_slot is changed to drop its path and start over if it has to COW a contended block. This often means that many writers will pre-alloc a new destination for a the same contended block, but they cache their prealloc for later use on lower levels in the tree. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Start btree concurrency work.Chris Mason
The allocation trees and the chunk trees are serialized via their own dedicated mutexes. This means allocation location is still not very fine grained. The main FS btree is protected by locks on each block in the btree. Locks are taken top / down, and as processing finishes on a given level of the tree, the lock is released after locking the lower level. The end result of a search is now a path where only the lowest level is locked. Releasing or freeing the path drops any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>