diff options
author | Filipe Manana <fdmanana@suse.com> | 2021-01-27 10:35:00 +0000 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2021-02-08 22:59:01 +0100 |
commit | d0c2f4fa555e70324ec2a129b822ab58f172cc62 (patch) | |
tree | bb295da4e494d897cb6429f8594f128a23c6cba9 /fs | |
parent | 64d6b281ba4db044c946158387c74e1149b9487e (diff) |
btrfs: make concurrent fsyncs wait less when waiting for a transaction commit
Often an fsync needs to fallback to a transaction commit for several
reasons (to ensure consistency after a power failure, a new block group
was allocated or a temporary error such as ENOMEM or ENOSPC happened).
In that case the log is marked as needing a full commit and any concurrent
tasks attempting to log inodes or commit the log will also fallback to the
transaction commit. When this happens they all wait for the task that first
started the transaction commit to finish the transaction commit - however
they wait until the full transaction commit happens, which is not needed,
as they only need to wait for the superblocks to be persisted and not for
unpinning all the extents pinned during the transaction's lifetime, which
even for short lived transactions can be a few thousand and take some
significant amount of time to complete - for dbench workloads I have
observed up to 4~5 milliseconds of time spent unpinning extents in the
worst cases, and the number of pinned extents was between 2 to 3 thousand.
So allow fsync tasks to skip waiting for the unpinning of extents when
they call btrfs_commit_transaction() and they were not the task that
started the transaction commit (that one has to do it, the alternative
would be to offload the transaction commit to another task so that it
could avoid waiting for the extent unpinning or offload the extent
unpinning to another task).
This patch is part of a patchset comprised of the following patches:
btrfs: remove unnecessary directory inode item update when deleting dir entry
btrfs: stop setting nbytes when filling inode item for logging
btrfs: avoid logging new ancestor inodes when logging new inode
btrfs: skip logging directories already logged when logging all parents
btrfs: skip logging inodes already logged when logging new entries
btrfs: remove unnecessary check_parent_dirs_for_sync()
btrfs: make concurrent fsyncs wait less when waiting for a transaction commit
After applying the entire patchset, dbench shows improvements in respect
to throughput and latency. The script used to measure it is the following:
$ cat dbench-test.sh
#!/bin/bash
DEV=/dev/sdk
MNT=/mnt/sdk
MOUNT_OPTIONS="-o ssd"
MKFS_OPTIONS="-m single -d single"
echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
umount $DEV &> /dev/null
mkfs.btrfs -f $MKFS_OPTIONS $DEV
mount $MOUNT_OPTIONS $DEV $MNT
dbench -D $MNT -t 300 64
umount $MNT
The test was run on a physical machine with 12 cores (Intel corei7), 64G
of ram, using a NVMe device and a non-debug kernel configuration (Debian's
default configuration).
Before applying patchset, 32 clients:
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 9627107 0.153 61.938
Close 7072076 0.001 3.175
Rename 407633 1.222 44.439
Unlink 1943895 0.658 44.440
Deltree 256 17.339 110.891
Mkdir 128 0.003 0.009
Qpathinfo 8725406 0.064 17.850
Qfileinfo 1529516 0.001 2.188
Qfsinfo 1599884 0.002 1.457
Sfileinfo 784200 0.005 3.562
Find 3373513 0.411 30.312
WriteX 4802132 0.053 29.054
ReadX 15089959 0.002 5.801
LockX 31344 0.002 0.425
UnlockX 31344 0.001 0.173
Flush 674724 5.952 341.830
Throughput 1008.02 MB/sec 32 clients 32 procs max_latency=341.833 ms
After applying patchset, 32 clients:
After patchset, with 32 clients:
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 9931568 0.111 25.597
Close 7295730 0.001 2.171
Rename 420549 0.982 49.714
Unlink 2005366 0.497 39.015
Deltree 256 11.149 89.242
Mkdir 128 0.002 0.014
Qpathinfo 9001863 0.049 20.761
Qfileinfo 1577730 0.001 2.546
Qfsinfo 1650508 0.002 3.531
Sfileinfo 809031 0.005 5.846
Find 3480259 0.309 23.977
WriteX 4952505 0.043 41.283
ReadX 15568127 0.002 5.476
LockX 32338 0.002 0.978
UnlockX 32338 0.001 2.032
Flush 696017 7.485 228.835
Throughput 1049.91 MB/sec 32 clients 32 procs max_latency=228.847 ms
--> +4.1% throughput, -39.6% max latency
Before applying patchset, 64 clients:
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 8956748 0.342 108.312
Close 6579660 0.001 3.823
Rename 379209 2.396 81.897
Unlink 1808625 1.108 131.148
Deltree 256 25.632 172.176
Mkdir 128 0.003 0.018
Qpathinfo 8117615 0.131 55.916
Qfileinfo 1423495 0.001 2.635
Qfsinfo 1488496 0.002 5.412
Sfileinfo 729472 0.007 8.643
Find 3138598 0.855 78.321
WriteX 4470783 0.102 79.442
ReadX 14038139 0.002 7.578
LockX 29158 0.002 0.844
UnlockX 29158 0.001 0.567
Flush 627746 14.168 506.151
Throughput 924.738 MB/sec 64 clients 64 procs max_latency=506.154 ms
After applying patchset, 64 clients:
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 9069003 0.303 43.193
Close 6662328 0.001 3.888
Rename 383976 2.194 46.418
Unlink 1831080 1.022 43.873
Deltree 256 24.037 155.763
Mkdir 128 0.002 0.005
Qpathinfo 8219173 0.137 30.233
Qfileinfo 1441203 0.001 3.204
Qfsinfo 1507092 0.002 4.055
Sfileinfo 738775 0.006 5.431
Find 3177874 0.936 38.170
WriteX 4526152 0.084 39.518
ReadX 14213562 0.002 24.760
LockX 29522 0.002 1.221
UnlockX 29522 0.001 0.694
Flush 635652 14.358 422.039
Throughput 990.13 MB/sec 64 clients 64 procs max_latency=422.043 ms
--> +6.8% throughput, -18.1% max latency
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs')
-rw-r--r-- | fs/btrfs/file.c | 1 | ||||
-rw-r--r-- | fs/btrfs/transaction.c | 39 | ||||
-rw-r--r-- | fs/btrfs/transaction.h | 2 |
3 files changed, 34 insertions, 8 deletions
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index d81ae1f518f2..be5350f5bedf 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2238,6 +2238,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) ret = PTR_ERR(trans); goto out_release_extents; } + trans->in_fsync = true; ret = btrfs_log_dentry_safe(trans, dentry, &ctx); btrfs_release_log_ctx_extents(&ctx); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index b83e8ae38cfc..00c0680dac3a 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -107,6 +107,11 @@ static const unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = { __TRANS_JOIN | __TRANS_JOIN_NOLOCK | __TRANS_JOIN_NOSTART), + [TRANS_STATE_SUPER_COMMITTED] = (__TRANS_START | + __TRANS_ATTACH | + __TRANS_JOIN | + __TRANS_JOIN_NOLOCK | + __TRANS_JOIN_NOSTART), [TRANS_STATE_COMPLETED] = (__TRANS_START | __TRANS_ATTACH | __TRANS_JOIN | @@ -826,10 +831,11 @@ btrfs_attach_transaction_barrier(struct btrfs_root *root) return trans; } -/* wait for a transaction commit to be fully complete */ -static noinline void wait_for_commit(struct btrfs_transaction *commit) +/* Wait for a transaction commit to reach at least the given state. */ +static noinline void wait_for_commit(struct btrfs_transaction *commit, + const enum btrfs_trans_state min_state) { - wait_event(commit->commit_wait, commit->state == TRANS_STATE_COMPLETED); + wait_event(commit->commit_wait, commit->state >= min_state); } int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid) @@ -884,7 +890,7 @@ int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid) goto out; /* nothing committing|committed */ } - wait_for_commit(cur_trans); + wait_for_commit(cur_trans, TRANS_STATE_COMPLETED); btrfs_put_transaction(cur_trans); out: return ret; @@ -2100,11 +2106,15 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) spin_lock(&fs_info->trans_lock); if (cur_trans->state >= TRANS_STATE_COMMIT_START) { + enum btrfs_trans_state want_state = TRANS_STATE_COMPLETED; + spin_unlock(&fs_info->trans_lock); refcount_inc(&cur_trans->use_count); - ret = btrfs_end_transaction(trans); - wait_for_commit(cur_trans); + if (trans->in_fsync) + want_state = TRANS_STATE_SUPER_COMMITTED; + ret = btrfs_end_transaction(trans); + wait_for_commit(cur_trans, want_state); if (TRANS_ABORTED(cur_trans)) ret = cur_trans->aborted; @@ -2118,13 +2128,19 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) wake_up(&fs_info->transaction_blocked_wait); if (cur_trans->list.prev != &fs_info->trans_list) { + enum btrfs_trans_state want_state = TRANS_STATE_COMPLETED; + + if (trans->in_fsync) + want_state = TRANS_STATE_SUPER_COMMITTED; + prev_trans = list_entry(cur_trans->list.prev, struct btrfs_transaction, list); - if (prev_trans->state != TRANS_STATE_COMPLETED) { + if (prev_trans->state < want_state) { refcount_inc(&prev_trans->use_count); spin_unlock(&fs_info->trans_lock); - wait_for_commit(prev_trans); + wait_for_commit(prev_trans, want_state); + ret = READ_ONCE(prev_trans->aborted); btrfs_put_transaction(prev_trans); @@ -2343,6 +2359,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) if (ret) goto scrub_continue; + /* + * We needn't acquire the lock here because there is no other task + * which can change it. + */ + cur_trans->state = TRANS_STATE_SUPER_COMMITTED; + wake_up(&cur_trans->commit_wait); + btrfs_finish_extent_commit(trans); if (test_bit(BTRFS_TRANS_HAVE_FREE_BGS, &cur_trans->flags)) diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 31ca81bad822..935bd6958a8a 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -16,6 +16,7 @@ enum btrfs_trans_state { TRANS_STATE_COMMIT_START, TRANS_STATE_COMMIT_DOING, TRANS_STATE_UNBLOCKED, + TRANS_STATE_SUPER_COMMITTED, TRANS_STATE_COMPLETED, TRANS_STATE_MAX, }; @@ -133,6 +134,7 @@ struct btrfs_trans_handle { bool can_flush_pending_bgs; bool reloc_reserved; bool dirty; + bool in_fsync; struct btrfs_root *root; struct btrfs_fs_info *fs_info; struct list_head new_bgs; |