summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-04-27io_uring: statx must grab the file table for valid fdJens Axboe
Clay reports that OP_STATX fails for a test case with a valid fd and empty path: -- Test 0: statx:fd 3: SUCCEED, file mode 100755 -- Test 1: statx:path ./uring_statx: SUCCEED, file mode 100755 -- Test 2: io_uring_statx:fd 3: FAIL, errno 9: Bad file descriptor -- Test 3: io_uring_statx:path ./uring_statx: SUCCEED, file mode 100755 This is due to statx not grabbing the process file table, hence we can't lookup the fd in async context. If the fd is valid, ensure that we grab the file table so we can grab the file from async context. Cc: stable@vger.kernel.org # v5.6 Reported-by: Clay Harris <bugs@claycon.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-27btrfs: transaction: Avoid deadlock due to bad initialization timing of ↵Qu Wenruo
fs_info::journal_info [BUG] One run of btrfs/063 triggered the following lockdep warning: ============================================ WARNING: possible recursive locking detected 5.6.0-rc7-custom+ #48 Not tainted -------------------------------------------- kworker/u24:0/7 is trying to acquire lock: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] but task is already holding lock: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_internal#2); lock(sb_internal#2); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/u24:0/7: #0: ffff88817b495948 ((wq_completion)btrfs-endio-write){+.+.}, at: process_one_work+0x557/0xb80 #1: ffff888189ea7db8 ((work_completion)(&work->normal_work)){+.+.}, at: process_one_work+0x557/0xb80 #2: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] #3: ffff888174ca4da8 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x83/0xd0 [btrfs] stack backtrace: CPU: 0 PID: 7 Comm: kworker/u24:0 Not tainted 5.6.0-rc7-custom+ #48 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] Call Trace: dump_stack+0xc2/0x11a __lock_acquire.cold+0xce/0x214 lock_acquire+0xe6/0x210 __sb_start_write+0x14e/0x290 start_transaction+0x66c/0x890 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] find_free_extent+0x1504/0x1a50 [btrfs] btrfs_reserve_extent+0xd5/0x1f0 [btrfs] btrfs_alloc_tree_block+0x1ac/0x570 [btrfs] btrfs_copy_root+0x213/0x580 [btrfs] create_reloc_root+0x3bd/0x470 [btrfs] btrfs_init_reloc_root+0x2d2/0x310 [btrfs] record_root_in_trans+0x191/0x1d0 [btrfs] btrfs_record_root_in_trans+0x90/0xd0 [btrfs] start_transaction+0x16e/0x890 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] btrfs_finish_ordered_io+0x55d/0xcd0 [btrfs] finish_ordered_fn+0x15/0x20 [btrfs] btrfs_work_helper+0x116/0x9a0 [btrfs] process_one_work+0x632/0xb80 worker_thread+0x80/0x690 kthread+0x1a3/0x1f0 ret_from_fork+0x27/0x50 It's pretty hard to reproduce, only one hit so far. [CAUSE] This is because we're calling btrfs_join_transaction() without re-using the current running one: btrfs_finish_ordered_io() |- btrfs_join_transaction() <<< Call #1 |- btrfs_record_root_in_trans() |- btrfs_reserve_extent() |- btrfs_join_transaction() <<< Call #2 Normally such btrfs_join_transaction() call should re-use the existing one, without trying to re-start a transaction. But the problem is, in btrfs_join_transaction() call #1, we call btrfs_record_root_in_trans() before initializing current::journal_info. And in btrfs_join_transaction() call #2, we're relying on current::journal_info to avoid such deadlock. [FIX] Call btrfs_record_root_in_trans() after we have initialized current::journal_info. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-27btrfs: fix partial loss of prealloc extent past i_size after fsyncFilipe Manana
When we have an inode with a prealloc extent that starts at an offset lower than the i_size and there is another prealloc extent that starts at an offset beyond i_size, we can end up losing part of the first prealloc extent (the part that starts at i_size) and have an implicit hole if we fsync the file and then have a power failure. Consider the following example with comments explaining how and why it happens. $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt # Create our test file with 2 consecutive prealloc extents, each with a # size of 128Kb, and covering the range from 0 to 256Kb, with a file # size of 0. $ xfs_io -f -c "falloc -k 0 128K" /mnt/foo $ xfs_io -c "falloc -k 128K 128K" /mnt/foo # Fsync the file to record both extents in the log tree. $ xfs_io -c "fsync" /mnt/foo # Now do a redudant extent allocation for the range from 0 to 64Kb. # This will merely increase the file size from 0 to 64Kb. Instead we # could also do a truncate to set the file size to 64Kb. $ xfs_io -c "falloc 0 64K" /mnt/foo # Fsync the file, so we update the inode item in the log tree with the # new file size (64Kb). This also ends up setting the number of bytes # for the first prealloc extent to 64Kb. This is done by the truncation # at btrfs_log_prealloc_extents(). # This means that if a power failure happens after this, a write into # the file range 64Kb to 128Kb will not use the prealloc extent and # will result in allocation of a new extent. $ xfs_io -c "fsync" /mnt/foo # Now set the file size to 256K with a truncate and then fsync the file. # Since no changes happened to the extents, the fsync only updates the # i_size in the inode item at the log tree. This results in an implicit # hole for the file range from 64Kb to 128Kb, something which fsck will # complain when not using the NO_HOLES feature if we replay the log # after a power failure. $ xfs_io -c "truncate 256K" -c "fsync" /mnt/foo So instead of always truncating the log to the inode's current i_size at btrfs_log_prealloc_extents(), check first if there's a prealloc extent that starts at an offset lower than the i_size and with a length that crosses the i_size - if there is one, just make sure we truncate to a size that corresponds to the end offset of that prealloc extent, so that we don't lose the part of that extent that starts at i_size if a power failure happens. A test case for fstests follows soon. Fixes: 31d11b83b96f ("Btrfs: fix duplicate extents after fsync of file with prealloc extents") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-27propagate_one(): mnt_set_mountpoint() needs mount_lockAl Viro
... to protect the modification of mp->m_count done by it. Most of the places that modify that thing also have namespace_lock held, but not all of them can do so, so we really need mount_lock here. Kudos to Piotr Krysiuk <piotras@gmail.com>, who'd spotted a related bug in pivot_root(2) (fixed unnoticed in 5.3); search for other similar turds has caught out this one. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-26Merge tag '5.7-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Five cifs/smb3 fixes:two for DFS reconnect failover, one lease fix for stable and the others to fix a missing spinlock during reconnect" * tag '5.7-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix uninitialised lease_key in open_shroot() cifs: ensure correct super block for DFS reconnect cifs: do not share tcons with DFS cifs: minor update to comments around the cifs_tcp_ses_lock mutex cifs: protect updating server->dstaddr with a spinlock
2020-04-26Merge tag 'driver-core-5.7-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some small firmware/driver core/debugfs fixes for 5.7-rc3. The debugfs change is now possible as now the last users of debugfs_create_u32() have been fixed up in the different trees that got merged into 5.7-rc1, and I don't want it creeping back in. The firmware changes did cause a regression in linux-next, so the final patch here reverts part of that, re-exporting the symbol to resolve that issue. All of these patches, with the exception of the final one, have been in linux-next with only that one reported issue" * tag 'driver-core-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: firmware_loader: revert removal of the fw_fallback_config export debugfs: remove return value of debugfs_create_u32() firmware_loader: remove unused exports firmware: imx: fix compile-testing
2020-04-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull pid leak fix from Eric Biederman: "Oleg noticed that put_pid(thread_pid) was not getting called when proc was not compiled in. Let's get that fixed before 5.7 is released and causes problems for anyone" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Put thread_pid in release_task not proc_flush_pid
2020-04-25NFSv4: Remove unreachable error condition due to rpc_run_task()Xiyu Yang
nfs4_proc_layoutget() invokes rpc_run_task(), which return the value to "task". Since rpc_run_task() is impossible to return an ERR pointer, there is no need to add the IS_ERR() condition on "task" here. So we need to remove it. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-24proc: Put thread_pid in release_task not proc_flush_pidEric W. Biederman
Oleg pointed out that in the unlikely event the kernel is compiled with CONFIG_PROC_FS unset that release_task will now leak the pid. Move the put_pid out of proc_flush_pid into release_task to fix this and to guarantee I don't make that mistake again. When possible it makes sense to keep get and put in the same function so it can easily been seen how they pair up. Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc") Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-24Merge tag 'io_uring-5.7-2020-04-24' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Single fixup for a change that went into -rc2" * tag 'io_uring-5.7-2020-04-24' of git://git.kernel.dk/linux-block: io_uring: only restore req->work for req that needs do completion
2020-04-24Merge tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes/changes that should go into this release: - null_blk zoned fixes (Damien) - blkdev_close() sync improvement (Douglas) - Fix regression in blk-iocost that impacted (at least) systemtap (Waiman) - Comment fix, header removal (Zhiqiang, Jianpeng)" * tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block: null_blk: Cleanup zoned device initialization null_blk: Fix zoned command handling block: remove unused header blk-iocost: Fix error on iocost_ioc_vrate_adj bdev: Reduce time holding bd_mutex in sync in blkdev_close() buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.
2020-04-24Merge tag 'afs-fixes-20200424' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull misc AFS fixes from David Howells: "Three miscellaneous fixes to the afs filesystem: - Remove some struct members that aren't used, aren't set or aren't read, plus a wake up that nothing ever waits for. - Actually set the AFS_SERVER_FL_HAVE_EPOCH flag so that the code that depends on it can work. - Make a couple of waits uninterruptible if they're done for an operation that isn't supposed to be interruptible" * tag 'afs-fixes-20200424' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCH afs: Remove some unused bits
2020-04-24afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriateDavid Howells
When an operation is meant to be done uninterruptibly (such as FS.StoreData), we should not be allowing volume and server record checking to be interrupted. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCHDavid Howells
AFS keeps track of the epoch value from the rxrpc protocol to note (a) when a fileserver appears to have restarted and (b) when different endpoints of a fileserver do not appear to be associated with the same fileserver (ie. all probes back from a fileserver from all of its interfaces should carry the same epoch). However, the AFS_SERVER_FL_HAVE_EPOCH flag that indicates that we've received the server's epoch is never set, though it is used. Fix this to set the flag when we first receive an epoch value from a probe sent to the filesystem client from the fileserver. Fixes: 3bf0fb6f33dd ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24afs: Remove some unused bitsDavid Howells
Remove three bits: (1) afs_server::no_epoch is neither set nor used. (2) afs_server::have_result is set and a wakeup is applied to it, but nothing looks at it or waits on it. (3) afs_vl_dump_edestaddrreq() prints afs_addr_list::probed, but nothing sets it for VL servers. Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-23dlmfs_file_write(): fix the bogosity in handling non-zero *pposAl Viro
'count' is how much you want written, not the final position. Moreover, it can legitimately be less than the current position... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-23Merge tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds
Pull nfsd fixes from Chuck Lever: "The first set of 5.7-rc fixes for NFS server issues. These were all unresolved at the time the 5.7 window opened, and needed some additional time to ensure they were correctly addressed. They are ready now. At the moment I know of one more urgent issue regarding the NFS server. A fix has been tested and is under review. I expect to send one more pull request, containing this fix (which now consists of 3 patches). Fixes: - Address several use-after-free and memory leak bugs - Prevent a backchannel livelock" * tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: svcrdma: Fix leak of svc_rdma_recv_ctxt objects svcrdma: Fix trace point use-after-free race SUNRPC: Fix backchannel RPC soft lockups SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge nfsd: memory corruption in nfsd4_lock()
2020-04-23Merge tag 'for-5.7-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - several bug fixes(broken mount discard option, remount failure, memory leak) - add missing MODULE_ALIAS_FS for automatically loading exfat module. - set s_time_gran and truncate atime with exfat timestamp granularity. * tag 'for-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: truncate atimes to 2s granularity exfat: properly set s_time_gran exfat: remove 'bps' mount-option exfat: Unify access to the boot sector exfat: add missing MODULE_ALIAS_FS() exfat: Fix discard support
2020-04-23btrfs: fix transaction leak in btrfs_recover_relocationXiyu Yang
btrfs_recover_relocation() invokes btrfs_join_transaction(), which joins a btrfs_trans_handle object into transactions and returns a reference of it with increased refcount to "trans". When btrfs_recover_relocation() returns, "trans" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of btrfs_recover_relocation(). When read_fs_root() failed, the refcnt increased by btrfs_join_transaction() is not decreased, causing a refcnt leak. Fix this issue by calling btrfs_end_transaction() on this error path when read_fs_root() failed. Fixes: 79787eaab461 ("btrfs: replace many BUG_ONs with proper error handling") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23btrfs: fix block group leak when removing failsXiyu Yang
btrfs_remove_block_group() invokes btrfs_lookup_block_group(), which returns a local reference of the block group that contains the given bytenr to "block_group" with increased refcount. When btrfs_remove_block_group() returns, "block_group" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in several exception handling paths of btrfs_remove_block_group(). When those error scenarios occur such as btrfs_alloc_path() returns NULL, the function forgets to decrease its refcnt increased by btrfs_lookup_block_group() and will cause a refcnt leak. Fix this issue by jumping to "out_put_group" label and calling btrfs_put_block_group() when those error scenarios occur. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23btrfs: drop logs when we've aborted a transactionJosef Bacik
Dave reported a problem where we were panicing with generic/475 with misc-5.7. This is because we were doing IO after we had stopped all of the worker threads, because we do the log tree cleanup on roots at drop time. Cleaning up the log tree will always need to do reads if we happened to have evicted the blocks from memory. Because of this simply add a helper to btrfs_cleanup_transaction() that will go through and drop all of the log roots. This gets run before we do the close_ctree() work, and thus we are allowed to do any reads that we would need. I ran this through many iterations of generic/475 with constrained memory and I did not see the issue. general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 12359 Comm: umount Tainted: G W 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_queue_work+0x33/0x1c0 [btrfs] RSP: 0018:ffff9cfb015937d8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8eb5e339ed80 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8eb5eb33b770 RDI: ffff8eb5e37a0460 RBP: ffff8eb5eb33b770 R08: 000000000000020c R09: ffffffff9fc09ac0 R10: 0000000000000007 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b R13: ffff9cfb00229040 R14: 0000000000000008 R15: ffff8eb5d3868000 FS: 00007f167ea022c0(0000) GS:ffff8eb5fae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f167e5e0cb1 CR3: 0000000138c18004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_end_bio+0x81/0x130 [btrfs] __split_and_process_bio+0xaf/0x4e0 [dm_mod] ? percpu_counter_add_batch+0xa3/0x120 dm_process_bio+0x98/0x290 [dm_mod] ? generic_make_request+0xfb/0x410 dm_make_request+0x4d/0x120 [dm_mod] ? generic_make_request+0xfb/0x410 generic_make_request+0x12a/0x410 ? submit_bio+0x38/0x160 submit_bio+0x38/0x160 ? percpu_counter_add_batch+0xa3/0x120 btrfs_map_bio+0x289/0x570 [btrfs] ? kmem_cache_alloc+0x24d/0x300 btree_submit_bio_hook+0x79/0xc0 [btrfs] submit_one_bio+0x31/0x50 [btrfs] read_extent_buffer_pages+0x2fe/0x450 [btrfs] btree_read_extent_buffer_pages+0x7e/0x170 [btrfs] walk_down_log_tree+0x343/0x690 [btrfs] ? walk_log_tree+0x3d/0x380 [btrfs] walk_log_tree+0xf7/0x380 [btrfs] ? plist_requeue+0xf0/0xf0 ? delete_node+0x4b/0x230 free_log_tree+0x4c/0x130 [btrfs] ? wait_log_commit+0x140/0x140 [btrfs] btrfs_free_log+0x17/0x30 [btrfs] btrfs_drop_and_free_fs_root+0xb0/0xd0 [btrfs] btrfs_free_fs_roots+0x10c/0x190 [btrfs] ? do_raw_spin_unlock+0x49/0xc0 ? _raw_spin_unlock+0x29/0x40 ? release_extent_buffer+0x121/0x170 [btrfs] close_ctree+0x289/0x2e6 [btrfs] generic_shutdown_super+0x6c/0x110 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x3a/0x70 Reported-by: David Sterba <dsterba@suse.com> Fixes: 8c38938c7bb096 ("btrfs: move the root freeing stuff into btrfs_put_root") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23btrfs: fix memory leak of transaction when deleting unused block groupFilipe Manana
When cleaning pinned extents right before deleting an unused block group, we check if there's still a previous transaction running and if so we increment its reference count before using it for cleaning pinned ranges in its pinned extents iotree. However we ended up never decrementing the reference count after using the transaction, resulting in a memory leak. Fix it by decrementing the reference count. Fixes: fe119a6eeb6705 ("btrfs: switch to per-transaction pinned extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-22cifs: fix uninitialised lease_key in open_shroot()Paulo Alcantara
SMB2_open_init() expects a pre-initialised lease_key when opening a file with a lease, so set pfid->lease_key prior to calling it in open_shroot(). This issue was observed when performing some DFS failover tests and the lease key was never randomly generated. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org>
2020-04-22cifs: ensure correct super block for DFS reconnectPaulo Alcantara
This patch is basically fixing the lookup of tcons (DFS specific) during reconnect (smb2pdu.c:__smb2_reconnect) to update their prefix paths. Previously, we relied on the TCP_Server_Info pointer (misc.c:tcp_super_cb) to determine which tcon to update the prefix path We could not rely on TCP server pointer to determine which super block to update the prefix path when reconnecting tcons since it might map to different tcons that share same TCP connection. Instead, walk through all cifs super blocks and compare their DFS full paths with the tcon being updated to. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-04-22cifs: do not share tcons with DFSPaulo Alcantara
This disables tcon re-use for DFS shares. tcon->dfs_path stores the path that the tcon should connect to when doing failing over. If that tcon is used multiple times e.g. 2 mounts using it with different prefixpath, each will need a different dfs_path but there is only one tcon. The other solution would be to split the tcon in 2 tcons during failover but that is much harder. tcons could not be shared with DFS in cifs.ko because in a DFS namespace like: //domain/dfsroot -> /serverA/dfsroot, /serverB/dfsroot //serverA/dfsroot/link -> /serverA/target1/aa/bb //serverA/dfsroot/link2 -> /serverA/target1/cc/dd you can see that link and link2 are two DFS links that both resolve to the same target share (/serverA/target1), so cifs.ko will only contain a single tcon for both link and link2. The problem with that is, if we (auto)mount "link" and "link2", cifs.ko will only contain a single tcon for both DFS links so we couldn't perform failover or refresh the DFS cache for both links because tcon->dfs_path was set to either "link" or "link2", but not both -- which is wrong. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-22exfat: truncate atimes to 2s granularityEric Sandeen
The timestamp for access_time has double seconds granularity(There is no 10msIncrement field for access_time unlike create/modify_time). exfat's atimes are restricted to only 2s granularity so after we set an atime, round it down to the nearest 2s and set the sub-second component of the timestamp to 0. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: properly set s_time_granEric Sandeen
The s_time_gran superblock field indicates the on-disk nanosecond granularity of timestamps, and for exfat that seems to be 10ms, so set s_time_gran to 10000000ns. Without this, in-memory timestamps change when they get re-read from disk. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: remove 'bps' mount-optionTetsuhiro Kohada
remount fails because exfat_show_options() returns unsupported option 'bps'. > # mount -o ro,remount > exfat: Unknown parameter 'bps' To fix the problem, just remove 'bps' option from exfat_show_options(). Signed-off-by: Tetsuhiro Kohada <Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: Unify access to the boot sectorTetsuhiro Kohada
Unify access to boot sector via 'sbi->pbr_bh'. This fixes vol_flags inconsistency at read failed in fs_set_vol_flags(), and buffer_head leak in __exfat_fill_super(). Signed-off-by: Tetsuhiro Kohada <Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: add missing MODULE_ALIAS_FS()Thomas Backlund
This adds the necessary MODULE_ALIAS_FS() to exfat so the module gets automatically loaded when an exfat filesystem is mounted. Signed-off-by: Thomas Backlund <tmb@mageia.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: Fix discard supportPali Rohár
Discard support was always unconditionally disabled. Now it is disabled only in the case when blk_queue_discard() returns false. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-21cifs: minor update to comments around the cifs_tcp_ses_lock mutexSteve French
Update comment to note that it protects server->dstaddr Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-21coredump: fix null pointer dereference on coredumpSudip Mukherjee
If the core_pattern is set to "|" and any process segfaults then we get a null pointer derefernce while trying to coredump. The call stack shows: RIP: do_coredump+0x628/0x11c0 When the core_pattern has only "|" there is no use of trying the coredump and we can check that while formating the corename and exit with an error. After this change I get: format_corename failed Aborting core Fixes: 315c69261dd3 ("coredump: split pipe command whitespace before expanding template") Reported-by: Matthew Ruffell <matthew.ruffell@canonical.com> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Wise <pabs3@bonedaddy.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416194612.21418-1-sudipm.mukherjee@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-21vmalloc: fix remap_vmalloc_range() bounds checksJann Horn
remap_vmalloc_range() has had various issues with the bounds checks it promises to perform ("This function checks that addr is a valid vmalloc'ed area, and that it is big enough to cover the vma") over time, e.g.: - not detecting pgoff<<PAGE_SHIFT overflow - not detecting (pgoff<<PAGE_SHIFT)+usize overflow - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same vmalloc allocation - comparing a potentially wildly out-of-bounds pointer with the end of the vmalloc region In particular, since commit fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer dereferences by calling mmap() on a BPF map with a size that is bigger than the distance from the start of the BPF map to the end of the address space. This could theoretically be used as a kernel ASLR bypass, by using whether mmap() with a given offset oopses or returns an error code to perform a binary search over the possible address range. To allow remap_vmalloc_range_partial() to verify that addr and addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset to remap_vmalloc_range_partial() instead of adding it to the pointer in remap_vmalloc_range(). In remap_vmalloc_range_partial(), fix the check against get_vm_area_size() by using size comparisons instead of pointer comparisons, and add checks for pgoff. Fixes: 833423143c3a ("[PATCH] mm: introduce remap_vmalloc_range()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-21block: remove unused headerMa, Jianpeng
Dax related code already removed from this file. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-21cifs: protect updating server->dstaddr with a spinlockRonnie Sahlberg
We use a spinlock while we are reading and accessing the destination address for a server. We need to also use this spinlock to protect when we are modifying this address from reconn_set_ipaddr(). Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-20btrfs: discard: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in header file related to Btrfs File System support. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-20bdev: Reduce time holding bd_mutex in sync in blkdev_close()Douglas Anderson
While trying to "dd" to the block device for a USB stick, I encountered a hung task warning (blocked for > 120 seconds). I managed to come up with an easy way to reproduce this on my system (where /dev/sdb is the block device for my USB stick) with: while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done With my reproduction here are the relevant bits from the hung task detector: INFO: task udevd:294 blocked for more than 122 seconds. ... udevd D 0 294 1 0x00400008 Call trace: ... mutex_lock_nested+0x40/0x50 __blkdev_get+0x7c/0x3d4 blkdev_get+0x118/0x138 blkdev_open+0x94/0xa8 do_dentry_open+0x268/0x3a0 vfs_open+0x34/0x40 path_openat+0x39c/0xdf4 do_filp_open+0x90/0x10c do_sys_open+0x150/0x3c8 ... ... Showing all locks held in the system: ... 1 lock held by dd/2798: #0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204 ... dd D 0 2798 2764 0x00400208 Call trace: ... schedule+0x8c/0xbc io_schedule+0x1c/0x40 wait_on_page_bit_common+0x238/0x338 __lock_page+0x5c/0x68 write_cache_pages+0x194/0x500 generic_writepages+0x64/0xa4 blkdev_writepages+0x24/0x30 do_writepages+0x48/0xa8 __filemap_fdatawrite_range+0xac/0xd8 filemap_write_and_wait+0x30/0x84 __blkdev_put+0x88/0x204 blkdev_put+0xc4/0xe4 blkdev_close+0x28/0x38 __fput+0xe0/0x238 ____fput+0x1c/0x28 task_work_run+0xb0/0xe4 do_notify_resume+0xfc0/0x14bc work_pending+0x8/0x14 The problem appears related to the fact that my USB disk is terribly slow and that I have a lot of RAM in my system to cache things. Specifically my writes seem to be happening at ~15 MB/s and I've got ~4 GB of RAM in my system that can be used for buffering. To write 4 GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds. The 267 second number is a problem because in __blkdev_put() we call sync_blockdev() while holding the bd_mutex. Any other callers who want the bd_mutex will be blocked for the whole time. The problem is made worse because I believe blkdev_put() specifically tells other tasks (namely udev) to go try to access the device at right around the same time we're going to hold the mutex for a long time. Putting some traces around this (after disabling the hung task detector), I could confirm: dd: 437.608600: __blkdev_put() right before sync_blockdev() for sdb udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb dd: 661.468451: __blkdev_put() right after sync_blockdev() for sdb udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb A simple fix for this is to realize that sync_blockdev() works fine if you're not holding the mutex. Also, it's not the end of the world if you sync a little early (though it can have performance impacts). Thus we can make a guess that we're going to need to do the sync and then do it without holding the mutex. We still do one last sync with the mutex but it should be much, much faster. With this, my hung task warnings for my test case are gone. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-20nfs: Fix potential posix_acl refcnt leak in nfs3_set_aclAndreas Gruenbacher
nfs3_set_acl keeps track of the acl it allocated locally to determine if an acl needs to be released at the end. This results in a memory leak when the function allocates an acl as well as a default acl. Fix by releasing acls that differ from the acl originally passed into nfs3_set_acl. Fixes: b7fa0554cf1b ("[PATCH] NFS: Add support for NFSv3 ACLs") Reported-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-19NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()Trond Myklebust
If the credential returned by pnfs_prepare_layoutreturn() does not match the credential of the RPC call, then we do end up calling pnfs_send_layoutreturn() with that credential, so don't free it! Fixes: 44ea8dfce021 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-19NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completionTrond Myklebust
We require that any outstanding layout return completes before we can free up the inode so that the layout itself can be freed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-19io_uring: only restore req->work for req that needs do completionXiaoguang Wang
When testing io_uring IORING_FEAT_FAST_POLL feature, I got below panic: BUG: kernel NULL pointer dereference, address: 0000000000000030 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 2154 Comm: io_uring_echo_s Not tainted 5.6.0+ #359 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:io_wq_submit_work+0xf/0xa0 Code: ff ff ff be 02 00 00 00 e8 ae c9 19 00 e9 58 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 2f <8b> 45 30 48 8d 9d 48 ff ff ff 25 01 01 00 00 83 f8 01 75 07 eb 2a RSP: 0018:ffffbef543e93d58 EFLAGS: 00010286 RAX: ffffffff84364f50 RBX: ffffa3eb50f046b8 RCX: 0000000000000000 RDX: ffffa3eb0efc1840 RSI: 0000000000000006 RDI: ffffa3eb50f046b8 RBP: 0000000000000000 R08: 00000000fffd070d R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffa3eb50f046b8 R13: ffffa3eb0efc2088 R14: ffffffff85b69be0 R15: ffffa3eb0effa4b8 FS: 00007fe9f69cc4c0(0000) GS:ffffa3eb5ef40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000000020410000 CR4: 00000000000006e0 Call Trace: task_work_run+0x6d/0xa0 do_exit+0x39a/0xb80 ? get_signal+0xfe/0xbc0 do_group_exit+0x47/0xb0 get_signal+0x14b/0xbc0 ? __x64_sys_io_uring_enter+0x1b7/0x450 do_signal+0x2c/0x260 ? __x64_sys_io_uring_enter+0x228/0x450 exit_to_usermode_loop+0x87/0xf0 do_syscall_64+0x209/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fe9f64f8df9 Code: Bad RIP value. task_work_run calls io_wq_submit_work unexpectedly, it's obvious that struct callback_head's func member has been changed. After looking into codes, I found this issue is still due to the union definition: union { /* * Only commands that never go async can use the below fields, * obviously. Right now only IORING_OP_POLL_ADD uses them, and * async armed poll handlers for regular commands. The latter * restore the work, if needed. */ struct { struct callback_head task_work; struct hlist_node hash_node; struct async_poll *apoll; }; struct io_wq_work work; }; When task_work_run has multiple work to execute, the work that calls io_poll_remove_all() will do req->work restore for non-poll request always, but indeed if a non-poll request has been added to a new callback_head, subsequent callback will call io_async_task_func() to handle this request, that means we should not do the restore work for such non-poll request. Meanwhile in io_async_task_func(), we should drop submit ref when req has been canceled. Fix both issues. Fixes: b1f573bd15fd ("io_uring: restore req->work when canceling poll request") Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Use io_double_put_req() Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-19Merge tag 'timers-urgent-2020-04-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull time namespace fix from Thomas Gleixner: "An update for the proc interface of time namespaces: Use symbolic names instead of clockid numbers. The usability nuisance of numbers was noticed by Michael when polishing the man page" * tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets
2020-04-19Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous bug fixes and cleanups for ext4, including a fix for generic/388 in data=journal mode, removing some BUG_ON's, and cleaning up some compiler warnings" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: convert BUG_ON's to WARN_ON's in mballoc.c ext4: increase wait time needed before reuse of deleted inode numbers ext4: remove set but not used variable 'es' in ext4_jbd2.c ext4: remove set but not used variable 'es' ext4: do not zeroout extents beyond i_disksize ext4: fix return-value types in several function comments ext4: use non-movable memory for superblock readahead ext4: use matching invalidatepage in ext4_writepage
2020-04-19Merge tag '5.7-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small smb3 fixes: two debug related (helping network tracing for SMB2 mounts, and the other removing an unintended debug line on signing failures), and one fixing a performance problem with 64K pages" * tag '5.7-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: remove overly noisy debug line in signing errors cifs: improve read performance for page size 64KB & cache=strict & vers=2.1+ cifs: dump the session id and keys also for SMB2 sessions
2020-04-18Merge tag 'xfs-5.7-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "The three commits here fix some livelocks and other clashes with fsfreeze, a potential corruption problem, and a minor race between processes freeing and allocating space when the filesystem is near ENOSPC. Summary: - Fix a partially uninitialized variable. - Teach the background gc threads to apply for fsfreeze protection. - Fix some scaling problems when multiple threads try to flush the filesystem when we're about to hit ENOSPC" * tag 'xfs-5.7-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: move inode flush to the sync workqueue xfs: fix partially uninitialized structure in xfs_reflink_remap_extent xfs: acquire superblock freeze protection on eofblocks scans
2020-04-17buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.Zhiqiang Liu
free_more_memory func has been completely removed in commit bc48f001de12 ("buffer: eliminate the need to call free_more_memory() in __getblk_slow()") So comment and `WB_REASON_FREE_MORE_MEM` reason about free_more_memory are no longer needed. Fixes: bc48f001de12 ("buffer: eliminate the need to call free_more_memory() in __getblk_slow()") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull proc fix from Eric Biederman: "While running syzbot happened to spot one more oversight in my rework of proc_flush_task. The fields proc_self and proc_thread_self were not being reinitialized when proc was unmounted, which could cause problems if the mount of proc fails" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Handle umounts cleanly
2020-04-17Merge tag 'io_uring-5.7-2020-04-17' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - wrap up the init/setup cleanup (Pavel) - fix some issues around deferral sequences (Pavel) - fix splice punt check using the wrong struct file member - apply poll re-arm logic for pollable retry too - pollable retry should honor cancelation - fix setup time error handling syzbot reported crash - restore work state when poll is canceled * tag 'io_uring-5.7-2020-04-17' of git://git.kernel.dk/linux-block: io_uring: don't count rqs failed after current one io_uring: kill already cached timeout.seq_offset io_uring: fix cached_sq_head in io_timeout() io_uring: only post events in io_poll_remove_all() if we completed some io_uring: io_async_task_func() should check and honor cancelation io_uring: check for need to re-wait in polled async handling io_uring: correct O_NONBLOCK check for splice punt io_uring: restore req->work when canceling poll request io_uring: move all request init code in one place io_uring: keep all sqe->flags in req->flags io_uring: early submission req fail code io_uring: track mm through current->mm io_uring: remove obsolete @mm_fault
2020-04-17Merge tag 'for-5.7-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A regression fix for a warning caused by running balance and snapshot creation in parallel" * tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix setting last_trans for reloc roots