summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-09-09btrfs: Remove leftover of in-band dedupeNikolay Borisov
It's unlikely in-band dedupe is going to land so just remove any leftovers - dedupe.h header as well as the 'dedupe' parameter to btrfs_set_extent_delalloc. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Remove delalloc_end argument from extent_clear_unlock_delallocNikolay Borisov
It was added in ba8b04c1d4ad ("btrfs: extend btrfs_set_extent_delalloc and its friends to support in-band dedupe and subpage size patchset") as a preparatory patch for in-band and subapge block size patchsets. However neither of those are likely to be merged anytime soon and the code has diverged significantly from the last public post of either of those patchsets. It's unlikely either of the patchests are going to use those preparatory steps so just remove the variables. Since cow_file_range also took delalloc_end to pass it to extent_clear_unlock_delalloc remove the parameter from that function as well. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Move free_pages_out label in inline extent handling branch in ↵Nikolay Borisov
compress_file_range This label is only executed if compress_file_range fails to create an inline extent. So move its code in the semantically related inline extent handling branch. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Return number of compressed extents directly in compress_file_rangeNikolay Borisov
compress_file_range returns a void, yet uses a function parameter as a return value. Make that more idiomatic by simply returning the number of compressed extents directly. Also track such extents in more aptly named variables. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: use common vfs LABEL ioctl definitionsEric Sandeen
I lifted the btrfs label get/set ioctls to the vfs some time ago, but never followed up to use those common definitions directly in btrfs. This patch does that. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Remove unused locking functionsNikolay Borisov
Those were split out of btrfs_clear_lock_blocking_rw by aa12c02778a9 ("btrfs: split btrfs_clear_lock_blocking_rw to read and write helpers") however at that time this function was unused due to commit 523983401644 ("Btrfs: kill btrfs_clear_path_blocking"). Put the final nail in the coffin of those 2 functions. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: reduce stack usage for btrfsic_process_written_blockArnd Bergmann
btrfsic_process_written_block() cals btrfsic_process_metablock(), which has a fairly large stack usage due to the btrfsic_stack_frame variable. It also calls btrfsic_test_for_metadata(), which now needs several hundreds of bytes for its SHASH_DESC_ON_STACK(). In some configurations, we end up with both functions on the same stack, and gcc warns about the excessive stack usage that might cause the available stack space to run out: fs/btrfs/check-integrity.c:1743:13: error: stack frame size of 1152 bytes in function 'btrfsic_process_written_block' [-Werror,-Wframe-larger-than=] Marking both child functions as noinline_for_stack helps because this guarantees that the large variables are not on the same stack frame. Fixes: d5178578bcd4 ("btrfs: directly call into crypto framework for checksumming") Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: remove set but not used variable 'offset'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: fs/btrfs/volumes.c: In function __btrfs_map_block: fs/btrfs/volumes.c:6023:6: warning: variable offset set but not used [-Wunused-but-set-variable] It is not used any more since commit 343abd1c0ca9 ("btrfs: Use btrfs_get_io_geometry appropriately") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extentsFilipe Manana
When cloning extents (or deduplicating) we create a transaction with a space reservation that considers we will drop or update a single file extent item of the destination inode (that we modify a single leaf). That is fine for the vast majority of scenarios, however it might happen that we need to drop many file extent items, and adjust at most two file extent items, in the destination root, which can span multiple leafs. This will lead to either the call to btrfs_drop_extents() to fail with ENOSPC or the subsequent calls to btrfs_insert_empty_item() or btrfs_update_inode() (called through clone_finish_inode_update()) to fail with ENOSPC. Such failure results in a transaction abort, leaving the filesystem in a read-only mode. In order to fix this we need to follow the same approach as the hole punching code, where we create a local reservation with 1 unit and keep ending and starting transactions, after balancing the btree inode, when __btrfs_drop_extents() returns ENOSPC. So fix this by making the extent cloning call calls the recently added btrfs_punch_hole_range() helper, which is what does the mentioned work for hole punching, and make sure whenever we drop extent items in a transaction, we also add a replacing file extent item, to avoid corruption (a hole) if after ending a transaction and before starting a new one, the old transaction gets committed and a power failure happens before we finish cloning. A test case for fstests follows soon. Reported-by: David Goodwin <david@codepoets.co.uk> Link: https://lore.kernel.org/linux-btrfs/a4a4cf31-9cf4-e52c-1f86-c62d336c9cd1@codepoets.co.uk/ Reported-by: Sam Tygier <sam@tygier.co.uk> Link: https://lore.kernel.org/linux-btrfs/82aace9f-a1e3-1f0b-055f-3ea75f7a41a0@tygier.co.uk/ Fixes: b6f3409b2197e8f ("Btrfs: reserve sufficient space for ioctl clone") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09Btrfs: factor out extent dropping code from hole punch handlerFilipe Manana
Move the code that is responsible for dropping extents in a range out of btrfs_punch_hole() into a new helper function, btrfs_punch_hole_range(), so that later it can be used by the reflinking (extent cloning and dedup) code to fix a ENOSPC bug. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-06Merge tag 'configfs-for-5.3' of git://git.infradead.org/users/hch/configfsLinus Torvalds
Pull configfs fixes from Christoph Hellwig: "Late configfs fixes from Al that fix pretty nasty removal vs attribute access races" * tag 'configfs-for-5.3' of git://git.infradead.org/users/hch/configfs: configfs: provide exclusion between IO and removals configfs: new object reprsenting tree fragments configfs_register_group() shouldn't be (and isn't) called in rmdirable parts configfs: stash the data we need into configfs_buffer at open time
2019-09-04configfs: provide exclusion between IO and removalsAl Viro
Make sure that attribute methods are not called after the item has been removed from the tree. To do so, we * at the point of no return in removals, grab ->frag_sem exclusive and mark the fragment dead. * call the methods of attributes with ->frag_sem taken shared and only after having verified that the fragment is still alive. The main benefit is for method instances - they are guaranteed that the objects they are accessing *and* all ancestors are still there. Another win is that we don't need to bother with extra refcount on config_item when opening a file - the item will be alive for as long as it stays in the tree, and we won't touch it/attributes/any associated data after it's been removed from the tree. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-02configfs: new object reprsenting tree fragmentsAl Viro
Refcounted, hangs of configfs_dirent, created by operations that add fragments to configfs tree (mkdir and configfs_register_{subsystem,group}). Will be used in the next commit to provide exclusion between fragment removal and ->show/->store calls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-02configfs_register_group() shouldn't be (and isn't) called in rmdirable partsAl Viro
revert cc57c07343bd "configfs: fix registered group removal" It was an attempt to handle something that fundamentally doesn't work - configfs_register_group() should never be done in a part of tree that can be rmdir'ed. And in mainline it never had been, so let's not borrow trouble; the fix was racy anyway, it would take a lot more to make that work and desired semantics is not clear. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-02configfs: stash the data we need into configfs_buffer at open timeAl Viro
simplifies the ->read()/->write()/->release() instances nicely Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-02NFS: Fix inode fileid checks in attribute revalidation codeTrond Myklebust
We want to throw out the attrbute if it refers to the mounted on fileid, and not the real fileid. However we do not want to block cache consistency updates from NFSv4 writes. Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Fixes: 7e10cc25bfa0 ("NFS: Don't refresh attributes with mounted-on-file...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-29Merge tag '5.3-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "A few small SMB3 fixes, and a larger one to fix various older string handling functions" * tag '5.3-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module number cifs: replace various strncpy with strscpy and similar cifs: Use kzfree() to zero out the password cifs: set domainName when a domain-key is used in multiuser
2019-08-27cifs: update internal module numberSteve French
To 2.22 Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27cifs: replace various strncpy with strscpy and similarRonnie Sahlberg
Using strscpy is cleaner, and avoids some problems with handling maximum length strings. Linus noticed the original problem and Aurelien pointed out some additional problems. Fortunately most of this is SMB1 code (and in particular the ASCII string handling older, which is less common). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27cifs: Use kzfree() to zero out the passwordDan Carpenter
It's safer to zero out the password so that it can never be disclosed. Fixes: 0c219f5799c7 ("cifs: set domainName when a domain-key is used in multiuser") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27cifs: set domainName when a domain-key is used in multiuserRonnie Sahlberg
RHBZ: 1710429 When we use a domain-key to authenticate using multiuser we must also set the domainnmame for the new volume as it will be used and passed to the server in the NTLMSSP Domain-name. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-08-27Merge tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - Fix a page lock leak in nfs_pageio_resend() - Ensure O_DIRECT reports an error if the bytes read/written is 0 - Don't handle errors if the bind/connect succeeded - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidat ed" Bugfixes: - Don't refresh attributes with mounted-on-file information - Fix return values for nfs4_file_open() and nfs_finish_open() - Fix pnfs layoutstats reporting of I/O errors - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort for soft I/O errors when the user specifies a hard mount. - Various fixes to the error handling in sunrpc - Don't report writepage()/writepages() errors twice" * tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: remove set but not used variable 'mapping' NFSv2: Fix write regression NFSv2: Fix eof handling NFS: Fix writepage(s) error handling to not report errors twice NFS: Fix spurious EIO read errors pNFS/flexfiles: Don't time out requests on hard mounts SUNRPC: Handle connection breakages correctly in call_status() Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated" SUNRPC: Handle EADDRINUSE and ENOBUFS correctly pNFS/flexfiles: Turn off soft RPC calls SUNRPC: Don't handle errors if the bind/connect succeeded NFS: On fatal writeback errors, we need to call nfs_inode_remove_request() NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0 NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend() NFSv4: Fix return value in nfs_finish_open() NFSv4: Fix return values for nfs4_file_open() NFS: Don't refresh attributes with mounted-on-file information
2019-08-27NFS: remove set but not used variable 'mapping'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: fs/nfs/write.c: In function nfs_page_async_flush: fs/nfs/write.c:609:24: warning: variable mapping set but not used [-Wunused-but-set-variable] It is not use since commit aefb623c422e ("NFS: Fix writepage(s) error handling to not report errors twice") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-27NFSv2: Fix write regressionTrond Myklebust
Ensure we update the write result count on success, since the RPC call itself does not do so. Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Jan Stancek <jstancek@redhat.com>
2019-08-27NFSv2: Fix eof handlingTrond Myklebust
If we received a reply from the server with a zero length read and no error, then that implies we are at eof. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26NFS: Fix writepage(s) error handling to not report errors twiceTrond Myklebust
If writepage()/writepages() saw an error, but handled it without reporting it, we should not be re-reporting that error on exit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26NFS: Fix spurious EIO read errorsTrond Myklebust
If the client attempts to read a page, but the read fails due to some spurious error (e.g. an ACCESS error or a timeout, ...) then we need to allow other processes to retry. Also try to report errors correctly when doing a synchronous readpage. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26pNFS/flexfiles: Don't time out requests on hard mountsTrond Myklebust
If the mount is hard, we should ignore the 'io_maxretrans' module parameter so that we always keep retrying. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"Trond Myklebust
This reverts commit a79f194aa4879e9baad118c3f8bb2ca24dbef765. The mechanism for aborting I/O is racy, since we are not guaranteed that the request is asleep while we're changing both task->tk_status and task->tk_action. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1
2019-08-26pNFS/flexfiles: Turn off soft RPC callsTrond Myklebust
The pNFS/flexfiles I/O requests are sent with the SOFTCONN flag set, so they automatically time out if the connection breaks. It should therefore not be necessary to have the soft flag set in addition. Fixes: 5f01d9539496 ("nfs41: create NFSv3 DS connection if specified") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-25Merge tag 'for-linus-5.3-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBIFS and JFFS2 fixes from Richard Weinberger: "UBIFS: - Don't block too long in writeback_inodes_sb() - Fix for a possible overrun of the log head - Fix double unlock in orphan_delete() JFFS2: - Remove C++ style from UAPI header and unbreak picky toolchains" * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Limit the number of pages in shrink_liability ubifs: Correctly initialize c->min_log_bytes ubifs: Fix double unlock around orphan_delete() jffs2: Remove C++ style comments from uapi header
2019-08-24userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctxOleg Nesterov
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if mm->core_state != NULL. Otherwise a page fault can see userfaultfd_missing() == T and use an already freed userfaultfd_ctx. Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Peter Xu <peterx@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24Merge tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Darrick Wong: "A single patch that fixes a xfs lockup problem when a chown/chgrp operation fails due to running out of quota. It has survived the usual xfstests runs and merges cleanly with this morning's master: - Fix a forgotten inode unlock when chown/chgrp fail due to quota" * tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
2019-08-23Merge tag 'for-linus-20190823' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Here's a set of fixes that should go into this release. This contains: - Three minor fixes for NVMe. - Three minor tweaks for the io_uring polling logic. - Officially mark Song as the MD maintainer, after he's been filling that role sucessfully for the last 6 months or so" * tag 'for-linus-20190823' of git://git.kernel.dk/linux-block: io_uring: add need_resched() check in inner poll loop md: update MAINTAINERS info io_uring: don't enter poll loop if we have CQEs pending nvme: Add quirk for LiteON CL1 devices running FW 22301111 nvme: Fix cntlid validation when not using NVMEoF nvme-multipath: fix possible I/O hang when paths are updated io_uring: fix potential hang with polled IO
2019-08-23Merge tag 'xfs-5.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Here are a few more bug fixes that trickled in since the last pull. They've survived the usual xfstests runs and merge cleanly with this morning's master. I expect there to be one more pull request tomorrow for the fix to that quota related inode unlock bug that we were reviewing last night, but it will continue to soak in the testing machine for several more hours. - Fix missing compat ioctl handling for get/setlabel - Fix missing ioctl pointer sanitization on s390 - Fix a page locking deadlock in the dedupe comparison code - Fix inadequate locking in reflink code w.r.t. concurrent directio - Fix broken error detection when breaking layouts" * tag 'xfs-5.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs/xfs: Fix return code of xfs_break_leased_layouts() xfs: fix reflink source file racing with directio writes vfs: fix page locking deadlocks when deduping files xfs: compat_ioctl: use compat_ptr() xfs: fall back to native ioctls for unhandled compat ones
2019-08-23Merge tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Three important fixes tagged for stable (an indefinite hang, a crash on an assert and a NULL pointer dereference) plus a small series from Luis fixing instances of vfree() under spinlock" * tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client: libceph: fix PG split vs OSD (re)connect race ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply ceph: clear page dirty before invalidate page ceph: fix buffer free while holding i_ceph_lock in fill_inode() ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob() ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr() libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
2019-08-22xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOTDarrick J. Wong
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp fails on account of being out of disk quota. I ran his reproducer script: # adduser dummy # adduser dummy plugdev # dd if=/dev/zero bs=1M count=100 of=test.img # mkfs.xfs test.img # mount -t xfs -o gquota test.img /mnt # mkdir -p /mnt/dummy # chown -c dummy /mnt/dummy # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt (and then as user dummy) $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo $ chgrp plugdev /mnt/dummy/foo and saw: ================================================ WARNING: lock held when returning to user space! 5.3.0-rc5 #rc5 Tainted: G W ------------------------------------------------ chgrp/47006 is leaving the kernel with locks still held! 1 lock held by chgrp/47006: #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs] ...which is clearly caused by xfs_setattr_nonsize failing to unlock the ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing unlock. Reported-by: benjamin.moody@gmail.com Fixes: 253f4911f297 ("xfs: better xfs_trans_alloc interface") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org>
2019-08-22io_uring: add need_resched() check in inner poll loopJens Axboe
The outer poll loop checks for whether we need to reschedule, and returns to userspace if we do. However, it's possible to get stuck in the inner loop as well, if the CPU we are running on needs to reschedule to finish the IO work. Add the need_resched() check in the inner loop as well. This fixes a potential hang if the kernel is configured with CONFIG_PREEMPT_VOLUNTARY=y. Reported-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-22Merge tag 'afs-fixes-20190822' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - Fix a cell record leak due to the default error not being cleared. - Fix an oops in tracepoint due to a pointer that may contain an error. - Fix the ACL storage op for YFS where the wrong op definition is being used. By luck, this only actually affects the information appearing in traces. * tag 'afs-fixes-20190822' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: use correct afs_call_type in yfs_fs_store_opaque_acl2 afs: Fix possible oops in afs_lookup trace event afs: Fix leak in afs_lookup_cell_rcu()
2019-08-22ubifs: Limit the number of pages in shrink_liabilityLiu Song
If the number of dirty pages to be written back is large, then writeback_inodes_sb will block waiting for a long time, causing hung task detection alarm. Therefore, we should limit the maximum number of pages written back this time, which let the budget be completed faster. The remaining dirty pages tend to rely on the writeback mechanism to complete the synchronization. Fixes: b6e51316daed ("writeback: separate starting of sync vs opportunistic writeback") Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-08-22ubifs: Correctly initialize c->min_log_bytesRichard Weinberger
Currently on a freshly mounted UBIFS, c->min_log_bytes is 0. This can lead to a log overrun and make commits fail. Recent kernels will report the following assert: UBIFS assert failed: c->lhead_lnum != c->ltail_lnum, in fs/ubifs/log.c:412 c->min_log_bytes can have two states, 0 and c->leb_size. It controls how much bytes of the log area are reserved for non-bud nodes such as commit nodes. After a commit it has to be set to c->leb_size such that we have always enough space for a commit. While a commit runs it can be 0 to make the remaining bytes of the log available to writers. Having it set to 0 right after mount is wrong since no space for commits is reserved. Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Reported-and-tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-08-22ubifs: Fix double unlock around orphan_delete()Richard Weinberger
We unlock after orphan_delete(), so no need to unlock in the function too. Reported-by: Han Xu <han.xu@nxp.com> Fixes: 8009ce956c3d ("ubifs: Don't leak orphans on memory during commit") Signed-off-by: Richard Weinberger <richard@nod.at>
2019-08-22afs: use correct afs_call_type in yfs_fs_store_opaque_acl2YueHaibing
It seems that 'yfs_RXYFSStoreOpaqueACL2' should be use in yfs_fs_store_opaque_acl2(). Fixes: f5e4546347bc ("afs: Implement YFS ACL setting") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-22afs: Fix possible oops in afs_lookup trace eventMarc Dionne
The afs_lookup trace event can cause the following: [ 216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b [ 216.576803] #PF: supervisor read access in kernel mode [ 216.576813] #PF: error_code(0x0000) - not-present page ... [ 216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs] If the inode from afs_do_lookup() is an error other than ENOENT, or if it is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will try to dereference the error pointer as a valid pointer. Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL. Ideally the trace would include the error value, but for now just avoid the oops. Fixes: 80548b03991f ("afs: Add more tracepoints") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-22afs: Fix leak in afs_lookup_cell_rcu()David Howells
Fix a leak on the cell refcount in afs_lookup_cell_rcu() due to non-clearance of the default error in the case a NULL cell name is passed and the workstation default cell is used. Also put a bit at the end to make sure we don't leak a cell ref if we're going to be returning an error. This leak results in an assertion like the following when the kafs module is unloaded: AFS: Assertion failed 2 == 1 is false 0x2 == 0x1 is false ------------[ cut here ]------------ kernel BUG at fs/afs/cell.c:770! ... RIP: 0010:afs_manage_cells+0x220/0x42f [kafs] ... process_one_work+0x4c2/0x82c ? pool_mayday_timeout+0x1e1/0x1e1 ? do_raw_spin_lock+0x134/0x175 worker_thread+0x336/0x4a6 ? rescuer_thread+0x4af/0x4af kthread+0x1de/0x1ee ? kthread_park+0xd4/0xd4 ret_from_fork+0x24/0x30 Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-22ceph: don't try fill file_lock on unsuccessful GETFILELOCK replyJeff Layton
When ceph_mdsc_do_request returns an error, we can't assume that the filelock_reply pointer will be set. Only try to fetch fields out of the r_reply_info when it returns success. Cc: stable@vger.kernel.org Reported-by: Hector Martin <hector@marcansoft.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-08-22ceph: clear page dirty before invalidate pageErqi Chen
clear_page_dirty_for_io(page) before mapping->a_ops->invalidatepage(). invalidatepage() clears page's private flag, if dirty flag is not cleared, the page may cause BUG_ON failure in ceph_set_page_dirty(). Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/40862 Signed-off-by: Erqi Chen <chenerqi@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-08-22ceph: fix buffer free while holding i_ceph_lock in fill_inode()Luis Henriques
Calling ceph_buffer_put() in fill_inode() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by postponing the call until later, when the lock is released. The following backtrace was triggered by fstests generic/070. BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4 6 locks held by kworker/0:4/3852: #0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0 #1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0 #2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476 #3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476 #4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476 #5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70 CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Workqueue: ceph-msgr ceph_con_workfn Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 fill_inode.isra.0+0xa9b/0xf70 ceph_fill_trace+0x13b/0xc70 ? dispatch+0x2eb/0x1476 dispatch+0x320/0x1476 ? __mutex_unlock_slowpath+0x4d/0x2a0 ceph_con_workfn+0xc97/0x2ec0 ? process_one_work+0x1b8/0x5f0 process_one_work+0x244/0x5f0 worker_thread+0x4d/0x3e0 kthread+0x105/0x140 ? process_one_work+0x5f0/0x5f0 ? kthread_park+0x90/0x90 ret_from_fork+0x3a/0x50 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-08-22ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob()Luis Henriques
Calling ceph_buffer_put() in __ceph_build_xattrs_blob() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by having this function returning the old blob buffer and have the callers of this function freeing it when the lock is released. The following backtrace was triggered by fstests generic/117. BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 649, name: fsstress 4 locks held by fsstress/649: #0: 00000000a7478e7e (&type->s_umount_key#19){++++}, at: iterate_supers+0x77/0xf0 #1: 00000000f8de1423 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: ceph_check_caps+0x7b/0xc60 #2: 00000000562f2b27 (&s->s_mutex){+.+.}, at: ceph_check_caps+0x3bd/0xc60 #3: 00000000f83ce16a (&mdsc->snap_rwsem){++++}, at: ceph_check_caps+0x3ed/0xc60 CPU: 1 PID: 649 Comm: fsstress Not tainted 5.2.0+ #439 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 __ceph_build_xattrs_blob+0x12b/0x170 __send_cap+0x302/0x540 ? __lock_acquire+0x23c/0x1e40 ? __mark_caps_flushing+0x15c/0x280 ? _raw_spin_unlock+0x24/0x30 ceph_check_caps+0x5f0/0xc60 ceph_flush_dirty_caps+0x7c/0x150 ? __ia32_sys_fdatasync+0x20/0x20 ceph_sync_fs+0x5a/0x130 iterate_supers+0x8f/0xf0 ksys_sync+0x4f/0xb0 __ia32_sys_sync+0xa/0x10 do_syscall_64+0x50/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fc6409ab617 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-08-22ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr()Luis Henriques
Calling ceph_buffer_put() in __ceph_setxattr() may end up freeing the i_xattrs.prealloc_blob buffer while holding the i_ceph_lock. This can be fixed by postponing the call until later, when the lock is released. The following backtrace was triggered by fstests generic/117. BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 650, name: fsstress 3 locks held by fsstress/650: #0: 00000000870a0fe8 (sb_writers#8){.+.+}, at: mnt_want_write+0x20/0x50 #1: 00000000ba0c4c74 (&type->i_mutex_dir_key#6){++++}, at: vfs_setxattr+0x55/0xa0 #2: 000000008dfbb3f2 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: __ceph_setxattr+0x297/0x810 CPU: 1 PID: 650 Comm: fsstress Not tainted 5.2.0+ #437 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 __ceph_setxattr+0x2b4/0x810 __vfs_setxattr+0x66/0x80 __vfs_setxattr_noperm+0x59/0xf0 vfs_setxattr+0x81/0xa0 setxattr+0x115/0x230 ? filename_lookup+0xc9/0x140 ? rcu_read_lock_sched_held+0x74/0x80 ? rcu_sync_lockdep_assert+0x2e/0x60 ? __sb_start_write+0x142/0x1a0 ? mnt_want_write+0x20/0x50 path_setxattr+0xba/0xd0 __x64_sys_lsetxattr+0x24/0x30 do_syscall_64+0x50/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7ff23514359a Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>