summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)Author
2019-08-29xfs: remove all *_ITER_ABORT valuesDarrick J. Wong
Use -ECANCELED to signal "stop iterating" instead of these magical *_ITER_ABORT values, since it's duplicative. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28xfs: log proper length of btree block in scrub/repairEric Sandeen
xfs_trans_log_buf() takes a final argument of the last byte to log in the buffer; b_length is in basic blocks, so this isn't the correct last byte. Fix it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-28xfs: reinitialize rm_flags when unpacking an offset into an rmap irecDarrick J. Wong
In xfs_rmap_irec_offset_unpack, we should always clear the contents of rm_flags before we begin unpacking the encoded (ondisk) offset into the incore rm_offset and incore rm_flags fields. Remove the open-coded field zeroing as this encourages api misuse. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28xfs: remove unnecessary int returns from deferred bmap functionsDarrick J. Wong
Remove the return value from the functions that schedule deferred bmap operations since they never fail and do not return status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28xfs: remove unnecessary int returns from deferred refcount functionsDarrick J. Wong
Remove the return value from the functions that schedule deferred refcount operations since they never fail and do not return status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28xfs: remove unnecessary int returns from deferred rmap functionsDarrick J. Wong
Remove the return value from the functions that schedule deferred rmap operations since they never fail and do not return status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28xfs: remove unnecessary parameter from xfs_iext_inc_seqDarrick J. Wong
This function doesn't use the @state parameter, so get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28xfs: fix sign handling problem in xfs_bmbt_diff_two_keysDarrick J. Wong
In xfs_bmbt_diff_two_keys, we perform a signed int64_t subtraction with two unsigned 64-bit quantities. If the second quantity is actually the "maximum" key (all ones) as used in _query_all, the subtraction effectively becomes addition of two positive numbers and the function returns incorrect results. Fix this with explicit comparisons of the unsigned values. Nobody needs this now, but the online repair patches will need this to work properly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28xfs: don't return _QUERY_ABORT from xfs_rmap_has_other_keysDarrick J. Wong
The xfs_rmap_has_other_keys helper aborts the iteration as soon as it has an answer. Don't let this abort leak out to callers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-28xfs: fix maxicount division by zero errorDarrick J. Wong
In xfs_ialloc_setup_geometry, it's possible for a malicious/corrupt fs image to set an unreasonably large value for sb_inopblog which will cause ialloc_blks to be zero. If sb_imax_pct is also set, this results in a division by zero error in the second do_div call. Therefore, force maxicount to zero if ialloc_blks is zero. Note that the kernel metadata verifiers will catch the garbage inopblog value and abort the fs mount long before it tries to set up the inode geometry; this is needed to avoid a crash in xfs_db while setting up the xfs_mount structure. Found by fuzzing sb_inopblog to 122 in xfs/350. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2019-08-26xfs: bmap scrub should only scrub records onceDarrick J. Wong
The inode block mapping scrub function does more work for btree format extent maps than is absolutely necessary -- first it will walk the bmbt and check all the entries, and then it will load the incore tree and check every entry in that tree, possibly for a second time. Simplify the code and decrease check runtime by separating the two responsibilities. The bmbt walk will make sure the incore extent mappings are loaded, check the shape of the bmap btree (via xchk_btree) and check that every bmbt record has a corresponding incore extent map; and the incore extent map walk takes all the responsibility for checking the mapping records and cross referencing them with other AG metadata. This enables us to clean up some messy parameter handling and reduce redundant code. Rename a few functions to make the split of responsibilities clearer. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-08-26xfs: remove excess function parameter description in ↵zhengbin
'xfs_btree_sblock_v5hdr_verify' Fixes gcc warning: fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'max_recs' description in 'xfs_btree_sblock_v5hdr_verify' fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'pag_max_level' description in 'xfs_btree_sblock_v5hdr_verify' Fixes: c5ab131ba0df ("libxfs: refactor short btree block verification") Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-26xfs: add kmem_alloc_io()Dave Chinner
Memory we use to submit for IO needs strict alignment to the underlying driver contraints. Worst case, this is 512 bytes. Given that all allocations for IO are always a power of 2 multiple of 512 bytes, the kernel heap provides natural alignment for objects of these sizes and that suffices. Until, of course, memory debugging of some kind is turned on (e.g. red zones, poisoning, KASAN) and then the alignment of the heap objects is thrown out the window. Then we get weird IO errors and data corruption problems because drivers don't validate alignment and do the wrong thing when passed unaligned memory buffers in bios. TO fix this, introduce kmem_alloc_io(), which will guaranteeat least 512 byte alignment of buffers for IO, even if memory debugging options are turned on. It is assumed that the minimum allocation size will be 512 bytes, and that sizes will be power of 2 mulitples of 512 bytes. Use this everywhere we allocate buffers for IO. This no longer fails with log recovery errors when KASAN is enabled due to the brd driver not handling unaligned memory buffers: # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-26xfs: get allocation alignment from the buftargDave Chinner
Needed to feed into the allocation routine to guarantee the memory buffers we add to bios are correctly aligned to the underlying device. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-26xfs: add kmem allocation trace pointsDave Chinner
When trying to correlate XFS kernel allocations to memory reclaim behaviour, it is useful to know what allocations XFS is actually attempting. This information is not directly available from tracepoints in the generic memory allocation and reclaim tracepoints, so these new trace points provide a high level indication of what the XFS memory demand actually is. There is no per-filesystem context in this code, so we just trace the type of allocation, the size and the allocation constraints. The kmem code also doesn't include much of the common XFS headers, so there are a few definitions that need to be added to the trace headers and a couple of types that need to be made common to avoid needing to include the whole world in the kmem code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-26fs: xfs: Remove KM_NOSLEEP and KM_SLEEP.Tetsuo Handa
Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP, we can remove KM_NOSLEEP and replace KM_SLEEP with 0. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-22xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOTDarrick J. Wong
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp fails on account of being out of disk quota. I ran his reproducer script: # adduser dummy # adduser dummy plugdev # dd if=/dev/zero bs=1M count=100 of=test.img # mkfs.xfs test.img # mount -t xfs -o gquota test.img /mnt # mkdir -p /mnt/dummy # chown -c dummy /mnt/dummy # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt (and then as user dummy) $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo $ chgrp plugdev /mnt/dummy/foo and saw: ================================================ WARNING: lock held when returning to user space! 5.3.0-rc5 #rc5 Tainted: G W ------------------------------------------------ chgrp/47006 is leaving the kernel with locks still held! 1 lock held by chgrp/47006: #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs] ...which is clearly caused by xfs_setattr_nonsize failing to unlock the ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing unlock. Reported-by: benjamin.moody@gmail.com Fixes: 253f4911f297 ("xfs: better xfs_trans_alloc interface") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org>
2019-08-19fs/xfs: Fix return code of xfs_break_leased_layouts()Ira Weiny
The parens used in the while loop would result in error being assigned the value 1 rather than the intended errno value. This is required to return -ETXTBSY from follow on break_layout() changes. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-18xfs: fix reflink source file racing with directio writesDarrick J. Wong
While trawling through the dedupe file comparison code trying to fix page deadlocking problems, Dave Chinner noticed that the reflink code only takes shared IOLOCK/MMAPLOCKs on the source file. Because page_mkwrite and directio writes do not take the EXCL versions of those locks, this means that reflink can race with writer processes. For pure remapping this can lead to undefined behavior and file corruption; for dedupe this means that we cannot be sure that the contents are identical when we decide to go ahead with the remapping. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-08-16xfs: compat_ioctl: use compat_ptr()Christoph Hellwig
For 31-bit s390 user space, we have to pass pointer arguments through compat_ptr() in the compat_ioctl handler. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-16xfs: fall back to native ioctls for unhandled compat onesChristoph Hellwig
Always try the native ioctl if we don't have a compat handler. This removes a lot of boilerplate code as 'modern' ioctls should generally be compat clean, and fixes the missing entries for the recently added FS_IOC_GETFSLABEL/FS_IOC_SETFSLABEL ioctls. Fixes: f7664b31975b ("xfs: implement online get/set fs label") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-12xfs: don't crash on null attr fork xfs_bmapi_readDarrick J. Wong
Zorro Lang reported a crash in generic/475 if we try to inactivate a corrupt inode with a NULL attr fork (stack trace shortened somewhat): RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs] RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51 RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012 RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004 R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001 FS: 00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0 Call Trace: xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs] xfs_da_read_buf+0xf5/0x2c0 [xfs] xfs_da3_node_read+0x1d/0x230 [xfs] xfs_attr_inactive+0x3cc/0x5e0 [xfs] xfs_inactive+0x4c8/0x5b0 [xfs] xfs_fs_destroy_inode+0x31b/0x8e0 [xfs] destroy_inode+0xbc/0x190 xfs_bulkstat_one_int+0xa8c/0x1200 [xfs] xfs_bulkstat_one+0x16/0x20 [xfs] xfs_bulkstat+0x6fa/0xf20 [xfs] xfs_ioc_bulkstat+0x182/0x2b0 [xfs] xfs_file_ioctl+0xee0/0x12a0 [xfs] do_vfs_ioctl+0x193/0x1000 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x9f/0x4d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f11d39a3e5b The "obvious" cause is that the attr ifork is null despite the inode claiming an attr fork having at least one extent, but it's not so obvious why we ended up with an inode in that state. Reported-by: Zorro Lang <zlang@redhat.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2019-08-12xfs: remove more ondisk directory corruption assertsDarrick J. Wong
Continue our game of replacing ASSERTs for corrupt ondisk metadata with EFSCORRUPTED returns. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2019-08-03fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve().Tetsuo Handa
When the system is close-to-OOM, fsync() may fail due to -ENOMEM because xfs_log_reserve() is using KM_MAYFAIL. It is a bad thing to fail writeback operation due to user-triggerable OOM condition. Since we are not using KM_MAYFAIL at xfs_trans_alloc() before calling xfs_log_reserve(), let's use the same flags at xfs_log_reserve(). oom-torture: page allocation failure: order:0, mode:0x46c40(GFP_NOFS|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP), nodemask=(null) CPU: 7 PID: 1662 Comm: oom-torture Kdump: loaded Not tainted 5.3.0-rc2+ #925 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 Call Trace: dump_stack+0x67/0x95 warn_alloc+0xa9/0x140 __alloc_pages_slowpath+0x9a8/0xbce __alloc_pages_nodemask+0x372/0x3b0 alloc_slab_page+0x3a/0x8d0 new_slab+0x330/0x420 ___slab_alloc.constprop.94+0x879/0xb00 __slab_alloc.isra.89.constprop.93+0x43/0x6f kmem_cache_alloc+0x331/0x390 kmem_zone_alloc+0x9f/0x110 [xfs] kmem_zone_alloc+0x9f/0x110 [xfs] xlog_ticket_alloc+0x33/0xd0 [xfs] xfs_log_reserve+0xb4/0x410 [xfs] xfs_trans_reserve+0x1d1/0x2b0 [xfs] xfs_trans_alloc+0xc9/0x250 [xfs] xfs_setfilesize_trans_alloc.isra.27+0x44/0xc0 [xfs] xfs_submit_ioend.isra.28+0xa5/0x180 [xfs] xfs_vm_writepages+0x76/0xa0 [xfs] do_writepages+0x17/0x80 __filemap_fdatawrite_range+0xc1/0xf0 file_write_and_wait_range+0x53/0xa0 xfs_file_fsync+0x87/0x290 [xfs] vfs_fsync_range+0x37/0x80 do_fsync+0x38/0x60 __x64_sys_fsync+0xf/0x20 do_syscall_64+0x4a/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: eb01c9cd87 ("[XFS] Remove the xlog_ticket allocator") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-07-30xfs: Fix possible null-pointer dereferences in ↵Jia-Ju Bai
xchk_da_btree_block_check_sibling() In xchk_da_btree_block_check_sibling(), there is an if statement on line 274 to check whether ds->state->altpath.blk[level].bp is NULL: if (ds->state->altpath.blk[level].bp) When ds->state->altpath.blk[level].bp is NULL, it is used on line 281: xfs_trans_brelse(..., ds->state->altpath.blk[level].bp); struct xfs_buf_log_item *bip = bp->b_log_item; ASSERT(bp->b_transp == tp); Thus, possible null-pointer dereferences may occur. To fix these bugs, ds->state->altpath.blk[level].bp is checked before being used. These bugs are found by a static analysis tool STCheck written by us. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-07-28xfs: fix stack contents leakage in the v1 inumber ioctlsDarrick J. Wong
Explicitly initialize the onstack structures to zero so we don't leak kernel memory into userspace when converting the in-core inumbers structure to the v1 inogrp ioctl structure. Add a comment about why we have to use memset to ensure that the padding holes in the structures are set to zero. Fixes: 5f19c7fc6873351 ("xfs: introduce v5 inode group structure") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2019-07-18Merge tag 'xfs-5.3-merge-13' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs cleanups from Darrick Wong: "We had a few more lateish cleanup patches come in for 5.3 -- a couple of syncups with the userspace libxfs code and a conversion of the XFS administrator's guide to ReST format. Summary: - Bring fs/xfs/libxfs/xfs_trans_inode.c in sync with userspace libxfs. - Convert the xfs administrator guide to rst and move it into the official admin guide under Documentation" * tag 'xfs-5.3-merge-13' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: Documentation: filesystem: Convert xfs.txt to ReST xfs: sync up xfs_trans_inode with userspace xfs: move xfs_trans_inode.c to libxfs/
2019-07-18Merge tag 'libnvdimm-for-5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Primarily just the virtio_pmem driver: - virtio_pmem The new virtio_pmem facility introduces a paravirtualized persistent memory device that allows a guest VM to use DAX mechanisms to access a host-file with host-page-cache. It arranges for MAP_SYNC to be disabled and instead triggers a host fsync() when a 'write-cache flush' command is sent to the virtual disk device. - Miscellaneous small fixups" * tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: virtio_pmem: fix sparse warning xfs: disable map_sync for async flush ext4: disable map_sync for async flush dax: check synchronous mapping is supported dm: enable synchronous dax libnvdimm: add dax_dev sync flag virtio-pmem: Add virtio pmem driver libnvdimm: nd_region flush callback support libnvdimm, namespace: Drop uuid_t implementation detail
2019-07-15Merge tag 'for-linus-20190715' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block updates from Jens Axboe: "A later pull request with some followup items. I had some vacation coming up to the merge window, so certain things items were delayed a bit. This pull request also contains fixes that came in within the last few days of the merge window, which I didn't want to push right before sending you a pull request. This contains: - NVMe pull request, mostly fixes, but also a few minor items on the feature side that were timing constrained (Christoph et al) - Report zones fixes (Damien) - Removal of dead code (Damien) - Turn on cgroup psi memstall (Josef) - block cgroup MAINTAINERS entry (Konstantin) - Flush init fix (Josef) - blk-throttle low iops timing fix (Konstantin) - nbd resize fixes (Mike) - nbd 0 blocksize crash fix (Xiubo) - block integrity error leak fix (Wenwen) - blk-cgroup writeback and priority inheritance fixes (Tejun)" * tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits) MAINTAINERS: add entry for block io cgroup null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED block: Limit zone array allocation size sd_zbc: Fix report zones buffer allocation block: Kill gfp_t argument of blkdev_report_zones() block: Allow mapping of vmalloc-ed buffers block/bio-integrity: fix a memory leak bug nvme: fix NULL deref for fabrics options nbd: add netlink reconfigure resize support nbd: fix crash when the blksize is zero block: Disable write plugging for zoned block devices block: Fix elevator name declaration block: Remove unused definitions nvme: fix regression upon hot device removal and insertion blk-throttle: fix zero wait time for iops throttled group block: Fix potential overflow in blk_report_zones() blkcg: implement REQ_CGROUP_PUNT blkcg, writeback: Implement wbc_blkcg_css() blkcg, writeback: Add wbc->no_cgroup_owner blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() ...
2019-07-15xfs: sync up xfs_trans_inode with userspaceEric Sandeen
Add an XFS_ICHGTIME_CREATE case to xfs_trans_ichgtime() to keep in sync with userspace. (Currently no kernel caller sends this flag.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-07-15xfs: move xfs_trans_inode.c to libxfs/Eric Sandeen
Userspace now has an identical xfs_trans_inode.c which it has already moved to libxfs/ so do the same move for kernelspace. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-07-12Merge tag 'xfs-5.3-merge-12' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "In this release there are a significant amounts of consolidations and cleanups in the log code; restructuring of the log to issue struct bios directly; new bulkstat ioctls to return v5 fs inode information (and fix all the padding problems of the old ioctl); the beginnings of multithreaded inode walks (e.g. quotacheck); and a reduction in memory usage in the online scrub code leading to reduced runtimes. - Refactor inode geometry calculation into a single structure instead of open-coding pieces everywhere. - Add online repair to build options. - Remove unnecessary function call flags and functions. - Claim maintainership of various loose xfs documentation and header files. - Use struct bio directly for log buffer IOs instead of struct xfs_buf. - Reduce log item boilerplate code requirements. - Merge log item code spread across too many files. - Further distinguish between log item commits and cancellations. - Various small cleanups to the ag small allocator. - Support cgroup-aware writeback - libxfs refactoring for mkfs cleanup - Remove unneeded #includes - Fix a memory allocation miscalculation in the new log bio code - Fix bisection problems - Fix a crash in ioend processing caused by tripping over freeing of preallocated transactions - Split out a generic inode walk mechanism from the bulkstat code, hook up all the internal users to use the walking code, then clean up bulkstat to serve only the bulkstat ioctls. - Add a multithreaded iwalk implementation to speed up quotacheck on fast storage with many CPUs. - Remove unnecessary return values in logging teardown functions. - Supplement the bstat and inogrp structures with new bulkstat and inumbers structures that have all the fields we need for v5 filesystem features and none of the padding problems of their predecessors. - Wire up new ioctls that use the new structures with a much simpler bulk_ireq structure at the head instead of the pointerhappy mess we had before. - Enable userspace to constrain bulkstat returns to a single AG or a single special inode so that we can phase out a lot of geometry guesswork in userspace. - Reduce memory consumption and zeroing overhead in extended attribute scrub code. - Fix some behavioral regressions in the new bulkstat backend code. - Fix some behavioral regressions in the new log bio code" * tag 'xfs-5.3-merge-12' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (100 commits) xfs: chain bios the right way around in xfs_rw_bdev xfs: bump INUMBERS cursor correctly in xfs_inumbers_walk xfs: don't update lastino for FSBULKSTAT_SINGLE xfs: online scrub needn't bother zeroing its temporary buffer xfs: only allocate memory for scrubbing attributes when we need it xfs: refactor attr scrub memory allocation function xfs: refactor extended attribute buffer pointer functions xfs: attribute scrub should use seen_enough to pass error values xfs: allow single bulkstat of special inodes xfs: specify AG in bulk req xfs: wire up the v5 inumbers ioctl xfs: wire up new v5 bulkstat ioctls xfs: introduce v5 inode group structure xfs: introduce new v5 bulkstat structure xfs: rename bulkstat functions xfs: remove various bulk request typedef usage fs: xfs: xfs_log: Change return type from int to void xfs: poll waiting for quotacheck xfs: multithreaded iwalk implementation xfs: refactor INUMBERS to use iwalk functions ...
2019-07-12Merge tag 'vfs-fix-ioctl-checking-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong: "Here's a patch series that sets up common parameter checking functions for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations. The goal here is to reduce the amount of behaviorial variance between the filesystems where those ioctls originated (ext2 and XFS, respectively) and everybody else. - Standardize parameter checking for the SETFLAGS and FSSETXATTR ioctls (which were the file attribute setters for ext4 and xfs and have now been hoisted to the vfs) - Only allow the DAX flag to be set on files and directories" * tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: vfs: only allow FSSETXATTR to set DAX flag on files and dirs vfs: teach vfs_ioc_fssetxattr_check to check extent size hints vfs: teach vfs_ioc_fssetxattr_check to check project id info vfs: create a generic checking function for FS_IOC_FSSETXATTR vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
2019-07-10Merge tag 'copy-file-range-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull copy_file_range updates from Darrick Wong: "This fixes numerous parameter checking problems and inconsistent behaviors in the new(ish) copy_file_range system call. Now the system call will actually check its range parameters correctly; refuse to copy into files for which the caller does not have sufficient privileges; update mtime and strip setuid like file writes are supposed to do; and allows copying up to the EOF of the source file instead of failing the call like we used to. Summary: - Create a generic copy_file_range handler and make individual filesystems responsible for calling it (i.e. no more assuming that do_splice_direct will work or is appropriate) - Refactor copy_file_range and remap_range parameter checking where they are the same - Install missing copy_file_range parameter checking(!) - Remove suid/sgid and update mtime like any other file write - Change the behavior so that a copy range crossing the source file's eof will result in a short copy to the source file's eof instead of EINVAL - Permit filesystems to decide if they want to handle cross-superblock copy_file_range in their local handlers" * tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fuse: copy_file_range needs to strip setuid bits and update timestamps vfs: allow copy_file_range to copy across devices xfs: use file_modified() helper vfs: introduce file_modified() helper vfs: add missing checks to copy_file_range vfs: remove redundant checks from generic_remap_checks() vfs: introduce generic_file_rw_checks() vfs: no fallback for ->copy_file_range vfs: introduce generic_copy_file_range()
2019-07-10xfs: chain bios the right way around in xfs_rw_bdevChristoph Hellwig
We need to chain the earlier bios to the later ones, so that submit_bio_wait waits on the bio that all the completions are dispatched to. Fixes: 6ad5b3255b9e ("xfs: use bios directly to read and write the log recovery buffers") Reported-by: Dave Chinner <david@fromorbit.com> Tested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-07-09xfs: bump INUMBERS cursor correctly in xfs_inumbers_walkDarrick J. Wong
There's a subtle unit conversion error when we increment the INUMBERS cursor at the end of xfs_inumbers_walk. If there's an inode chunk at the very end of the AG /and/ the AG size is a perfect power of two, the startino of that last chunk (which is in units of AG inodes) will be 63 less than (1 << agino_log). If we add XFS_INODES_PER_CHUNK to the startino, we end up with a startino that's larger than (1 << agino_log) and when we convert that back to fs inode units we'll rip off that upper bit and wind up back at the start of the AG. Fix this by converting to units of fs inodes before adding XFS_INODES_PER_CHUNK so that we'll harmlessly end up pointing to the next AG. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-06xfs: don't update lastino for FSBULKSTAT_SINGLEDarrick J. Wong
The kernel test robot found a regression of xfs/054 in the conversion of bulkstat to use the new iwalk infrastructure -- if a caller set *lastip = 128 and invoked FSBULKSTAT_SINGLE, the bstat info would be for inode 128, but *lastip would be increased by the kernel to 129. FSBULKSTAT_SINGLE never incremented lastip before, so it's incorrect to make such an update to the internal lastino value now. Fixes: 2810bd6840e463 ("xfs: convert bulkstat to new iwalk infrastructure") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: disable map_sync for async flushPankaj Gupta
Dont support 'MAP_SYNC' with non-DAX files and DAX files with asynchronous dax_device. Virtio pmem provides asynchronous host page cache flush mechanism. We don't support 'MAP_SYNC' with virtio pmem and xfs. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-07-05xfs: online scrub needn't bother zeroing its temporary bufferDarrick J. Wong
The xattr scrubber functions use the temporary memory buffer either for storing bitmaps or for testing if attribute value extraction works. The bitmap code always zeroes what it needs and the value extraction sets the buffer contents, so it's not necessary to waste CPU time zeroing on allocation. Note that while we never read the contents that the attr value extraction function sets, we do need to call it to check the remote attribute header and CRCs to check for corruption. A flame graph analysis showed that we were spending 7% of a xfs_scrub run (the whole program, not just the attr scrubber itself) allocating and zeroing 64k segments needlessly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: only allocate memory for scrubbing attributes when we need itDarrick J. Wong
In examining a flame graph of time spent running xfs_scrub on various filesystems, I noticed that we spent nearly 7% of the total runtime on allocating a zeroed 65k buffer for every SCRUB_TYPE_XATTR invocation. We do this even if none of the attribute values were anywhere near 64k in size, even if there were no attribute blocks to check space on, and even if it just turns out there are no attributes at all. Therefore, rearrange the xattr buffer setup code to support reallocating with a bigger buffer and redistribute the callers of that function so that we only allocate memory just prior to needing it, and only allocate as much as we need. If we can't get memory with the ILOCK held we'll bail out with EDEADLOCK which will allocate the maximum memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: refactor attr scrub memory allocation functionDarrick J. Wong
Move the code that allocates memory buffers for the extended attribute scrub code into a separate function so we can reduce memory allocations in the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: refactor extended attribute buffer pointer functionsDarrick J. Wong
Replace the open-coded attribute buffer pointer calculations with helper functions to make it more obvious what we're doing with our freeform memory allocation w.r.t. either storing xattr values or computing btree block free space. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-05xfs: attribute scrub should use seen_enough to pass error valuesDarrick J. Wong
When we're iterating all the attributes using the built-in xattr iterator, we can use the seen_enough variable to pass error codes back to the main scrub function instead of flattening them into 0/1. This will be used in a more exciting fashion in upcoming patches. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-04xfs: allow single bulkstat of special inodesDarrick J. Wong
Create a new bulk ireq flag that enables userspace to ask us for a special inode number instead of interpreting @ino as a literal inode number. This enables us to query the root inode easily. The reason for adding the ability to query specifically the root directory inode is that certain programs (xfsdump and xfsrestore) want to confirm when they've been pointed to the root directory. The userspace code assumes the root directory is always the first result from calling bulkstat with lastino == 0, but this isn't true if the (initial btree roots + initial AGFL + inode alignment padding) is itself long enough to be allocated to new inodes if all of those blocks should happen to be free at the same time. Rather than make userspace guess at internal filesystem state, we provide a direct query. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-04xfs: specify AG in bulk reqDarrick J. Wong
Add a new xfs_bulk_ireq flag to constrain the iteration to a single AG. If the passed-in startino value is zero then we start with the first inode in the AG that the user passes in; otherwise, we iterate only within the same AG as the passed-in inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: wire up the v5 inumbers ioctlDarrick J. Wong
Wire up the v5 INUMBERS ioctl. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: wire up new v5 bulkstat ioctlsDarrick J. Wong
Wire up the new v5 BULKSTAT ioctl. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: introduce v5 inode group structureDarrick J. Wong
Introduce a new "v5" inode group structure that fixes the alignment and padding problems of the existing structure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: introduce new v5 bulkstat structureDarrick J. Wong
Introduce a new version of the in-core bulkstat structure that supports our new v5 format features. This structure also fills the gaps in the previous structure. We leave wiring up the ioctls for the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-03xfs: rename bulkstat functionsDarrick J. Wong
Rename the bulkstat functions to 'fsbulkstat' so that they match the ioctl names. We will be introducing a new set of bulkstat/inumbers ioctls soon, and it will be important to keep the names straight. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>