summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_super.c
AgeCommit message (Collapse)Author
2020-12-09xfs: show the proper user quota optionsKaixu Xia
The quota option 'usrquota' should be shown if both the XFS_UQUOTA_ACCT and XFS_UQUOTA_ENFD flags are set. The option 'uqnoenforce' should be shown when only the XFS_UQUOTA_ACCT flag is set. The current code logic seems wrong, Fix it and show proper options. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-12-09xfs: rename xfs_fc_* back to xfs_fs_*Darrick J. Wong
Get rid of this one-off namespace since we're done converting things to fscontext now. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-12-09xfs: refactor file range validationDarrick J. Wong
Refactor all the open-coded validation of file block ranges into a single helper, and teach the bmap scrubber to check the ranges. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-12-09xfs: define a new "needrepair" featureDarrick J. Wong
Define an incompat feature flag to indicate that the filesystem needs to be repaired. While libxfs will recognize this feature, the kernel will refuse to mount if the feature flag is set, and only xfs_repair will be able to clear the flag. The goal here is to force the admin to run xfs_repair to completion after upgrading the filesystem, or if we otherwise detect anomalies. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-12-08xfs: move kernel-specific superblock validation out of libxfsDarrick J. Wong
A couple of the superblock validation checks apply only to the kernel, so move them to xfs_fc_fill_super before we add the needsrepair "feature", which will prevent the kernel (but not xfsprogs) from mounting the filesystem. This also reduces the diff between kernel and userspace libxfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-10-24Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff all over the place (the largest group here is Christoph's stat cleanups)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove KSTAT_QUERY_FLAGS fs: remove vfs_stat_set_lookup_flags fs: move vfs_fstatat out of line fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat fs: remove vfs_statx_fd fs: omfs: use kmemdup() rather than kmalloc+memcpy [PATCH] reduce boilerplate in fsid handling fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS selftests: mount: add nosymfollow tests Add a "nosymfollow" mount option.
2020-09-25xfs: remove deprecated mount optionsPavel Reichl
ikeep/noikeep was a workaround for old DMAPI code which is no longer relevant. attr2/noattr2 - is for controlling upgrade behaviour from fixed attribute fork sizes in the inode (attr1) and dynamic attribute fork sizes (attr2). mkfs has defaulted to setting attr2 since 2007, hence just about every XFS filesystem out there in production right now uses attr2. Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix minor typos] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-18[PATCH] reduce boilerplate in fsid handlingAl Viro
Get rid of boilerplate in most of ->statfs() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-15xfs: deprecate the V4 formatDarrick J. Wong
The V4 filesystem format contains known weaknesses in the on-disk format that make metadata verification diffiult. In addition, the format does not support dates past 2038 and will not be upgraded to do so. We should start the process of retiring the old format to close off attack surfaces and to encourage users to migrate onto V5. Therefore, make XFS V4 support a configurable option. For the first period it will be default Y in case some distributors want to withdraw support early; for the second period it will be default N so that anyone who wishes to continue support can do so; and after that, support will be removed from the kernel. Dates for these events have been added to the upstream kernel. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-09-15xfs: trace timestamp limitsDarrick J. Wong
Add a couple of tracepoints so that we can check the timestamp limits being set on inodes and quotas. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15xfs: widen ondisk inode timestamps to deal with y2038+Darrick J. Wong
Redesign the ondisk inode timestamps to be a simple unsigned 64-bit counter of nanoseconds since 14 Dec 1901 (i.e. the minimum time in the 32-bit unix time epoch). This enables us to handle dates up to 2486, which solves the y2038 problem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15xfs: explicitly define inode timestamp rangeDarrick J. Wong
Formally define the inode timestamp ranges that existing filesystems support, and switch the vfs timetamp ranges to use it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15xfs: store inode btree block counts in AGI headerDarrick J. Wong
Add a btree block usage counters for both inode btrees to the AGI header so that we don't have to walk the entire finobt at mount time to create the per-AG reservations. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-09-06xfs: xfs_iflock is no longer a completionDave Chinner
With the recent rework of the inode cluster flushing, we no longer ever wait on the the inode flush "lock". It was never a lock in the first place, just a completion to allow callers to wait for inode IO to complete. We now never wait for flush completion as all inode flushing is non-blocking. Hence we can get rid of all the iflock infrastructure and instead just set and check a state flag. Rename the XFS_IFLOCK flag to XFS_IFLUSHING, convert all the xfs_iflock_nowait() test-and-set operations on that flag, and replace all the xfs_ifunlock() calls to clear operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-17xfs: preserve inode versioning across remountsEric Sandeen
The MS_I_VERSION mount flag is exposed via the VFS, as documented in the mount manpages etc; see the iversion and noiversion mount options in mount(8). As a result, mount -o remount looks for this option in /proc/mounts and will only send the I_VERSION flag back in during remount it it is present. Since it's not there, a remount will /remove/ the I_VERSION flag at the vfs level, and iversion functionality is lost. xfs v5 superblocks intend to always have i_version enabled; it is set as a default at mount time, but is lost during remount for the reasons above. The generic fix would be to expose this documented option in /proc/mounts, but since that was rejected, fix it up again in the xfs remount path instead, so that at least xfs won't suffer from this misbehavior. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-09xfs: Fix false positive lockdep warning with sb_internal & fs_reclaimWaiman Long
Depending on the workloads, the following circular locking dependency warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo lock) may show up: ====================================================== WARNING: possible circular locking dependency detected 5.0.0-rc1+ #60 Tainted: G W ------------------------------------------------------ fsfreeze/4346 is trying to acquire lock: 0000000026f1d784 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.19+0x5/0x30 but task is already holding lock: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650 which lock already depends on the new lock. : Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_internal); lock(fs_reclaim); lock(sb_internal); lock(fs_reclaim); *** DEADLOCK *** 4 locks held by fsfreeze/4346: #0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650 #1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290 #2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650 #3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650 stack backtrace: Call Trace: dump_stack+0xe0/0x19a print_circular_bug.isra.10.cold.34+0x2f4/0x435 check_prev_add.constprop.19+0xca1/0x15f0 validate_chain.isra.14+0x11af/0x3b50 __lock_acquire+0x728/0x1200 lock_acquire+0x269/0x5a0 fs_reclaim_acquire.part.19+0x29/0x30 fs_reclaim_acquire+0x19/0x20 kmem_cache_alloc+0x3e/0x3f0 kmem_zone_alloc+0x79/0x150 xfs_trans_alloc+0xfa/0x9d0 xfs_sync_sb+0x86/0x170 xfs_log_sbcount+0x10f/0x140 xfs_quiesce_attr+0x134/0x270 xfs_fs_freeze+0x4a/0x70 freeze_super+0x1af/0x290 do_vfs_ioctl+0xedc/0x16c0 ksys_ioctl+0x41/0x80 __x64_sys_ioctl+0x73/0xa9 do_syscall_64+0x18f/0xd23 entry_SYSCALL_64_after_hwframe+0x49/0xbe This is a false positive as all the dirty pages are flushed out before the filesystem can be frozen. One way to avoid this splat is to add GFP_NOFS to the affected allocation calls by using the memalloc_nofs_save()/memalloc_nofs_restore() pair. This shouldn't matter unless the system is really running out of memory. In that particular case, the filesystem freeze operation may fail while it was succeeding previously. Without this patch, the command sequence below will show that the lock dependency chain sb_internal -> fs_reclaim exists. # fsfreeze -f /home # fsfreeze --unfreeze /home # grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal After applying the patch, such sb_internal -> fs_reclaim lock dependency chain can no longer be found. Because of that, the locking dependency warning will not be shown. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-07-07xfs: remove SYNC_WAIT from xfs_reclaim_inodes()Dave Chinner
Clean up xfs_reclaim_inodes() callers. Most callers want blocking behaviour, so just make the existing SYNC_WAIT behaviour the default. For the xfs_reclaim_worker(), just call xfs_reclaim_inodes_ag() directly because we just want optimistic clean inode reclaim to be done in the background. For xfs_quiesce_attr() we can just remove the inode reclaim calls as they are a historic relic that was required to flush dirty inodes that contained unlogged changes. We now log all changes to the inodes, so the sync AIL push from xfs_log_quiesce() called by xfs_quiesce_attr() will do all the required inode writeback for freeze. Seeing as we now want to loop until all reclaimable inodes have been reclaimed, make xfs_reclaim_inodes() loop on the XFS_ICI_RECLAIM_TAG tag rather than having xfs_reclaim_inodes_ag() tell it that inodes were skipped. This is much more reliable and will always loop until all reclaimable inodes are reclaimed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-06-02Merge tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull DAX updates part two from Darrick Wong: "This time around, we're hoisting the DONTCACHE flag from XFS into the VFS so that we can make the incore DAX mode changes become effective sooner. We can't change the file data access mode on a live inode because we don't have a safe way to change the file ops pointers. The incore state change becomes effective at inode loading time, which can happen if the inode is evicted. Therefore, we're making it so that filesystems can ask the VFS to evict the inode as soon as the last holder drops. The per-fs changes to make this call this will be in subsequent pull requests from Ted and myself. Summary: - Introduce DONTCACHE flags for dentries and inodes. This hint will cause the VFS to drop the associated objects immediately after the last put, so that we can change the file access mode (DAX or page cache) on the fly" * tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: Introduce DCACHE_DONTCACHE fs: Lift XFS_IDONTCACHE to the VFS layer
2020-06-02Merge tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "Most of the changes this cycle are refactoring of existing code in preparation for things landing in the future. We also fixed various problems and deficiencies in the quota implementation, and (I hope) the last of the stale read vectors by forcing write allocations to go through the unwritten state until the write completes. Summary: - Various cleanups to remove dead code, unnecessary conditionals, asserts, etc. - Fix a linker warning caused by xfs stuffing '-g' into CFLAGS redundantly. - Tighten up our dmesg logging to ensure that everything is prefixed with 'XFS' for easier grepping. - Kill a bunch of typedefs. - Refactor the deferred ops code to reduce indirect function calls. - Increase type-safety with the deferred ops code. - Make the DAX mount options a tri-state. - Fix some error handling problems in the inode flush code and clean up other inode flush warts. - Refactor log recovery so that each log item recovery functions now live with the other log item processing code. - Fix some SPDX forms. - Fix quota counter corruption if the fs crashes after running quotacheck but before any dquots get logged. - Don't fail metadata verification on zero-entry attr leaf blocks, since they're just part of the disk format now due to a historic lack of log atomicity. - Don't allow SWAPEXT between files with different [ugp]id when quotas are enabled. - Refactor inode fork reading and verification to run directly from the inode-from-disk function. This means that we now actually guarantee that _iget'ted inodes are totally verified and ready to go. - Move the incore inode fork format and extent counts to the ifork structure. - Scalability improvements by reducing cacheline pingponging in struct xfs_mount. - More scalability improvements by removing m_active_trans from the hot path. - Fix inode counter update sanity checking to run /only/ on debug kernels. - Fix longstanding inconsistency in what error code we return when a program hits project quota limits (ENOSPC). - Fix group quota returning the wrong error code when a program hits group quota limits. - Fix per-type quota limits and grace periods for group and project quotas so that they actually work. - Allow extension of individual grace periods. - Refactor the non-reclaim inode radix tree walking code to remove a bunch of stupid little functions and straighten out the inconsistent naming schemes. - Fix a bug in speculative preallocation where we measured a new allocation based on the last extent mapping in the file instead of looking farther for the last contiguous space allocation. - Force delalloc writes to unwritten extents. This closes a stale disk contents exposure vector if the system goes down before the write completes. - More lockdep whackamole" * tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (129 commits) xfs: more lockdep whackamole with kmem_alloc* xfs: force writes to delalloc regions to unwritten xfs: refactor xfs_iomap_prealloc_size xfs: measure all contiguous previous extents for prealloc size xfs: don't fail unwritten extent conversion on writeback due to edquot xfs: rearrange xfs_inode_walk_ag parameters xfs: straighten out all the naming around incore inode tree walks xfs: move xfs_inode_ag_iterator to be closer to the perag walking code xfs: use bool for done in xfs_inode_ag_walk xfs: fix inode ag walk predicate function return values xfs: refactor eofb matching into a single helper xfs: remove __xfs_icache_free_eofblocks xfs: remove flags argument from xfs_inode_ag_walk xfs: remove xfs_inode_ag_iterator_flags xfs: remove unused xfs_inode_ag_iterator function xfs: replace open-coded XFS_ICI_NO_TAG xfs: move eofblocks conversion function to xfs_ioctl.c xfs: allow individual quota grace period extension xfs: per-type quota timers and warn limits xfs: switch xfs_get_defquota to take explicit type ...
2020-05-27xfs: remove the m_active_trans counterDave Chinner
It's a global atomic counter, and we are hitting it at a rate of half a million transactions a second, so it's bouncing the counter cacheline all over the place on large machines. We don't actually need it anymore - it used to be required because the VFS freeze code could not track/prevent filesystem transactions that were running, but that problem no longer exists. Hence to remove the counter, we simply have to ensure that nothing calls xfs_sync_sb() while we are trying to quiesce the filesytem. That only happens if the log worker is still running when we call xfs_quiesce_attr(). The log worker is cancelled at the end of xfs_quiesce_attr() by calling xfs_log_quiesce(), so just call it early here and then we can remove the counter altogether. Concurrent create, 50 million inodes, identical 16p/16GB virtual machines on different physical hosts. Machine A has twice the CPU cores per socket of machine B: unpatched patched machine A: 3m16s 2m00s machine B: 4m04s 4m05s Create rates: unpatched patched machine A: 282k+/-31k 468k+/-21k machine B: 231k+/-8k 233k+/-11k Concurrent rm of same 50 million inodes: unpatched patched machine A: 6m42s 2m33s machine B: 4m47s 4m47s The transaction rate on the fast machine went from just under 300k/sec to 700k/sec, which indicates just how much of a bottleneck this atomic counter was. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-22block: remove the error_sector argument to blkdev_issue_flushChristoph Hellwig
The argument isn't used by any caller, and drivers don't fill out bi_sector for flush requests either. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-13xfs: ensure f_bfree returned by statfs() is non-negativeZheng Bin
Construct an img like this: dd if=/dev/zero of=xfs.img bs=1M count=20 mkfs.xfs -d agcount=1 xfs.img xfs_db -x xfs.img sb 0 write fdblocks 0 agf 0 write freeblks 0 write longest 0 quit mount it, df -h /mnt(xfs mount point), will show this: Filesystem Size Used Avail Use% Mounted on /dev/loop0 17M -64Z -32K 100% /mnt Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-13fs: Lift XFS_IDONTCACHE to the VFS layerIra Weiny
DAX effective mode (S_DAX) changes requires inode eviction. XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the inode if no other additional references are taken. We lift this flag to the VFS layer and change the behavior slightly by allowing the flag to remain even if multiple references are taken. This will expedite the eviction of inodes to change S_DAX. Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04fs/xfs: Make DAX mount option a tri-stateIra Weiny
As agreed upon[1]. We make the dax mount option a tri-state. '-o dax' continues to operate the same. We add 'always', 'never', and 'inode' (default). [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/ Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYSIra Weiny
In prep for the new tri-state mount option which then introduces XFS_MOUNT_DAX_NEVER. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-16xfs: move inode flush to the sync workqueueDarrick J. Wong
Move the inode dirty data flushing to a workqueue so that multiple threads can take advantage of a single thread's flushing work. The ratelimiting technique used in bdd4ee4 was not successful, because threads that skipped the inode flush scan due to ratelimiting would ENOSPC early, which caused occasional (but noticeable) changes in behavior and sporadic fstest regressions. Therefore, make all the writer threads wait on a single inode flush, which eliminates both the stampeding hordes of flushers and the small window in which a write could fail with ENOSPC because it lost the ratelimit race after even another thread freed space. Fixes: c6425702f21e ("xfs: ratelimit inode flush on buffered write ENOSPC") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-31xfs: ratelimit inode flush on buffered write ENOSPCDarrick J. Wong
A customer reported rcu stalls and softlockup warnings on a computer with many CPU cores and many many more IO threads trying to write to a filesystem that is totally out of space. Subsequent analysis pointed to the many many IO threads calling xfs_flush_inodes -> sync_inodes_sb, which causes a lot of wb_writeback_work to be queued. The writeback worker spends so much time trying to wake the many many threads waiting for writeback completion that it trips the softlockup detector, and (in this case) the system automatically reboots. In addition, they complain that the lengthy xfs_flush_inodes scan traps all of those threads in uninterruptible sleep, which hampers their ability to kill the program or do anything else to escape the situation. If there's thousands of threads trying to write to files on a full filesystem, each of those threads will start separate copies of the inode flush scan. This is kind of pointless since we only need one scan, so rate limit the inode flush. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-27xfs: correctly acount for reclaimable slabsDave Chinner
The XFS inode item slab actually reclaimed by inode shrinker callbacks from the memory reclaim subsystem. These should be marked as reclaimable so the mm subsystem has the full picture of how much memory it can actually reclaim from the XFS slab caches. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-02-08Merge branch 'merge.nfs-fs_parse.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs file system parameter updates from Al Viro: "Saner fs_parser.c guts and data structures. The system-wide registry of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is the horror switch() in fs_parse() that would have to grow another case every time something got added to that system-wide registry. New syntax types can be added by filesystems easily now, and their namespace is that of functions - not of system-wide enum members. IOW, they can be shared or kept private and if some turn out to be widely useful, we can make them common library helpers, etc., without having to do anything whatsoever to fs_parse() itself. And we already get that kind of requests - the thing that finally pushed me into doing that was "oh, and let's add one for timeouts - things like 15s or 2h". If some filesystem really wants that, let them do it. Without somebody having to play gatekeeper for the variants blessed by direct support in fs_parse(), TYVM. Quite a bit of boilerplate is gone. And IMO the data structures make a lot more sense now. -200LoC, while we are at it" * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits) tmpfs: switch to use of invalfc() cgroup1: switch to use of errorfc() et.al. procfs: switch to use of invalfc() hugetlbfs: switch to use of invalfc() cramfs: switch to use of errofc() et.al. gfs2: switch to use of errorfc() et.al. fuse: switch to use errorfc() et.al. ceph: use errorfc() and friends instead of spelling the prefix out prefix-handling analogues of errorf() and friends turn fs_param_is_... into functions fs_parse: handle optional arguments sanely fs_parse: fold fs_parameter_desc/fs_parameter_spec fs_parser: remove fs_parameter_description name field add prefix to fs_context->log ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log new primitive: __fs_parse() switch rbd and libceph to p_log-based primitives struct p_log, variants of warnf() et.al. taking that one instead teach logfc() to handle prefices, give it saner calling conventions get rid of cg_invalf() ...
2020-02-07fs_parse: fold fs_parameter_desc/fs_parameter_specAl Viro
The former contains nothing but a pointer to an array of the latter... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07fs_parser: remove fs_parameter_description name fieldEric Sandeen
Unused now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-14xfs: fix s_maxbytes computation on 32-bit kernelsDarrick J. Wong
I observed a hang in generic/308 while running fstests on a i686 kernel. The hang occurred when trying to purge the pagecache on a large sparse file that had a page created past MAX_LFS_FILESIZE, which caused an integer overflow in the pagecache xarray and resulted in an infinite loop. I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in commit 0cc3b0ec23ce ("Clarify (and fix) MAX_LFS_FILESIZE macros") so that it is now one page short of the maximum page index on 32-bit kernels. Because the XFS function to compute max offset open-codes the 2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform any sanity checking of s_maxbytes, the code in generic/308 can create a page above the pagecache's limit and kaboom. Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and aborting the mount with a warning if our assumptions ever break. I have no answer for why this seems to have been broken for years and nobody noticed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-18xfs: Remove kmem_zone_destroy() wrapperCarlos Maiolino
Use kmem_cache_destroy directly Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-18xfs: Remove slab init wrappersCarlos Maiolino
Remove kmem_zone_init() and kmem_zone_init_flags() together with their specific KM_* to SLAB_* flag wrappers. Use kmem_cache_create() directly. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-10xfs: remove a stray tab in xfs_remount_rw()Dan Carpenter
The extra tab makes the code slightly confusing. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-06xfs: remove redundant assignment to variable errorColin Ian King
Variable error is being initialized with a value that is never read and is being re-assigned a couple of statements later on. The assignment is redundant and hence can be removed. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: fold xfs_mount-alloc() into xfs_init_fs_context()Ian Kent
After switching to use the mount-api the only remaining caller of xfs_mount_alloc() is xfs_init_fs_context(), so fold xfs_mount_alloc() into it. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: move xfs_fc_parse_param() above xfs_fc_get_tree()Ian Kent
Grouping the options parsing and mount handling functions above the struct fs_context_operations but below the struct super_operations should improve (some) the grouping of the super operations while also improving the grouping of the options parsing and mount handling code. Lastly move xfs_fc_parse_param() and related functions down to above xfs_fc_get_tree() and it's related functions. But leave the options enum, struct fs_parameter_spec and the struct fs_parameter_description declarations at the top since that's the logical place for them. This is a straight code move, there aren't any functional changes. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: move xfs_fc_get_tree() above xfs_fc_reconfigure()Ian Kent
Grouping the options parsing and mount handling functions above the struct fs_context_operations but below the struct super_operations should improve (some) the grouping of the super operations while also improving the grouping of the options parsing and mount handling code. Now move xfs_fc_get_tree() and friends, also take the oppertunity to change STATIC to static for the xfs_fs_put_super() function. This is a straight code move, there aren't any functional changes. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: move xfs_fc_reconfigure() above xfs_fc_free()Ian Kent
Grouping the options parsing and mount handling functions above the struct fs_context_operations but below the struct super_operations should improve (some) the grouping of the super operations while also improving the grouping of the options parsing and mount handling code. Start by moving xfs_fc_reconfigure() and friends. This is a straight code move, there aren't any functional changes. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: switch to use the new mount-apiIan Kent
Define the struct fs_parameter_spec table that's used by the new mount-api for options parsing. Create the various fs context operations methods and define the fs_context_operations struct. Create the fs context initialization method and update the struct file_system_type to utilize it. The initialization function is responsible for working storage initialization, allocation and initialization of file system private information storage and for setting the operations in the fs context. Also set struct file_system_type .parameters to the newly defined struct fs_parameter_spec options parsing table for use by the fs context methods and remove unused code. [darrick: add a comment pointing out the one place where mp->m_super is null] Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: dont set sb in xfs_mount_alloc()Ian Kent
When changing to use the new mount api the super block won't be available when the xfs_mount struct is allocated so move setting the super block in xfs_mount to xfs_fs_fill_super(). Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: move xfs_parseargs() validation to a helperIan Kent
Move the validation code of xfs_parseargs() into a helper for later use within the mount context methods. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: refactor xfs_parseags()Ian Kent
Refactor xfs_parseags(), move the entire token case block to a separate function in an attempt to highlight the code that actually changes in converting to use the new mount api. Also change the break in the switch to a return in the factored out xfs_fc_parse_param() function. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: avoid redundant checks when options is emptyIan Kent
When options passed to xfs_parseargs() is NULL the checks performed after taking the branch are made with the initial values of dsunit, dswidth and iosizelog. But all the checks do nothing in this case so return immediately instead. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: refactor suffix_kstrtoint()Ian Kent
The mount-api doesn't have a "human unit" parse type yet so the options that have values like "10k" etc. still need to be converted by the fs. But the value comes to the fs as a string (not a substring_t type) so there's a need to change the conversion function to take a character string instead. When xfs is switched to use the new mount-api match_kstrtoint() will no longer be used and will be removed. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: add xfs_remount_ro() helperIan Kent
Factor the remount read only code into a helper to simplify the subsequent change from the super block method .remount_fs to the mount-api fs_context_operations method .reconfigure. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: add xfs_remount_rw() helperIan Kent
Factor the remount read write code into a helper to simplify the subsequent change from the super block method .remount_fs to the mount-api fs_context_operations method .reconfigure. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: merge freeing of mp names and mpIan Kent
In all cases when struct xfs_mount (mp) fields m_rtname and m_logname are freed mp is also freed, so merge these into a single function xfs_mount_free() Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: use kmem functions for struct xfs_mountIan Kent
The remount function uses the kmem functions for allocating and freeing struct xfs_mount, for consistency use the kmem functions everwhere for struct xfs_mount. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>