summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-05-04Merge tag 'for-linus-5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull JFFS2, UBI and UBIFS updates from Richard Weinberger: "JFFS2: - Use splice_write() - Fix for a slab-out-of-bounds bug UBI: - Fix for clang related warnings - Code cleanup UBIFS: - Fix for inode rebirth at replay - Set s_uuid - Use zstd for default filesystem" * tag 'for-linus-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Remove unnecessary struct declaration jffs2: Hook up splice_write callback jffs2: avoid Wempty-body warnings jffs2: Fix kasan slab-out-of-bounds problem ubi: Fix fall-through warnings for Clang ubifs: Report max LEB count at mount time ubifs: Set s_uuid in super block to support ima/evm uuid options ubifs: Default to zstd compression ubifs: Only check replay with inode type to judge if inode linked
2021-05-04Merge tag 'f2fs-for-5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we added a new mount option, "checkpoint_merge", which introduces a kernel thread dealing with the f2fs checkpoints. Once we start to manage the IO priority along with blk-cgroup, the checkpoint operation can be processed in a lower priority under the process context. Since the checkpoint holds all the filesystem operations, we give a higher priority to the checkpoint thread all the time. Enhancements: - introduce gc_merge mount option to introduce a checkpoint thread - improve to run discard thread efficiently - allow modular compression algorithms - expose # of overprivision segments to sysfs - expose runtime compression stat to sysfs Bug fixes: - fix OOB memory access by the node id lookup - avoid touching checkpointed data in the checkpoint-disabled mode - fix the resizing flow to avoid kernel panic and race conditions - fix block allocation issues on pinned files - address some swapfile issues - fix hugtask problem and kernel panic during atomic write operations - don't start checkpoint thread in RO And, we've cleaned up some kernel coding style and build warnings. In addition, we fixed some minor race conditions and error handling routines" * tag 'f2fs-for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (48 commits) f2fs: drop inplace IO if fs status is abnormal f2fs: compress: remove unneed check condition f2fs: clean up left deprecated IO trace codes f2fs: avoid using native allocate_segment_by_default() f2fs: remove unnecessary struct declaration f2fs: fix to avoid NULL pointer dereference f2fs: avoid duplicated codes for cleanup f2fs: document: add description about compressed space handling f2fs: clean up build warnings f2fs: fix the periodic wakeups of discard thread f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block() f2fs: fix to avoid GC/mmap race with f2fs_truncate() f2fs: set checkpoint_merge by default f2fs: Fix a hungtask problem in atomic write f2fs: fix to restrict mount condition on readonly block device f2fs: introduce gc_merge mount option f2fs: fix to cover __allocate_new_section() with curseg_lock f2fs: fix wrong alloc_type in f2fs_do_replace_block f2fs: delete empty compress.h f2fs: fix a typo in inode.c ...
2021-05-04Merge tag 'm68knommu-for-v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: - a fix for interrupt number range checking for the ColdFire SIMR interrupt controller. - changes for the binfmt_flat binary loader to allow RISC-V nommu support it needs to be able to accept flat binaries that have no gap between the text and data sections. * tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: coldfire: fix irq ranges riscv: Disable data start offset in flat binaries binfmt_flat: allow not offsetting data start
2021-05-03Merge tag 'trace-v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New feature: - A new "func-no-repeats" option in tracefs/options directory. When set the function tracer will detect if the current function being traced is the same as the previous one, and instead of recording it, it will keep track of the number of times that the function is repeated in a row. And when another function is recorded, it will write a new event that shows the function that repeated, the number of times it repeated and the time stamp of when the last repeated function occurred. Enhancements: - In order to implement the above "func-no-repeats" option, the ring buffer timestamp can now give the accurate timestamp of the event as it is being recorded, instead of having to record an absolute timestamp for all events. This helps the histogram code which no longer needs to waste ring buffer space. - New validation logic to make sure all trace events that access dereferenced pointers do so in a safe way, and will warn otherwise. Fixes: - No longer limit the PIDs of tasks that are recorded for "saved_cmdlines" to PID_MAX_DEFAULT (32768), as systemd now allows for a much larger range. This caused the mapping of PIDs to the task names to be dropped for all tasks with a PID greater than 32768. - Change trace_clock_global() to never block. This caused a deadlock. Clean ups: - Typos, prototype fixes, and removing of duplicate or unused code. - Better management of ftrace_page allocations" * tag 'trace-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (32 commits) tracing: Restructure trace_clock_global() to never block tracing: Map all PIDs to command lines ftrace: Reuse the output of the function tracer for func_repeats tracing: Add "func_no_repeats" option for function tracing tracing: Unify the logic for function tracing options tracing: Add method for recording "func_repeats" events tracing: Add "last_func_repeats" to struct trace_array tracing: Define new ftrace event "func_repeats" tracing: Define static void trace_print_time() ftrace: Simplify the calculation of page number for ftrace_page->records some more ftrace: Store the order of pages allocated in ftrace_page tracing: Remove unused argument from "ring_buffer_time_stamp() tracing: Remove duplicate struct declaration in trace_events.h tracing: Update create_system_filter() kernel-doc comment tracing: A minor cleanup for create_system_filter() kernel: trace: Mundane typo fixes in the file trace_events_filter.c tracing: Fix various typos in comments scripts/recordmcount.pl: Make vim and emacs indent the same scripts/recordmcount.pl: Make indent spacing consistent tracing: Add a verifier to check string pointers for trace events ...
2021-05-03Merge branch 'work.file' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull receive_fd update from Al Viro: "Cleanup of receive_fd mess" * 'work.file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: split receive_fd_replace from __receive_fd
2021-05-02Merge tag 'for-linus-5.13-ofs-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "orangefs: implement orangefs_readahead mm/readahead.c/read_pages was quite a bit different back when I put my open-coded readahead logic into orangefs_readpage. That logic seemed to work as designed back then, it is a trainwreck now. This implements orangefs_readahead using the new xarray and readahead_expand features and removes all my open-coded readahead logic. This results in an extreme read performance improvement, these sample numbers are from my test VM: Here's an example of what's upstream in 5.11.8-200.fc33.x86_64: 30+0 records in 30+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 5.77943 s, 21.8 MB/s And here's this version of orangefs_readahead on top of 5.12.0-rc4: 30+0 records in 30+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 0.325919 s, 386 MB/s There are four xfstest regressions with this patch. David Howells and Matthew Wilcox have been helping me work with this code" * tag 'for-linus-5.13-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: leave files in the page cache for a few micro seconds at least Orangef: implement orangefs_readahead.
2021-05-02Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff all over the place" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: useful constants: struct qstr for ".." hostfs_open(): don't open-code file_dentry() whack-a-mole: kill strlen_user() (again) autofs: should_expire() argument is guaranteed to be positive apparmor:match_mn() - constify devpath argument buffer: a small optimization in grow_buffers get rid of autofs_getpath() constify dentry argument of dentry_path()/dentry_path_raw()
2021-05-02Merge branch 'work.ecryptfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull exryptfs updates from Al Viro: "The interesting part here is (ecryptfs) lock_parent() fixes - its treatment of ->d_parent had been very wrong. The rest is trivial cleanups" * 'work.ecryptfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs: ecryptfs_dentry_info->crypt_stat is never used ecryptfs: get rid of unused accessors ecryptfs: saner API for lock_parent() ecryptfs: get rid of pointless dget/dput in ->symlink() and ->link()
2021-05-01Merge tag 'landlock_v34' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull Landlock LSM from James Morris: "Add Landlock, a new LSM from Mickaël Salaün. Briefly, Landlock provides for unprivileged application sandboxing. From Mickaël's cover letter: "The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [1], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil. In this current form, Landlock misses some access-control features. This enables to minimize this patch series and ease review. This series still addresses multiple use cases, especially with the combined use of seccomp-bpf: applications with built-in sandboxing, init systems, security sandbox tools and security-oriented APIs [2]" The cover letter and v34 posting is here: https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/ See also: https://landlock.io/ This code has had extensive design discussion and review over several years" Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1] Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2] * tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: landlock: Enable user space to infer supported features landlock: Add user and kernel documentation samples/landlock: Add a sandbox manager example selftests/landlock: Add user space tests landlock: Add syscall implementations arch: Wire up Landlock syscalls fs,security: Add sb_delete hook landlock: Support filesystem access-control LSM: Infrastructure management of the superblock landlock: Add ptrace restrictions landlock: Set up the security framework and manage credentials landlock: Add ruleset and domain management landlock: Add object management
2021-05-01afs: Fix speculative status fetchesDavid Howells
The generic/464 xfstest causes kAFS to emit occasional warnings of the form: kAFS: vnode modified {100055:8a} 30->31 YFS.StoreData64 (c=6015) This indicates that the data version received back from the server did not match the expected value (the DV should be incremented monotonically for each individual modification op committed to a vnode). What is happening is that a lookup call is doing a bulk status fetch speculatively on a bunch of vnodes in a directory besides getting the status of the vnode it's actually interested in. This is racing with a StoreData operation (though it could also occur with, say, a MakeDir op). On the client, a modification operation locks the vnode, but the bulk status fetch only locks the parent directory, so no ordering is imposed there (thereby avoiding an avenue to deadlock). On the server, the StoreData op handler doesn't lock the vnode until it's received all the request data, and downgrades the lock after committing the data until it has finished sending change notifications to other clients - which allows the status fetch to occur before it has finished. This means that: - a status fetch can access the target vnode either side of the exclusive section of the modification - the status fetch could start before the modification, yet finish after, and vice-versa. - the status fetch and the modification RPCs can complete in either order. - the status fetch can return either the before or the after DV from the modification. - the status fetch might regress the locally cached DV. Some of these are handled by the previous fix[1], but that's not sufficient because it checks the DV it received against the DV it cached at the start of the op, but the DV might've been updated in the meantime by a locally generated modification op. Fix this by the following means: (1) Keep track of when we're performing a modification operation on a vnode. This is done by marking vnode parameters with a 'modification' note that causes the AFS_VNODE_MODIFYING flag to be set on the vnode for the duration. (2) Alter the speculation race detection to ignore speculative status fetches if either the vnode is marked as being modified or the data version number is not what we expected. Note that whilst the "vnode modified" warning does get recovered from as it causes the client to refetch the status at the next opportunity, it will also invalidate the pagecache, so changes might get lost. Fixes: a9e5c87ca744 ("afs: Fix speculative status fetch going out of order wrt to modifications") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-and-reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/160605082531.252452.14708077925602709042.stgit@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/linux-fsdevel/161961335926.39335.2552653972195467566.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New features for ext4 this cycle include support for encrypted casefold, ensure that deleted file names are cleared in directory blocks by zeroing directory entries when they are unlinked or moved as part of a hash tree node split. We also improve the block allocator's performance on a freshly mounted file system by prefetching block bitmaps. There are also the usual cleanups and bug fixes, including fixing a page cache invalidation race when there is mixed buffered and direct I/O and the block size is less than page size, and allow the dax flag to be set and cleared on inline directories" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits) ext4: wipe ext4_dir_entry2 upon file deletion ext4: Fix occasional generic/418 failure fs: fix reporting supported extra file attributes for statx() ext4: allow the dax flag to be set and cleared on inline directories ext4: fix debug format string warning ext4: fix trailing whitespace ext4: fix various seppling typos ext4: fix error return code in ext4_fc_perform_commit() ext4: annotate data race in jbd2_journal_dirty_metadata() ext4: annotate data race in start_this_handle() ext4: fix ext4_error_err save negative errno into superblock ext4: fix error code in ext4_commit_super ext4: always panic when errors=panic is specified ext4: delete redundant uptodate check for buffer ext4: do not set SB_ACTIVE in ext4_orphan_cleanup() ext4: make prefetch_block_bitmaps default ext4: add proc files to monitor new structures ext4: improve cr 0 / cr 1 group scanning ext4: add MB_NUM_ORDERS macro ext4: add mballoc stats proc file ...
2021-04-30Merge tag 'dlm-5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This includes more dlm networking cleanups and improvements for making dlm shutdowns more robust" * tag 'dlm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: fix missing unlock on error in accept_from_sock() fs: dlm: add shutdown hook fs: dlm: flush swork on shutdown fs: dlm: remove unaligned memory access handling fs: dlm: check on minimum msglen size fs: dlm: simplify writequeue handling fs: dlm: use GFP_ZERO for page buffer fs: dlm: change allocation limits fs: dlm: add check if dlm is currently running fs: dlm: add errno handling to check callback fs: dlm: set subclass for othercon sock_mutex fs: dlm: set connected bit after accept fs: dlm: fix mark setting deadlock fs: dlm: fix debugfs dump
2021-04-30Merge tag 'fuse-update-5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix a page locking bug in write (introduced in 2.6.26) - Allow sgid bit to be killed in setacl() - Miscellaneous fixes and cleanups * tag 'fuse-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: cuse: simplify refcount cuse: prevent clone virtiofs: fix userns virtiofs: remove useless function virtiofs: split requests that exceed virtqueue size virtiofs: fix memory leak in virtio_fs_probe() fuse: invalidate attrs when page writeback completes fuse: add a flag FUSE_SETXATTR_ACL_KILL_SGID to kill SGID fuse: extend FUSE_SETXATTR request fuse: fix matching of FUSE_DEV_IOC_CLONE command fuse: fix a typo fuse: don't zero pages twice fuse: fix typo for fuse_conn.max_pages comment fuse: fix write deadlock
2021-04-30Merge tag 'ovl-update-5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: - Fix a regression introduced in 5.2 that resulted in valid overlayfs mounts being rejected with ELOOP (Too many levels of symbolic links) - Fix bugs found by various tools - Miscellaneous improvements and cleanups * tag 'ovl-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: add debug print to ovl_do_getxattr() ovl: invalidate readdir cache on changes to dir with origin ovl: allow upperdir inside lowerdir ovl: show "userxattr" in the mount data ovl: trivial typo fixes in the file inode.c ovl: fix misspellings using codespell tool ovl: do not copy attr several times ovl: remove ovl_map_dev_ino() return value ovl: fix error for ovl_fill_super() ovl: fix missing revert_creds() on error path ovl: fix leaked dentry ovl: restrict lower null uuid for "xino=auto" ovl: check that upperdir path is not on a read-only mount ovl: plumb through flush method
2021-04-30Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"Brian Geffon
This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99. As discussed in [1] this commit was a no-op because the mapping type was checked in vma_to_resize before move_vma is ever called. This meant that vm_ops->mremap() would never be called on such mappings. Furthermore, we've since expanded support of MREMAP_DONTUNMAP to non-anonymous mappings, and these special mappings are still protected by the existing check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL. 1. https://lkml.org/lkml/2020/12/28/2340 Link: https://lkml.kernel.org/r/20210323182520.2712101-2-bgeffon@google.com Signed-off-by: Brian Geffon <bgeffon@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alejandro Colomar <alx.manpages@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30iomap: use filemap_range_needs_writeback() for O_DIRECT readsJens Axboe
For reads, use the better variant of checking for the need to call filemap_write_and_wait_range() when doing O_DIRECT. This avoids falling back to the slow path for IOCB_NOWAIT, if there are no pages to wait for (or write out). Link: https://lkml.kernel.org/r/20210224164455.1096727-4-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30vfs: fs_parser: clean up kernel-doc warningsRandy Dunlap
Fix kernel-doc notation function arguments to eliminate two kernel-doc warnings: fs_parser.c:322: warning: Excess function parameter 'name' description in 'validate_constant_table' fs_parser.c:367: warning: Function parameter or member 'name' not described in 'fs_validate_description' Link: https://lkml.kernel.org/r/20210407033743.9701-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2/dlm: remove unused functionJiapeng Chong
Fix the following clang warning: fs/ocfs2/dlm/dlmrecovery.c:129:20: warning: unused function 'dlm_reset_recovery' [-Wunused-function]. Link: https://lkml.kernel.org/r/1618382761-5784-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2: fix a typoBhaskar Chowdhury
s/cluter/cluster/ Link: https://lkml.kernel.org/r/20210324072931.5056-1-unixbhaskar@gmail.com Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2: map flags directly in flags_to_o2dlm()Joseph Qi
Use macro map_flag() is tricky and coccicheck outputs the following warning: fs/ocfs2/stack_o2cb.c:69:5-16: Unneeded variable: "o2dlm_flags" So map flags directly in flags_to_o2dlm() to make coccicheck happy. And remove BUG_ON() here as well to simplify code since it runs well a long time. Link: https://lkml.kernel.org/r/1616138664-35935-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2: replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTEYang Li
Fix the following coccicheck warning: fs/ocfs2/blockcheck.c:232:0-23: WARNING: blockcheck_fops should be defined with DEFINE_DEBUGFS_ATTRIBUTE Link: https://lkml.kernel.org/r/1614155230-57292-1-git-send-email-yang.lee@linux.alibaba.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-29Merge tag 'kbuild-v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Evaluate $(call cc-option,...) etc. only for build targets - Add CONFIG_VMLINUX_MAP to generate .map file when linking vmlinux - Remove unnecessary --gcc-toolchains Clang flag because the --prefix flag finds the toolchains - Do not pass Clang's --prefix flag when using the integrated as - Check the assembler version in Kconfig time - Add new CONFIG options, AS_VERSION, AS_IS_GNU, AS_IS_LLVM to clean up some dependencies in Kconfig - Fix invalid Module.symvers creation when building only modules without vmlinux - Fix false-positive modpost warnings when CONFIG_TRIM_UNUSED_KSYMS is set, but there is no module to build - Refactor module installation Makefile - Support zstd for module compression - Convert alpha and ia64 to use generic shell scripts to generate the syscall headers - Add a new elfnote to indicate if the kernel was built with LTO, which will be used by pahole - Flatten the directory structure under include/config/ so CONFIG options and filenames match - Change the deb source package name from linux-$(KERNELRELEASE) to linux-upstream * tag 'kbuild-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (42 commits) kbuild: Add $(KBUILD_HOSTLDFLAGS) to 'has_libelf' test kbuild: deb-pkg: change the source package name to linux-upstream tools: do not include scripts/Kbuild.include kbuild: redo fake deps at include/config/*.h kbuild: remove TMPO from try-run MAINTAINERS: add pattern for dummy-tools kbuild: add an elfnote for whether vmlinux is built with lto ia64: syscalls: switch to generic syscallhdr.sh ia64: syscalls: switch to generic syscalltbl.sh alpha: syscalls: switch to generic syscallhdr.sh alpha: syscalls: switch to generic syscalltbl.sh sysctl: use min() helper for namecmp() kbuild: add support for zstd compressed modules kbuild: remove CONFIG_MODULE_COMPRESS kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst kbuild: move module strip/compression code into scripts/Makefile.modinst kbuild: refactor scripts/Makefile.modinst kbuild: rename extmod-prefix to extmod_prefix kbuild: check module name conflict for external modules as well kbuild: show the target directory for depmod log ...
2021-04-29Merge tag 'net-next-5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - bpf: - allow bpf programs calling kernel functions (initially to reuse TCP congestion control implementations) - enable task local storage for tracing programs - remove the need to store per-task state in hash maps, and allow tracing programs access to task local storage previously added for BPF_LSM - add bpf_for_each_map_elem() helper, allowing programs to walk all map elements in a more robust and easier to verify fashion - sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT redirection - lpm: add support for batched ops in LPM trie - add BTF_KIND_FLOAT support - mostly to allow use of BTF on s390 which has floats in its headers files - improve BPF syscall documentation and extend the use of kdoc parsing scripts we already employ for bpf-helpers - libbpf, bpftool: support static linking of BPF ELF files - improve support for encapsulation of L2 packets - xdp: restructure redirect actions to avoid a runtime lookup, improving performance by 4-8% in microbenchmarks - xsk: build skb by page (aka generic zerocopy xmit) - improve performance of software AF_XDP path by 33% for devices which don't need headers in the linear skb part (e.g. virtio) - nexthop: resilient next-hop groups - improve path stability on next-hops group changes (incl. offload for mlxsw) - ipv6: segment routing: add support for IPv4 decapsulation - icmp: add support for RFC 8335 extended PROBE messages - inet: use bigger hash table for IP ID generation - tcp: deal better with delayed TX completions - make sure we don't give up on fast TCP retransmissions only because driver is slow in reporting that it completed transmitting the original - tcp: reorder tcp_congestion_ops for better cache locality - mptcp: - add sockopt support for common TCP options - add support for common TCP msg flags - include multiple address ids in RM_ADDR - add reset option support for resetting one subflow - udp: GRO L4 improvements - improve 'forward' / 'frag_list' co-existence with UDP tunnel GRO, allowing the first to take place correctly even for encapsulated UDP traffic - micro-optimize dev_gro_receive() and flow dissection, avoid retpoline overhead on VLAN and TEB GRO - use less memory for sysctls, add a new sysctl type, to allow using u8 instead of "int" and "long" and shrink networking sysctls - veth: allow GRO without XDP - this allows aggregating UDP packets before handing them off to routing, bridge, OvS, etc. - allow specifing ifindex when device is moved to another namespace - netfilter: - nft_socket: add support for cgroupsv2 - nftables: add catch-all set element - special element used to define a default action in case normal lookup missed - use net_generic infra in many modules to avoid allocating per-ns memory unnecessarily - xps: improve the xps handling to avoid potential out-of-bound accesses and use-after-free when XPS change race with other re-configuration under traffic - add a config knob to turn off per-cpu netdev refcnt to catch underflows in testing Device APIs: - add WWAN subsystem to organize the WWAN interfaces better and hopefully start driving towards more unified and vendor- independent APIs - ethtool: - add interface for reading IEEE MIB stats (incl. mlx5 and bnxt support) - allow network drivers to dump arbitrary SFP EEPROM data, current offset+length API was a poor fit for modern SFP which define EEPROM in terms of pages (incl. mlx5 support) - act_police, flow_offload: add support for packet-per-second policing (incl. offload for nfp) - psample: add additional metadata attributes like transit delay for packets sampled from switch HW (and corresponding egress and policy-based sampling in the mlxsw driver) - dsa: improve support for sandwiched LAGs with bridge and DSA - netfilter: - flowtable: use direct xmit in topologies with IP forwarding, bridging, vlans etc. - nftables: counter hardware offload support - Bluetooth: - improvements for firmware download w/ Intel devices - add support for reading AOSP vendor capabilities - add support for virtio transport driver - mac80211: - allow concurrent monitor iface and ethernet rx decap - set priority and queue mapping for injected frames - phy: add support for Clause-45 PHY Loopback - pci/iov: add sysfs MSI-X vector assignment interface to distribute MSI-X resources to VFs (incl. mlx5 support) New hardware/drivers: - dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit interfaces. - dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and BCM63xx switches - Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches - ath11k: support for QCN9074 a 802.11ax device - Bluetooth: Broadcom BCM4330 and BMC4334 - phy: Marvell 88X2222 transceiver support - mdio: add BCM6368 MDIO mux bus controller - r8152: support RTL8153 and RTL8156 (USB Ethernet) chips - mana: driver for Microsoft Azure Network Adapter (MANA) - Actions Semi Owl Ethernet MAC - can: driver for ETAS ES58X CAN/USB interfaces Pure driver changes: - add XDP support to: enetc, igc, stmmac - add AF_XDP support to: stmmac - virtio: - page_to_skb() use build_skb when there's sufficient tailroom (21% improvement for 1000B UDP frames) - support XDP even without dedicated Tx queues - share the Tx queues with the stack when necessary - mlx5: - flow rules: add support for mirroring with conntrack, matching on ICMP, GTP, flex filters and more - support packet sampling with flow offloads - persist uplink representor netdev across eswitch mode changes - allow coexistence of CQE compression and HW time-stamping - add ethtool extended link error state reporting - ice, iavf: support flow filters, UDP Segmentation Offload - dpaa2-switch: - move the driver out of staging - add spanning tree (STP) support - add rx copybreak support - add tc flower hardware offload on ingress traffic - ionic: - implement Rx page reuse - support HW PTP time-stamping - octeon: support TC hardware offloads - flower matching on ingress and egress ratelimitting. - stmmac: - add RX frame steering based on VLAN priority in tc flower - support frame preemption (FPE) - intel: add cross time-stamping freq difference adjustment - ocelot: - support forwarding of MRP frames in HW - support multiple bridges - support PTP Sync one-step timestamping - dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like learning, flooding etc. - ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350, SC7280 SoCs) - mt7601u: enable TDLS support - mt76: - add support for 802.3 rx frames (mt7915/mt7615) - mt7915 flash pre-calibration support - mt7921/mt7663 runtime power management fixes" * tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits) net: selftest: fix build issue if INET is disabled net: netrom: nr_in: Remove redundant assignment to ns net: tun: Remove redundant assignment to ret net: phy: marvell: add downshift support for M88E1240 net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255 net/sched: act_ct: Remove redundant ct get and check icmp: standardize naming of RFC 8335 PROBE constants bpf, selftests: Update array map tests for per-cpu batched ops bpf: Add batched ops support for percpu array bpf: Implement formatted output helpers with bstr_printf seq_file: Add a seq_bprintf function sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues net:nfc:digital: Fix a double free in digital_tg_recv_dep_req net: fix a concurrency bug in l2tp_tunnel_register() net/smc: Remove redundant assignment to rc mpls: Remove redundant assignment to err llc2: Remove redundant assignment to rc net/tls: Remove redundant initialization of record rds: Remove redundant assignment to nr_sig dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0 ...
2021-04-29Merge tag 'fsnotify_for_v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: - support for limited fanotify functionality for unpriviledged users - faster merging of fanotify events - a few smaller fsnotify improvements * tag 'fsnotify_for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: shmem: allow reporting fanotify events with file handles on tmpfs fs: introduce a wrapper uuid_to_fsid() fanotify_user: use upper_32_bits() to verify mask fanotify: support limited functionality for unprivileged users fanotify: configurable limits via sysfs fanotify: limit number of event merge attempts fsnotify: use hash table for faster events merge fanotify: mix event info and pid into merge key hash fanotify: reduce event objectid to 29-bit hash fsnotify: allow fsnotify_{peek,remove}_first_event with empty queue
2021-04-29Merge tag 'for_v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota, ext2, reiserfs updates from Jan Kara: - support for path (instead of device) based quotactl syscall (quotactl_path(2)) - ext2 conversion to kmap_local() - other minor cleanups & fixes * tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs/reiserfs/journal.c: delete useless variables fs/ext2: Replace kmap() with kmap_local_page() ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry() fs/ext2/: fix misspellings using codespell tool quota: report warning limits for realtime space quotas quota: wire up quotactl_path quota: Add mountpath based quota support
2021-04-29Merge tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "The notable user-visible addition this cycle is ability to remove space from the last AG in a filesystem. This is the first of many changes needed for full-fledged support for shrinking a filesystem. Still needed are (a) the ability to reorganize files and metadata away from the end of the fs; (b) the ability to remove entire allocation groups; (c) shrink support for realtime volumes; and (d) thorough testing of (a-c). There are a number of performance improvements in this code drop: Dave streamlined various parts of the buffer logging code and reduced the cost of various debugging checks, and added the ability to pre-create the xattr structures while creating files. Brian eliminated transaction reservations that were being held across writeback (thus reducing livelock potential. Other random pieces: Pavel fixed the repetitve warnings about deprecated mount options, I fixed online fsck to behave itself when a readonly remount comes in during scrub, and refactored various other parts of that code, Christoph contributed a lot of refactoring this cycle. The xfs_icdinode structure has been absorbed into the (incore) xfs_inode structure, and the format and flags handling around xfs_inode_fork structures has been simplified. Chandan provided a number of fixes for extent count overflow related problems that have been shaken out by debugging knobs added during 5.12. Summary: - Various minor fixes in online scrub. - Prevent metadata files from being automatically inactivated. - Validate btree heights by the computed per-btree limits. - Don't warn about remounting with deprecated mount options. - Initialize attr forks at create time if we suspect we're going to need to store them. - Reduce memory reallocation workouts in the logging code. - Fix some theoretical math calculation errors in logged buffers that span multiple discontig memory ranges but contiguous ondisk regions. - Speedups in dirty buffer bitmap handling. - Make type verifier functions more inline-happy to reduce overhead. - Reduce debug overhead in directory checking code. - Many many typo fixes. - Begin to handle the permanent loss of the very end of a filesystem. - Fold struct xfs_icdinode into xfs_inode. - Deprecate the long defunct BMV_IF_NO_DMAPI_READ from the bmapx ioctl. - Remove a broken directory block format check from online scrub. - Fix a bug where we could produce an unnecessarily tall data fork btree when creating an attr fork. - Fix scrub and readonly remounts racing. - Fix a writeback ioend log deadlock problem by dropping the behavior where we could preallocate a setfilesize transaction. - Fix some bugs in the new extent count checking code. - Fix some bugs in the attr fork preallocation code. - Refactor if_flags out of the incore inode fork data structure" * tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (77 commits) xfs: remove xfs_quiesce_attr declaration xfs: remove XFS_IFEXTENTS xfs: remove XFS_IFINLINE xfs: remove XFS_IFBROOT xfs: only look at the fork format in xfs_idestroy_fork xfs: simplify xfs_attr_remove_args xfs: rename and simplify xfs_bmap_one_block xfs: move the XFS_IFEXTENTS check into xfs_iread_extents xfs: drop unnecessary setfilesize helper xfs: drop unused ioend private merge and setfilesize code xfs: open code ioend needs workqueue helper xfs: drop submit side trans alloc for append ioends xfs: fix return of uninitialized value in variable error xfs: get rid of the ip parameter to xchk_setup_* xfs: fix scrub and remount-ro protection when running scrub xfs: move the check for post-EOF mappings into xfs_can_free_eofblocks xfs: move the xfs_can_free_eofblocks call under the IOLOCK xfs: precalculate default inode attribute offset xfs: default attr fork size does not handle device inodes xfs: inode fork allocation depends on XFS_IFEXTENT flag ...
2021-04-29Merge tag 'gfs2-for-5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix some compiler and kernel-doc warnings - Various minor cleanups and optimizations - Add a new sysfs gfs2 status file with some filesystem wide information * tag 'gfs2-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix fall-through warnings for Clang gfs2: Fix a number of kernel-doc warnings gfs2: Make gfs2_setattr_simple static gfs2: Add new sysfs file for gfs2 status gfs2: Silence possible null pointer dereference warning gfs2: Turn gfs2_meta_indirect_buffer into gfs2_meta_buffer gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extent gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extent gfs2: Add new gfs2_iomap_get helper gfs2: Remove unused variable sb_format gfs2: Fix dir.c function parameter descriptions gfs2: Eliminate gh parameter from go_xmote_bh func gfs2: don't create empty buffers for NO_CREATE
2021-04-29Merge tag 'exfat-for-5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Improve write performance with dirsync mount option - Improve lookup performance - Add support for FITRIM ioctl - Fix a bug with discard option * tag 'exfat-for-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: speed up iterate/lookup by fixing start point of traversing cluster chain exfat: improve write performance when dirsync enabled exfat: add support ioctl and FITRIM function exfat: introduce bitmap_lock for cluster bitmap access exfat: fix erroneous discard when clear cluster bit
2021-04-29orangefs: leave files in the page cache for a few micro seconds at leastMike Marshall
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2021-04-28Merge tag 'for-5.13/io_uring-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: - Support for multi-shot mode for POLL requests - More efficient reference counting. This is shamelessly stolen from the mm side. Even though referencing is mostly single/dual user, the 128 count was retained to keep the code the same. Maybe this should/could be made generic at some point. - Removal of the need to have a manager thread for each ring. The manager threads only job was checking and creating new io-threads as needed, instead we handle this from the queue path. - Allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE. Since 5.12, this thread is "just" a regular application thread, so no need to restrict use of it anymore. - Cleanup of how internal async poll data lifetime is managed. - Fix for syzbot reported crash on SQPOLL cancelation. - Make buffer registration more like file registrations, which includes flexibility in avoiding full set unregistration and re-registration. - Fix for io-wq affinity setting. - Be a bit more defensive in task->pf_io_worker setup. - Various SQPOLL fixes. - Cleanup of SQPOLL creds handling. - Improvements to in-flight request tracking. - File registration cleanups. - Tons of cleanups and little fixes * tag 'for-5.13/io_uring-2021-04-27' of git://git.kernel.dk/linux-block: (156 commits) io_uring: maintain drain logic for multishot poll requests io_uring: Check current->io_uring in io_uring_cancel_sqpoll io_uring: fix NULL reg-buffer io_uring: simplify SQPOLL cancellations io_uring: fix work_exit sqpoll cancellations io_uring: Fix uninitialized variable up.resv io_uring: fix invalid error check after malloc io_uring: io_sq_thread() no longer needs to reset current->pf_io_worker kernel: always initialize task->pf_io_worker to NULL io_uring: update sq_thread_idle after ctx deleted io_uring: add full-fledged dynamic buffers support io_uring: implement fixed buffers registration similar to fixed files io_uring: prepare fixed rw for dynanic buffers io_uring: keep table of pointers to ubufs io_uring: add generic rsrc update with tags io_uring: add IORING_REGISTER_RSRC io_uring: enumerate dynamic resources io_uring: add generic path for rsrc update io_uring: preparation for rsrc tagging io_uring: decouple CQE filling from requests ...
2021-04-28Merge tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - MD changes via Song: - raid5 POWER fix - raid1 failure fix - UAF fix for md cluster - mddev_find_or_alloc() clean up - Fix NULL pointer deref with external bitmap - Performance improvement for raid10 discard requests - Fix missing information of /proc/mdstat - rsxx const qualifier removal (Arnd) - Expose allocated brd pages (Calvin) - rnbd via Gioh Kim: - Change maintainer - Change domain address of maintainers' email - Add polling IO mode and document update - Fix memory leak and some bug detected by static code analysis tools - Code refactoring - Series of floppy cleanups/fixes (Denis) - s390 dasd fixes (Julian) - kerneldoc fixes (Lee) - null_blk double free (Lv) - null_blk virtual boundary addition (Max) - Remove xsysace driver (Michal) - umem driver removal (Davidlohr) - ataflop fixes (Dan) - Revalidate disk removal (Christoph) - Bounce buffer cleanups (Christoph) - Mark lightnvm as deprecated (Christoph) - mtip32xx init cleanups (Shixin) - Various fixes (Tian, Gustavo, Coly, Yang, Zhang, Zhiqiang) * tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block: (143 commits) async_xor: increase src_offs when dropping destination page drivers/block/null_blk/main: Fix a double free in null_init. md/raid1: properly indicate failure when ending a failed write request md-cluster: fix use-after-free issue when removing rdev nvme: introduce generic per-namespace chardev nvme: cleanup nvme_configure_apst nvme: do not try to reconfigure APST when the controller is not live nvme: add 'kato' sysfs attribute nvme: sanitize KATO setting nvmet: avoid queuing keep-alive timer if it is disabled brd: expose number of allocated pages in debugfs ataflop: fix off by one in ataflop_probe() ataflop: potential out of bounds in do_format() drbd: Fix fall-through warnings for Clang block/rnbd: Use strscpy instead of strlcpy block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name block/rnbd-clt: Remove max_segment_size block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev Documentation/ABI/rnbd-clt: Add description for nr_poll_queues ...
2021-04-28Merge tag 'for-5.13/block-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "Pretty quiet round this time, which is nice. In detail: - Series revamping bounce buffer support (Christoph) - Dead code removal (Christoph, Bart) - Partition iteration revamp, now using xarray (Christoph) - Passthrough request scheduler improvements (Lin) - Series of BFQ improvements (Paolo) - Fix ioprio task iteration (Peter) - Various little tweaks and fixes (Tejun, Saravanan, Bhaskar, Max, Nikolay)" * tag 'for-5.13/block-2021-04-27' of git://git.kernel.dk/linux-block: (41 commits) blk-iocost: don't ignore vrate_min on QD contention blk-mq: Fix spurious debugfs directory creation during initialization bfq/mq-deadline: remove redundant check for passthrough request blk-mq: bypass IO scheduler's limit_depth for passthrough request block: Remove an obsolete comment from sg_io() block: move bio_list_copy_data to pktcdvd block: remove zero_fill_bio_iter block: add queue_to_disk() to get gendisk from request_queue block: remove an incorrect check from blk_rq_append_bio block: initialize ret in bdev_disk_changed block: Fix sys_ioprio_set(.which=IOPRIO_WHO_PGRP) task iteration block: remove disk_part_iter block: simplify diskstats_show block: simplify show_partition block: simplify printk_all_partitions block: simplify partition_overlaps block: simplify partition removal block: take bd_mutex around delete_partitions in del_gendisk block: refactor blk_drop_partitions block: move more syncing and invalidation to delete_partition ...
2021-04-28Merge tag 'sched-core-2021-04-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Clean up SCHED_DEBUG: move the decades old mess of sysctl, procfs and debugfs interfaces to a unified debugfs interface. - Signals: Allow caching one sigqueue object per task, to improve performance & latencies. - Improve newidle_balance() irq-off latencies on systems with a large number of CPU cgroups. - Improve energy-aware scheduling - Improve the PELT metrics for certain workloads - Reintroduce select_idle_smt() to improve load-balancing locality - but without the previous regressions - Add 'scheduler latency debugging': warn after long periods of pending need_resched. This is an opt-in feature that requires the enabling of the LATENCY_WARN scheduler feature, or the use of the resched_latency_warn_ms=xx boot parameter. - CPU hotplug fixes for HP-rollback, and for the 'fail' interface. Fix remaining balance_push() vs. hotplug holes/races - PSI fixes, plus allow /proc/pressure/ files to be written by CAP_SYS_RESOURCE tasks as well - Fix/improve various load-balancing corner cases vs. capacity margins - Fix sched topology on systems with NUMA diameter of 3 or above - Fix PF_KTHREAD vs to_kthread() race - Minor rseq optimizations - Misc cleanups, optimizations, fixes and smaller updates * tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) cpumask/hotplug: Fix cpu_dying() state tracking kthread: Fix PF_KTHREAD vs to_kthread() race sched/debug: Fix cgroup_path[] serialization sched,psi: Handle potential task count underflow bugs more gracefully sched: Warn on long periods of pending need_resched sched/fair: Move update_nohz_stats() to the CONFIG_NO_HZ_COMMON block to simplify the code & fix an unused function warning sched/debug: Rename the sched_debug parameter to sched_verbose sched,fair: Alternative sched_slice() sched: Move /proc/sched_debug to debugfs sched,debug: Convert sysctl sched_domains to debugfs debugfs: Implement debugfs_create_str() sched,preempt: Move preempt_dynamic to debug.c sched: Move SCHED_DEBUG sysctl to debugfs sched: Don't make LATENCYTOP select SCHED_DEBUG sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG sched: Use cpu_dying() to fix balance_push vs hotplug-rollback cpumask: Introduce DYING mask cpumask: Make cpu_{online,possible,present,active}() inline rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs() ...
2021-04-28Merge tag 'perf-core-2021-04-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event updates from Ingo Molnar: - Improve Intel uncore PMU support: - Parse uncore 'discovery tables' - a new hardware capability enumeration method introduced on the latest Intel platforms. This table is in a well-defined PCI namespace location and is read via MMIO. It is organized in an rbtree. These uncore tables will allow the discovery of standard counter blocks, but fancier counters still need to be enumerated explicitly. - Add Alder Lake support - Improve IIO stacks to PMON mapping support on Skylake servers - Add Intel Alder Lake PMU support - which requires the introduction of 'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big') and Gracemont ('small' - Atom derived) cores. The CPU-side feature set is entirely symmetrical - but on the PMU side there's core type dependent PMU functionality. - Reduce data loss with CPU level hardware tracing on Intel PT / AUX profiling, by fixing the AUX allocation watermark logic. - Improve ring buffer allocation on NUMA systems - Put 'struct perf_event' into their separate kmem_cache pool - Add support for synchronous signals for select perf events. The immediate motivation is to support low-overhead sampling-based race detection for user-space code. The feature consists of the following main changes: - Add thread-only event inheritance via perf_event_attr::inherit_thread, which limits inheritance of events to CLONE_THREAD. - Add the ability for events to not leak through exec(), via perf_event_attr::remove_on_exec. - Allow the generation of SIGTRAP via perf_event_attr::sigtrap, extend siginfo with an u64 ::si_perf, and add the breakpoint information to ::si_addr and ::si_perf if the event is PERF_TYPE_BREAKPOINT. The siginfo support is adequate for breakpoints right now - but the new field can be used to introduce support for other types of metadata passed over siginfo as well. - Misc fixes, cleanups and smaller updates. * tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) signal, perf: Add missing TRAP_PERF case in siginfo_layout() signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures perf/x86: Allow for 8<num_fixed_counters<16 perf/x86/rapl: Add support for Intel Alder Lake perf/x86/cstate: Add Alder Lake CPU support perf/x86/msr: Add Alder Lake CPU support perf/x86/intel/uncore: Add Alder Lake support perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE perf/x86/intel: Add Alder Lake Hybrid support perf/x86: Support filter_match callback perf/x86/intel: Add attr_update for Hybrid PMUs perf/x86: Add structures for the attributes of Hybrid PMUs perf/x86: Register hybrid PMUs perf/x86: Factor out x86_pmu_show_pmu_cap perf/x86: Remove temporary pmu assignment in event_init perf/x86/intel: Factor out intel_pmu_check_extra_regs perf/x86/intel: Factor out intel_pmu_check_event_constraints perf/x86/intel: Factor out intel_pmu_check_num_counters perf/x86: Hybrid PMU support for extra_regs perf/x86: Hybrid PMU support for event constraints ...
2021-04-28Orangef: implement orangefs_readahead.Mike Marshall
Also remove open-coded readahead logic from orangefs_readpage. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2021-04-27Merge tag 'printk-for-5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Stop synchronizing kernel log buffer readers by logbuf_lock. As a result, the access to the buffer is fully lockless now. Note that printk() itself still uses locks because it tries to flush the messages to the console immediately. Also the per-CPU temporary buffers are still there because they prevent infinite recursion and serialize backtraces from NMI. All this is going to change in the future. - kmsg_dump API rework and cleanup as a side effect of the logbuf_lock removal. - Make bstr_printf() aware that %pf and %pF formats could deference the given pointer. - Show also page flags by %pGp format. - Clarify the documentation for plain pointer printing. - Do not show no_hash_pointers warning multiple times. - Update Senozhatsky email address. - Some clean up. * tag 'printk-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (24 commits) lib/vsprintf.c: remove leftover 'f' and 'F' cases from bstr_printf() printk: clarify the documentation for plain pointer printing kernel/printk.c: Fixed mundane typos printk: rename vprintk_func to vprintk vsprintf: dump full information of page flags in pGp mm, slub: don't combine pr_err with INFO mm, slub: use pGp to print page flags MAINTAINERS: update Senozhatsky email address lib/vsprintf: do not show no_hash_pointers message multiple times printk: console: remove unnecessary safe buffer usage printk: kmsg_dump: remove _nolock() variants printk: remove logbuf_lock printk: introduce a kmsg_dump iterator printk: kmsg_dumper: remove @active field printk: add syslog_lock printk: use atomic64_t for devkmsg_user.seq printk: use seqcount_latch for clear_seq printk: introduce CONSOLE_LOG_MAX printk: consolidate kmsg_dump_get_buffer/syslog_print_all code printk: refactor kmsg_dump_get_buffer() ...
2021-04-27seq_file: Add a seq_bprintf functionFlorent Revest
Similarly to seq_buf_bprintf in lib/seq_buf.c, this function writes a printf formatted string with arguments provided in a "binary representation" built by functions such as vbin_printf. Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210427174313.860948-2-revest@chromium.org
2021-04-27Merge tag 'selinux-pr-20210426' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add support for measuring the SELinux state and policy capabilities using IMA. - A handful of SELinux/NFS patches to compare the SELinux state of one mount with a set of mount options. Olga goes into more detail in the patch descriptions, but this is important as it allows more flexibility when using NFS and SELinux context mounts. - Properly differentiate between the subjective and objective LSM credentials; including support for the SELinux and Smack. My clumsy attempt at a proper fix for AppArmor didn't quite pass muster so John is working on a proper AppArmor patch, in the meantime this set of patches shouldn't change the behavior of AppArmor in any way. This change explains the bulk of the diffstat beyond security/. - Fix a problem where we were not properly terminating the permission list for two SELinux object classes. * tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: add proper NULL termination to the secclass_map permissions smack: differentiate between subjective and objective task credentials selinux: clarify task subjective and objective credentials lsm: separate security_task_getsecid() into subjective and objective variants nfs: account for selinux security context when deciding to share superblock nfs: remove unneeded null check in nfs_fill_super() lsm,selinux: add new hook to compare new mount to an existing mount selinux: fix misspellings using codespell tool selinux: fix misspellings using codespell tool selinux: measure state and policy capabilities selinux: Allow context mounts for unpriviliged overlayfs
2021-04-27Merge tag 'afs-netfs-lib-20210426' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS updates from David Howells: "Use the new netfs lib. Begin the process of overhauling the use of the fscache API by AFS and the introduction of support for features such as Transparent Huge Pages (THPs). - Add some support for THPs, including using core VM helper functions to find details of pages. - Use the ITER_XARRAY I/O iterator to mediate access to the pagecache as this handles THPs and doesn't require allocation of large bvec arrays. - Delegate address_space read/pre-write I/O methods for AFS to the netfs helper library. A method is provided to the library that allows it to issue a read against the server. This includes a change in use for PG_fscache (it now indicates a DIO write in progress from the marked page), so a number of waits need to be deployed for it. - Split the core AFS writeback function to make it easier to modify in future patches to handle writing to the cache. [This might feasibly make more sense moved out into my fscache-iter branch]. I've tested these with "xfstests -g quick" against an AFS volume (xfstests needs patching to make it work). With this, AFS without a cache passes all expected xfstests; with a cache, there's an extra failure, but that's also there before these patches. Fixing that probably requires a greater overhaul (as can be found on my fscache-iter branch, but that's for a later time). Thanks should go to Marc Dionne and Jeff Altman of AuriStor for exercising the patches in their test farm also" Link: https://lore.kernel.org/lkml/3785063.1619482429@warthog.procyon.org.uk/ * tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Use the netfs_write_begin() helper afs: Use new netfs lib read helper API afs: Use the fs operation ops to handle FetchData completion afs: Prepare for use of THPs afs: Extract writeback extension into its own function afs: Wait on PG_fscache before modifying/releasing a page afs: Use ITER_XARRAY for writing afs: Set up the iov_iter before calling afs_extract_data() afs: Log remote unmarshalling errors afs: Don't truncate iter during data fetch afs: Move key to afs_read struct afs: Print the operation debug_id when logging an unexpected data version afs: Pass page into dirty region helpers to provide THP size afs: Disable use of the fscache I/O routines
2021-04-27Merge tag 'netfs-lib-20210426' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull network filesystem helper library updates from David Howells: "Here's a set of patches for 5.13 to begin the process of overhauling the local caching API for network filesystems. This set consists of two parts: (1) Add a helper library to handle the new VM readahead interface. This is intended to be used unconditionally by the filesystem (whether or not caching is enabled) and provides a common framework for doing caching, transparent huge pages and, in the future, possibly fscrypt and read bandwidth maximisation. It also allows the netfs and the cache to align, expand and slice up a read request from the VM in various ways; the netfs need only provide a function to read a stretch of data to the pagecache and the helper takes care of the rest. (2) Add an alternative fscache/cachfiles I/O API that uses the kiocb facility to do async DIO to transfer data to/from the netfs's pages, rather than using readpage with wait queue snooping on one side and vfs_write() on the other. It also uses less memory, since it doesn't do buffered I/O on the backing file. Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available to be read from the cache. Whilst this is an improvement from the bmap interface, it still has a problem with regard to a modern extent-based filesystem inserting or removing bridging blocks of zeros. Fixing that requires a much greater overhaul. This is a step towards overhauling the fscache API. The change is opt-in on the part of the network filesystem. A netfs should not try to mix the old and the new API because of conflicting ways of handling pages and the PG_fscache page flag and because it would be mixing DIO with buffered I/O. Further, the helper library can't be used with the old API. This does not change any of the fscache cookie handling APIs or the way invalidation is done at this time. In the near term, I intend to deprecate and remove the old I/O API (fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(), fscache_write_page() and fscache_uncache_page()) and eventually replace most of fscache/cachefiles with something simpler and easier to follow. This patchset contains the following parts: - Some helper patches, including provision of an ITER_XARRAY iov iterator and a function to do readahead expansion. - Patches to add the netfs helper library. - A patch to add the fscache/cachefiles kiocb API. - A pair of patches to fix some review issues in the ITER_XARRAY and read helpers as spotted by Al and Willy. Jeff Layton has patches to add support in Ceph for this that he intends for this merge window. I have a set of patches to support AFS that I will post a separate pull request for. With this, AFS without a cache passes all expected xfstests; with a cache, there's an extra failure, but that's also there before these patches. Fixing that probably requires a greater overhaul. Ceph also passes the expected tests. I also have patches in a separate branch to tidy up the handling of PG_fscache/PG_private_2 and their contribution to page refcounting in the core kernel here, but I haven't included them in this set and will route them separately" Link: https://lore.kernel.org/lkml/3779937.1619478404@warthog.procyon.org.uk/ * tag 'netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: netfs: Miscellaneous fixes iov_iter: Four fixes for ITER_XARRAY fscache, cachefiles: Add alternate API to use kiocb for read/write to cache netfs: Add a tracepoint to log failures that would be otherwise unseen netfs: Define an interface to talk to a cache netfs: Add write_begin helper netfs: Gather stats netfs: Add tracepoints netfs: Provide readahead and readpage netfs helpers netfs, mm: Add set/end/wait_on_page_fscache() aliases netfs, mm: Move PG_fscache helper funcs to linux/netfs.h netfs: Documentation for helper library netfs: Make a netfs helper module mm: Implement readahead_control pageset expansion mm/readahead: Handle ractl nr_pages being modified fs: Document file_ra_state mm/filemap: Pass the file_ra_state in the ractl mm: Add set/end/wait functions for PG_private_2 iov_iter: Add ITER_XARRAY
2021-04-27Merge tag 'fs.idmapped.helpers.v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fs mapping helper updates from Christian Brauner: "This adds kernel-doc to all new idmapping helpers and improves their naming which was triggered by a discussion with some fs developers. Some of the names are based on suggestions by Vivek and Al. Also remove the open-coded permission checking in a few places with simple helpers. Overall this should lead to more clarity and make it easier to maintain" * tag 'fs.idmapped.helpers.v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fs: introduce two inode i_{u,g}id initialization helpers fs: introduce fsuidgid_has_mapping() helper fs: document and rename fsid helpers fs: document mapping helpers
2021-04-27Merge tag 'fs.idmapped.docs.v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fs helper kernel-doc updates from Christian Brauner: "In the last cycles we forgot to update the kernel-docs in some places that were changed during the idmapped mount work. Lukas and Randy took the chance to not just fixup those places but also fixup and expand kernel-docs for some additional helpers. No functional changes" * tag 'fs.idmapped.docs.v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fs: update kernel-doc for vfs_rename() fs: turn some comments into kernel-doc xattr: fix kernel-doc for mnt_userns and vfs xattr helpers namei: fix kernel-doc for struct renamedata and more libfs: fix kernel-doc for mnt_userns
2021-04-27Merge tag 'iomap-5.13-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap update from Darrick Wong: "A single patch to the iomap code, which augments what gets logged when someone tries to swapon an unacceptable swap file. (Yes, this is a continuation of the swapfile drama from last season...)" * tag 'iomap-5.13-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: improve the warnings from iomap_swapfile_activate
2021-04-27Merge branch 'miklos.fileattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fileattr conversion updates from Miklos Szeredi via Al Viro: "This splits the handling of FS_IOC_[GS]ETFLAGS from ->ioctl() into a separate method. The interface is reasonably uniform across the filesystems that support it and gives nice boilerplate removal" * 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) ovl: remove unneeded ioctls fuse: convert to fileattr fuse: add internal open/release helpers fuse: unsigned open flags fuse: move ioctl to separate source file vfs: remove unused ioctl helpers ubifs: convert to fileattr reiserfs: convert to fileattr ocfs2: convert to fileattr nilfs2: convert to fileattr jfs: convert to fileattr hfsplus: convert to fileattr efivars: convert to fileattr xfs: convert to fileattr orangefs: convert to fileattr gfs2: convert to fileattr f2fs: convert to fileattr ext4: convert to fileattr ext2: convert to fileattr btrfs: convert to fileattr ...
2021-04-27Merge branch 'work.coredump' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull coredump updates from Al Viro: "Just a couple of patches this cycle: use of seek + write instead of expanding truncate and minor header cleanup" * 'work.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump.h: move CONFIG_COREDUMP-only stuff inside the ifdef coredump: don't bother with do_truncate()
2021-04-27Merge branch 'work.inode-type-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs inode type handling updates from Al Viro: "We should never change the type bits of ->i_mode or the method tables (->i_op and ->i_fop) of a live inode. Unfortunately, not all filesystems took care to prevent that" * 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: spufs: fix bogosity in S_ISGID handling 9p: missing chunk of "fs/9p: Don't update file type when updating file attributes" openpromfs: don't do unlock_new_inode() until the new inode is set up hostfs_mknod(): don't bother with init_special_inode() cifs: have cifs_fattr_to_inode() refuse to change type on live inode cifs: have ->mkdir() handle race with another client sanely do_cifs_create(): don't set ->i_mode of something we had not created gfs2: be careful with inode refresh ocfs2_inode_lock_update(): make sure we don't change the type bits of i_mode orangefs_inode_is_stale(): i_mode type bits do *not* form a bitmap... vboxsf: don't allow to change the inode type afs: Fix updating of i_mode due to 3rd party change ceph: don't allow type or device number to change on non-I_NEW inodes ceph: fix up error handling with snapdirs new helper: inode_wrong_type()
2021-04-27Merge tag 'cfi-v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull CFI on arm64 support from Kees Cook: "This builds on last cycle's LTO work, and allows the arm64 kernels to be built with Clang's Control Flow Integrity feature. This feature has happily lived in Android kernels for almost 3 years[1], so I'm excited to have it ready for upstream. The wide diffstat is mainly due to the treewide fixing of mismatched list_sort prototypes. Other things in core kernel are to address various CFI corner cases. The largest code portion is the CFI runtime implementation itself (which will be shared by all architectures implementing support for CFI). The arm64 pieces are Acked by arm64 maintainers rather than coming through the arm64 tree since carrying this tree over there was going to be awkward. CFI support for x86 is still under development, but is pretty close. There are a handful of corner cases on x86 that need some improvements to Clang and objtool, but otherwise works well. Summary: - Clean up list_sort prototypes (Sami Tolvanen) - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)" * tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: allow CONFIG_CFI_CLANG to be selected KVM: arm64: Disable CFI for nVHE arm64: ftrace: use function_nocfi for ftrace_call arm64: add __nocfi to __apply_alternatives arm64: add __nocfi to functions that jump to a physical address arm64: use function_nocfi with __pa_symbol arm64: implement function_nocfi psci: use function_nocfi for cpu_resume lkdtm: use function_nocfi treewide: Change list_sort to use const pointers bpf: disable CFI in dispatcher functions kallsyms: strip ThinLTO hashes from static functions kthread: use WARN_ON_FUNCTION_MISMATCH workqueue: use WARN_ON_FUNCTION_MISMATCH module: ensure __cfi_check alignment mm: add generic function_nocfi macro cfi: add __cficanonical add support for Clang CFI
2021-04-27Merge tag 'pstore-v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: - Add mem_type property to expand support for >2 memory types (Mukesh Ojha) * tag 'pstore-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: Add mem_type property DT parsing support
2021-04-27Merge tag 'seccomp-v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - Fix "cacheable" typo in comments (Cui GaoSheng) - Fix CONFIG for /proc/$pid/status Seccomp_filters (Kenta.Tada@sony.com) * tag 'seccomp-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Fix "cacheable" typo in comments seccomp: Fix CONFIG tests for Seccomp_filters
2021-04-27io_uring: maintain drain logic for multishot poll requestsHao Xu
Now that we have multishot poll requests, one SQE can emit multiple CQEs. given below example: sqe0(multishot poll)-->sqe1-->sqe2(drain req) sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0 is a multishot poll request, sqe2 may be issued after sqe0's event triggered twice before sqe1 completed. This isn't what users leverage drain requests for. Here the solution is to wait for multishot poll requests fully completed. To achieve this, we should reconsider the req_need_defer equation, the original one is: all_sqes(excluding dropped ones) == all_cqes(including dropped ones) This means we issue a drain request when all the previous submitted SQEs have generated their CQEs. Now we should consider multishot requests, we deduct all the multishot CQEs except the cancellation one, In this way a multishot poll request behave like a normal request, so: all_sqes == all_cqes - multishot_cqes(except cancellations) Here we introduce cq_extra for it. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/1618298439-136286-1-git-send-email-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>