summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-05-25fs: dlm: add midcomms debugfs functionalityAlexander Aring
This patch adds functionality to debug midcomms per connection state inside a comms directory which is similar like dlm configfs. Currently there exists the possibility to read out two attributes which is the send queue counter and the version of each midcomms node state. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add reliable connection if reconnectAlexander Aring
This patch introduce to make a tcp lowcomms connection reliable even if reconnects occurs. This is done by an application layer re-transmission handling and sequence numbers in dlm protocols. There are three new dlm commands: DLM_OPTS: This will encapsulate an existing dlm message (and rcom message if they don't have an own application side re-transmission handling). As optional handling additional tlv's (type length fields) can be appended. This can be for example a sequence number field. However because in DLM_OPTS the lockspace field is unused and a sequence number is a mandatory field it isn't made as a tlv and we put the sequence number inside the lockspace id. The possibility to add optional options are still there for future purposes. DLM_ACK: Just a dlm header to acknowledge the receive of a DLM_OPTS message to it's sender. DLM_FIN: This provides a 4 way handshake for connection termination inclusive support for half-closed connections. It's provided on application layer because SCTP doesn't support half-closed sockets, the shutdown() call can interrupted by e.g. TCP resets itself and a hard logic to implement it because the othercon paradigm in lowcomms. The 4-way termination handshake also solve problems to synchronize peer EOF arrival and that the cluster manager removes the peer in the node membership handling of DLM. In some cases messages can be still transmitted in this time and we need to wait for the node membership event. To provide a reliable connection the node will retransmit all unacknowledges message to it's peer on reconnect. The receiver will then filtering out the next received message and drop all messages which are duplicates. As RCOM_STATUS and RCOM_NAMES messages are the first messages which are exchanged and they have they own re-transmission handling, there exists logic that these messages must be first. If these messages arrives we store the dlm version field. This handling is on DLM 3.1 and after this patch 3.2 the same. A backwards compatibility handling has been added which seems to work on tests without tcpkill, however it's not recommended to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long term bugs in the DLM protocol. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add union in dlm header for lockspace idAlexander Aring
This patch adds union inside the lockspace id to handle it also for another use case for a different dlm command. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: move out some hash functionalityAlexander Aring
This patch moves out some lowcomms hash functionality into lowcomms header to provide them to other layers like midcomms as well. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add functionality to re-transmit a messageAlexander Aring
This patch introduces a retransmit functionality for a lowcomms message handle. It's just allocates a new buffer and transmit it again, no special handling about prioritize it because keeping bytestream in order. To avoid another connection look some refactor was done to make a new buffer allocation with a preexisting connection pointer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: make buffer handling per msgAlexander Aring
This patch makes the void pointer handle for lowcomms functionality per message and not per page allocation entry. A refcount handling for the handle was added to keep the message alive until the user doesn't need it anymore. There exists now a per message callback which will be called when allocating a new buffer. This callback will be guaranteed to be called according the order of the sending buffer, which can be used that the caller increments a sequence number for the dlm message handle. For transition process we cast the dlm_mhandle to dlm_msg and vice versa until the midcomms layer will implement a specific dlm_mhandle structure. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add more midcomms hooksAlexander Aring
This patch prepares hooks to redirect to the midcomms layer which will be used by the midcomms re-transmit handling. There exists the new concept of stateless buffers allocation and commits. This can be used to bypass the midcomms re-transmit handling. It is used by RCOM_STATUS and RCOM_NAMES messages, because they have their own ping-like re-transmit handling. As well these two messages will be used to determine the DLM version per node, because these two messages are per observation the first messages which are exchanged. Cluster manager events for node membership are added to add support for half-closed connections in cases that the peer connection get to an end of file but DLM still holds membership of the node. In this time DLM can still trigger new message which we should allow. After the cluster manager node removal event occurs it safe to close the connection. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: public header in out utilityAlexander Aring
This patch allows to use header_out() and header_in() outside of dlm util functionality. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: fix connection tcp EOF handlingAlexander Aring
This patch fixes the EOF handling for TCP that if and EOF is received we will close the socket next time the writequeue runs empty. This is a half-closed socket functionality which doesn't exists in SCTP. The midcomms layer will do a half closed socket functionality on DLM side to solve this problem for the SCTP case. However there is still the last ack flying around but other reset functionality will take care of it if it got lost. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: cancel work sync otherconAlexander Aring
These rx tx flags arguments are for signaling close_connection() from which worker they are called. Obviously the receive worker cannot cancel itself and vice versa for swork. For the othercon the receive worker should only be used, however to avoid deadlocks we should pass the same flags as the original close_connection() was called. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: reconnect if socket error report occursAlexander Aring
This patch will change the reconnect handling that if an error occurs if a socket error callback is occurred. This will also handle reconnects in a non blocking connecting case which is currently missing. If error ECONNREFUSED is reported we delay the reconnect by one second. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: set is othercon flagAlexander Aring
There is a is othercon flag which is never used, this patch will set it and printout a warning if the othercon ever sends a dlm message which should never be the case. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: fix srcu read lock usageAlexander Aring
This patch holds the srcu connection read lock in cases where we lookup the connections and accessing it. We don't hold the srcu lock in workers function where the scheduled worker is part of the connection itself. The connection should not be freed if any worker is scheduled or pending. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add dlm macros for ratelimit logAlexander Aring
This patch add ratelimit macro to dlm subsystem and will set the connecting log message to ratelimit. In non blocking connecting cases it will print out this message a lot. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: always run complete for possible waitersAlexander Aring
This patch changes the ping_members() result that we always run complete() for possible waiters. We handle the -EINTR error code as successful. This error code is returned if the recovery is stopped which is likely that a new recovery is triggered with a new members configuration and ping_members() runs again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-09Merge tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small SMB3 chmultichannel related changesets (also for stable) from the SMB3 test event this week. The other fixes are still in review/testing" * tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6: smb3: if max_channels set to more than one channel request multichannel smb3: do not attempt multichannel to server which does not support it smb3: when mounting with multichannel include it in requested capabilities
2021-05-08Merge tag 'kbuild-v5.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - Convert sh and sparc to use generic shell scripts to generate the syscall headers - refactor .gitignore files - Update kernel/config_data.gz only when the content of the .config is really changed, which avoids the unneeded re-link of vmlinux - move "remove stale files" workarounds to scripts/remove-stale-files - suppress unused-but-set-variable warnings by default for Clang as well - fix locale setting LANG=C to LC_ALL=C - improve 'make distclean' - always keep intermediate objects from scripts/link-vmlinux.sh - move IF_ENABLED out of <linux/kconfig.h> to make it self-contained - misc cleanups * tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits) linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h> kbuild: Don't remove link-vmlinux temporary files on exit/signal kbuild: remove the unneeded comments for external module builds kbuild: make distclean remove tag files in sub-directories kbuild: make distclean work against $(objtree) instead of $(srctree) kbuild: refactor modname-multi by using suffix-search kbuild: refactor fdtoverlay rule kbuild: parameterize the .o part of suffix-search arch: use cross_compiling to check whether it is a cross build or not kbuild: remove ARCH=sh64 support from top Makefile .gitignore: prefix local generated files with a slash kbuild: replace LANG=C with LC_ALL=C Makefile: Move -Wno-unused-but-set-variable out of GCC only block kbuild: add a script to remove stale generated files kbuild: update config_data.gz only when the content of .config is changed .gitignore: ignore only top-level modules.builtin .gitignore: move tags and TAGS close to other tag files kernel/.gitgnore: remove stale timeconst.h and hz.bc usr/include: refactor .gitignore genksyms: fix stale comment ...
2021-05-08smb3: if max_channels set to more than one channel request multichannelSteve French
Mounting with "multichannel" is obviously implied if user requested more than one channel on mount (ie mount parm max_channels>1). Currently both have to be specified. Fix that so that if max_channels is greater than 1 on mount, enable multichannel rather than silently falling back to non-multichannel. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-By: Tom Talpey <tom@talpey.com> Cc: <stable@vger.kernel.org> # v5.11+ Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
2021-05-08smb3: do not attempt multichannel to server which does not support itSteve French
We were ignoring CAP_MULTI_CHANNEL in the server response - if the server doesn't support multichannel we should not be attempting it. See MS-SMB2 section 3.2.5.2 Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-By: Tom Talpey <tom@talpey.com> Cc: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-08smb3: when mounting with multichannel include it in requested capabilitiesSteve French
In the SMB3/SMB3.1.1 negotiate protocol request, we are supposed to advertise CAP_MULTICHANNEL capability when establishing multiple channels has been requested by the user doing the mount. See MS-SMB2 sections 2.2.3 and 3.2.5.2 Without setting it there is some risk that multichannel could fail if the server interpreted the field strictly. Reviewed-By: Tom Talpey <tom@talpey.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-07Merge tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - dasd spelling fixes (Bhaskar) - Limit bio max size on multi-page bvecs to the hardware limit, to avoid overly large bio's (and hence latencies). Originally queued for the merge window, but needed a fix and was dropped from the initial pull (Changheun) - NVMe pull request (Christoph): - reset the bdev to ns head when failover (Daniel Wagner) - remove unsupported command noise (Keith Busch) - misc passthrough improvements (Kanchan Joshi) - fix controller ioctl through ns_head (Minwoo Im) - fix controller timeouts during reset (Tao Chiu) - rnbd fixes/cleanups (Gioh, Md, Dima) - Fix iov_iter re-expansion (yangerkun) * tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-block: block: reexpand iov_iter after read/write nvmet: remove unsupported command noise nvme-multipath: reset bdev to ns head when failover nvme-pci: fix controller reset hang when racing with nvme_timeout nvme: move the fabrics queue ready check routines to core nvme: avoid memset for passthrough requests nvme: add nvme_get_ns helper nvme: fix controller ioctl through ns_head bio: limit bio max size RDMA/rtrs: fix uninitialized symbol 'cnt' s390: dasd: Mundane spelling fixes block/rnbd: Remove all likely and unlikely block/rnbd-clt: Check the return value of the function rtrs_clt_query block/rnbd: Fix style issues block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
2021-05-07Merge tag 'io_uring-5.13-2021-05-07' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Mostly fixes for merge window merged code. In detail: - Error case memory leak fixes (Colin, Zqiang) - Add the tools/io_uring/ to the list of maintained files (Lukas) - Set of fixes for the modified buffer registration API (Pavel) - Sanitize io thread setup on x86 (Stefan) - Ensure we truncate transfer count for registered buffers (Thadeu)" * tag 'io_uring-5.13-2021-05-07' of git://git.kernel.dk/linux-block: x86/process: setup io_threads more like normal user space threads MAINTAINERS: add io_uring tool to IO_URING io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers io_uring: Fix memory leak in io_sqe_buffers_register() io_uring: Fix premature return from loop and memory leak io_uring: fix unchecked error in switch_start() io_uring: allow empty slots for reg buffers io_uring: add more build check for uapi io_uring: dont overlap internal and user req flags io_uring: fix drain with rsrc CQEs
2021-05-07Merge tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Add validation of the UDP retrans parameter to prevent shift out-of-bounds - Don't discard pNFS layout segments that are marked for return Bugfixes: - Fix a NULL dereference crash in xprt_complete_bc_request() when the NFSv4.1 server misbehaves. - Fix the handling of NFS READDIR cookie verifiers - Sundry fixes to ensure attribute revalidation works correctly when the server does not return post-op attributes. - nfs4_bitmask_adjust() must not change the server global bitmasks - Fix major timeout handling in the RPC code. - NFSv4.2 fallocate() fixes. - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling - Copy offload attribute revalidation fixes - Fix an incorrect filehandle size check in the pNFS flexfiles driver - Fix several RDMA transport setup/teardown races - Fix several RDMA queue wrapping issues - Fix a misplaced memory read barrier in sunrpc's call_decode() Features: - Micro optimisation of the TCP transmission queue using TCP_CORK - statx() performance improvements by further splitting up the tracking of invalid cached file metadata. - Support the NFSv4.2 'change_attr_type' attribute and use it to optimise handling of change attribute updates" * tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (85 commits) xprtrdma: Fix a NULL dereference in frwr_unmap_sync() sunrpc: Fix misplaced barrier in call_decode NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code. xprtrdma: Move fr_mr field to struct rpcrdma_mr xprtrdma: Move the Work Request union to struct rpcrdma_mr xprtrdma: Move fr_linv_done field to struct rpcrdma_mr xprtrdma: Move cqe to struct rpcrdma_mr xprtrdma: Move fr_cid to struct rpcrdma_mr xprtrdma: Remove the RPC/RDMA QP event handler xprtrdma: Don't display r_xprt memory addresses in tracepoints xprtrdma: Add an rpcrdma_mr_completion_class xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation xprtrdma: Avoid Send Queue wrapping xprtrdma: Do not wake RPC consumer on a failed LocalInv xprtrdma: Do not recycle MR after FastReg/LocalInv flushes xprtrdma: Clarify use of barrier in frwr_wc_localinv_done() xprtrdma: Rename frwr_release_mr() xprtrdma: rpcrdma_mr_pop() already does list_del_init() xprtrdma: Delete rpcrdma_recv_buffer_put() xprtrdma: Fix cwnd update ordering ...
2021-05-07Merge tag '9p-for-5.13-rc1' of git://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: "An error handling fix and constification" * tag '9p-for-5.13-rc1' of git://github.com/martinetd/linux: fs: 9p: fix v9fs_file_open writeback fid error check 9p: Constify static struct v9fs_attr_group
2021-05-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: "This is everything else from -mm for this merge window. 90 patches. Subsystems affected by this patch series: mm (cleanups and slub), alpha, procfs, sysctl, misc, core-kernel, bitmap, lib, compat, checkpatch, epoll, isofs, nilfs2, hpfs, exit, fork, kexec, gcov, panic, delayacct, gdb, resource, selftests, async, initramfs, ipc, drivers/char, and spelling" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (90 commits) mm: fix typos in comments mm: fix typos in comments treewide: remove editor modelines and cruft ipc/sem.c: spelling fix fs: fat: fix spelling typo of values kernel/sys.c: fix typo kernel/up.c: fix typo kernel/user_namespace.c: fix typos kernel/umh.c: fix some spelling mistakes include/linux/pgtable.h: few spelling fixes mm/slab.c: fix spelling mistake "disired" -> "desired" scripts/spelling.txt: add "overflw" scripts/spelling.txt: Add "diabled" typo scripts/spelling.txt: add "overlfow" arm: print alloc free paths for address in registers mm/vmalloc: remove vwrite() mm: remove xlate_dev_kmem_ptr() drivers/char: remove /dev/kmem for good mm: fix some typos and code style problems ipc/sem.c: mundane typo fixes ...
2021-05-07treewide: remove editor modelines and cruftMasahiro Yamada
The section "19) Editor modelines and other cruft" in Documentation/process/coding-style.rst clearly says, "Do not include any of these in source files." I recently receive a patch to explicitly add a new one. Let's do treewide cleanups, otherwise some people follow the existing code and attempt to upstream their favoriate editor setups. It is even nicer if scripts/checkpatch.pl can check it. If we like to impose coding style in an editor-independent manner, I think editorconfig (patch [1]) is a saner solution. [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/ Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07fs: fat: fix spelling typo of valuesdingsenjie
vaules -> values Link: https://lkml.kernel.org/r/20210302034817.30384-1-dingsenjie@163.com Signed-off-by: dingsenjie <dingsenjie@yulong.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06Merge tag 'iomap-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more iomap updates from Darrick Wong: "Remove the now unused 'io_private' field from struct iomap_ioend, for a modest savings in memory allocation" * tag 'iomap-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: remove unused private field from ioend
2021-05-06Merge tag 'xfs-5.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more xfs updates from Darrick Wong: "Except for the timestamp struct renaming patches, everything else in here are bug fixes: - Rename the log timestamp struct. - Remove broken transaction counter debugging that wasn't working correctly on very old filesystems. - Various fixes to make pre-lazysbcount filesystems work properly again. - Fix a free space accounting problem where we neglected to consider free space btree blocks that track metadata reservation space when deciding whether or not to allow caller to reserve space for a metadata update. - Fix incorrect pagecache clearing behavior during FUNSHARE ops. - Don't allow log writes if the data device is readonly" * tag 'xfs-5.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't allow log writes if the data device is readonly xfs: fix xfs_reflink_unshare usage of filemap_write_and_wait_range xfs: set aside allocation btree blocks from block reservation xfs: introduce in-core global counter of allocbt blocks xfs: unconditionally read all AGFs on mounts with perag reservation xfs: count free space btree blocks when scrubbing pre-lazysbcount fses xfs: update superblock counters correctly for !lazysbcount xfs: don't check agf_btreeblks on pre-lazysbcount filesystems xfs: remove obsolete AGF counter debugging xfs: rename struct xfs_legacy_ictimestamp xfs: rename xfs_ictimestamp_t
2021-05-06hpfs: replace one-element array with flexible-array memberGustavo A. R. Silva
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Also, this helps with the ongoing efforts to enable -Warray-bounds by fixing the following warning: CC [M] fs/hpfs/dir.o fs/hpfs/dir.c: In function `hpfs_readdir': fs/hpfs/dir.c:163:41: warning: array subscript 1 is above array bounds of `u8[1]' {aka `unsigned char[1]'} [-Warray-bounds] 163 | || de ->name[0] != 1 || de->name[1] != 1)) | ~~~~~~~~^~~ [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/109 Link: https://lkml.kernel.org/r/20210326173510.GA81212@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06nilfs2: fix typos in commentsLu Jialin
numer -> number in fs/nilfs2/cpfile.c Decription -> Description in fs/nilfs2/ioctl.c isntance -> instance in fs/nilfs2/the_nilfs.c Link: https://lkml.kernel.org/r/1617942951-14631-1-git-send-email-konishi.ryusuke@gmail.com Link: https://lore.kernel.org/r/20210409022519.176988-1-lujialin4@huawei.com Signed-off-by: Lu Jialin <lujialin4@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06fs/nilfs2: fix misspellings using codespell toolLiu xuzhi
Two typos are found out by codespell tool \ in 2217th and 2254th lines of segment.c: $ codespell ./fs/nilfs2/ ./segment.c:2217 :retured ==> returned ./segment.c:2254: retured ==> returned Fix two typos found by codespell. Link: https://lkml.kernel.org/r/1617864087-8198-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Liu xuzhi <liu.xuzhi@zte.com.cn> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06isofs: fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Link: https://lkml.kernel.org/r/5b7caa73958588065fabc59032c340179b409ef5.1605896059.git.gustavoars@kernel.org Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06fs/epoll: restore waking from ep_done_scan()Davidlohr Bueso
Commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") changed the userspace visible behavior of exclusive waiters blocked on a common epoll descriptor upon a single event becoming ready. Previously, all tasks doing epoll_wait would awake, and now only one is awoken, potentially causing missed wakeups on applications that rely on this behavior, such as Apache Qpid. While the aforementioned commit aims at having only a wakeup single path in ep_poll_callback (with the exceptions of epoll_ctl cases), we need to restore the wakeup in what was the old ep_scan_ready_list() such that the next thread can be awoken, in a cascading style, after the waker's corresponding ep_send_events(). Link: https://lkml.kernel.org/r/20210405231025.33829-3-dave@stgolabs.net Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@akamai.com> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06proc/sysctl: fix function name error in commentszhouchuangao
The function name should be modified to register_sysctl_paths instead of register_sysctl_table_path. Link: https://lkml.kernel.org/r/1615807194-79646-1-git-send-email-zhouchuangao@vivo.com Signed-off-by: zhouchuangao <zhouchuangao@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06proc: delete redundant subset=pid checkAlexey Dobriyan
Two checks in lookup and readdir code should be enough to not have third check in open code. Can't open what can't be looked up? Link: https://lkml.kernel.org/r/YFYYwIBIkytqnkxP@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06proc: mandate ->proc_lseek in "struct proc_ops"Alexey Dobriyan
Now that proc_ops are separate from file_operations and other operations it easy to check all instances to have ->proc_lseek hook and remove check in main code. Note: nonseekable_open() files naturally don't require ->proc_lseek. Garbage collect pde_lseek() function. [adobriyan@gmail.com: smoke test lseek()] Link: https://lkml.kernel.org/r/YG4OIhChOrVTPgdN@localhost.localdomain Link: https://lkml.kernel.org/r/YFYX0Bzwxlc7aBa/@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06proc: save LOC in __xlate_proc_name()Alexey Dobriyan
Can't look at this verbosity anymore. Link: https://lkml.kernel.org/r/YFYXAp/fgq405qcy@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06fs/proc/generic.c: fix incorrect pde_is_permanent checkColin Ian King
Currently the pde_is_permanent() check is being run on root multiple times rather than on the next proc directory entry. This looks like a copy-paste error. Fix this by replacing root with next. Addresses-Coverity: ("Copy-paste error") Link: https://lkml.kernel.org/r/20210318122633.14222-1-colin.king@canonical.com Fixes: d919b33dafb3 ("proc: faster open/read/close with "permanent" files") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Greg Kroah-Hartman <gregkh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06Merge tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "Notable items here are - a series to take advantage of David Howells' netfs helper library from Jeff - three new filesystem client metrics from Xiubo - ceph.dir.rsnaps vxattr from Yanhu - two auth-related fixes from myself, marked for stable. Interspersed is a smattering of assorted fixes and cleanups across the filesystem" * tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client: (24 commits) libceph: allow addrvecs with a single NONE/blank address libceph: don't set global_id until we get an auth ticket libceph: bump CephXAuthenticate encoding version ceph: don't allow access to MDS-private inodes ceph: fix up some bare fetches of i_size ceph: convert some PAGE_SIZE invocations to thp_size() ceph: support getting ceph.dir.rsnaps vxattr ceph: drop pinned_page parameter from ceph_get_caps ceph: fix inode leak on getattr error in __fh_to_dentry ceph: only check pool permissions for regular files ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon ceph: avoid counting the same request twice or more ceph: rename the metric helpers ceph: fix kerneldoc copypasta over ceph_start_io_direct ceph: use attach/detach_page_private for tracking snap context ceph: don't use d_add in ceph_handle_snapdir ceph: don't clobber i_snap_caps on non-I_NEW inode ceph: fix fall-through warnings for Clang ceph: convert ceph_readpages to ceph_readahead ceph: convert ceph_write_begin to netfs_write_begin ...
2021-05-06Merge tag 'ecryptfs-5.13-rc1-updates' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs updates from Tyler Hicks: "Code cleanups and a bug fix - W=1 compiler warning cleanups - Mutex initialization simplification - Protect against NULL pointer exception during mount" * tag 'ecryptfs-5.13-rc1-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: fix kernel panic with null dev_name ecryptfs: remove unused helpers ecryptfs: Fix typo in message eCryptfs: Use DEFINE_MUTEX() for mutex lock ecryptfs: keystore: Fix some kernel-doc issues and demote non-conformant headers ecryptfs: inode: Help out nearly-there header and demote non-conformant ones ecryptfs: mmap: Help out one function header and demote other abuses ecryptfs: crypto: Supply some missing param descriptions and demote abuses ecryptfs: miscdev: File headers are not good kernel-doc candidates ecryptfs: main: Demote a bunch of non-conformant kernel-doc headers ecryptfs: messaging: Add missing param descriptions and demote abuses ecryptfs: super: Fix formatting, naming and kernel-doc abuses ecryptfs: file: Demote kernel-doc abuses ecryptfs: kthread: Demote file header and provide description for 'cred' ecryptfs: dentry: File headers are not good candidates for kernel-doc ecryptfs: debug: Demote a couple of kernel-doc abuses ecryptfs: read_write: File headers do not make good candidates for kernel-doc ecryptfs: use DEFINE_MUTEX() for mutex lock eCryptfs: add a semicolon
2021-05-06block: reexpand iov_iter after read/writeyangerkun
We get a bug: BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 Read of size 8 at addr ffff0000d3fb11f8 by task CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted 5.10.0-00843-g352c8610ccd2 #2 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 io_read fs/io_uring.c:3421 [inline] io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 __kmalloc+0x23c/0x334 mm/slub.c:3970 kmalloc include/linux/slab.h:557 [inline] __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 io_setup_async_rw fs/io_uring.c:3229 [inline] io_read fs/io_uring.c:3436 [inline] io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 io_dismantle_req fs/io_uring.c:1855 [inline] __io_free_req+0x70/0x254 fs/io_uring.c:1867 io_put_req_find_next fs/io_uring.c:2173 [inline] __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 task_work_run+0xdc/0x128 kernel/task_work.c:151 get_signal+0x6f8/0x980 kernel/signal.c:2562 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 work_pending+0xc/0x180 blkdev_read_iter can truncate iov_iter's count since the count + pos may exceed the size of the blkdev. This will confuse io_read that we have consume the iovec. And once we do the iov_iter_revert in io_read, we will trigger the slab-out-of-bounds. Fix it by reexpand the count with size has been truncated. blkdev_write_iter can trigger the problem too. Signed-off-by: yangerkun <yangerkun@huawei.com> Acked-by: Pavel Begunkov <asml.silencec@gmail.com> Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-05io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffersThadeu Lima de Souza Cascardo
Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS. Truncate those lengths when doing io_add_buffers, so buffer addresses still use the uncapped length. Also, take the chance and change struct io_buffer len member to __u32, so it matches struct io_provide_buffer len member. This fixes CVE-2021-3491, also reported as ZDI-CAN-13546. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Reported-by: Billy Jheng Bing-Jhong (@st424204) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "The remainder of the main mm/ queue. 143 patches. Subsystems affected by this patch series (all mm): pagecache, hugetlb, userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap, kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and kfence" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits) kfence: use power-efficient work queue to run delayed work kfence: maximize allocation wait timeout duration kfence: await for allocation using wait_event kfence: zero guard page after out-of-bounds access mm/process_vm_access.c: remove duplicate include mm/mempool: minor coding style tweaks mm/highmem.c: fix coding style issue btrfs: use memzero_page() instead of open coded kmap pattern iov_iter: lift memzero_page() to highmem.h mm/zsmalloc: use BUG_ON instead of if condition followed by BUG. mm/zswap.c: switch from strlcpy to strscpy arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE mm,memory_hotplug: add kernel boot option to enable memmap_on_memory acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported mm,memory_hotplug: allocate memmap from the added memory range mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count() mm,memory_hotplug: relax fully spanned sections check drivers/base/memory: introduce memory_block_{online,offline} mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove ...
2021-05-05Merge tag 'nfsd-5.13-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, including a fix to grant read delegations for files open for writing" * tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix null pointer dereference in svc_rqst_free() SUNRPC: fix ternary sign expansion bug in tracing nfsd: Fix fall-through warnings for Clang nfsd: grant read delegations to clients holding writes nfsd: reshuffle some code nfsd: track filehandle aliasing in nfs4_files nfsd: hash nfs4_files by inode number nfsd: ensure new clients break delegations nfsd: removed unused argument in nfsd_startup_generic() nfsd: remove unused function svcrdma: Pass a useful error code to the send_err tracepoint svcrdma: Rename goto labels in svc_rdma_sendto() svcrdma: Don't leak send_ctxt on Send errors
2021-05-05Merge tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs updates from Steve French: "Ten CIFS/SMB3 changes - including two marked for stable - including some important multichannel fixes, as well as support for handle leases (deferred close) and shutdown support: - some important multichannel fixes - support for handle leases (deferred close) - shutdown support (which is also helpful since it enables multiple xfstests) - enable negotiating stronger encryption by default (GCM256) - improve wireshark debugging by allowing more options for root to dump decryption keys SambaXP and the SMB3 Plugfest test event are going on now so I am expecting more patches over the next few days due to extra testing (including more multichannel fixes)" * tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6: fs/cifs: Fix resource leak Cifs: Fix kernel oops caused by deferred close for files. cifs: fix regression when mounting shares with prefix paths cifs: use echo_interval even when connection not ready. cifs: detect dead connections only when echoes are enabled. smb3.1.1: allow dumping keys for multiuser mounts smb3.1.1: allow dumping GCM256 keys to improve debugging of encrypted shares cifs: add shutdown support cifs: Deferred close for files smb3.1.1: enable negotiating stronger encryption by default
2021-05-05btrfs: use memzero_page() instead of open coded kmap patternIra Weiny
There are many places where kmap/memset/kunmap patterns occur. Use the newly lifted memzero_page() to eliminate direct uses of kmap and leverage the new core functions use of kmap_local_page(). The development of this patch was aided by the following coccinelle script: // <smpl> // SPDX-License-Identifier: GPL-2.0-only // Find kmap/memset/kunmap pattern and replace with memset*page calls // // NOTE: Offsets and other expressions may be more complex than what the script // will automatically generate. Therefore a catchall rule is provided to find // the pattern which then must be evaluated by hand. // // Confidence: Low // Copyright: (C) 2021 Intel Corporation // URL: http://coccinelle.lip6.fr/ // Comments: // Options: // // Then the memset pattern // @ memset_rule1 @ expression page, V, L, Off; identifier ptr; type VP; @@ ( -VP ptr = kmap(page); | -ptr = kmap(page); | -VP ptr = kmap_atomic(page); | -ptr = kmap_atomic(page); ) <+... ( -memset(ptr, 0, L); +memzero_page(page, 0, L); | -memset(ptr + Off, 0, L); +memzero_page(page, Off, L); | -memset(ptr, V, L); +memset_page(page, V, 0, L); | -memset(ptr + Off, V, L); +memset_page(page, V, Off, L); ) ...+> ( -kunmap(page); | -kunmap_atomic(ptr); ) // Remove any pointers left unused @ depends on memset_rule1 @ identifier memset_rule1.ptr; type VP, VP1; @@ -VP ptr; ... when != ptr; ? VP1 ptr; // // Catch all // @ memset_rule2 @ expression page; identifier ptr; expression GenTo, GenSize, GenValue; type VP; @@ ( -VP ptr = kmap(page); | -ptr = kmap(page); | -VP ptr = kmap_atomic(page); | -ptr = kmap_atomic(page); ) <+... ( // // Some call sites have complex expressions within the memset/memcpy // The follow are catch alls which need to be evaluated by hand. // -memset(GenTo, 0, GenSize); +memzero_pageExtra(page, GenTo, GenSize); | -memset(GenTo, GenValue, GenSize); +memset_pageExtra(page, GenValue, GenTo, GenSize); ) ...+> ( -kunmap(page); | -kunmap_atomic(ptr); ) // Remove any pointers left unused @ depends on memset_rule2 @ identifier memset_rule2.ptr; type VP, VP1; @@ -VP ptr; ... when != ptr; ? VP1 ptr; // </smpl> Link: https://lkml.kernel.org/r/20210309212137.2610186-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: David Sterba <dsterba@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)Anshuman Khandual
SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms that subscribe it. Instead, just make it a generic option which can be selected on applicable platforms. Also rename it as ARCH_SUPPORTS_HUGETLBFS instead. This reduces code duplication and makes it cleaner. Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [riscv] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: fs: invalidate BH LRU during page migrationMinchan Kim
Pages containing buffer_heads that are in one of the per-CPU buffer_head LRU caches will be pinned and thus cannot be migrated. This can prevent CMA allocations from succeeding, which are often used on platforms with co-processors (such as a DSP) that can only use physically contiguous memory. It can also prevent memory hot-unplugging from succeeding, which involves migrating at least MIN_MEMORY_BLOCK_SIZE bytes of memory, which ranges from 8 MiB to 1 GiB based on the architecture in use. Correspondingly, invalidate the BH LRU caches before a migration starts and stop any buffer_head from being cached in the LRU caches, until migration has finished. Link: https://lkml.kernel.org/r/20210319175127.886124-3-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Chris Goldsworthy <cgoldswo@codeaurora.org> Reported-by: Laura Abbott <labbott@kernel.org> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: John Dias <joaodias@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05userfaultfd: add UFFDIO_CONTINUE ioctlAxel Rasmussen
This ioctl is how userspace ought to resolve "minor" userfaults. The idea is, userspace is notified that a minor fault has occurred. It might change the contents of the page using its second non-UFFD mapping, or not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in the minor fault case, we already have some pre-existing underlying page. Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping. We'd just use memcpy() or similar instead. It turns out hugetlb_mcopy_atomic_pte() already does very close to what we want, if an existing page is provided via `struct page **pagep`. We already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so just extend that design: add an enum for the three modes of operation, and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE case. (Basically, look up the existing page, and avoid adding the existing page to the page cache or calling set_page_huge_active() on it.) Link: https://lkml.kernel.org/r/20210301222728.176417-5-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>