summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-08-07Merge branch 'hch.init_path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull init and set_fs() cleanups from Al Viro: "Christoph's 'getting rid of ksys_...() uses under KERNEL_DS' series" * 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (50 commits) init: add an init_dup helper init: add an init_utimes helper init: add an init_stat helper init: add an init_mknod helper init: add an init_mkdir helper init: add an init_symlink helper init: add an init_link helper init: add an init_eaccess helper init: add an init_chmod helper init: add an init_chown helper init: add an init_chroot helper init: add an init_chdir helper init: add an init_rmdir helper init: add an init_unlink helper init: add an init_umount helper init: add an init_mount helper init: mark create_dev as __init init: mark console_on_rootfs as __init init: initialize ramdisk_execute_command at compile time devtmpfs: refactor devtmpfsd() ...
2020-08-07Merge branch 'work.regset' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ptrace regset updates from Al Viro: "Internal regset API changes: - regularize copy_regset_{to,from}_user() callers - switch to saner calling conventions for ->get() - kill user_regset_copyout() The ->put() side of things will have to wait for the next cycle, unfortunately. The balance is about -1KLoC and replacements for ->get() instances are a lot saner" * 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits) regset: kill user_regset_copyout{,_zero}() regset(): kill ->get_size() regset: kill ->get() csky: switch to ->regset_get() xtensa: switch to ->regset_get() parisc: switch to ->regset_get() nds32: switch to ->regset_get() nios2: switch to ->regset_get() hexagon: switch to ->regset_get() h8300: switch to ->regset_get() openrisc: switch to ->regset_get() riscv: switch to ->regset_get() c6x: switch to ->regset_get() ia64: switch to ->regset_get() arc: switch to ->regset_get() arm: switch to ->regset_get() sh: convert to ->regset_get() arm64: switch to ->regset_get() mips: switch to ->regset_get() sparc: switch to ->regset_get() ...
2020-08-07gfs2: When gfs2_dirty_inode gets a glock error, dump the glockBob Peterson
Before this patch, if function gfs2_dirty_inode got an error when trying to lock the inode glock, it complained, but it didn't say what glock or inode had the problem. In this case, it almost always means that dinode_in found an error with the dinode in the file system. So it makes sense to dump the glock, which tells us the location of the dinode in the file system. That will allow us to analyze the corruption from the metadata. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-07gfs2: Never call gfs2_block_zero_range with an open transactionBob Peterson
Before this patch, some functions started transactions then they called gfs2_block_zero_range. However, gfs2_block_zero_range, like writes, can start transactions, which results in a recursive transaction error. For example: do_shrink trunc_start gfs2_trans_begin <------------------------------------------------ gfs2_block_zero_range iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops); iomap_apply ... iomap_zero_range_actor iomap_begin gfs2_iomap_begin gfs2_iomap_begin_write actor (iomap_zero_range_actor) iomap_zero iomap_write_begin gfs2_iomap_page_prepare gfs2_trans_begin <------------------------ This patch reorders the callers of gfs2_block_zero_range so that they only start their transactions after the call. It also adds a BUG_ON to ensure this doesn't happen again. Fixes: 2257e468a63b ("gfs2: implement gfs2_block_zero_range using iomap_zero_range") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-07gfs2: print details on transactions that aren't properly endedBob Peterson
If function gfs2_trans_begin is called with another transaction active it BUGs out, but it doesn't give any details about the duplicate. This patch moves function gfs2_print_trans and calls it when this situation arises for better debugging. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-07gfs2: Fix inaccurate commentBob Peterson
The comment regarding journal flush thresholds is wrong. This patch fixes it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-06cifs: Fix an error pointer dereference in cifs_mount()Dan Carpenter
The error handling calls kfree(full_path) so we can't let it be a NULL pointer. There used to be a NULL assignment here but we accidentally deleted it. Add it back. Fixes: 7efd08158261 ("cifs: document and cleanup dfs mount") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
2020-08-06Merge tag 'dlm-5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes a some improvements to the dlm networking layer: improving the ability to trace dlm messages for debugging, and improved handling of bad messages or disrupted connections" * tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: fs: dlm: implement tcp graceful shutdown fs: dlm: change handling of reconnects fs: dlm: don't close socket on invalid message fs: dlm: set skb mark per peer socket fs: dlm: set skb mark for listen socket net: sock: add sock_set_mark dlm: Fix kobject memleak
2020-08-06Merge tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap updates from Darrick Wong: "The most notable changes are: - iomap no longer invalidates the page cache when performing a direct read, since doing so is unnecessary and the old directio code doesn't do that either. - iomap embraced the use of returning ENOTBLK from a direct write to trigger falling back to a buffered write since ext4 already did this and btrfs wants it for their port. - iomap falls back to buffered writes if we're doing a direct write and the page cache invalidation after the flush fails; this was necessary to handle a corner case in the btrfs port. - Remove email virus scanner detritus that was accidentally included in yesterday's pull request. Clearly I need(ed) to update my git branch checker scripts. :(" * tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: fall back to buffered writes for invalidation failures xfs: use ENOTBLK for direct I/O to buffered I/O fallback iomap: Only invalidate page cache pages on direct IO writes iomap: Make sure iomap_end is called after iomap_begin
2020-08-06Merge tag 'fsnotify_for_v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: - fanotify fix for softlockups when there are many queued events - performance improvement to reduce fsnotify overhead when not used - Amir's implementation of fanotify events with names. With these you can now efficiently monitor whole filesystem, eg to mirror changes to another machine. * tag 'fsnotify_for_v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (37 commits) fanotify: compare fsid when merging name event fsnotify: create method handle_inode_event() in fsnotify_operations fanotify: report parent fid + child fid fanotify: report parent fid + name + child fid fanotify: add support for FAN_REPORT_NAME fanotify: report events with parent dir fid to sb/mount/non-dir marks fanotify: add basic support for FAN_REPORT_DIR_FID fsnotify: remove check that source dentry is positive fsnotify: send event with parent/name info to sb/mount/non-dir marks audit: do not set FS_EVENT_ON_CHILD in audit marks mask inotify: do not set FS_EVENT_ON_CHILD in non-dir mark mask fsnotify: pass dir and inode arguments to fsnotify() fsnotify: create helper fsnotify_inode() fsnotify: send event to parent and child with single callback inotify: report both events on parent and child with single callback dnotify: report both events on parent and child with single callback fanotify: no external fh buffer in fanotify_name_event fanotify: use struct fanotify_info to parcel the variable size buffer fsnotify: add object type "child" to object type iterator fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marks ...
2020-08-06Merge tag 'for_v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, udf, reiserfs, quota cleanups and minor fixes from Jan Kara: "A few ext2 fixups and then several (mostly comment and documentation) cleanups in ext2, udf, reiserfs, and quota" * tag 'for_v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: delete duplicated words udf: osta_udf.h: delete a duplicated word reiserfs: reiserfs.h: delete a duplicated word ext2: ext2.h: fix duplicated word + typos udf: Replace HTTP links with HTTPS ones quota: Fixup http links in quota doc Replace HTTP links with HTTPS ones: DISKQUOTA ext2: initialize quota info in ext2_xattr_set() ext2: fix some incorrect comments in inode.c ext2: remove nocheck option ext2: fix missing percpu_counter_inc ext2: ext2_find_entry() return -ENOENT if no entry found ext2: propagate errors up to ext2_find_entry()'s callers ext2: fix improper assignment for e_value_offs
2020-08-06Merge tag 'erofs-for-5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "This cycle mainly addresses an issue out of some extended inode with designated location, which are not generated by current mkfs but need to handled at runtime anyway. The others are quite trivial ones. - use HTTPS links instead of insecure HTTP ones; - fix crossing page boundary on specific extended inodes; - remove useless WQ_CPU_INTENSIVE flag for unbound wq; - minor cleanup" * tag 'erofs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: remove WQ_CPU_INTENSIVE flag from unbound wq's erofs: fold in used-once helper erofs_workgroup_unfreeze_final() erofs: fix extended inode could cross boundary erofs: Replace HTTP links with HTTPS ones
2020-08-06Merge tag '5.9-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs updates from Steve French: "16 cifs/smb3 fixes, about half DFS related, two fixes for stable. Still working on and testing an additional set of fixes (including updates to mount, and some fallocate scenario improvements) for later in the merge window" * tag '5.9-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: cifs: document and cleanup dfs mount cifs: only update prefix path of DFS links in cifs_tree_connect() cifs: fix double free error on share and prefix cifs: handle RESP_GET_DFS_REFERRAL.PathConsumed in reconnect cifs: handle empty list of targets in cifs_reconnect() cifs: rename reconn_inval_dfs_target() cifs: reduce number of referral requests in DFS link lookups cifs: merge __{cifs,smb2}_reconnect[_tcon]() into cifs_tree_connect() cifs: convert to use be32_add_cpu() cifs: delete duplicated words in header files cifs: Remove the superfluous break cifs: smb1: Try failing back to SetFileInfo if SetPathInfo fails cifs`: handle ERRBaduid for SMB1 cifs: remove unused variable 'server' smb3: warn on confusing error scenario with sec=krb5 cifs: Fix leak when handling lease break for cached root fid
2020-08-06fs: dlm: implement tcp graceful shutdownAlexander Aring
During my code inspection I saw there is no implementation of a graceful shutdown for tcp. This patch will introduce a graceful shutdown for tcp connections. The shutdown is implemented synchronized as dlm_lowcomms_stop() is called to end all dlm communication. After shutdown is done, a lot of flush and closing functionality will be called. However I don't see a problem with that. The waitqueue for synchronize the shutdown has a timeout of 10 seconds, if timeout a force close will be exectued. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06fs: dlm: change handling of reconnectsAlexander Aring
This patch changes the handling of reconnects. At first we only close the connection related to the communication failure. If we get a new connection for an already existing connection we close the existing connection and take the new one. This patch improves significantly the stability of tcp connections while running "tcpkill -9 -i $IFACE port 21064" while generating a lot of dlm messages e.g. on a gfs2 mount with many files. My test setup shows that a deadlock is "more" unlikely. Before this patch I wasn't able to get not a deadlock after 5 seconds. After this patch my observation is that it's more likely to survive after 5 seconds and more, but still a deadlock occurs after certain time. My guess is that there are still "segments" inside the tcp writequeue or retransmit queue which get dropped when receiving a tcp reset [1]. Hard to reproduce because the right message need to be inside these queues, which might even be in the 5 first seconds with this patch. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp_input.c?h=v5.8-rc6#n4122 Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06fs: dlm: don't close socket on invalid messageAlexander Aring
This patch doesn't close sockets when there is an invalid dlm message received. The connection will probably reconnect anyway so. To not close the connection will reduce the number of possible failtures. As we don't have a different strategy to react on such scenario just keep going the connection and ignore the message. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06fs: dlm: set skb mark per peer socketAlexander Aring
This patch adds support to set the skb mark value for the DLM tcp and sctp socket per peer. The mark value will be offered as per comm value of configfs. At creation time of the peer socket it will be set as socket option. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06fs: dlm: set skb mark for listen socketAlexander Aring
This patch adds support to set the skb mark value for the DLM listen tcp and sctp sockets. The mark value will be offered as cluster configuration. At creation time of the listen socket it will be set as socket option. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06dlm: Fix kobject memleakWang Hai
Currently the error return path from kobject_init_and_add() is not followed by a call to kobject_put() - which means we are leaking the kobject. Set do_unreg = 1 before kobject_init_and_add() to ensure that kobject_put() can be called in its error patch. Fixes: 901195ed7f4b ("Kobject: change GFS2 to use kobject_init_and_add") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06io_uring: account locked memory before potential error caseJens Axboe
The tear down path will always unaccount the memory, so ensure that we have accounted it before hitting any of them. Reported-by: Tomáš Chaloupka <chalucha@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-06io_uring: set ctx sq/cq entry count earlierJens Axboe
If we hit an earlier error path in io_uring_create(), then we will have accounted memory, but not set ctx->{sq,cq}_entries yet. Then when the ring is torn down in error, we use those values to unaccount the memory. Ensure we set the ctx entries before we're able to hit a potential error path. Cc: stable@vger.kernel.org Reported-by: Tomáš Chaloupka <chalucha@gmail.com> Tested-by: Tomáš Chaloupka <chalucha@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-06Merge branch 'WIP.locking/seqlocks' into locking/urgentIngo Molnar
Pick up the full seqlock series PeterZ is working on. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...
2020-08-05Merge tag 'driver-core-5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of changes to the driver core, and some drivers using the changes, for 5.9-rc1. "Biggest" thing in here is the device link exposure in sysfs, to help to tame the madness that is SoC device tree representations and driver interactions with it. Other stuff in here that is interesting is: - device probe log helper so that drivers can report problems in a unified way easier. - devres functions added - DEVICE_ATTR_ADMIN_* macro added to make it harder to write incorrect sysfs file permissions - documentation cleanups - ability for debugfs to be present in the kernel, yet not exposed to userspace. Needed for systems that want it enabled, but do not trust users, so they can still use some kernel functions that were otherwise disabled. - other minor fixes and cleanups The patches outside of drivers/base/ all have acks from the respective subsystem maintainers to go through this tree instead of theirs. All of these have been in linux-next with no reported issues" * tag 'driver-core-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (39 commits) drm/bridge: lvds-codec: simplify error handling drm/bridge/sii8620: fix resource acquisition error handling driver core: add deferring probe reason to devices_deferred property driver core: add device probe log helper driver core: Avoid binding drivers to dead devices Revert "test_firmware: Test platform fw loading on non-EFI systems" firmware_loader: EFI firmware loader must handle pre-allocated buffer selftest/firmware: Add selftest timeout in settings test_firmware: Test platform fw loading on non-EFI systems driver core: Change delimiter in devlink device's name to "--" debugfs: Add access restriction option tracefs: Remove unnecessary debug_fs checks. driver core: Fix probe_count imbalance in really_probe() kobject: remove unused KOBJ_MAX action driver core: Fix sleeping in invalid context during device link deletion driver core: Add waiting_for_supplier sysfs file for devices driver core: Add state_synced sysfs file for devices that support it driver core: Expose device link details in sysfs driver core: Drop mention of obsolete bus rwsem from kernel-doc debugfs: file: Remove unnecessary cast in kfree() ...
2020-08-05iomap: fall back to buffered writes for invalidation failuresChristoph Hellwig
Failing to invalid the page cache means data in incoherent, which is a very bad state for the system. Always fall back to buffered I/O through the page cache if we can't invalidate mappings. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4 Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2 Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
2020-08-05xfs: use ENOTBLK for direct I/O to buffered I/O fallbackChristoph Hellwig
This is what the classic fs/direct-io.c implementation and thuse other file systems use. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-08-05iomap: Only invalidate page cache pages on direct IO writesDave Chinner
The historic requirement for XFS to invalidate cached pages on direct IO reads has been lost in the twisty pages of history - it was inherited from Irix, which implemented page cache invalidation on read as a method of working around problems synchronising page cache state with uncached IO. XFS has carried this ever since. In the initial linux ports it was necessary to get mmap and DIO to play "ok" together and not immediately corrupt data. This was the state of play until the linux kernel had infrastructure to track unwritten extents and synchronise page faults with allocations and unwritten extent conversions (->page_mkwrite infrastructure). IOws, the page cache invalidation on DIO read was necessary to prevent trivial data corruptions. This didn't solve all the problems, though. There were peformance problems if we didn't invalidate the entire page cache over the file on read - we couldn't easily determine if the cached pages were over the range of the IO, and invalidation required taking a serialising lock (i_mutex) on the inode. This serialising lock was an issue for XFS, as it was the only exclusive lock in the direct Io read path. Hence if there were any cached pages, we'd just invalidate the entire file in one go so that subsequent IOs didn't need to take the serialising lock. This was a problem that prevented ranged invalidation from being particularly useful for avoiding the remaining coherency issues. This was solved with the conversion of i_mutex to i_rwsem and the conversion of the XFS inode IO lock to use i_rwsem. Hence we could now just do ranged invalidation and the performance problem went away. However, page cache invalidation was still needed to serialise sub-page/sub-block zeroing via direct IO against buffered IO because bufferhead state attached to the cached page could get out of whack when direct IOs were issued. We've removed bufferheads from the XFS code, and we don't carry any extent state on the cached pages anymore, and so this problem has gone away, too. IOWs, it would appear that we don't have any good reason to be invalidating the page cache on DIO reads anymore. Hence remove the invalidation on read because it is unnecessary overhead, not needed to maintain coherency between mmap/buffered access and direct IO anymore, and prevents anyone from using direct IO reads from intentionally invalidating the page cache of a file. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-08-05xfs: delete duplicated words + other fixesRandy Dunlap
Delete repeated words in fs/xfs/. {we, that, the, a, to, fork} Change "it it" to "it is" in one location. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> To: linux-fsdevel@vger.kernel.org Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: linux-xfs@vger.kernel.org Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-08-05ceph: handle zero-length feature mask in session messagesJeff Layton
Most session messages contain a feature mask, but the MDS will routinely send a REJECT message with one that is zero-length. Commit 0fa8263367db ("ceph: fix endianness bug when handling MDS session feature bits") fixed the decoding of the feature mask, but failed to account for the MDS sending a zero-length feature mask. This causes REJECT message decoding to fail. Skip trying to decode a feature mask if the word count is zero. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/46823 Fixes: 0fa8263367db ("ceph: fix endianness bug when handling MDS session feature bits") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-05virtio_fs: convert to LE accessorsMichael S. Tsirkin
Virtio fs is modern-only. Use LE accessors for config space. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05io_uring: Fix NULL pointer dereference in loop_rw_iter()Guoyu Huang
loop_rw_iter() does not check whether the file has a read or write function. This can lead to NULL pointer dereference when the user passes in a file descriptor that does not have read or write function. The crash log looks like this: [ 99.834071] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 99.835364] #PF: supervisor instruction fetch in kernel mode [ 99.836522] #PF: error_code(0x0010) - not-present page [ 99.837771] PGD 8000000079d62067 P4D 8000000079d62067 PUD 79d8c067 PMD 0 [ 99.839649] Oops: 0010 [#2] SMP PTI [ 99.840591] CPU: 1 PID: 333 Comm: io_wqe_worker-0 Tainted: G D 5.8.0 #2 [ 99.842622] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 99.845140] RIP: 0010:0x0 [ 99.845840] Code: Bad RIP value. [ 99.846672] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202 [ 99.848018] RAX: 0000000000000000 RBX: ffff92363bd67300 RCX: ffff92363d461208 [ 99.849854] RDX: 0000000000000010 RSI: 00007ffdbf696bb0 RDI: ffff92363bd67300 [ 99.851743] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000 [ 99.853394] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010 [ 99.855148] R13: 0000000000000000 R14: ffff92363d461208 R15: ffffa1c7c01ebc68 [ 99.856914] FS: 0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000 [ 99.858651] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 99.860032] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0 [ 99.861979] Call Trace: [ 99.862617] loop_rw_iter.part.0+0xad/0x110 [ 99.863838] io_write+0x2ae/0x380 [ 99.864644] ? kvm_sched_clock_read+0x11/0x20 [ 99.865595] ? sched_clock+0x9/0x10 [ 99.866453] ? sched_clock_cpu+0x11/0xb0 [ 99.867326] ? newidle_balance+0x1d4/0x3c0 [ 99.868283] io_issue_sqe+0xd8f/0x1340 [ 99.869216] ? __switch_to+0x7f/0x450 [ 99.870280] ? __switch_to_asm+0x42/0x70 [ 99.871254] ? __switch_to_asm+0x36/0x70 [ 99.872133] ? lock_timer_base+0x72/0xa0 [ 99.873155] ? switch_mm_irqs_off+0x1bf/0x420 [ 99.874152] io_wq_submit_work+0x64/0x180 [ 99.875192] ? kthread_use_mm+0x71/0x100 [ 99.876132] io_worker_handle_work+0x267/0x440 [ 99.877233] io_wqe_worker+0x297/0x350 [ 99.878145] kthread+0x112/0x150 [ 99.878849] ? __io_worker_unuse+0x100/0x100 [ 99.879935] ? kthread_park+0x90/0x90 [ 99.880874] ret_from_fork+0x22/0x30 [ 99.881679] Modules linked in: [ 99.882493] CR2: 0000000000000000 [ 99.883324] ---[ end trace 4453745f4673190b ]--- [ 99.884289] RIP: 0010:0x0 [ 99.884837] Code: Bad RIP value. [ 99.885492] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202 [ 99.886851] RAX: 0000000000000000 RBX: ffff92363acd7f00 RCX: ffff92363d461608 [ 99.888561] RDX: 0000000000000010 RSI: 00007ffe040d9e10 RDI: ffff92363acd7f00 [ 99.890203] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000 [ 99.891907] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010 [ 99.894106] R13: 0000000000000000 R14: ffff92363d461608 R15: ffffa1c7c01ebc68 [ 99.896079] FS: 0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000 [ 99.898017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 99.899197] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0 Fixes: 32960613b7c3 ("io_uring: correctly handle non ->{read,write}_iter() file_operations") Cc: stable@vger.kernel.org Signed-off-by: Guoyu Huang <hgy5945@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-05NFS: Add tracepoints for layouterror and layoutstats.Trond Myklebust
Allow tracing of the NFSv4.2 layouterror and layoutstats operations. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-05NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()Trond Myklebust
Ensure we correctly report the stateid and status in the layoutreturn on close tracepoint. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-05reiserfs: delete duplicated wordsRandy Dunlap
Delete repeated words in fs/reiserfs/. {from, not, we, are} Link: https://lore.kernel.org/r/20200805024925.12281-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> To: linux-fsdevel@vger.kernel.org Cc: Jan Kara <jack@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: reiserfs-devel@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2020-08-04Merge tag 'docs-5.9' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more" * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits) scripts/kernel-doc: optionally treat warnings as errors docs: ia64: correct typo mailmap: add entry for <alobakin@marvell.com> doc/zh_CN: add cpu-load Chinese version Documentation/admin-guide: tainted-kernels: fix spelling mistake MAINTAINERS: adjust kprobes.rst entry to new location devices.txt: document rfkill allocation PCI: correct flag name docs: filesystems: vfs: correct flag name docs: filesystems: vfs: correct sync_mode flag names docs: path-lookup: markup fixes for emphasis docs: path-lookup: more markup fixes docs: path-lookup: fix HTML entity mojibake CREDITS: Replace HTTP links with HTTPS ones docs: process: Add an example for creating a fixes tag doc/zh_CN: add Chinese translation prefer section doc/zh_CN: add clearing-warn-once Chinese version doc/zh_CN: add admin-guide index doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label futex: MAINTAINERS: Re-add selftests directory ...
2020-08-04nfs: nfs_file_write() should check for writeback errorsScott Mayhew
The NFS_CONTEXT_ERROR_WRITE flag (as well as the check of said flag) was removed by commit 6fbda89b257f. The absence of an error check allows writes to be continually queued up for a server that may no longer be able to handle them. Fix it by adding an error check using the generic error reporting functions. Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-04init: add an init_dup helperChristoph Hellwig
Add a simple helper to grab a reference to a file and install it at the next available fd, and switch the early init code over to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-04Merge tag 'close-range-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull close_range() implementation from Christian Brauner: "This adds the close_range() syscall. It allows to efficiently close a range of file descriptors up to all file descriptors of a calling task. This is coordinated with the FreeBSD folks which have copied our version of this syscall and in the meantime have already merged it in April 2019: https://reviews.freebsd.org/D21627 https://svnweb.freebsd.org/base?view=revision&revision=359836 The syscall originally came up in a discussion around the new mount API and making new file descriptor types cloexec by default. During this discussion, Al suggested the close_range() syscall. First, it helps to close all file descriptors of an exec()ing task. This can be done safely via (quoting Al's example from [1] verbatim): /* that exec is sensitive */ unshare(CLONE_FILES); /* we don't want anything past stderr here */ close_range(3, ~0U); execve(....); The code snippet above is one way of working around the problem that file descriptors are not cloexec by default. This is aggravated by the fact that we can't just switch them over without massively regressing userspace. For a whole class of programs having an in-kernel method of closing all file descriptors is very helpful (e.g. demons, service managers, programming language standard libraries, container managers etc.). Second, it allows userspace to avoid implementing closing all file descriptors by parsing through /proc/<pid>/fd/* and calling close() on each file descriptor and other hacks. From looking at various large(ish) userspace code bases this or similar patterns are very common in service managers, container runtimes, and programming language runtimes/standard libraries such as Python or Rust. In addition, the syscall will also work for tasks that do not have procfs mounted and on kernels that do not have procfs support compiled in. In such situations the only way to make sure that all file descriptors are closed is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery. Based on Linus' suggestion close_range() also comes with a new flag CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping right before exec. This would usually be expressed in the sequence: unshare(CLONE_FILES); close_range(3, ~0U); as pointed out by Linus it might be desirable to have this be a part of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which gets especially handy when we're closing all file descriptors above a certain threshold. Test-suite as always included" * tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add CLOSE_RANGE_UNSHARE tests close_range: add CLOSE_RANGE_UNSHARE tests: add close_range() tests arch: wire-up close_range() open: add close_range()
2020-08-04Merge tag 'cap-checkpoint-restore-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull checkpoint-restore updates from Christian Brauner: "This enables unprivileged checkpoint/restore of processes. Given that this work has been going on for quite some time the first sentence in this summary is hopefully more exciting than the actual final code changes required. Unprivileged checkpoint/restore has seen a frequent increase in interest over the last two years and has thus been one of the main topics for the combined containers & checkpoint/restore microconference since at least 2018 (cf. [1]). Here are just the three most frequent use-cases that were brought forward: - The JVM developers are integrating checkpoint/restore into a Java VM to significantly decrease the startup time. - In high-performance computing environment a resource manager will typically be distributing jobs where users are always running as non-root. Long-running and "large" processes with significant startup times are supposed to be checkpointed and restored with CRIU. - Container migration as a non-root user. In all of these scenarios it is either desirable or required to run without CAP_SYS_ADMIN. The userspace implementation of checkpoint/restore CRIU already has the pull request for supporting unprivileged checkpoint/restore up (cf. [2]). To enable unprivileged checkpoint/restore a new dedicated capability CAP_CHECKPOINT_RESTORE is introduced. This solution has last been discussed in 2019 in a talk by Google at Linux Plumbers (cf. [1] "Update on Task Migration at Google Using CRIU") with Adrian and Nicolas providing the implementation now over the last months. In essence, this allows the CRIU binary to be installed with the CAP_CHECKPOINT_RESTORE vfs capability set thereby enabling unprivileged users to restore processes. To make this possible the following permissions are altered: - Selecting a specific PID via clone3() set_tid relaxed from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE. - Selecting a specific PID via /proc/sys/kernel/ns_last_pid relaxed from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE. - Accessing /proc/pid/map_files relaxed from init userns CAP_SYS_ADMIN to init userns CAP_CHECKPOINT_RESTORE. - Changing /proc/self/exe from userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE. Of these four changes the /proc/self/exe change deserves a few words because the reasoning behind even restricting /proc/self/exe changes in the first place is just full of historical quirks and tracking this down was a questionable version of fun that I'd like to spare others. In short, it is trivial to change /proc/self/exe as an unprivileged user, i.e. without userns CAP_SYS_ADMIN right now. Either via ptrace() or by simply intercepting the elf loader in userspace during exec. Nicolas was nice enough to even provide a POC for the latter (cf. [3]) to illustrate this fact. The original patchset which introduced PR_SET_MM_MAP had no permissions around changing the exe link. They too argued that it is trivial to spoof the exe link already which is true. The argument brought up against this was that the Tomoyo LSM uses the exe link in tomoyo_manager() to detect whether the calling process is a policy manager. This caused changing the exe links to be guarded by userns CAP_SYS_ADMIN. All in all this rather seems like a "better guard it with something rather than nothing" argument which imho doesn't qualify as a great security policy. Again, because spoofing the exe link is possible for the calling process so even if this were security relevant it was broken back then and would be broken today. So technically, dropping all permissions around changing the exe link would probably be possible and would send a clearer message to any userspace that relies on /proc/self/exe for security reasons that they should stop doing this but for now we're only relaxing the exe link permissions from userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE. There's a final uapi change in here. Changing the exe link used to accidently return EINVAL when the caller lacked the necessary permissions instead of the more correct EPERM. This pr contains a commit fixing this. I assume that userspace won't notice or care and if they do I will revert this commit. But since we are changing the permissions anyway it seems like a good opportunity to try this fix. With these changes merged unprivileged checkpoint/restore will be possible and has already been tested by various users" [1] LPC 2018 1. "Task Migration at Google Using CRIU" https://www.youtube.com/watch?v=yI_1cuhoDgA&t=12095 2. "Securely Migrating Untrusted Workloads with CRIU" https://www.youtube.com/watch?v=yI_1cuhoDgA&t=14400 LPC 2019 1. "CRIU and the PID dance" https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=2m48s 2. "Update on Task Migration at Google Using CRIU" https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=1h2m8s [2] https://github.com/checkpoint-restore/criu/pull/1155 [3] https://github.com/nviennot/run_as_exe * tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests: add clone3() CAP_CHECKPOINT_RESTORE test prctl: exe link permission error changed from -EINVAL to -EPERM prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid pid: use checkpoint_restore_ns_capable() for set_tid capabilities: Introduce CAP_CHECKPOINT_RESTORE
2020-08-04Merge branch 'exec-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve updates from Eric Biederman: "During the development of v5.7 I ran into bugs and quality of implementation issues related to exec that could not be easily fixed because of the way exec is implemented. So I have been diggin into exec and cleaning up what I can. This cycle I have been looking at different ideas and different implementations to see what is possible to improve exec, and cleaning the way exec interfaces with in kernel users. Only cleaning up the interfaces of exec with rest of the kernel has managed to stabalize and make it through review in time for v5.9-rc1 resulting in 2 sets of changes this cycle. - Implement kernel_execve - Make the user mode driver code a better citizen With kernel_execve the code size got a little larger as the copying of parameters from userspace and copying of parameters from userspace is now separate. The good news is kernel threads no longer need to play games with set_fs to use exec. Which when combined with the rest of Christophs set_fs changes should security bugs with set_fs much more difficult" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) exec: Implement kernel_execve exec: Factor bprm_stack_limits out of prepare_arg_pages exec: Factor bprm_execve out of do_execve_common exec: Move bprm_mm_init into alloc_bprm exec: Move initialization of bprm->filename into alloc_bprm exec: Factor out alloc_bprm exec: Remove unnecessary spaces from binfmts.h umd: Stop using split_argv umd: Remove exit_umh bpfilter: Take advantage of the facilities of struct pid exit: Factor thread_group_exited out of pidfd_poll umd: Track user space drivers with struct pid bpfilter: Move bpfilter_umh back into init data exec: Remove do_execve_file umh: Stop calling do_execve_file umd: Transform fork_usermode_blob into fork_usermode_driver umd: Rename umd_info.cmdline umd_info.driver_name umd: For clarity rename umh_info umd_info umh: Separate the user mode driver and the user mode helper support umh: Remove call_usermodehelper_setup_file. ...
2020-08-04Merge tag 'seccomp-v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "There are a bunch of clean ups and selftest improvements along with two major updates to the SECCOMP_RET_USER_NOTIF filter return: EPOLLHUP support to more easily detect the death of a monitored process, and being able to inject fds when intercepting syscalls that expect an fd-opening side-effect (needed by both container folks and Chrome). The latter continued the refactoring of __scm_install_fd() started by Christoph, and in the process found and fixed a handful of bugs in various callers. - Improved selftest coverage, timeouts, and reporting - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner) - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)" * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD seccomp: Introduce addfd ioctl to seccomp user notifier fs: Expand __receive_fd() to accept existing fd pidfd: Replace open-coded receive_fd() fs: Add receive_fd() wrapper for __receive_fd() fs: Move __scm_install_fd() to __receive_fd() net/scm: Regularize compat handling of scm_detach_fds() pidfd: Add missing sock updates for pidfd_getfd() net/compat: Add missing sock updates for SCM_RIGHTS selftests/seccomp: Check ENOSYS under tracing selftests/seccomp: Refactor to use fixture variants selftests/harness: Clean up kern-doc for fixtures seccomp: Use -1 marker for end of mode 1 syscall list seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall() selftests/seccomp: Make kcmp() less required seccomp: Use pr_fmt selftests/seccomp: Improve calibration loop selftests/seccomp: use 90s as timeout selftests/seccomp: Expand benchmark to per-filter measurements ...
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-08-04Merge tag 'pstore-v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: "A tiny pstore update which fixes a very corner-case build failure: - Fix linking when crypto API disabled (Matteo Croce)" * tag 'pstore-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: Fix linking when crypto API disabled
2020-08-04orangefs: remove unnecessary assignment to variable retJing Xiangfeng
The variable ret is guaranteed to be 0 in this if (). So we can remove this assignement. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2020-08-04ceph: use frag's MDS in either modeYanhu Cao
When doing some tests with multiple mds, we were seeing many mds forwarding requests between them, causing clients to resend. If the request is a modification operation and the mode is set to USE_AUTH_MDS, then the auth mds should be selected to handle the request. If auth mds for frag is already set, then it should be returned directly without further processing. The current logic is wrong because it only returns directly if mode is USE_AUTH_MDS, but we want to do that for all modes. If we don't, then when the frag's mds is not equal to cap session's mds, the request will get sent to the wrong MDS needlessly. Drop the mode check in this condition. Signed-off-by: Yanhu Cao <gmayyyha@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-04ceph: move sb->wb_pagevec_pool to be a global mempoolJeff Layton
When doing some testing recently, I hit some page allocation failures on mount, when creating the wb_pagevec_pool for the mount. That requires 128k (32 contiguous pages), and after thrashing the memory during an xfstests run, sometimes that would fail. 128k for each mount seems like a lot to hold in reserve for a rainy day, so let's change this to a global mempool that gets allocated when the module is plugged in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-04ceph: set sec_context xattr on symlink creationJeff Layton
Symlink inodes should have the security context set in their xattrs on creation. We already set the context on creation, but we don't attach the pagelist. The effect is that symlink inodes don't get an SELinux context set on them at creation, so they end up unlabeled instead of inheriting the proper context. Make it do so. Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03f2fs: prepare a waiter before entering io_scheduleJaegeuk Kim
This is to avoid sleep() in the waiter thread. [ 20.157753] ------------[ cut here ]------------ [ 20.158393] do not call blocking ops when !TASK_RUNNING; state=2 set at [<0000000096354225>] prepare_to_wait+0xcd/0x430 [ 20.159858] WARNING: CPU: 1 PID: 1152 at kernel/sched/core.c:7142 __might_sleep+0x149/0x1a0 ... [ 20.176110] __submit_merged_write_cond+0x191/0x310 [ 20.176739] f2fs_submit_merged_write+0x18/0x20 [ 20.177323] f2fs_wait_on_all_pages+0x269/0x2d0 [ 20.177899] ? block_operations+0x980/0x980 [ 20.178441] ? __kasan_check_read+0x11/0x20 [ 20.178975] ? finish_wait+0x260/0x260 [ 20.179488] ? percpu_counter_set+0x147/0x230 [ 20.180049] do_checkpoint+0x1757/0x2a50 [ 20.180558] f2fs_write_checkpoint+0x840/0xaf0 [ 20.181126] f2fs_sync_fs+0x287/0x4a0 Reported-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-08-03Merge tag 'pm-5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The most significant change here is the extension of the Energy Model to cover non-CPU devices (as well as CPUs) from Lukasz Luba. There is also some new hardware support (Ice Lake server idle states table for intel_idle, Sapphire Rapids and Power Limit 4 support in the RAPL driver), some new functionality in the existing drivers (eg. a new switch to disable/enable CPU energy-efficiency optimizations in intel_pstate, delayed timers in devfreq), some assorted fixes (cpufreq core, intel_pstate, intel_idle) and cleanups (eg. cpuidle-psci, devfreq), including the elimination of W=1 build warnings from cpufreq done by Lee Jones. Specifics: - Make the Energy Model cover non-CPU devices (Lukasz Luba). - Add Ice Lake server idle states table to the intel_idle driver and eliminate a redundant static variable from it (Chen Yu, Rafael Wysocki). - Eliminate all W=1 build warnings from cpufreq (Lee Jones). - Add support for Sapphire Rapids and for Power Limit 4 to the Intel RAPL power capping driver (Sumeet Pawnikar, Zhang Rui). - Fix function name in kerneldoc comments in the idle_inject power capping driver (Yangtao Li). - Fix locking issues with cpufreq governors and drop a redundant "weak" function definition from cpufreq (Viresh Kumar). - Rearrange cpufreq to register non-modular governors at the core_initcall level and allow the default cpufreq governor to be specified in the kernel command line (Quentin Perret). - Extend, fix and clean up the intel_pstate driver (Srinivas Pandruvada, Rafael Wysocki): * Add a new sysfs attribute for disabling/enabling CPU energy-efficiency optimizations in the processor. * Make the driver avoid enabling HWP if EPP is not supported. * Allow the driver to handle numeric EPP values in the sysfs interface and fix the setting of EPP via sysfs in the active mode. * Eliminate a static checker warning and clean up a kerneldoc comment. - Clean up some variable declarations in the powernv cpufreq driver (Wei Yongjun). - Fix up the ->enter_s2idle callback definition to cover the case when it points to the same function as ->idle correctly (Neal Liu). - Rearrange and clean up the PSCI cpuidle driver (Ulf Hansson). - Make the PM core emit "changed" uevent when adding/removing the "wakeup" sysfs attribute of devices (Abhishek Pandit-Subedi). - Add a helper macro for declaring PM callbacks and use it in the MMC jz4740 driver (Paul Cercueil). - Fix white space in some places in the hibernate code and make the system-wide PM code use "const char *" where appropriate (Xiang Chen, Alexey Dobriyan). - Add one more "unsafe" helper macro to the freezer to cover the NFS use case (He Zhe). - Change the language in the generic PM domains framework to use parent/child terminology and clean up a typo and some comment fromatting in that code (Kees Cook, Geert Uytterhoeven). - Update the operating performance points OPP framework (Lukasz Luba, Andrew-sh.Cheng, Valdis Kletnieks): * Refactor dev_pm_opp_of_register_em() and update related drivers. * Add a missing function export. * Allow disabled OPPs in dev_pm_opp_get_freq(). - Update devfreq core and drivers (Chanwoo Choi, Lukasz Luba, Enric Balletbo i Serra, Dmitry Osipenko, Kieran Bingham, Marc Zyngier): * Add support for delayed timers to the devfreq core and make the Samsung exynos5422-dmc driver use it. * Unify sysfs interface to use "df-" as a prefix in instance names consistently. * Fix devfreq_summary debugfs node indentation. * Add the rockchip,pmu phandle to the rk3399_dmc driver DT bindings. * List Dmitry Osipenko as the Tegra devfreq driver maintainer. * Fix typos in the core devfreq code. - Update the pm-graph utility to version 5.7 including a number of fixes related to suspend-to-idle (Todd Brandt). - Fix coccicheck errors and warnings in the cpupower utility (Shuah Khan). - Replace HTTP links with HTTPs ones in multiple places (Alexander A. Klimov)" * tag 'pm-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits) cpuidle: ACPI: fix 'return' with no value build warning cpufreq: intel_pstate: Fix EPP setting via sysfs in active mode cpufreq: intel_pstate: Rearrange the storing of new EPP values intel_idle: Customize IceLake server support PM / devfreq: Fix the wrong end with semicolon PM / devfreq: Fix indentaion of devfreq_summary debugfs node PM / devfreq: Clean up the devfreq instance name in sysfs attr memory: samsung: exynos5422-dmc: Add module param to control IRQ mode memory: samsung: exynos5422-dmc: Adjust polling interval and uptreshold memory: samsung: exynos5422-dmc: Use delayed timer as default PM / devfreq: Add support delayed timer for polling mode dt-bindings: devfreq: rk3399_dmc: Add rockchip,pmu phandle PM / devfreq: tegra: Add Dmitry as a maintainer PM / devfreq: event: Fix trivial spelling PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent cpuidle: change enter_s2idle() prototype cpuidle: psci: Prevent domain idlestates until consumers are ready cpuidle: psci: Convert PM domain to platform driver cpuidle: psci: Fix error path via converting to a platform driver cpuidle: psci: Fail cpuidle registration if set OSI mode failed ...
2020-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-08-04 The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 135 files changed, 4603 insertions(+), 1013 deletions(-). The main changes are: 1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko. 2) Add BPF iterator for map elements and to iterate all BPF programs for efficient in-kernel inspection, from Yonghong Song and Alexei Starovoitov. 3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid unwinder errors, from Song Liu. 4) Allow cgroup local storage map to be shared between programs on the same cgroup. Also extend BPF selftests with coverage, from YiFei Zhu. 5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM load instructions, from Jean-Philippe Brucker. 6) Follow-up fixes on BPF socket lookup in combination with reuseport group handling. Also add related BPF selftests, from Jakub Sitnicki. 7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for socket create/release as well as bind functions, from Stanislav Fomichev. 8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct xdp_statistics, from Peilin Ye. 9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime. 10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6} fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin. 11) Fix a bpftool segfault due to missing program type name and make it more robust to prevent them in future gaps, from Quentin Monnet. 12) Consolidate cgroup helper functions across selftests and fix a v6 localhost resolver issue, from John Fastabend. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>