summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-02-22regulatory: add NUL to request alpha2Johannes Berg
Similar to the ancient commit a5fe8e7695dc ("regulatory: add NUL to alpha2"), add another byte to alpha2 in the request struct so that when we use nla_put_string(), we don't overrun anything. Fixes: 73d54c9e74c4 ("cfg80211: add regulatory netlink multicast group") Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-02-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Doug Ledford: "Nothing in this is overly interesting, it's mostly your garden variety fixes. There was some work in this merge cycle around the new ioctl kABI, so there are fixes in here related to that (probably with more to come). We've also recently added new netlink support with a goal of moving the primary means of configuring the entire subsystem to netlink (eventually, this is a long term project), so there are fixes for that. Then a few bnxt_re driver fixes, and a few minor WARN_ON removals, and that covers this pull request. There are already a few more fixes on the list as of this morning, so there will certainly be more to come in this rc cycle ;-) Summary: - Lots of fixes for the new IOCTL interface and general uverbs flow. Found through testing and syzkaller - Bugfixes for the new resource track netlink reporting - Remove some unneeded WARN_ONs that were triggering for some users in IPoIB - Various fixes for the bnxt_re driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (27 commits) RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type RDMA/bnxt_re: Avoid system hang during device un-reg RDMA/bnxt_re: Fix system crash during load/unload RDMA/bnxt_re: Synchronize destroy_qp with poll_cq RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails RDMA/bnxt_re: Disable atomic capability on bnxt_re adapters RDMA/restrack: don't use uaccess_kernel() RDMA/verbs: Check existence of function prior to accessing it RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file RDMA/uverbs: Sanitize user entered port numbers prior to access it RDMA/uverbs: Fix circular locking dependency RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcd RDMA/restrack: Increment CQ restrack object before committing RDMA/uverbs: Protect from command mask overflow IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy IB/uverbs: Improve lockdep_check RDMA/uverbs: Protect from races between lookup and destroy of uobjects IB/uverbs: Hold the uobj write lock after allocate IB/uverbs: Fix possible oops with duplicate ioctl attributes IB/uverbs: Add ioctl support for 32bit processes ...
2018-02-22Merge tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V cleanups from Palmer Dabbelt: "This contains a handful of small cleanups. The only functional change is that IRQs are now enabled during exception handling, which was found when some warnings triggered with `CONFIG_DEBUG_ATOMIC_SLEEP=y`. The remaining fixes should have no functional change: `sbi_save()` has been renamed to `parse_dtb()` reflect what it actually does, and a handful of unused Kconfig entries have been removed" * tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: Rename sbi_save to parse_dtb to improve code readability RISC-V: Enable IRQ during exception handling riscv: Remove ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select riscv: kconfig: Remove RISCV_IRQ_INTC select riscv: Remove ARCH_WANT_OPTIONAL_GPIOLIB select
2018-02-22ibmvnic: Fix early release of login bufferThomas Falcon
The login buffer is released before the driver can perform sanity checks between resources the driver requested and what firmware will provide. Don't release the login buffer until the sanity check is performed. Fixes: 34f0f4e3f488 ("ibmvnic: Fix login buffer memory leaks") Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22net/smc9194: Remove bogus CONFIG_MAC referenceFinn Thain
AFAIK the only version of smc9194.c with Mac support is the one in the linux-mac68k CVS repo, which never made it to the mainline. Despite that, from v2.3.45, arch/m68k/config.in listed CONFIG_SMC9194 under CONFIG_MAC. This mistake got carried over into Kconfig in v2.5.55. (See pre-git era "[PATCH] add m68k dependencies to net driver config".) Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22net: ipv4: Set addr_type in hash_keys for forwarded caseDavid Ahern
The result of the skb flow dissect is copied from keys to hash_keys to ensure only the intended data is hashed. The original L4 hash patch overlooked setting the addr_type for this case; add it. Fixes: bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22tcp_bbr: better deal with suboptimal GSOEric Dumazet
BBR uses tcp_tso_autosize() in an attempt to probe what would be the burst sizes and to adjust cwnd in bbr_target_cwnd() with following gold formula : /* Allow enough full-sized skbs in flight to utilize end systems. */ cwnd += 3 * bbr->tso_segs_goal; But GSO can be lacking or be constrained to very small units (ip link set dev ... gso_max_segs 2) What we really want is to have enough packets in flight so that both GSO and GRO are efficient. So in the case GSO is off or downgraded, we still want to have the same number of packets in flight as if GSO/TSO was fully operational, so that GRO can hopefully be working efficiently. To fix this issue, we make tcp_tso_autosize() unaware of sk->sk_gso_max_segs Only tcp_tso_segs() has to enforce the gso_max_segs limit. Tested: ethtool -K eth0 tso off gso off tc qd replace dev eth0 root pfifo_fast Before patch: for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done     691  (ss -temoi shows cwnd is stuck around 6 )     667     651     631     517 After patch : # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done    1733 (ss -temoi shows cwnd is around 386 )    1778    1746    1781    1718 Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22smsc75xx: fix smsc75xx_set_features()Eric Dumazet
If an attempt is made to disable RX checksums, USB adapter is changed but netdev->features is not, because smsc75xx_set_features() returns a non zero value. This throws errors from netdev_rx_csum_fault() : <devname>: hw csum failure Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22netlink: put module reference if dump start failsJason A. Donenfeld
Before, if cb->start() failed, the module reference would never be put, because cb->cb_running is intentionally false at this point. Users are generally annoyed by this because they can no longer unload modules that leak references. Also, it may be possible to tediously wrap a reference counter back to zero, especially since module.c still uses atomic_inc instead of refcount_inc. This patch expands the error path to simply call module_put if cb->start() fails. Fixes: 41c87425a1ac ("netlink: do not set cb_running if dump's start() errs") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22Merge tag 'seccomp-v4.16-rc3' of ↵James Morris
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into fixes-v4.16-rc3 - Fix seccomp GET_METADATA to deal with field sizes correctly (Tycho Andersen) - Add selftest to make sure GET_METADATA doesn't regress (Tycho Andersen)
2018-02-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "16 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: don't defer struct page initialization for Xen pv guests lib/Kconfig.debug: enable RUNTIME_TESTING_MENU vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems selftests/memfd: add run_fuse_test.sh to TEST_FILES bug.h: work around GCC PR82365 in BUG() mm/swap.c: make functions and their kernel-doc agree (again) mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc ida: do zeroing in ida_pre_get() mm, swap, frontswap: fix THP swap if frontswap enabled certs/blacklist_nohashes.c: fix const confusion in certs blacklist kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE mm, mlock, vmscan: no more skipping pagevecs mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats Kbuild: always define endianess in kconfig.h include/linux/sched/mm.h: re-inline mmdrop() tools: fix cross-compile var clobbering
2018-02-22efivarfs: Limit the rate for non-root to read filesLuck, Tony
Each read from a file in efivarfs results in two calls to EFI (one to get the file size, another to get the actual data). On X86 these EFI calls result in broadcast system management interrupts (SMI) which affect performance of the whole system. A malicious user can loop performing reads from efivarfs bringing the system to its knees. Linus suggested per-user rate limit to solve this. So we add a ratelimit structure to "user_struct" and initialize it for the root user for no limit. When allocating user_struct for other users we set the limit to 100 per second. This could be used for other places that want to limit the rate of some detrimental user action. In efivarfs if the limit is exceeded when reading, we take an interruptible nap for 50ms and check the rate limit again. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-22kconfig.h: Include compiler types to avoid missed struct attributesKees Cook
The header files for some structures could get included in such a way that struct attributes (specifically __randomize_layout from path.h) would be parsed as variable names instead of attributes. This could lead to some instances of a structure being unrandomized, causing nasty GPFs, etc. This patch makes sure the compiler_types.h header is included in kconfig.h so that we've always got types and struct attributes defined, since kconfig.h is included from the compiler command line. Reported-by: Patrick McLean <chutzpah@gentoo.org> Root-caused-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-22NFS: make struct nlmclnt_fl_close_lock_ops staticColin Ian King
The structure nlmclnt_fl_close_lock_ops s local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: fs/nfs/nfs3proc.c:876:33: warning: symbol 'nlmclnt_fl_close_lock_ops' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-22nfs: system crashes after NFS4ERR_MOVED recoveryBill.Baker@oracle.com
nfs4_update_server unconditionally releases the nfs_client for the source server. If migration fails, this can cause the source server's nfs_client struct to be left with a low reference count, resulting in use-after-free. Also, adjust reference count handling for ELOOP. NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6 WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]() nfs_put_client+0xfa/0x110 [nfs] nfs4_run_state_manager+0x30/0x40 [nfsv4] kthread+0xd8/0xf0 BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8 nfs4_xdr_enc_write+0x6b/0x160 [nfsv4] rpcauth_wrap_req+0xac/0xf0 [sunrpc] call_transmit+0x18c/0x2c0 [sunrpc] __rpc_execute+0xa6/0x490 [sunrpc] rpc_async_schedule+0x15/0x20 [sunrpc] process_one_work+0x160/0x470 worker_thread+0x112/0x540 ? rescuer_thread+0x3f0/0x3f0 kthread+0xd8/0xf0 This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"), but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops") Reported-by: Helen Chao <helen.chao@oracle.com> Fixes: 52442f9b11b7 ("NFS4: Avoid migration loops") Signed-off-by: Bill Baker <bill.baker@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-22x86: Treat R_X86_64_PLT32 as R_X86_64_PC32H.J. Lu
On i386, there are 2 types of PLTs, PIC and non-PIC. PIE and shared objects must use PIC PLT. To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_ into EBX first. There is no need for that on x86-64 since x86-64 uses PC-relative PLT. On x86-64, for 32-bit PC-relative branches, we can generate PLT32 relocation, instead of PC32 relocation, which can also be used as a marker for 32-bit PC-relative branches. Linker can always reduce PLT32 relocation to PC32 if function is defined locally. Local functions should use PC32 relocation. As far as Linux kernel is concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32 since Linux kernel doesn't use PLT. R_X86_64_PLT32 for 32-bit PC-relative branches has been enabled in binutils master branch which will become binutils 2.31. [ hjl is working on having better documentation on this all, but a few more notes from him: "PLT32 relocation is used as marker for PC-relative branches. Because of EBX, it looks odd to generate PLT32 relocation on i386 when EBX doesn't have GOT. As for symbol resolution, PLT32 and PC32 relocations are almost interchangeable. But when linker sees PLT32 relocation against a protected symbol, it can resolved locally at link-time since it is used on a branch instruction. Linker can't do that for PC32 relocation" but for the kernel use, the two are basically the same, and this commit gets things building and working with the current binutils master - Linus ] Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-22KEYS: Use individual pages in big_key for crypto buffersDavid Howells
kmalloc() can't always allocate large enough buffers for big_key to use for crypto (1MB + some metadata) so we cannot use that to allocate the buffer. Further, vmalloc'd pages can't be passed to sg_init_one() and the aead crypto accessors cannot be called progressively and must be passed all the data in one go (which means we can't pass the data in one block at a time). Fix this by allocating the buffer pages individually and passing them through a multientry scatterlist to the crypto layer. This has the bonus advantage that we don't have to allocate a contiguous series of pages. We then vmap() the page list and pass that through to the VFS read/write routines. This can trigger a warning: WARNING: CPU: 0 PID: 60912 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0xb7c/0x15f8 ([<00000000002acbb6>] __alloc_pages_nodemask+0x1ee/0x15f8) [<00000000002dd356>] kmalloc_order+0x46/0x90 [<00000000002dd3e0>] kmalloc_order_trace+0x40/0x1f8 [<0000000000326a10>] __kmalloc+0x430/0x4c0 [<00000000004343e4>] big_key_preparse+0x7c/0x210 [<000000000042c040>] key_create_or_update+0x128/0x420 [<000000000042e52c>] SyS_add_key+0x124/0x220 [<00000000007bba2c>] system_call+0xc4/0x2b0 from the keyctl/padd/useradd test of the keyutils testsuite on s390x. Note that it might be better to shovel data through in page-sized lumps instead as there's no particular need to use a monolithic buffer unless the kernel itself wants to access the data. Fixes: 13100a72f40f ("Security: Keys: Big keys stored encrypted") Reported-by: Paul Bunyan <pbunyan@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Kirill Marinushkin <k.marinushkin@gmail.com>
2018-02-22X.509: fix NULL dereference when restricting key with unsupported_sigEric Biggers
The asymmetric key type allows an X.509 certificate to be added even if its signature's hash algorithm is not available in the crypto API. In that case 'payload.data[asym_auth]' will be NULL. But the key restriction code failed to check for this case before trying to use the signature, resulting in a NULL pointer dereference in key_or_keyring_common() or in restrict_link_by_signature(). Fix this by returning -ENOPKG when the signature is unsupported. Reproducer when all the CONFIG_CRYPTO_SHA512* options are disabled and keyctl has support for the 'restrict_keyring' command: keyctl new_session keyctl restrict_keyring @s asymmetric builtin_trusted openssl req -new -sha512 -x509 -batch -nodes -outform der \ | keyctl padd asymmetric desc @s Fixes: a511e1af8b12 ("KEYS: Move the point of trust determination to __key_link()") Cc: <stable@vger.kernel.org> # v4.7+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-22X.509: fix BUG_ON() when hash algorithm is unsupportedEric Biggers
The X.509 parser mishandles the case where the certificate's signature's hash algorithm is not available in the crypto API. In this case, x509_get_sig_params() doesn't allocate the cert->sig->digest buffer; this part seems to be intentional. However, public_key_verify_signature() is still called via x509_check_for_self_signed(), which triggers the 'BUG_ON(!sig->digest)'. Fix this by making public_key_verify_signature() return -ENOPKG if the hash buffer has not been allocated. Reproducer when all the CONFIG_CRYPTO_SHA512* options are disabled: openssl req -new -sha512 -x509 -batch -nodes -outform der \ | keyctl padd asymmetric desc @s Fixes: 6c2dc5ae4ab7 ("X.509: Extract signature digest and make self-signed cert checks earlier") Reported-by: Paolo Valente <paolo.valente@linaro.org> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: <stable@vger.kernel.org> # v4.7+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-22PKCS#7: fix direct verification of SignerInfo signatureEric Biggers
If none of the certificates in a SignerInfo's certificate chain match a trusted key, nor is the last certificate signed by a trusted key, then pkcs7_validate_trust_one() tries to check whether the SignerInfo's signature was made directly by a trusted key. But, it actually fails to set the 'sig' variable correctly, so it actually verifies the last signature seen. That will only be the SignerInfo's signature if the certificate chain is empty; otherwise it will actually be the last certificate's signature. This is not by itself a security problem, since verifying any of the certificates in the chain should be sufficient to verify the SignerInfo. Still, it's not working as intended so it should be fixed. Fix it by setting 'sig' correctly for the direct verification case. Fixes: 757932e6da6d ("PKCS#7: Handle PKCS#7 messages that contain no X.509 certs") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-22PKCS#7: fix certificate blacklistingEric Biggers
If there is a blacklisted certificate in a SignerInfo's certificate chain, then pkcs7_verify_sig_chain() sets sinfo->blacklisted and returns 0. But, pkcs7_verify() fails to handle this case appropriately, as it actually continues on to the line 'actual_ret = 0;', indicating that the SignerInfo has passed verification. Consequently, PKCS#7 signature verification ignores the certificate blacklist. Fix this by not considering blacklisted SignerInfos to have passed verification. Also fix the function comment with regards to when 0 is returned. Fixes: 03bb79315ddc ("PKCS#7: Handle blacklisted certificates") Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-22PKCS#7: fix certificate chain verificationEric Biggers
When pkcs7_verify_sig_chain() is building the certificate chain for a SignerInfo using the certificates in the PKCS#7 message, it is passing the wrong arguments to public_key_verify_signature(). Consequently, when the next certificate is supposed to be used to verify the previous certificate, the next certificate is actually used to verify itself. An attacker can use this bug to create a bogus certificate chain that has no cryptographic relationship between the beginning and end. Fortunately I couldn't quite find a way to use this to bypass the overall signature verification, though it comes very close. Here's the reasoning: due to the bug, every certificate in the chain beyond the first actually has to be self-signed (where "self-signed" here refers to the actual key and signature; an attacker might still manipulate the certificate fields such that the self_signed flag doesn't actually get set, and thus the chain doesn't end immediately). But to pass trust validation (pkcs7_validate_trust()), either the SignerInfo or one of the certificates has to actually be signed by a trusted key. Since only self-signed certificates can be added to the chain, the only way for an attacker to introduce a trusted signature is to include a self-signed trusted certificate. But, when pkcs7_validate_trust_one() reaches that certificate, instead of trying to verify the signature on that certificate, it will actually look up the corresponding trusted key, which will succeed, and then try to verify the *previous* certificate, which will fail. Thus, disaster is narrowly averted (as far as I could tell). Fixes: 6c2dc5ae4ab7 ("X.509: Extract signature digest and make self-signed cert checks earlier") Cc: <stable@vger.kernel.org> # v4.7+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-22selftests/bpf/test_maps: exit child process without error in ENOMEM caseLi Zhijian
test_maps contains a series of stress tests, and previously it will break the rest tests when it failed to alloc memory. ----------------------- Failed to create hashmap key=8 value=262144 'Cannot allocate memory' Failed to create hashmap key=16 value=262144 'Cannot allocate memory' Failed to create hashmap key=8 value=262144 'Cannot allocate memory' Failed to create hashmap key=8 value=262144 'Cannot allocate memory' test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed. Aborted not ok 1..3 selftests: test_maps [FAIL] ----------------------- after this patch, the rest tests will be continue when it occurs an ENOMEM failure CC: Alexei Starovoitov <alexei.starovoitov@gmail.com> CC: Philip Li <philip.li@intel.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-22arm64: Enforce BBM for huge IO/VMAP mappingsWill Deacon
ioremap_page_range doesn't honour break-before-make and attempts to put down huge mappings (using p*d_set_huge) over the top of pre-existing table entries. This leads to us leaking page table memory and also gives rise to TLB conflicts and spurious aborts, which have been seen in practice on Cortex-A75. Until this has been resolved, refuse to put block mappings when the existing entry is found to be present. Fixes: 324420bf91f60 ("arm64: add support for ioremap() block mappings") Reported-by: Hanjun Guo <hanjun.guo@linaro.org> Reported-by: Lei Li <lious.lilei@hisilicon.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-22i2c: designware: Consider SCL GPIO optionalAndy Shevchenko
GPIO library can return -ENOSYS for the failed request. Instead of failing ->probe() in this case override error code to 0. Fixes: ca382f5b38f3 ("i2c: designware: add i2c gpio recovery option") Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-02-22i2c: busses: i2c-sirf: Fix spelling: "formular" -> "formula".Patryk Kocielnik
Fix spelling. Signed-off-by: Patryk Kocielnik <patryk.kocielnik@gmail.com> [wsa: fixed "Initialization", too] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-02-22i2c: bcm2835: Set up the rising/falling edge delaysEric Anholt
We were leaving them in the power on state (or the state the firmware had set up for some client, if we were taking over from them). The boot state was 30 core clocks, when we actually want to sample some time after (to make sure that the new input bit has actually arrived). Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2018-02-22treewide/trivial: Remove ';;$' typo noiseIngo Molnar
On lkml suggestions were made to split up such trivial typo fixes into per subsystem patches: --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height) struct efi_uga_draw_protocol *uga = NULL, *first_uga; efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID; unsigned long nr_ugas; - u32 *handles = (u32 *)uga_handle;; + u32 *handles = (u32 *)uga_handle; efi_status_t status = EFI_INVALID_PARAMETER; int i; This patch is the result of the following script: $ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq) ... followed by manual review to make sure it's all good. Splitting this up is just crazy talk, let's get over with this and just do it. Reported-by: Pavel Machek <pavel@ucw.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-22powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data accessMark Lord
I am using SECCOMP to filter syscalls on a ppc32 platform, and noticed that the JIT compiler was failing on the BPF even though the interpreter was working fine. The issue was that the compiler was missing one of the instructions used by SECCOMP, so here is a patch to enable JIT for that instruction. Fixes: eb84bab0fb38 ("ppc: Kconfig: Enable BPF JIT on ppc32") Signed-off-by: Mark Lord <mlord@pobox.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-22powerpc/pseries: Revert support for ibm,drc-info devtree propertyMichael Bringmann
This reverts commit 02ef6dd8109b581343ebeb1c4c973513682535d6. The earlier patch tried to enable support for a new property "ibm,drc-info" on powerpc systems. Unfortunately, some errors in the associated patch set break things in some of the DLPAR operations. In particular when attempting to hot-add a new CPU or set of CPUs, the original patch failed to properly calculate the available resources, and aborted the operation. In addition, the original set missed several opportunities to compress and reuse common code. As the associated patch set was meant to provide an optimization of storage and performance of a set of device-tree properties for future systems with large amounts of resources, reverting just restores the previous behavior for existing systems. It seems unnecessary to enable this feature and introduce the consequent problems in the field that it will cause at this time, so please revert it for now until testing of the corrections are finished properly. Fixes: 02ef6dd8109b ("powerpc: Enable support for ibm,drc-info devtree property") Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-02-22powerpc/pseries: Fix duplicate firmware feature for DRC_INFOMichael Ellerman
We had a mid-air collision between two new firmware features, DRMEM_V2 and DRC_INFO, and they ended up with the same value. No one's actually reported any problems, presumably because the new firmware that supports both properties is not widely available, and the two properties tend to be enabled together. Still if we ever had one enabled but not the other, the bugs that could result are many and varied. So fix it. Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info firmware feature") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
2018-02-21seccomp: add a selftest for get_metadataTycho Andersen
Let's test that we get the flags correctly, and that we preserve the filter index across the ptrace(PTRACE_SECCOMP_GET_METADATA) correctly. Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-02-21ptrace, seccomp: tweak get_metadata behavior slightlyTycho Andersen
Previously if users passed a small size for the input structure size, they would get get odd behavior. It doesn't make sense to pass a structure smaller than at least filter_off size, so let's just give -EINVAL in this case. This changes userspace visible behavior, but was only introduced in commit 26500475ac1b ("ptrace, seccomp: add support for retrieving seccomp metadata") in 4.16-rc2, so should be safe to change if merged before then. Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-02-21seccomp, ptrace: switch get_metadata types to arch independentTycho Andersen
Commit 26500475ac1b ("ptrace, seccomp: add support for retrieving seccomp metadata") introduced `struct seccomp_metadata`, which contained unsigned longs that should be arch independent. The type of the flags member was chosen to match the corresponding argument to seccomp(), and so we need something at least as big as unsigned long. My understanding is that __u64 should fit the bill, so let's switch both types to that. While this is userspace facing, it was only introduced in 4.16-rc2, and so should be safe assuming it goes in before then. Reported-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-02-22selftests/bpf: update gitignore with test_libbpf_openAnders Roxell
bpf builds a test program for loading BPF ELF files. Add the executable to the .gitignore list. Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Daniel Díaz <daniel.diaz@linaro.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-22selftests/bpf: tcpbpf_kern: use in6_* macros from glibcAnders Roxell
Both glibc and the kernel have in6_* macros definitions. Build fails because it picks up wrong in6_* macro from the kernel header and not the header from glibc. Fixes build error below: clang -I. -I./include/uapi -I../../../include/uapi -Wno-compare-distinct-pointer-types \ -O2 -target bpf -emit-llvm -c test_tcpbpf_kern.c -o - | \ llc -march=bpf -mcpu=generic -filetype=obj -o .../tools/testing/selftests/bpf/test_tcpbpf_kern.o In file included from test_tcpbpf_kern.c:12: .../netinet/in.h:101:5: error: expected identifier IPPROTO_HOPOPTS = 0, /* IPv6 Hop-by-Hop options. */ ^ .../linux/in6.h:131:26: note: expanded from macro 'IPPROTO_HOPOPTS' ^ In file included from test_tcpbpf_kern.c:12: /usr/include/netinet/in.h:103:5: error: expected identifier IPPROTO_ROUTING = 43, /* IPv6 routing header. */ ^ .../linux/in6.h:132:26: note: expanded from macro 'IPPROTO_ROUTING' ^ In file included from test_tcpbpf_kern.c:12: .../netinet/in.h:105:5: error: expected identifier IPPROTO_FRAGMENT = 44, /* IPv6 fragmentation header. */ ^ Since both glibc and the kernel have in6_* macros definitions, use the one from glibc. Kernel headers will check for previous libc definitions by including include/linux/libc-compat.h. Reported-by: Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-22bpf: clean up unused-variable warningArnd Bergmann
The only user of this variable is inside of an #ifdef, causing a warning without CONFIG_INET: net/core/filter.c: In function '____bpf_sock_ops_cb_flags_set': net/core/filter.c:3382:6: error: unused variable 'val' [-Werror=unused-variable] int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS; This replaces the #ifdef with a nicer IS_ENABLED() check that makes the code more readable and avoids the warning. Fixes: b13d88072172 ("bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-21mm: don't defer struct page initialization for Xen pv guestsJuergen Gross
Commit f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") broke Xen pv domains in some configurations, as the "Pinned" information in struct page of early page tables could get lost. This will lead to the kernel trying to write directly into the page tables instead of asking the hypervisor to do so. The result is a crash like the following: BUG: unable to handle kernel paging request at ffff8801ead19008 IP: xen_set_pud+0x4e/0xd0 PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065 Oops: 0003 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271 Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014 task: ffffffff81c10480 task.stack: ffffffff81c00000 RIP: e030:xen_set_pud+0x4e/0xd0 Call Trace: __pmd_alloc+0x128/0x140 ioremap_page_range+0x3f4/0x410 __ioremap_caller+0x1c3/0x2e0 acpi_os_map_iomem+0x175/0x1b0 acpi_tb_acquire_table+0x39/0x66 acpi_tb_validate_table+0x44/0x7c acpi_tb_verify_temp_table+0x45/0x304 acpi_reallocate_root_table+0x12d/0x141 acpi_early_init+0x4d/0x10a start_kernel+0x3eb/0x4a1 xen_start_kernel+0x528/0x532 Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3 RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8 CR2: ffff8801ead19008 ---[ end trace 38eca2e56f1b642e ]--- Avoid this problem by not deferring struct page initialization when running as Xen pv guest. Pavel said: : This is unique for Xen, so this particular issue won't effect other : configurations. I am going to investigate if there is a way to : re-enable deferred page initialization on xen guests. [akpm@linux-foundation.org: explicitly include xen.h] Link: http://lkml.kernel.org/r/20180216154101.22865-1-jgross@suse.com Fixes: f7f99100d8d95d ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Picco <bob.picco@oracle.com> Cc: <stable@vger.kernel.org> [4.15.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21lib/Kconfig.debug: enable RUNTIME_TESTING_MENUAnders Roxell
Commit d3deafaa8b5c ("lib/: make RUNTIME_TESTS a menuconfig to ease disabling it all") causes a regression when using runtime tests due to it defaults RUNTIME_TESTING_MENU to not set. Link: http://lkml.kernel.org/r/20180214133015.10090-1-anders.roxell@linaro.org Fixes: d3deafaa8b5c ("lib/: make RUNTIME_TESTS a menuconfig to easedisabling it all") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Cc: Vincent Legoll <vincent.legoll@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systemsMichal Hocko
Kai Heng Feng has noticed that BUG_ON(PageHighMem(pg)) triggers in drivers/media/common/saa7146/saa7146_core.c since 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly"). saa7146_vmalloc_build_pgtable uses vmalloc_32 and it is reasonable to expect that the resulting page is not in highmem. The above commit aimed to add __GFP_HIGHMEM only for those requests which do not specify any zone modifier gfp flag. vmalloc_32 relies on GFP_VMALLOC32 which should do the right thing. Except it has been missed that GFP_VMALLOC32 is an alias for GFP_KERNEL on 32b architectures. Thanks to Matthew to notice this. Fix the problem by unconditionally setting GFP_DMA32 in GFP_VMALLOC32 for !64b arches (as a bailout). This should do the right thing and use ZONE_NORMAL which should be always below 4G on 32b systems. Debugged by Matthew Wilcox. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20180212095019.GX21609@dhcp22.suse.cz Fixes: 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly”) Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Kai Heng Feng <kai.heng.feng@canonical.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Laura Abbott <labbott@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21selftests/memfd: add run_fuse_test.sh to TEST_FILESAnders Roxell
While testing memfd tests, there is a missing script, as reported by kselftest: ./run_tests.sh: line 7: ./run_fuse_test.sh: No such file or directory Link: http://lkml.kernel.org/r/1517955779-11386-1-git-send-email-daniel.diaz@linaro.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21bug.h: work around GCC PR82365 in BUG()Arnd Bergmann
Looking at functions with large stack frames across all architectures led me discovering that BUG() suffers from the same problem as fortify_panic(), which I've added a workaround for already. In short, variables that go out of scope by calling a noreturn function or __builtin_unreachable() keep using stack space in functions afterwards. A workaround that was identified is to insert an empty assembler statement just before calling the function that doesn't return. I'm adding a macro "barrier_before_unreachable()" to document this, and insert calls to that in all instances of BUG() that currently suffer from this problem. The files that saw the largest change from this had these frame sizes before, and much less with my patch: fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=] drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=] In case of ARC and CRIS, it turns out that the BUG() implementation actually does return (or at least the compiler thinks it does), resulting in lots of warnings about uninitialized variable use and leaving noreturn functions, such as: block/cfq-iosched.c: In function 'cfq_async_queue_prio': block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type] include/linux/dmaengine.h: In function 'dma_maxpq': include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type] This makes them call __builtin_trap() instead, which should normally dump the stack and kill the current process, like some of the other architectures already do. I tried adding barrier_before_unreachable() to panic() and fortify_panic() as well, but that had very little effect, so I'm not submitting that patch. Vineet said: : For ARC, it is double win. : : 1. Fixes 3 -Wreturn-type warnings : : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of : non-void function [-Wreturn-type] : : 2. bloat-o-meter reports code size improvements as gcc elides the : generated code for stack return. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Tested-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christopher Li <sparse@chrisli.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21mm/swap.c: make functions and their kernel-doc agree (again)Mike Rapoport
There was a conflict between the commit e02a9f048ef7 ("mm/swap.c: make functions and their kernel-doc agree") and the commit f144c390f905 ("mm: docs: fix parameter names mismatch") that both tried to fix mismatch betweeen pagevec_lookup_entries() parameter names and their description. Since nr_entries is a better name for the parameter, fix the description again. Link: http://lkml.kernel.org/r/1518116946-20947-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-docMike Rapoport
[akpm@linux-foundation.org: add colon, per Randy] Link: http://lkml.kernel.org/r/1518116984-21141-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21ida: do zeroing in ida_pre_get()Rasmus Villemoes
As far as I can tell, the only place the per-cpu ida_bitmap is populated is in ida_pre_get. The pre-allocated element is stolen in two places in ida_get_new_above, in both cases immediately followed by a memset(0). Since ida_get_new_above is called with locks held, do the zeroing in ida_pre_get, or rather let kmalloc() do it. Also, apparently gcc generates ~44 bytes of code to do a memset(, 0, 128): $ scripts/bloat-o-meter vmlinux.{0,1} add/remove: 0/0 grow/shrink: 2/1 up/down: 5/-88 (-83) Function old new delta ida_pre_get 115 119 +4 vermagic 27 28 +1 ida_get_new_above 715 627 -88 Link: http://lkml.kernel.org/r/20180108225634.15340-1-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21mm, swap, frontswap: fix THP swap if frontswap enabledHuang Ying
It was reported by Sergey Senozhatsky that if THP (Transparent Huge Page) and frontswap (via zswap) are both enabled, when memory goes low so that swap is triggered, segfault and memory corruption will occur in random user space applications as follow, kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] #0 0x00007fc08889ae0d _int_malloc (libc.so.6) #1 0x00007fc08889c2f3 malloc (libc.so.6) #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) #3 0x0000560e6005e75c n/a (urxvt) #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt) #8 0x0000560e6005cb55 ev_run (urxvt) #9 0x0000560e6003b9b9 main (urxvt) #10 0x00007fc08883af4a __libc_start_main (libc.so.6) #11 0x0000560e6003f9da _start (urxvt) After bisection, it was found the first bad commit is bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out"). The root cause is as follows: When the pages are written to swap device during swapping out in swap_writepage(), zswap (fontswap) is tried to compress the pages to improve performance. But zswap (frontswap) will treat THP as a normal page, so only the head page is saved. After swapping in, tail pages will not be restored to their original contents, causing memory corruption in the applications. This is fixed by refusing to save page in the frontswap store functions if the page is a THP. So that the THP will be swapped out to swap device. Another choice is to split THP if frontswap is enabled. But it is found that the frontswap enabling isn't flexible. For example, if CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if zswap itself isn't enabled. Frontswap has multiple backends, to make it easy for one backend to enable THP support, the THP checking is put in backend frontswap store functions instead of the general interfaces. Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> [put THP checking in backend] Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Shaohua Li <shli@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Shakeel Butt <shakeelb@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: <stable@vger.kernel.org> [4.14] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21certs/blacklist_nohashes.c: fix const confusion in certs blacklistAndi Kleen
const must be marked __initconst, not __initdata. Link: http://lkml.kernel.org/r/20171222001335.1987-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZEDavid Rientjes
chan->n_subbufs is set by the user and relay_create_buf() does a kmalloc() of chan->n_subbufs * sizeof(size_t *). kmalloc_slab() will generate a warning when this fails if chan->subbufs * sizeof(size_t *) > KMALLOC_MAX_SIZE. Limit chan->n_subbufs to the maximum allowed kmalloc() size. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802061216100.122576@chino.kir.corp.google.com Fixes: f6302f1bcd75 ("relay: prevent integer overflow in relay_open()") Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21mm, mlock, vmscan: no more skipping pagevecsShakeel Butt
When a thread mlocks an address space backed either by file pages which are currently not present in memory or swapped out anon pages (not in swapcache), a new page is allocated and added to the local pagevec (lru_add_pvec), I/O is triggered and the thread then sleeps on the page. On I/O completion, the thread can wake on a different CPU, the mlock syscall will then sets the PageMlocked() bit of the page but will not be able to put that page in unevictable LRU as the page is on the pagevec of a different CPU. Even on drain, that page will go to evictable LRU because the PageMlocked() bit is not checked on pagevec drain. The page will eventually go to right LRU on reclaim but the LRU stats will remain skewed for a long time. This patch puts all the pages, even unevictable, to the pagevecs and on the drain, the pages will be added on their LRUs correctly by checking their evictability. This resolves the mlocked pages on pagevec of other CPUs issue because when those pagevecs will be drained, the mlocked file pages will go to unevictable LRU. Also this makes the race with munlock easier to resolve because the pagevec drains happen in LRU lock. However there is still one place which makes a page evictable and does PageLRU check on that page without LRU lock and needs special attention. TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock(). #0: __pagevec_lru_add_fn #1: clear_page_mlock SetPageLRU() if (!TestClearPageMlocked()) return smp_mb() // <--required // inside does PageLRU if (!PageMlocked()) if (isolate_lru_page()) move to evictable LRU putback_lru_page() else move to unevictable LRU In '#1', TestClearPageMlocked() provides full memory barrier semantics and thus the PageLRU check (inside isolate_lru_page) can not be reordered before it. In '#0', without explicit memory barrier, the PageMlocked() check can be reordered before SetPageLRU(). If that happens, '#0' can put a page in unevictable LRU and '#1' might have just cleared the Mlocked bit of that page but fails to isolate as PageLRU fails as '#0' still hasn't set PageLRU bit of that page. That page will be stranded on the unevictable LRU. There is one (good) side effect though. Without this patch, the pages allocated for System V shared memory segment are added to evictable LRUs even after shmctl(SHM_LOCK) on that segment. This patch will correctly put such pages to unevictable LRU. Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shaohua Li <shli@fb.com> Cc: Jan Kara <jack@suse.cz> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21mm: memcontrol: fix NR_WRITEBACK leak in memcg and system statsJohannes Weiner
After commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting"), we observed slowly upward creeping NR_WRITEBACK counts over the course of several days, both the per-memcg stats as well as the system counter in e.g. /proc/meminfo. The conversion from full per-cpu stat counts to per-cpu cached atomic stat counts introduced an irq-unsafe RMW operation into the updates. Most stat updates come from process context, but one notable exception is the NR_WRITEBACK counter. While writebacks are issued from process context, they are retired from (soft)irq context. When writeback completions interrupt the RMW counter updates of new writebacks being issued, the decs from the completions are lost. Since the global updates are routed through the joint lruvec API, both the memcg counters as well as the system counters are affected. This patch makes the joint stat and event API irq safe. Link: http://lkml.kernel.org/r/20180203082353.17284-1-hannes@cmpxchg.org Fixes: a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Debugged-by: Tejun Heo <tj@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>