summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2018-12-28lib/ioremap: ensure phys_addr actually corresponds to a physical addressWill Deacon
The current ioremap() code uses a phys_addr variable at each level of page table, which is confusingly offset by subtracting the base virtual address being mapped so that adding the current virtual address back on when iterating through the page table entries gives back the corresponding physical address. This is fairly confusing and results in all users of phys_addr having to add the current virtual address back on. Instead, this patch just updates phys_addr when iterating over the page table entries, ensuring that it's always up-to-date and doesn't require explicit offsetting. Link: http://lkml.kernel.org/r/1544120495-17438-5-git-send-email-will.deacon@arm.com Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28ioremap: rework pXd_free_pYd_page() APIWill Deacon
The recently merged API for ensuring break-before-make on page-table entries when installing huge mappings in the vmalloc/ioremap region is fairly counter-intuitive, resulting in the arch freeing functions (e.g. pmd_free_pte_page()) being called even on entries that aren't present. This resulted in a minor bug in the arm64 implementation, giving rise to spurious VM_WARN messages. This patch moves the pXd_present() checks out into the core code, refactoring the callsites at the same time so that we avoid the complex conjunctions when determining whether or not we can put down a huge mapping. Link: http://lkml.kernel.org/r/1544120495-17438-2-git-send-email-will.deacon@arm.com Signed-off-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28lib/show_mem.c: drop pgdat_resize_lock in show_mem()Wei Yang
Function show_mem() is used to print system memory status when user requires or fail to allocate memory. Generally, this is a best effort information so any races with memory hotplug (or very theoretically an early initialization) should be tolerable and the worst that could happen is to print an imprecise node state. Drop the resize lock because this is the only place which might hold the lock from the interrupt context and so all other callers might use a simple spinlock. Even though this doesn't solve any real issue it makes the code easier to follow and tiny more effective. Link: http://lkml.kernel.org/r/20181129235532.9328-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28mm: convert zone->managed_pages to atomic variableArun KS
totalram_pages, zone->managed_pages and totalhigh_pages updates are protected by managed_page_count_lock, but readers never care about it. Convert these variables to atomic to avoid readers potentially seeing a store tear. This patch converts zone->managed_pages. Subsequent patches will convert totalram_panges, totalhigh_pages and eventually managed_page_count_lock will be removed. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. Link: http://lkml.kernel.org/r/1542090790-21750-3-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28debugobjects: call debug_objects_mem_init earilerQian Cai
The current value of the early boot static pool size, 1024 is not big enough for systems with large number of CPUs with timer or/and workqueue objects selected. As the results, systems have 60+ CPUs with both timer and workqueue objects enabled could trigger "ODEBUG: Out of memory. ODEBUG disabled". Some debug objects are allocated during the early boot. Enabling some options like timers or workqueue objects may increase the size required significantly with large number of CPUs. For example, CONFIG_DEBUG_OBJECTS_TIMERS: No. CPUs x 2 (worker pool) objects: start_kernel workqueue_init_early init_worker_pool init_timer_key debug_object_init plus No. CPUs objects (CONFIG_HIGH_RES_TIMERS): sched_init hrtick_rq_init hrtimer_init CONFIG_DEBUG_OBJECTS_WORK: No. CPUs objects: vmalloc_init __init_work plus No. CPUs x 6 (workqueue) objects: workqueue_init_early alloc_workqueue __alloc_workqueue_key alloc_and_link_pwqs init_pwq Also, plus No. CPUs objects: perf_event_init __init_srcu_struct init_srcu_struct_fields init_srcu_struct_nodes __init_work However, none of the things are actually used or required before debug_objects_mem_init() is invoked, so just move the call right before vmalloc_init(). According to tglx, "the reason why the call is at this place in start_kernel() is historical. It's because back in the days when debugobjects were added the memory allocator was enabled way later than today." Link: http://lkml.kernel.org/r/20181126102407.1836-1-cai@gmx.us Signed-off-by: Qian Cai <cai@gmx.us> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28kasan: add CONFIG_KASAN_GENERIC and CONFIG_KASAN_SW_TAGSAndrey Konovalov
This commit splits the current CONFIG_KASAN config option into two: 1. CONFIG_KASAN_GENERIC, that enables the generic KASAN mode (the one that exists now); 2. CONFIG_KASAN_SW_TAGS, that enables the software tag-based KASAN mode. The name CONFIG_KASAN_SW_TAGS is chosen as in the future we will have another hardware tag-based KASAN mode, that will rely on hardware memory tagging support in arm64. With CONFIG_KASAN_SW_TAGS enabled, compiler options are changed to instrument kernel files with -fsantize=kernel-hwaddress (except the ones for which KASAN_SANITIZE := n is set). Both CONFIG_KASAN_GENERIC and CONFIG_KASAN_SW_TAGS support both CONFIG_KASAN_INLINE and CONFIG_KASAN_OUTLINE instrumentation modes. This commit also adds empty placeholder (for now) implementation of tag-based KASAN specific hooks inserted by the compiler and adjusts common hooks implementation. While this commit adds the CONFIG_KASAN_SW_TAGS config option, this option is not selectable, as it depends on HAVE_ARCH_KASAN_SW_TAGS, which we will enable once all the infrastracture code has been added. Link: http://lkml.kernel.org/r/b2550106eb8a68b10fefbabce820910b115aa853.1544099024.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-27Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add 1472-byte test to tcrypt for IPsec - Reintroduced crypto stats interface with numerous changes - Support incremental algorithm dumps Algorithms: - Add xchacha12/20 - Add nhpoly1305 - Add adiantum - Add streebog hash - Mark cts(cbc(aes)) as FIPS allowed Drivers: - Improve performance of arm64/chacha20 - Improve performance of x86/chacha20 - Add NEON-accelerated nhpoly1305 - Add SSE2 accelerated nhpoly1305 - Add AVX2 accelerated nhpoly1305 - Add support for 192/256-bit keys in gcmaes AVX - Add SG support in gcmaes AVX - ESN for inline IPsec tx in chcr - Add support for CryptoCell 703 in ccree - Add support for CryptoCell 713 in ccree - Add SM4 support in ccree - Add SM3 support in ccree - Add support for chacha20 in caam/qi2 - Add support for chacha20 + poly1305 in caam/jr - Add support for chacha20 + poly1305 in caam/qi2 - Add AEAD cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (130 commits) crypto: skcipher - remove remnants of internal IV generators crypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS crypto: salsa20-generic - don't unnecessarily use atomic walk crypto: skcipher - add might_sleep() to skcipher_walk_virt() crypto: x86/chacha - avoid sleeping under kernel_fpu_begin() crypto: cavium/nitrox - Added AEAD cipher support crypto: mxc-scc - fix build warnings on ARM64 crypto: api - document missing stats member crypto: user - remove unused dump functions crypto: chelsio - Fix wrong error counter increments crypto: chelsio - Reset counters on cxgb4 Detach crypto: chelsio - Handle PCI shutdown event crypto: chelsio - cleanup:send addr as value in function argument crypto: chelsio - Use same value for both channel in single WR crypto: chelsio - Swap location of AAD and IV sent in WR crypto: chelsio - remove set but not used variable 'kctx_len' crypto: ux500 - Use proper enum in hash_set_dma_transfer crypto: ux500 - Use proper enum in cryp_set_dma_transfer crypto: aesni - Add scatter/gather avx stubs, and use them in C crypto: aesni - Introduce partial block macro ..
2018-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) New ipset extensions for matching on destination MAC addresses, from Stefano Brivio. 2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to nfp driver. From Stefano Brivio. 3) Implement GRO for plain UDP sockets, from Paolo Abeni. 4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT bit so that we could support the entire vlan_tci value. 5) Rework the IPSEC policy lookups to better optimize more usecases, from Florian Westphal. 6) Infrastructure changes eliminating direct manipulation of SKB lists wherever possible, and to always use the appropriate SKB list helpers. This work is still ongoing... 7) Lots of PHY driver and state machine improvements and simplifications, from Heiner Kallweit. 8) Various TSO deferral refinements, from Eric Dumazet. 9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov. 10) Batch dropping of XDP packets in tuntap, from Jason Wang. 11) Lots of cleanups and improvements to the r8169 driver from Heiner Kallweit, including support for ->xmit_more. This driver has been getting some much needed love since he started working on it. 12) Lots of new forwarding selftests from Petr Machata. 13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel. 14) Packed ring support for virtio, from Tiwei Bie. 15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov. 16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu. 17) Implement coalescing on TCP backlog queue, from Eric Dumazet. 18) Implement carrier change in tun driver, from Nicolas Dichtel. 19) Support msg_zerocopy in UDP, from Willem de Bruijn. 20) Significantly improve garbage collection of neighbor objects when the table has many PERMANENT entries, from David Ahern. 21) Remove egdev usage from nfp and mlx5, and remove the facility completely from the tree as it no longer has any users. From Oz Shlomo and others. 22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and therefore abort the operation before the commit phase (which is the NETDEV_CHANGEADDR event). From Petr Machata. 23) Add indirect call wrappers to avoid retpoline overhead, and use them in the GRO code paths. From Paolo Abeni. 24) Add support for netlink FDB get operations, from Roopa Prabhu. 25) Support bloom filter in mlxsw driver, from Nir Dotan. 26) Add SKB extension infrastructure. This consolidates the handling of the auxiliary SKB data used by IPSEC and bridge netfilter, and is designed to support the needs to MPTCP which could be integrated in the future. 27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits) net: dccp: fix kernel crash on module load drivers/net: appletalk/cops: remove redundant if statement and mask bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw net/net_namespace: Check the return value of register_pernet_subsys() net/netlink_compat: Fix a missing check of nla_parse_nested ieee802154: lowpan_header_create check must check daddr net/mlx4_core: drop useless LIST_HEAD mlxsw: spectrum: drop useless LIST_HEAD net/mlx5e: drop useless LIST_HEAD iptunnel: Set tun_flags in the iptunnel_metadata_reply from src net/mlx5e: fix semicolon.cocci warnings staging: octeon: fix build failure with XFRM enabled net: Revert recent Spectre-v1 patches. can: af_can: Fix Spectre v1 vulnerability packet: validate address length if non-zero nfc: af_nfc: Fix Spectre v1 vulnerability phonet: af_phonet: Fix Spectre v1 vulnerability net: core: Fix Spectre v1 vulnerability net: minor cleanup in skb_ext_add() net: drop the unused helper skb_ext_get() ...
2018-12-27Merge tag 'powerpc-4.21-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Mitigations for Spectre v2 on some Freescale (NXP) CPUs. - A large series adding support for pass-through of Nvidia V100 GPUs to guests on Power9. - Another large series to enable hardware assistance for TLB table walk on MPC8xx CPUs. - Some preparatory changes to our DMA code, to make way for further cleanups from Christoph. - Several fixes for our Transactional Memory handling discovered by fuzzing the signal return path. - Support for generating our system call table(s) from a text file like other architectures. - A fix to our page fault handler so that instead of generating a WARN_ON_ONCE, user accesses of kernel addresses instead print a ratelimited and appropriately scary warning. - A cosmetic change to make our unhandled page fault messages more similar to other arches and also more compact and informative. - Freescale updates from Scott: "Highlights include elimination of legacy clock bindings use from dts files, an 83xx watchdog handler, fixes to old dts interrupt errors, and some minor cleanup." And many clean-ups, reworks and minor fixes etc. Thanks to: Alexandre Belloni, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Benjamin Herrenschmidt, Breno Leitao, Christian Lamparter, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Darren Stevens, David Gibson, Diana Craciun, Dmitry V. Levin, Firoz Khan, Geert Uytterhoeven, Greg Kurz, Gustavo Romero, Hari Bathini, Joel Stanley, Kees Cook, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Mathieu Malaterre, Michal Suchánek, Naveen N. Rao, Nick Desaulniers, Oliver O'Halloran, Paul Mackerras, Ram Pai, Ravi Bangoria, Rob Herring, Russell Currey, Sabyasachi Gupta, Sam Bobroff, Satheesh Rajendran, Scott Wood, Segher Boessenkool, Stephen Rothwell, Tang Yuantian, Thiago Jung Bauermann, Yangtao Li, Yuantian Tang, Yue Haibing" * tag 'powerpc-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (201 commits) Revert "powerpc/fsl_pci: simplify fsl_pci_dma_set_mask" powerpc/zImage: Also check for stdout-path powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y macintosh: Use of_node_name_{eq, prefix} for node name comparisons ide: Use of_node_name_eq for node name comparisons powerpc: Use of_node_name_eq for node name comparisons powerpc/pseries/pmem: Convert to %pOFn instead of device_node.name powerpc/mm: Remove very old comment in hash-4k.h powerpc/pseries: Fix node leak in update_lmb_associativity_index() powerpc/configs/85xx: Enable CONFIG_DEBUG_KERNEL powerpc/dts/fsl: Fix dtc-flagged interrupt errors clk: qoriq: add more compatibles strings powerpc/fsl: Use new clockgen binding powerpc/83xx: handle machine check caused by watchdog timer powerpc/fsl-rio: fix spelling mistake "reserverd" -> "reserved" powerpc/fsl_pci: simplify fsl_pci_dma_set_mask arch/powerpc/fsl_rmu: Use dma_zalloc_coherent vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver vfio_pci: Allow regions to add own capabilities vfio_pci: Allow mapping extra regions ...
2018-12-26Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The biggest RCU changes in this cycle were: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits) rcutorture: Don't do busted forward-progress testing rcutorture: Use 100ms buckets for forward-progress callback histograms rcutorture: Recover from OOM during forward-progress tests rcutorture: Print forward-progress test age upon failure rcutorture: Print time since GP end upon forward-progress failure rcutorture: Print histogram of CB invocation at OOM time rcutorture: Print GP age upon forward-progress failure rcu: Print per-CPU callback counts for forward-progress failures rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings rcutorture: Dump grace-period diagnostics upon forward-progress OOM rcutorture: Prepare for asynchronous access to rcu_fwd_startat torture: Remove unnecessary "ret" variables rcutorture: Affinity forward-progress test to avoid housekeeping CPUs rcutorture: Break up too-long rcu_torture_fwd_prog() function rcutorture: Remove cbflood facility torture: Bring any extra CPUs online during kernel startup rcutorture: Add call_rcu() flooding forward-progress tests rcutorture/formal: Replace synchronize_sched() with synchronize_rcu() tools/kernel.h: Replace synchronize_sched() with synchronize_rcu() net/decnet: Replace rcu_barrier_bh() with rcu_barrier() ...
2018-12-26Merge tag 'mips_4.21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Paul Burton: "Here's the main MIPS pull for Linux 4.21. Core architecture changes include: - Syscall tables & definitions for unistd.h are now generated by scripts, providing greater consistency with other architectures & making it easier to add new syscalls. - Support for building kernels with no floating point support, upon which any userland attempting to use floating point instructions will receive a SIGILL. Mostly useful to shrink the kernel & as preparation for nanoMIPS support which does not yet include FP. - MIPS SIMD Architecture (MSA) vector register context is now exposed by ptrace via a new NT_MIPS_MSA regset. - ASIDs are now stored as 64b values even for MIPS32 kernels, expanding the ASID version field sufficiently that we don't need to worry about overflow & avoiding rare issues with reused ASIDs that have been observed in the wild. - The branch delay slot "emulation" page is now mapped without write permission for the user, preventing its use as a nice location for attacks to execute malicious code from. - Support for ioremap_prot(), primarily to allow gdb or other ptrace users the ability to view their tracee's memory using the same cache coherency attribute. - Optimizations to more cpu_has_* macros, allowing more to be compile-time constant where possible. - Enable building the whole kernel with UBSAN instrumentation. - Enable building the kernel with link-time dead code & data elimination. Platform specific changes include: - The Boston board gains a workaround for DMA prefetching issues with the EG20T Platform Controller Hub that it uses. - Cleanups to Cavium Octeon code removing about 20k lines of redundant code, mostly unused or duplicate register definitions in headers. - defconfig updates for the DECstation machines, including new defconfigs for r4k & 64b machines. - Further work on Loongson 3 support. - DMA fixes for SiByte machines" * tag 'mips_4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (95 commits) MIPS: math-emu: Write-protect delay slot emulation pages MIPS: Remove struct mm_context_t fp_mode_switching field mips: generate uapi header and system call table files mips: add system call table generation support mips: remove syscall table entries mips: add +1 to __NR_syscalls in uapi header mips: rename scall64-64.S to scall64-n64.S mips: remove unused macros mips: add __NR_syscalls along with __NR_Linux_syscalls MIPS: Expand MIPS32 ASIDs to 64 bits MIPS: OCTEON: delete redundant register definitions MIPS: OCTEON: cvmx_gmxx_inf_mode: use oldest forward compatible definition MIPS: OCTEON: cvmx_mio_fus_dat3: use oldest forward compatible definition MIPS: OCTEON: cvmx_pko_mem_debug8: use oldest forward compatible definition MIPS: OCTEON: octeon-usb: use common gpio_bit definition MIPS: OCTEON: enable all OCTEON drivers in defconfig mips: annotate implicit fall throughs MIPS: Hardcode cpu_has_mips* where target ISA allows MIPS: MT: Remove norps command line parameter MIPS: Only include mmzone.h when CONFIG_NEED_MULTIPLE_NODES=y ...
2018-12-22seq_buf: Use size_t for len in seq_buf_puts()Michael Ellerman
Jann Horn points out that we're using unsigned int for len in seq_buf_puts(), which could potentially overflow if we're passed a UINT_MAX sized string. The rest of the code already uses size_t, so we should also use that in seq_buf_puts() to avoid any issues. Link: http://lkml.kernel.org/r/20181019042109.8064-2-mpe@ellerman.id.au Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-22seq_buf: Make seq_buf_puts() null-terminate the bufferMichael Ellerman
Currently seq_buf_puts() will happily create a non null-terminated string for you in the buffer. This is particularly dangerous if the buffer is on the stack. For example: char buf[8]; char secret = "secret"; struct seq_buf s; seq_buf_init(&s, buf, sizeof(buf)); seq_buf_puts(&s, "foo"); printk("Message is %s\n", buf); Can result in: Message is fooªªªªªsecret We could require all users to memset() their buffer to zero before use. But that seems likely to be forgotten and lead to bugs. Instead we can change seq_buf_puts() to always leave the buffer in a null-terminated state. The only downside is that this makes the buffer 1 character smaller for seq_buf_puts(), but that seems like a good trade off. Link: http://lkml.kernel.org/r/20181019042109.8064-1-mpe@ellerman.id.au Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-22treewide: surround Kconfig file paths with double quotesMasahiro Yamada
The Kconfig lexer supports special characters such as '.' and '/' in the parameter context. In my understanding, the reason is just to support bare file paths in the source statement. I do not see a good reason to complicate Kconfig for the room of ambiguity. The majority of code already surrounds file paths with double quotes, and it makes sense since file paths are constant string literals. Make it treewide consistent now. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ingo Molnar <mingo@kernel.org>
2018-12-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20sbitmap: add helpers for add/del wait queue handlingJens Axboe
After commit 5d2ee7122c73, users of sbitmap that need wait queue handling must use the provided helpers. But we only added prepare_to_wait()/finish_wait() style helpers, add the equivalent add_wait_queue/list_del wrappers as we.. This is needed to ensure kyber plays by the sbitmap waitqueue rules. Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20lib/raid6: add option to skip algo benchmarkingDaniel Verkamp
This is helpful for systems where fast startup time is important. It is especially nice to avoid benchmarking RAID functions that are never used (for example, BTRFS selects RAID6_PQ even if the parity RAID mode is not in use). This saves 250+ milliseconds of boot time on modern x86 and ARM systems with a dozen or more available implementations. The new option is defaulted to 'y' to match the previous behavior of always benchmarking on init. Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Shaohua Li <shli@fb.com>
2018-12-20lib/raid6: sort algos in rough performance orderDaniel Verkamp
Sort the list of RAID6 algorithms in roughly decreasing order of expected performance: newer instruction sets first (within each architecture) and wider unrollings first. This doesn't make any difference right now, since all functions are benchmarked; a follow-up change will make use of this by optionally choosing the first valid function rather than testing all of them. The Itanium raid6_intx{16,32} entries are also moved down to be near the other raid6_intx entries for clarity. Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Shaohua Li <shli@fb.com>
2018-12-20lib/raid6: check for assembler SSSE3 supportDaniel Verkamp
Allow the x86 SSSE3 recovery function to be tested in raid6test. Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Shaohua Li <shli@fb.com>
2018-12-20raid6/ppc: Fix build for clangJoel Stanley
We cannot build these files with clang as it does not allow altivec instructions in assembly when -msoft-float is passed. Jinsong Ji <jji@us.ibm.com> wrote: > We currently disable Altivec/VSX support when enabling soft-float. So > any usage of vector builtins will break. > > Enable Altivec/VSX with soft-float may need quite some clean up work, so > I guess this is currently a limitation. > > Removing -msoft-float will make it work (and we are lucky that no > floating point instructions will be generated as well). This is a workaround until the issue is resolved in clang. Link: https://bugs.llvm.org/show_bug.cgi?id=31177 Link: https://github.com/ClangBuiltLinux/linux/issues/239 Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19Fonts: New Terminus large console fontAmanoel Dawod
This patch adds an option to compile-in a high resolution and large Terminus (ter16x32) bitmap console font for use with HiDPI and Retina screens. The font was convereted from standard Terminus ter-i32b.psf (size 16x32) with the help of psftools and minor hand editing deleting useless characters. This patch is non-intrusive, no options are enabled by default so most users won't notice a thing. I am placing my changes under the GPL 2.0 just as source Terminus font. Signed-off-by: Amanoel Dawod <amanoeladawod@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-18test_rhashtable: remove semaphore usageArnd Bergmann
This is one of only two files that initialize a semaphore to a negative value. We don't really need the two semaphores here at all, but can do the same thing in more conventional and more effient way, by using a single waitqueue and an atomic thread counter. This gets us a little bit closer to eliminating classic semaphores from the kernel. It also fixes a corner case where we fail to continue after one of the threads fails to start up. An alternative would be to use a split kthread_create()+wake_up_process() and completely eliminate the separate synchronization. Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17lib: fix build failure in CONFIG_DEBUG_VIRTUAL testChristophe Leroy
On several arches, virt_to_phys() is in io.h Build fails without it: CC lib/test_debug_virtual.o lib/test_debug_virtual.c: In function 'test_debug_virtual_init': lib/test_debug_virtual.c:26:7: error: implicit declaration of function 'virt_to_phys' [-Werror=implicit-function-declaration] pa = virt_to_phys(va); ^ Fixes: e4dace361552 ("lib: add test module for CONFIG_DEBUG_VIRTUAL") CC: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-13XArray: Fix xa_alloc when id exceeds maxMatthew Wilcox
Specifying a starting ID greater than the maximum ID isn't something attempted very often, but it should fail. It was succeeding due to xas_find_marked() returning the wrong error state, so add tests for both xa_alloc() and xas_find_marked(). Fixes: b803b42823d0 ("xarray: Add XArray iterators") Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-12-13iov_iter: introduce hash_and_copy_to_iter helperSagi Grimberg
Allow consumers that want to use iov iterator helpers and also update a predefined hash calculation online when copying data. This is useful when copying incoming network buffers to a local iterator and calculate a digest on the incoming stream. nvme-tcp host driver that will be introduced in following patches is the first consumer via skb_copy_and_hash_datagram_iter. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13iov_iter: pass void csum pointer to csum_and_copy_to_iterSagi Grimberg
The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram and we are trying to unite its logic with skb_copy_datagram_iter by passing a callback to the copy function that we want to apply. Thus, we need to make the checksum pointer private to the function. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-11sbitmap: flush deferred clears for resize and shallow getsJens Axboe
We're missing a deferred clear off the shallow get, which can cause a hang. Additionally, when we resize the sbitmap, we should also flush deferred clears for good measure. Ensure we have full coverage on batch clears, even for paths where we would not be doing deferred clear. This makes it less error prone for future additions. Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10lib/vsprintf: Print time and date in human readable format via %ptAndy Shevchenko
There are users which print time and date represented by content of struct rtc_time in human readable format. Instead of open coding that each time introduce %ptR[dt][r] specifier. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jonathan Hunter <jonathanh@nvidia.com> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several conflicts, seemingly all over the place. I used Stephen Rothwell's sample resolutions for many of these, if not just to double check my own work, so definitely the credit largely goes to him. The NFP conflict consisted of a bug fix (moving operations past the rhashtable operation) while chaning the initial argument in the function call in the moved code. The net/dsa/master.c conflict had to do with a bug fix intermixing of making dsa_master_set_mtu() static with the fixing of the tagging attribute location. cls_flower had a conflict because the dup reject fix from Or overlapped with the addition of port range classifiction. __set_phy_supported()'s conflict was relatively easy to resolve because Andrew fixed it in both trees, so it was just a matter of taking the net-next copy. Or at least I think it was :-) Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup() intermixed with changes on how the sdif and caller_net are calculated in these code paths in net-next. The remaining BPF conflicts were largely about the addition of the __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions to the relevant data structure where the MD pointer macros are used. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09sbitmap: silence bogus lockdep IRQ warningJens Axboe
Ming reports that lockdep spews the following trace. What this essentially says is that the sbitmap swap_lock was used inconsistently in IRQ enabled and disabled context, and that is usually indicative of a bug that will cause a deadlock. For this case, it's a false positive. The swap_lock is used from process context only, when we swap the bits in the word and cleared mask. We also end up doing that when we are getting a driver tag, from the blk_mq_mark_tag_wait(), and from there we hold the waitqueue lock with IRQs disabled. However, this isn't from an actual IRQ, it's still process context. In lieu of a better way to fix this, simply always disable interrupts when grabbing the swap_lock if lockdep is enabled. [ 100.967642] ================start test sanity/001================ [ 101.238280] null: module loaded [ 106.093735] [ 106.094012] ===================================================== [ 106.094854] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 106.095759] 4.20.0-rc3_5d2ee7122c73_for-next+ #1 Not tainted [ 106.096551] ----------------------------------------------------- [ 106.097386] fio/1043 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 106.098231] 000000004c43fa71 (&(&sb->map[i].swap_lock)->rlock){+.+.}, at: sbitmap_get+0xd5/0x22c [ 106.099431] [ 106.099431] and this task is already holding: [ 106.100229] 000000007eec8b2f (&(&hctx->dispatch_wait_lock)->rlock){....}, at: blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.101630] which would create a new lock dependency: [ 106.102326] (&(&hctx->dispatch_wait_lock)->rlock){....} -> (&(&sb->map[i].swap_lock)->rlock){+.+.} [ 106.103553] [ 106.103553] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 106.104580] (&sbq->ws[i].wait){..-.} [ 106.104582] [ 106.104582] ... which became SOFTIRQ-irq-safe at: [ 106.105751] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.106284] __wake_up_common_lock+0x119/0x1b9 [ 106.106825] sbitmap_queue_wake_up+0x33f/0x383 [ 106.107456] sbitmap_queue_clear+0x4c/0x9a [ 106.108046] __blk_mq_free_request+0x188/0x1d3 [ 106.108581] blk_mq_free_request+0x23b/0x26b [ 106.109102] scsi_end_request+0x345/0x5d7 [ 106.109587] scsi_io_completion+0x4b5/0x8f0 [ 106.110099] scsi_finish_command+0x412/0x456 [ 106.110615] scsi_softirq_done+0x23f/0x29b [ 106.111115] blk_done_softirq+0x2a7/0x2e6 [ 106.111608] __do_softirq+0x360/0x6ad [ 106.112062] run_ksoftirqd+0x2f/0x5b [ 106.112499] smpboot_thread_fn+0x3a5/0x3db [ 106.113000] kthread+0x1d4/0x1e4 [ 106.113457] ret_from_fork+0x3a/0x50 [ 106.113969] [ 106.113969] to a SOFTIRQ-irq-unsafe lock: [ 106.114672] (&(&sb->map[i].swap_lock)->rlock){+.+.} [ 106.114674] [ 106.114674] ... which became SOFTIRQ-irq-unsafe at: [ 106.116000] ... [ 106.116003] _raw_spin_lock+0x33/0x64 [ 106.116676] sbitmap_get+0xd5/0x22c [ 106.117134] __sbitmap_queue_get+0xe8/0x177 [ 106.117731] __blk_mq_get_tag+0x1e6/0x22d [ 106.118286] blk_mq_get_tag+0x1db/0x6e4 [ 106.118756] blk_mq_get_driver_tag+0x161/0x258 [ 106.119383] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.120043] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.120607] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.121234] __blk_mq_run_hw_queue+0x137/0x17e [ 106.121781] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.122366] blk_mq_run_hw_queue+0x151/0x187 [ 106.122887] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.123492] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.124042] blk_flush_plug_list+0x392/0x3d7 [ 106.124557] blk_finish_plug+0x37/0x4f [ 106.125019] read_pages+0x3ef/0x430 [ 106.125446] __do_page_cache_readahead+0x18e/0x2fc [ 106.126027] force_page_cache_readahead+0x121/0x133 [ 106.126621] page_cache_sync_readahead+0x35f/0x3bb [ 106.127229] generic_file_buffered_read+0x410/0x1860 [ 106.127932] __vfs_read+0x319/0x38f [ 106.128415] vfs_read+0xd2/0x19a [ 106.128817] ksys_read+0xb9/0x135 [ 106.129225] do_syscall_64+0x140/0x385 [ 106.129684] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.130292] [ 106.130292] other info that might help us debug this: [ 106.130292] [ 106.131226] Chain exists of: [ 106.131226] &sbq->ws[i].wait --> &(&hctx->dispatch_wait_lock)->rlock --> &(&sb->map[i].swap_lock)->rlock [ 106.131226] [ 106.132865] Possible interrupt unsafe locking scenario: [ 106.132865] [ 106.133659] CPU0 CPU1 [ 106.134194] ---- ---- [ 106.134733] lock(&(&sb->map[i].swap_lock)->rlock); [ 106.135318] local_irq_disable(); [ 106.136014] lock(&sbq->ws[i].wait); [ 106.136747] lock(&(&hctx->dispatch_wait_lock)->rlock); [ 106.137742] <Interrupt> [ 106.138110] lock(&sbq->ws[i].wait); [ 106.138625] [ 106.138625] *** DEADLOCK *** [ 106.138625] [ 106.139430] 3 locks held by fio/1043: [ 106.139947] #0: 0000000076ff0fd9 (rcu_read_lock){....}, at: hctx_lock+0x29/0xe8 [ 106.140813] #1: 000000002feb1016 (&sbq->ws[i].wait){..-.}, at: blk_mq_dispatch_rq_list+0x4ad/0xd7c [ 106.141877] #2: 000000007eec8b2f (&(&hctx->dispatch_wait_lock)->rlock){....}, at: blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.143267] [ 106.143267] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 106.144351] -> (&sbq->ws[i].wait){..-.} ops: 82 { [ 106.144926] IN-SOFTIRQ-W at: [ 106.145314] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.146042] __wake_up_common_lock+0x119/0x1b9 [ 106.146785] sbitmap_queue_wake_up+0x33f/0x383 [ 106.147567] sbitmap_queue_clear+0x4c/0x9a [ 106.148379] __blk_mq_free_request+0x188/0x1d3 [ 106.149148] blk_mq_free_request+0x23b/0x26b [ 106.149864] scsi_end_request+0x345/0x5d7 [ 106.150546] scsi_io_completion+0x4b5/0x8f0 [ 106.151367] scsi_finish_command+0x412/0x456 [ 106.152157] scsi_softirq_done+0x23f/0x29b [ 106.152855] blk_done_softirq+0x2a7/0x2e6 [ 106.153537] __do_softirq+0x360/0x6ad [ 106.154280] run_ksoftirqd+0x2f/0x5b [ 106.155020] smpboot_thread_fn+0x3a5/0x3db [ 106.155828] kthread+0x1d4/0x1e4 [ 106.156526] ret_from_fork+0x3a/0x50 [ 106.157267] INITIAL USE at: [ 106.157713] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.158542] prepare_to_wait_exclusive+0xa8/0x215 [ 106.159421] blk_mq_get_tag+0x34f/0x6e4 [ 106.160186] blk_mq_get_request+0x48e/0xaef [ 106.160997] blk_mq_make_request+0x27e/0xbd2 [ 106.161828] generic_make_request+0x4d1/0x873 [ 106.162661] submit_bio+0x20c/0x253 [ 106.163379] mpage_bio_submit+0x44/0x4b [ 106.164142] mpage_readpages+0x3c2/0x407 [ 106.164919] read_pages+0x13a/0x430 [ 106.165633] __do_page_cache_readahead+0x18e/0x2fc [ 106.166530] force_page_cache_readahead+0x121/0x133 [ 106.167439] page_cache_sync_readahead+0x35f/0x3bb [ 106.168337] generic_file_buffered_read+0x410/0x1860 [ 106.169255] __vfs_read+0x319/0x38f [ 106.169977] vfs_read+0xd2/0x19a [ 106.170662] ksys_read+0xb9/0x135 [ 106.171356] do_syscall_64+0x140/0x385 [ 106.172120] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.173051] } [ 106.173308] ... key at: [<ffffffff85094600>] __key.26481+0x0/0x40 [ 106.174219] ... acquired at: [ 106.174646] _raw_spin_lock+0x33/0x64 [ 106.175183] blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.175843] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.176518] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.177262] __blk_mq_run_hw_queue+0x137/0x17e [ 106.177900] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.178591] blk_mq_run_hw_queue+0x151/0x187 [ 106.179207] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.179926] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.180571] blk_flush_plug_list+0x392/0x3d7 [ 106.181187] blk_finish_plug+0x37/0x4f [ 106.181737] __se_sys_io_submit+0x171/0x304 [ 106.182346] do_syscall_64+0x140/0x385 [ 106.182895] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.183607] [ 106.183830] -> (&(&hctx->dispatch_wait_lock)->rlock){....} ops: 1 { [ 106.184691] INITIAL USE at: [ 106.185119] _raw_spin_lock+0x33/0x64 [ 106.185838] blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.186697] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.187551] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.188481] __blk_mq_run_hw_queue+0x137/0x17e [ 106.189307] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.190189] blk_mq_run_hw_queue+0x151/0x187 [ 106.190989] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.191902] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.192739] blk_flush_plug_list+0x392/0x3d7 [ 106.193535] blk_finish_plug+0x37/0x4f [ 106.194269] __se_sys_io_submit+0x171/0x304 [ 106.195059] do_syscall_64+0x140/0x385 [ 106.195794] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.196705] } [ 106.196950] ... key at: [<ffffffff84880620>] __key.51231+0x0/0x40 [ 106.197853] ... acquired at: [ 106.198270] lock_acquire+0x280/0x2f3 [ 106.198806] _raw_spin_lock+0x33/0x64 [ 106.199337] sbitmap_get+0xd5/0x22c [ 106.199850] __sbitmap_queue_get+0xe8/0x177 [ 106.200450] __blk_mq_get_tag+0x1e6/0x22d [ 106.201035] blk_mq_get_tag+0x1db/0x6e4 [ 106.201589] blk_mq_get_driver_tag+0x161/0x258 [ 106.202237] blk_mq_dispatch_rq_list+0x5b9/0xd7c [ 106.202902] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.203572] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.204316] __blk_mq_run_hw_queue+0x137/0x17e [ 106.204956] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.205649] blk_mq_run_hw_queue+0x151/0x187 [ 106.206269] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.206997] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.207644] blk_flush_plug_list+0x392/0x3d7 [ 106.208264] blk_finish_plug+0x37/0x4f [ 106.208814] __se_sys_io_submit+0x171/0x304 [ 106.209415] do_syscall_64+0x140/0x385 [ 106.209965] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.210684] [ 106.210904] [ 106.210904] the dependencies between the lock to be acquired [ 106.210905] and SOFTIRQ-irq-unsafe lock: [ 106.212541] -> (&(&sb->map[i].swap_lock)->rlock){+.+.} ops: 1969 { [ 106.213393] HARDIRQ-ON-W at: [ 106.213840] _raw_spin_lock+0x33/0x64 [ 106.214570] sbitmap_get+0xd5/0x22c [ 106.215282] __sbitmap_queue_get+0xe8/0x177 [ 106.216086] __blk_mq_get_tag+0x1e6/0x22d [ 106.216876] blk_mq_get_tag+0x1db/0x6e4 [ 106.217627] blk_mq_get_driver_tag+0x161/0x258 [ 106.218465] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.219326] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.220198] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.221138] __blk_mq_run_hw_queue+0x137/0x17e [ 106.221975] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.222874] blk_mq_run_hw_queue+0x151/0x187 [ 106.223686] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.224597] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.225444] blk_flush_plug_list+0x392/0x3d7 [ 106.226255] blk_finish_plug+0x37/0x4f [ 106.227006] read_pages+0x3ef/0x430 [ 106.227717] __do_page_cache_readahead+0x18e/0x2fc [ 106.228595] force_page_cache_readahead+0x121/0x133 [ 106.229491] page_cache_sync_readahead+0x35f/0x3bb [ 106.230373] generic_file_buffered_read+0x410/0x1860 [ 106.231277] __vfs_read+0x319/0x38f [ 106.231986] vfs_read+0xd2/0x19a [ 106.232666] ksys_read+0xb9/0x135 [ 106.233350] do_syscall_64+0x140/0x385 [ 106.234097] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.235012] SOFTIRQ-ON-W at: [ 106.235460] _raw_spin_lock+0x33/0x64 [ 106.236195] sbitmap_get+0xd5/0x22c [ 106.236913] __sbitmap_queue_get+0xe8/0x177 [ 106.237715] __blk_mq_get_tag+0x1e6/0x22d [ 106.238488] blk_mq_get_tag+0x1db/0x6e4 [ 106.239244] blk_mq_get_driver_tag+0x161/0x258 [ 106.240079] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.240937] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.241806] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.242751] __blk_mq_run_hw_queue+0x137/0x17e [ 106.243579] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.244469] blk_mq_run_hw_queue+0x151/0x187 [ 106.245277] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.246191] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.247044] blk_flush_plug_list+0x392/0x3d7 [ 106.247859] blk_finish_plug+0x37/0x4f [ 106.248749] read_pages+0x3ef/0x430 [ 106.249463] __do_page_cache_readahead+0x18e/0x2fc [ 106.250357] force_page_cache_readahead+0x121/0x133 [ 106.251263] page_cache_sync_readahead+0x35f/0x3bb [ 106.252157] generic_file_buffered_read+0x410/0x1860 [ 106.253084] __vfs_read+0x319/0x38f [ 106.253808] vfs_read+0xd2/0x19a [ 106.254488] ksys_read+0xb9/0x135 [ 106.255186] do_syscall_64+0x140/0x385 [ 106.255943] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.256867] INITIAL USE at: [ 106.257300] _raw_spin_lock+0x33/0x64 [ 106.258033] sbitmap_get+0xd5/0x22c [ 106.258747] __sbitmap_queue_get+0xe8/0x177 [ 106.259542] __blk_mq_get_tag+0x1e6/0x22d [ 106.260320] blk_mq_get_tag+0x1db/0x6e4 [ 106.261072] blk_mq_get_driver_tag+0x161/0x258 [ 106.261902] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.262762] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.263626] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.264571] __blk_mq_run_hw_queue+0x137/0x17e [ 106.265409] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.266302] blk_mq_run_hw_queue+0x151/0x187 [ 106.267111] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.268028] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.268878] blk_flush_plug_list+0x392/0x3d7 [ 106.269694] blk_finish_plug+0x37/0x4f [ 106.270432] read_pages+0x3ef/0x430 [ 106.271139] __do_page_cache_readahead+0x18e/0x2fc [ 106.272040] force_page_cache_readahead+0x121/0x133 [ 106.272932] page_cache_sync_readahead+0x35f/0x3bb [ 106.273811] generic_file_buffered_read+0x410/0x1860 [ 106.274709] __vfs_read+0x319/0x38f [ 106.275407] vfs_read+0xd2/0x19a [ 106.276074] ksys_read+0xb9/0x135 [ 106.276764] do_syscall_64+0x140/0x385 [ 106.277500] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.278417] } [ 106.278676] ... key at: [<ffffffff85094640>] __key.26212+0x0/0x40 [ 106.279586] ... acquired at: [ 106.280026] lock_acquire+0x280/0x2f3 [ 106.280559] _raw_spin_lock+0x33/0x64 [ 106.281101] sbitmap_get+0xd5/0x22c [ 106.281610] __sbitmap_queue_get+0xe8/0x177 [ 106.282221] __blk_mq_get_tag+0x1e6/0x22d [ 106.282809] blk_mq_get_tag+0x1db/0x6e4 [ 106.283368] blk_mq_get_driver_tag+0x161/0x258 [ 106.284018] blk_mq_dispatch_rq_list+0x5b9/0xd7c [ 106.284685] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.285371] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.286135] __blk_mq_run_hw_queue+0x137/0x17e [ 106.286806] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.287515] blk_mq_run_hw_queue+0x151/0x187 [ 106.288149] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.289041] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.289912] blk_flush_plug_list+0x392/0x3d7 [ 106.290590] blk_finish_plug+0x37/0x4f [ 106.291238] __se_sys_io_submit+0x171/0x304 [ 106.291864] do_syscall_64+0x140/0x385 [ 106.292534] entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: Ming Lei <ming.lei@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-06arch: switch the default on ARCH_HAS_SG_CHAINChristoph Hellwig
These days architectures are mostly out of the business of dealing with struct scatterlist at all, unless they have architecture specific iommu drivers. Replace the ARCH_HAS_SG_CHAIN symbol with a ARCH_NO_SG_CHAIN one only enabled for architectures with horrible legacy iommu drivers like alpha and parisc, and conditionally for arm which wants to keep it disable for legacy platforms. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-06XArray tests: Check iterating over multiorder entriesMatthew Wilcox
There was no bug here, but there was no test coverage for this scenario. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-12-06XArray tests: Handle larger indices more elegantlyMatthew Wilcox
xa_mk_value() only handles values up to LONG_MAX. I successfully hid that inside xa_store_index() and xa_erase_index(), but it turned out I also needed it for testing xa_alloc() on 32-bit machines. So extract xa_mk_index() from the above two functions, and convert the non-constant users of xa_mk_value() to xa_mk_index(). Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-12-06radix tree: Don't return retry entries from lookupMatthew Wilcox
Commit 66ee620f06f9 ("idr: Permit any valid kernel pointer to be stored") changed the radix tree lookup so that it stops when reaching the bottom of the tree. However, the condition was added in the wrong place, making it possible to return retry entries to the caller. Reorder the tests to check for the retry entry before checking whether we're at the bottom of the tree. The retry entry should never be found in the tree root, so it's safe to defer the check until the end of the loop. Add a regression test to the test-suite to be sure this doesn't come back. Fixes: 66ee620f06f9 ("idr: Permit any valid kernel pointer to be stored") Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-12-06kref/kobject: Improve documentationEzequiel Garcia
The current kref and kobject documentation may be insufficient to understand these common pitfalls regarding object lifetime and object releasing. Add a bit more documentation and improve the warnings seen by the user, pointing to the right piece of documentation. Also, it's important to understand that making fun of people publicly is not at all helpful, doesn't provide any value, and it's not a healthy way of encouraging developers to do better. "Mocking mercilessly" will, if anything, make developers feel bad and go away. This kind of behavior should not be encouraged or justified. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com> Signed-off-by: Matthias Brugger <mbrugger@suse.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-04Merge tag 'v4.20-rc5' into for-4.21/blockJens Axboe
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and also getting the merge fix that went into mainline that users are hitting testing for-4.21/block and/or for-next. * tag 'v4.20-rc5': (664 commits) Linux 4.20-rc5 PCI: Fix incorrect value returned from pcie_get_speed_cap() MAINTAINERS: Update linux-mips mailing list address ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set ...
2018-12-04Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03Merge tag 'wireless-drivers-next-for-davem-2018-11-30' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.21 First set of patches for 4.21. Most notable here is support for Quantenna's QSR1000/QSR2000 chipsets and more flexible ways to provide nvram files for brcmfmac. Major changes: brcmfmac * add support for first trying to get a board specific nvram file * add support for getting nvram contents from EFI variables qtnfmac * use single PCIe driver for all platforms and rename Kconfig option CONFIG_QTNFMAC_PEARL_PCIE to CONFIG_QTNFMAC_PCIE * add support for QSR1000/QSR2000 (Topaz) family of chipsets ath10k * add support for WCN3990 firmware crash recovery * add firmware memory dump support for QCA4019 wil6210 * add firmware error recovery while in AP mode ath9k * remove experimental notice from dynack feature iwlwifi * PCI IDs for some new 9000-series cards * improve antenna usage on connection problems * new firmware debugging infrastructure * some more work on 802.11ax * improve support for multiple RF modules with 22000 devices cordic * move cordic macros and defines to a public header file * convert brcmsmac and b43 to fully use cordic library ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03rhashtable: detect when object movement between tables might have ↵NeilBrown
invalidated a lookup Some users of rhashtables might need to move an object from one table to another - this appears to be the reason for the incomplete usage of NULLS markers. To support these, we store a unique NULLS_MARKER at the end of each chain, and when a search fails to find a match, we check if the NULLS marker found was the expected one. If not, the search may not have examined all objects in the target bucket, so it is repeated. The unique NULLS_MARKER is derived from the address of the head of the chain. As this cannot be derived at load-time the static rhnull in rht_bucket_nested() needs to be initialised at run time. Any caller of a lookup function must still be prepared for the possibility that the object returned is in a different table - it might have been there for some time. Note that this does NOT provide support for other uses of NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing the key of an object and re-inserting it in the same table. These could only be done safely if new objects were inserted at the *start* of a hash chain, and that is not currently the case. Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03Merge 4.20-rc5 into driver-core-nextGreg Kroah-Hartman
We need the fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-30debugobjects: avoid recursive calls with kmemleakQian Cai
CONFIG_DEBUG_OBJECTS_RCU_HEAD does not play well with kmemleak due to recursive calls. fill_pool kmemleak_ignore make_black_object put_object __call_rcu (kernel/rcu/tree.c) debug_rcu_head_queue debug_object_activate debug_object_init fill_pool kmemleak_ignore make_black_object ... So add SLAB_NOLEAKTRACE to kmem_cache_create() to not register newly allocated debug objects at all. Link: http://lkml.kernel.org/r/20181126165343.2339-1-cai@gmx.us Signed-off-by: Qian Cai <cai@gmx.us> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Waiman Long <longman@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30lib/test_kmod.c: fix rmmod double freeLuis Chamberlain
We free the misc device string twice on rmmod; fix this. Without this we cannot remove the module without crashing. Link: http://lkml.kernel.org/r/20181124050500.5257-1-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> [4.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30sbitmap: optimize wakeup checkJens Axboe
Even if we have no waiters on any of the sbitmap_queue wait states, we still have to loop every entry to check. We do this for every IO, so the cost adds up. Shift a bit of the cost to the slow path, when we actually have waiters. Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maintain an internal count of how many are currently active. Then we can simply check this count in sbq_wake_ptr() and not have to loop if we don't have any sleepers. Convert the two users of sbitmap with waiting, blk-mq-tag and iSCSI. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-30sbitmap: ammortize cost of clearing bitsJens Axboe
sbitmap maintains a set of words that we use to set and clear bits, with each bit representing a tag for blk-mq. Even though we spread the bits out and maintain a hint cache, one particular bit allocated will end up being cleared in the exact same spot. This introduces batched clearing of bits. Instead of clearing a given bit, the same bit is set in a cleared/free mask instead. If we fail allocating a bit from a given word, then we check the free mask, and batch move those cleared bits at that time. This trades 64 atomic bitops for 2 cmpxchg(). In a threaded poll test case, half the overhead of getting and clearing tags is removed with this change. On another poll test case with a single thread, performance is unchanged. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-30test_hexdump: use memcpy instead of strncpyLinus Torvalds
New versions of gcc reasonably warn about the odd pattern of strncpy(p, q, strlen(q)); which really doesn't make sense: the strncpy() ends up being just a slow and odd way to write memcpy() in this case. Apparently there was a patch for this floating around earlier, but it got lost. Acked-again-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "Assorted fixes all over the place. The iov_iter one is this cycle regression (splice from UDP triggering WARN_ON()), the rest is older" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Use d_instantiate() rather than d_add() and don't d_drop() afs: Fix missing net error handling afs: Fix validation/callback interaction iov_iter: teach csum_and_copy_to_iter() to handle pipe-backed ones exportfs: do not read dentry after free exportfs: fix 'passing zero to ERR_PTR()' warning aio: fix failure to put the file pointer sysv: return 'err' instead of 0 in __sysv_write_inode
2018-11-30s390: use common bust_spinlocks()Sergey Senozhatsky
s390 is the only architecture that is using own bust_spinlocks() variant, while other arch-s seem to be OK with the common implementation. Heiko Carstens [1] said he would prefer s390 to use the common bust_spinlocks() as well: I did some code archaeology and this function is unchanged since ~17 years. When it was introduced it was close to identical to the x86 variant. All other architectures use the common code variant in the meantime. So if we change this I'd prefer that we switch s390 to the common code variant as well. Right now I can't see a reason for not doing that This patch removes s390 bust_spinlocks() and drops the weak attribute from the common bust_spinlocks() version. [1] lkml.kernel.org/r/20181025062800.GB4037@osiris Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-29sbitmap: don't loop for find_next_zero_bit() for !round_robinJens Axboe
If we aren't forced to do round robin tag allocation, just use the allocation hint to find the index for the tag word, don't use it for the offset inside the word. This avoids a potential extra round trip in the bit looping, and since we're fetching this cacheline, we may as well check the whole word from the start. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-29lib: cordic: Move cordic macros and defines to header filePriit Laes
Now that these macros are in header file, we can eventually clean up the duplicate macros present in the drivers that utilize the same cordic algorithm implementation. Also add CORDIC_ prefix to nonprefixed macros. Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Priit Laes <plaes@plaes.org> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Trivial conflict in net/core/filter.c, a locally computed 'sdif' is now an argument to the function. Signed-off-by: David S. Miller <davem@davemloft.net>