summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-12-22Revert "vfs: Allow userns root to call mknod on owned filesystems."Christian Brauner
This reverts commit 55956b59df336f6738da916dbb520b6e37df9fbd. commit 55956b59df33 ("vfs: Allow userns root to call mknod on owned filesystems.") enabled mknod() in user namespaces for userns root if CAP_MKNOD is available. However, these device nodes are useless since any filesystem mounted from a non-initial user namespace will set the SB_I_NODEV flag on the filesystem. Now, when a device node s created in a non-initial user namespace a call to open() on said device node will fail due to: bool may_open_dev(const struct path *path) { return !(path->mnt->mnt_flags & MNT_NODEV) && !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV); } The problem with this is that as of the aforementioned commit mknod() creates partially functional device nodes in non-initial user namespaces. In particular, it has the consequence that as of the aforementioned commit open() will be more privileged with respect to device nodes than mknod(). Before it was the other way around. Specifically, if mknod() succeeded then it was transparent for any userspace application that a fatal error must have occured when open() failed. All of this breaks multiple userspace workloads and a widespread assumption about how to handle mknod(). Basically, all container runtimes and systemd live by the slogan "ask for forgiveness not permission" when running user namespace workloads. For mknod() the assumption is that if the syscall succeeds the device nodes are useable irrespective of whether it succeeds in a non-initial user namespace or not. This logic was chosen explicitly to allow for the glorious day when mknod() will actually be able to create fully functional device nodes in user namespaces. A specific problem people are already running into when running 4.18 rc kernels are failing systemd services. For any distro that is run in a container systemd services started with the PrivateDevices= property set will fail to start since the device nodes in question cannot be opened (cf. the arguments in [1]). Full disclosure, Seth made the very sound argument that it is already possible to end up with partially functional device nodes. Any filesystem mounted with MS_NODEV set will allow mknod() to succeed but will not allow open() to succeed. The difference to the case here is that the MS_NODEV case is transparent to userspace since it is an explicitly set mount option while the SB_I_NODEV case is an implicit property enforced by the kernel and hence opaque to userspace. [1]: https://github.com/systemd/systemd/pull/9483 Signed-off-by: Christian Brauner <christian@brauner.io> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-22x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP ↵Sai Praneeth Prakhya
and EFI_MIXED_MODE The following commit: d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions from efi_pgd") forgets to take two EFI modes into consideration, namely EFI_OLD_MEMMAP and EFI_MIXED_MODE: - EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir using ioremap() and init_memory_mapping(). This feature can be enabled by passing "efi=old_map" as kernel command line argument. But, efi_unmap_pages() unmaps EFI boot services code/data regions *only* from efi_pgd and hence cannot be used for unmapping EFI boot services code/data regions from swapper_pg_dir. Introduce a temporary fix to not unmap EFI boot services code/data regions when EFI_OLD_MEMMAP is enabled while working on a real fix. - EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a 64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE, all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY, EFI_LOADER_<CODE/DATA>, EFI_BOOT_SERVICES_<CODE/DATA> and EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to facilitate EFI runtime calls access it's arguments in 1:1 mode. Hence, don't unmap EFI boot services code/data regions when booted in mixed mode. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20181222022234.7573-1-sai.praneeth.prakhya@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-22dma-mapping: fix flags in dma_alloc_wcChristoph Hellwig
We really need the writecombine flag in dma_alloc_wc, fix a stupid oversight. Fixes: 7ed1d91a9e ("dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "4 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, page_alloc: fix has_unmovable_pages for HugePages fork,memcg: fix crash in free_thread_stack on memcg charge fail mm: thp: fix flags for pmd migration when split mm, memory_hotplug: initialize struct pages for the full memory section
2018-12-21mm, page_alloc: fix has_unmovable_pages for HugePagesOscar Salvador
While playing with gigantic hugepages and memory_hotplug, I triggered the following #PF when "cat memoryX/removable": BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1481 Comm: cat Tainted: G E 4.20.0-rc6-mm1-1-default+ #18 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:has_unmovable_pages+0x154/0x210 Call Trace: is_mem_section_removable+0x7d/0x100 removable_show+0x90/0xb0 dev_attr_show+0x1c/0x50 sysfs_kf_seq_show+0xca/0x1b0 seq_read+0x133/0x380 __vfs_read+0x26/0x180 vfs_read+0x89/0x140 ksys_read+0x42/0x90 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is we do not pass the Head to page_hstate(), and so, the call to compound_order() in page_hstate() returns 0, so we end up checking all hstates's size to match PAGE_SIZE. Obviously, we do not find any hstate matching that size, and we return NULL. Then, we dereference that NULL pointer in hugepage_migration_supported() and we got the #PF from above. Fix that by getting the head page before calling page_hstate(). Also, since gigantic pages span several pageblocks, re-adjust the logic for skipping pages. While are it, we can also get rid of the round_up(). [osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal] Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21fork,memcg: fix crash in free_thread_stack on memcg charge failRik van Riel
Commit 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") will result in fork failing if allocating a kernel stack for a task in dup_task_struct exceeds the kernel memory allowance for that cgroup. Unfortunately, it also results in a crash. This is due to the code jumping to free_stack and calling free_thread_stack when the memcg kernel stack charge fails, but without tsk->stack pointing at the freshly allocated stack. This in turn results in the vfree_atomic in free_thread_stack oopsing with a backtrace like this: #5 [ffffc900244efc88] die at ffffffff8101f0ab #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86 #7 [ffffc900244efce0] general_protection at ffffffff818ff082 [exception RIP: llist_add_batch+7] RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000 RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0 R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37 #10 [ffffc900244efe98] _do_fork at ffffffff810884e0 #11 [ffffc900244eff10] sys_vfork at ffffffff810887ff #12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43 RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948 RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421 RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00 R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040 R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018 ORIG_RAX: 000000000000003a CS: 0033 SS: 002b The simplest fix is to assign tsk->stack right where it is allocated. Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21mm: thp: fix flags for pmd migration when splitPeter Xu
When splitting a huge migrating PMD, we'll transfer all the existing PMD bits and apply them again onto the small PTEs. However we are fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound() while actually they don't make sense at all when it's a migration entry. Fix them up. Since at it, drop the ifdef together as not needed. Note that if my understanding is correct about the problem then if without the patch there is chance to lose some of the dirty bits in the migrating pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset instead of bit 2) and it could potentially corrupt the memory of an userspace program which depends on the dirty bit. Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21mm, memory_hotplug: initialize struct pages for the full memory sectionMikhail Zaslonko
If memory end is not aligned with the sparse memory section boundary, the mapping of such a section is only partly initialized. This may lead to VM_BUG_ON due to uninitialized struct page access from is_mem_section_removable() or test_pages_in_a_zone() function triggered by memory_hotplug sysfs handlers: Here are the the panic examples: CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y kernel parameter mem=2050M -------------------------- page:000003d082008000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ( test_pages_in_a_zone+0xde/0x160) show_valid_zones+0x5c/0x190 dev_attr_show+0x34/0x70 sysfs_kf_seq_show+0xc8/0x148 seq_read+0x204/0x480 __vfs_read+0x32/0x178 vfs_read+0x82/0x138 ksys_read+0x5a/0xb0 system_call+0xdc/0x2d8 Last Breaking-Event-Address: test_pages_in_a_zone+0xde/0x160 Kernel panic - not syncing: Fatal exception: panic_on_oops kernel parameter mem=3075M -------------------------- page:000003d08300c000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ( is_mem_section_removable+0xb4/0x190) show_mem_removable+0x9a/0xd8 dev_attr_show+0x34/0x70 sysfs_kf_seq_show+0xc8/0x148 seq_read+0x204/0x480 __vfs_read+0x32/0x178 vfs_read+0x82/0x138 ksys_read+0x5a/0xb0 system_call+0xdc/0x2d8 Last Breaking-Event-Address: is_mem_section_removable+0xb4/0x190 Kernel panic - not syncing: Fatal exception: panic_on_oops Fix the problem by initializing the last memory section of each zone in memmap_init_zone() till the very end, even if it goes beyond the zone end. Michal said: : This has alwways been problem AFAIU. It just went unnoticed because we : have zeroed memmaps during allocation before f7f99100d8d9 ("mm: stop : zeroing memory during allocation in vmemmap") and so the above test : would simply skip these ranges as belonging to zone 0 or provided a : garbage. : : So I guess we do care for post f7f99100d8d9 kernels mostly and : therefore Fixes: f7f99100d8d9 ("mm: stop zeroing memory during : allocation in vmemmap") Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: "Just some small fixes here and there, and a refcount leak in a serial driver, nothing serious" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: serial/sunsu: fix refcount leak sparc: Set "ARCH: sunxx" information on the same line sparc: vdso: Drop implicit common-page-size linker flag
2018-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull more networking fixes from David Miller: "Some more bug fixes have trickled in, we have: 1) Local MAC entries properly in mscc driver, from Allan W. Nielsen. 2) Eric Dumazet found some more of the typical "pskb_may_pull() --> oops forgot to reload the header pointer" bugs in ipv6 tunnel handling. 3) Bad SKB socket pointer in ipv6 fragmentation handling, from Herbert Xu. 4) Overflow fix in sk_msg_clone(), from Vakul Garg. 5) Validate address lengths in AF_PACKET, from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup qmi_wwan: Add support for Fibocom NL678 series tls: Do not call sk_memcopy_from_iter with zero length ipv6: tunnels: fix two use-after-free Prevent overflow of sk_msg in sk_msg_clone() packet: validate address length net: netxen: fix a missing check and an uninitialized use tcp: fix a race in inet_diag_dump_icsk() MAINTAINERS: update cxgb4 and cxgb3 maintainer ipv6: frags: Fix bogus skb->sk in reassembled packets mscc: Configured MAC entries should be locked.
2018-12-21auxdisplay: charlcd: fix x/y command parsingMans Rullgard
The x/y command parsing has been broken since commit 129957069e6a ("staging: panel: Fixed checkpatch warning about simple_strtoul()"). Commit b34050fadb86 ("auxdisplay: charlcd: Fix and clean up handling of x/y commands") fixed some problems by rewriting the parsing code, but also broke things further by removing the check for a complete command before attempting to parse it. As a result, parsing is terminated at the first x or y character. This reinstates the check for a final semicolon. Whereas the original code use strchr(), this is wasteful seeing as the semicolon is always at the end of the buffer. Thus check this character directly instead. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-12-21serial/sunsu: fix refcount leakYangtao Li
The function of_find_node_by_path() acquires a reference to the node returned by it and that reference needs to be dropped by its caller. su_get_type() doesn't do that. The match node are used as an identifier to compare against the current node, so we can directly drop the refcount after getting the node from the path as it is not used as pointer. Fix this by use a single variable and drop the refcount right after of_find_node_by_path(). Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21sparc: Set "ARCH: sunxx" information on the same lineCorentin Labbe
While checking boot log from SPARC qemu, I saw that the "ARCH: sunxx" information was split on two different line. This patchs merge both line together. In the meantime, thoses information need to be printed via pr_info since printk print them by default via the warning loglevel. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21sparc: vdso: Drop implicit common-page-size linker flagndesaulniers@google.com
GNU linker's -z common-page-size's default value is based on the target architecture. arch/sparc/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it. Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fix from Paolo Bonzini: "A simple patch for a pretty bad bug: Unbreak AMD nested virtualization." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: nSVM: fix switch to guest mmu
2018-12-21qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixupDaniele Palmas
This patch fixes qmap header retrieval when modem is configured for dl data aggregation. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a division by zero crash in the posix-timers code" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-timers: Fix division by zero bug
2018-12-21qmi_wwan: Add support for Fibocom NL678 seriesJörgen Storvist
Added support for Fibocom NL678 series cellular module QMI interface. Using QMI_QUIRK_SET_DTR required for Qualcomm MDM9x40 series chipsets. Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21tls: Do not call sk_memcopy_from_iter with zero lengthVakul Garg
In some conditions e.g. when tls_clone_plaintext_msg() returns -ENOSPC, the number of bytes to be copied using subsequent function sk_msg_memcopy_from_iter() becomes zero. This causes function sk_msg_memcopy_from_iter() to fail which in turn causes tls_sw_sendmsg() to return failure. To prevent it, do not call sk_msg_memcopy_from_iter() when number of bytes to copy (indicated by 'try_to_copy') is zero. Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex fix from Ingo Molnar: "A single fix for a robust futexes race between sys_exit() and sys_futex_lock_pi()" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Cure exit race
2018-12-21Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "The biggest part is a series of reverts for the macro based GCC inlining workarounds. It caused regressions in distro build and other kernel tooling environments, and the GCC project was very receptive to fixing the underlying inliner weaknesses - so as time ran out we decided to do a reasonably straightforward revert of the patches. The plan is to rely on the 'asm inline' GCC 9 feature, which might be backported to GCC 8 and could thus become reasonably widely available on modern distros. Other than those reverts, there's misc fixes from all around the place. I wish our final x86 pull request for v4.20 was smaller..." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs" Revert "x86/objtool: Use asm macros to work around GCC inlining bugs" Revert "x86/refcount: Work around GCC inlining bug" Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs" Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs" Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops" Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs" Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs" Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs" x86/mtrr: Don't copy uninitialized gentry fields back to userspace x86/fsgsbase/64: Fix the base write helper functions x86/mm/cpa: Fix cpa_flush_array() TLB invalidation x86/vdso: Pass --eh-frame-hdr to the linker x86/mm: Fix decoy address handling vs 32-bit builds x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence x86/dump_pagetables: Fix LDT remap address marker x86/mm: Fix guard hole handling
2018-12-21ipv6: tunnels: fix two use-after-freeEric Dumazet
xfrm6_policy_check() might have re-allocated skb->head, we need to reload ipv6 header pointer. sysbot reported : BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40 Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304 CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ #356 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40 ipv6_addr_type include/net/ipv6.h:403 [inline] ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443 IPVS: ftp: loaded support on port[0] = 21 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537 dst_input include/net/dst.h:450 [inline] ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:289 [inline] ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083 process_backlog+0x24e/0x7a0 net/core/dev.c:5923 napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412 __do_softirq+0x308/0xb7e kernel/softirq.c:292 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027 </IRQ> do_softirq.part.14+0x126/0x160 kernel/softirq.c:337 do_softirq+0x19/0x20 kernel/softirq.c:340 netif_rx_ni+0x521/0x860 net/core/dev.c:4569 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:278 [inline] ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171 dst_output include/net/dst.h:444 [inline] ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline] rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945 kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>' kobject: 'queues' (0000000089e6eea2): kobject_uevent_env inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop! sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 sock_write_iter+0x35e/0x5c0 net/socket.c:900 call_write_iter include/linux/fs.h:1857 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6b8/0x9f0 fs/read_write.c:487 kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues' kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env vfs_write+0x1fc/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0' __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues' entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457669 Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669 RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003 kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4 R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff Allocated by task 1304: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 __do_kmalloc_node mm/slab.c:3684 [inline] __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140 __alloc_skb+0x155/0x760 net/core/skbuff.c:208 kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0' alloc_skb include/linux/skbuff.h:1011 [inline] __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116 __sys_sendmsg+0x11d/0x280 net/socket.c:2154 __do_sys_sendmsg net/socket.c:2163 [inline] __se_sys_sendmsg net/socket.c:2161 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices' Freed by task 1304: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xcf/0x230 mm/slab.c:3817 skb_free_head+0x93/0xb0 net/core/skbuff.c:553 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896 pskb_may_pull include/linux/skbuff.h:2188 [inline] _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272 kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline] xfrm_policy_check include/net/xfrm.h:1175 [inline] xfrm6_policy_check include/net/xfrm.h:1185 [inline] vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537 dst_input include/net/dst.h:450 [inline] ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:289 [inline] ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083 process_backlog+0x24e/0x7a0 net/core/dev.c:5923 kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0' napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412 __do_softirq+0x308/0xb7e kernel/softirq.c:292 The buggy address belongs to the object at ffff888191b8cac0 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 176 bytes inside of 512-byte region [ffff888191b8cac0, ffff888191b8ccc0) The buggy address belongs to the page: page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940 raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>' Memory state around the buggy address: ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path") Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge tag 'drm-fixes-2018-12-21' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull final drm fix from Daniel Vetter: "Very calm week, so either everything perfect or everyone on holidays already. Just one array_index_nospec patch, also for stable" * tag 'drm-fixes-2018-12-21' of git://anongit.freedesktop.org/drm/drm: drm/ioctl: Fix Spectre v1 vulnerabilities
2018-12-21Prevent overflow of sk_msg in sk_msg_clone()Vakul Garg
Fixed function sk_msg_clone() to prevent overflow of 'dst' while adding pages in scatterlist entries. The overflow of 'dst' causes crash in kernel tls module while doing record encryption. Crash fixed by this patch. [ 78.796119] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 78.804900] Mem abort info: [ 78.807683] ESR = 0x96000004 [ 78.810744] Exception class = DABT (current EL), IL = 32 bits [ 78.816677] SET = 0, FnV = 0 [ 78.819727] EA = 0, S1PTW = 0 [ 78.822873] Data abort info: [ 78.825759] ISV = 0, ISS = 0x00000004 [ 78.829600] CM = 0, WnR = 0 [ 78.832576] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000bf8ee311 [ 78.839195] [0000000000000008] pgd=0000000000000000 [ 78.844081] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 78.849642] Modules linked in: tls xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables xt_CHECKSUM cpve cpufreq_conservative lm90 ina2xx crct10dif_ce [ 78.865377] CPU: 0 PID: 6007 Comm: openssl Not tainted 4.20.0-rc6-01647-g754d5da63145-dirty #107 [ 78.874149] Hardware name: LS1043A RDB Board (DT) [ 78.878844] pstate: 60000005 (nZCv daif -PAN -UAO) [ 78.883632] pc : scatterwalk_copychunks+0x164/0x1c8 [ 78.888500] lr : scatterwalk_copychunks+0x160/0x1c8 [ 78.893366] sp : ffff00001d04b600 [ 78.896668] x29: ffff00001d04b600 x28: ffff80006814c680 [ 78.901970] x27: 0000000000000000 x26: ffff80006c8de786 [ 78.907272] x25: ffff00001d04b760 x24: 000000000000001a [ 78.912573] x23: 0000000000000006 x22: ffff80006814e440 [ 78.917874] x21: 0000000000000100 x20: 0000000000000000 [ 78.923175] x19: 000081ffffffffff x18: 0000000000000400 [ 78.928476] x17: 0000000000000008 x16: 0000000000000000 [ 78.933778] x15: 0000000000000100 x14: 0000000000000001 [ 78.939079] x13: 0000000000001080 x12: 0000000000000020 [ 78.944381] x11: 0000000000001080 x10: 00000000ffff0002 [ 78.949683] x9 : ffff80006814c248 x8 : 00000000ffff0000 [ 78.954985] x7 : ffff80006814c318 x6 : ffff80006c8de786 [ 78.960286] x5 : 0000000000000f80 x4 : ffff80006c8de000 [ 78.965588] x3 : 0000000000000000 x2 : 0000000000001086 [ 78.970889] x1 : ffff7e0001b74e02 x0 : 0000000000000000 [ 78.976192] Process openssl (pid: 6007, stack limit = 0x00000000291367f9) [ 78.982968] Call trace: [ 78.985406] scatterwalk_copychunks+0x164/0x1c8 [ 78.989927] skcipher_walk_next+0x28c/0x448 [ 78.994099] skcipher_walk_done+0xfc/0x258 [ 78.998187] gcm_encrypt+0x434/0x4c0 [ 79.001758] tls_push_record+0x354/0xa58 [tls] [ 79.006194] bpf_exec_tx_verdict+0x1e4/0x3e8 [tls] [ 79.010978] tls_sw_sendmsg+0x650/0x780 [tls] [ 79.015326] inet_sendmsg+0x2c/0xf8 [ 79.018806] sock_sendmsg+0x18/0x30 [ 79.022284] __sys_sendto+0x104/0x138 [ 79.025935] __arm64_sys_sendto+0x24/0x30 [ 79.029936] el0_svc_common+0x60/0xe8 [ 79.033588] el0_svc_handler+0x2c/0x80 [ 79.037327] el0_svc+0x8/0xc [ 79.040200] Code: 6b01005f 54fff788 940169b1 f9000320 (b9400801) [ 79.046283] ---[ end trace 74db007d069c1cf7 ]--- Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21packet: validate address lengthWillem de Bruijn
Packet sockets with SOCK_DGRAM may pass an address for use in dev_hard_header. Ensure that it is of sufficient length. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Switching a few devices with Synaptics over to SMbus and disabling SMbus on a couple devices with Elan touchpads as they need more plumbing on PS/2 side" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics - enable SMBus for HP EliteBook 840 G4 Input: elantech - disable elan-i2c for P52 and P72 Input: synaptics - enable RMI on ThinkPad T560 Input: omap-keypad - fix idle configuration to not block SoC idle states
2018-12-21Merge tag 'gpio-v4.20-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Hopefully last round of GPIO fixes. The ACPI patch is pretty important for some laptop users, the rest is driver-specific for embedded (mostly ARM) systems. I took out one ACPI patch that wasn't critical enough because I couldn't justify sending it at this point, and that is why the commit date is today, but the patches have been in linux-next. Sorry for not sending some of them earlier :( Notice that we have a co-maintainer for GPIO now, Bartosz Golaszewski, and he might jump in and make some pull requests at times when I am off. Summary: - ACPI IRQ request deferral - OMAP: revert deferred wakeup quirk - MAX7301: fix DMA safe memory handling - MVEBU: selective probe failure on missing clk" * tag 'gpio-v4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: mvebu: only fail on missing clk if pwm is actually to be used gpio: max7301: fix driver for use with CONFIG_VMAP_STACK gpio: gpio-omap: Revert deferred wakeup quirk handling for regressions gpiolib-acpi: Only defer request_irq for GpioInt ACPI event handlers
2018-12-21net: netxen: fix a missing check and an uninitialized useKangjie Lu
When netxen_rom_fast_read() fails, "bios" is left uninitialized and may contain random value, thus should not be used. The fix ensures that if netxen_rom_fast_read() fails, we return "-EIO". Signed-off-by: Kangjie Lu <kjlu@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge tag '4.20-rc7-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb3 fix from Steve French: "An important smb3 fix for an regression to some servers introduced by compounding optimization to rmdir. This fix has been tested by multiple developers (including me) with the usual private xfstesting, but also by the new cifs/smb3 "buildbot" xfstest VMs (thank you Ronnie and Aurelien for good work on this automation). The automated testing has been updated so that it will catch problems like this in the future. Note that Pavel discovered (very recently) some unrelated but extremely important bugs in credit handling (smb3 flow control problem that can lead to disconnects/reconnects) when compounding, that I would have liked to send in ASAP but the complete testing of those two fixes may not be done in time and have to wait for 4.21" * tag '4.20-rc7-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: Fix rmdir compounding regression to strict servers
2018-12-21RISC-V: Select GENERIC_SCHED_CLOCK for clocksource driversAnup Patel
The riscv_timer driver can provide sched_clock using "rdtime" instruction but to achieve this we require generic sched_clock framework hence this patch selects GENERIC_SCHED_CLOCK for RISCV. Signed-off-by: Anup Patel <anup@brainfault.org> Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21RISC-V: lib: minor asm cleanupOlof Johansson
Fix tab/space conversion and use ENTRY/ENDPROC macros. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21RISC-V: Move from EARLY_PRINTK to SBI earlyconPalmer Dabbelt
Now that we have earlycon support in the SBI console driver there is no reason to have our arch-specific early printk support. This patch set turns on SBI earlycon support and removes the old early printk.
2018-12-21RISC-V: Update Kconfig to better handle CMDLINENick Kossifidis
Added a menu to choose how the built-in command line will be used and CMDLINE_EXTEND for compatibility with FDT code. v2: Improved help messages, removed references to bootloader and made them more descriptive. I also asked help from a friend who's a language expert just in case. v3: This time used the corrected text v4: Copy the config strings from the arm32 port. v5: Actually copy the config strings from the arm32 port. Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Debbie Maliotaki <dmaliotaki@gmail.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21riscv: remove unused variable in ftraceDavid Abdurachmanov
Noticed while building kernel-4.20.0-0.rc5.git2.1.fc30 for Fedora 30/RISCV. [..] BUILDSTDERR: arch/riscv/kernel/ftrace.c: In function 'prepare_ftrace_return': BUILDSTDERR: arch/riscv/kernel/ftrace.c:135:6: warning: unused variable 'err' [-Wunused-variable] BUILDSTDERR: int err; BUILDSTDERR: ^~~ [..] Signed-off-by: David Abdurachmanov <david.abdurachmanov@gmail.com> Fixes: e949b6db51dc1 ("riscv/function_graph: Simplify with function_graph_enter()") Reviewed-by: Olof Johansson <olof@lixom.net> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21RISC-V: add of_node_put()Yangtao Li
use of_node_put() to release the refcount. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21RISC-V: Fix of_node_* refcountAtish Patra
Fix of_node* refcount at various places by using of_node_put. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21riscv, atomic: Add #define's for the atomic_{cmp,}xchg_*() variantsAndrea Parri
If an architecture does not define the atomic_{cmp,}xchg_*() variants, the generic implementation defaults them to the fully-ordered version. riscv's had its own variants since "the beginning", but it never told (#define-d these for) the generic implementation: it is time to do so. Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-21Merge remote-tracking branch 'regulator/topic/coupled' into regulator-nextMark Brown
2018-12-21Merge branch 'regulator-4.21' into regulator-nextMark Brown
2018-12-21Merge branch 'regulator-4.20' into regulator-linusMark Brown
2018-12-21KVM: x86: Add CPUID support for new instruction WBNOINVDRobert Hoo
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21kvm: selftests: ucall: fix exit mmio address guessingAndrew Jones
Fix two more bugs in the exit_mmio address guessing. The first bug was that the start and step calculations were wrong since they were dividing the number of address bits instead of the address space. The second other bug was that the guessing algorithm wasn't considering the valid physical and virtual address ranges correctly for an identity map. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21Revert "compiler-gcc: disable -ftracer for __noclone functions"Sean Christopherson
The -ftracer optimization was disabled in __noclone as a workaround to GCC duplicating a blob of inline assembly that happened to define a global variable. It has been pointed out that no amount of workarounds can guarantee the compiler won't duplicate inline assembly[1], and that disabling the -ftracer optimization has several unintended and nasty side effects[2][3]. Now that the offending KVM code which required the workaround has been properly fixed and no longer uses __noclone, remove the -ftracer optimization tweak from __noclone. [1] https://lore.kernel.org/lkml/ri6y38lo23g.fsf@suse.cz/T/#u [2] https://lore.kernel.org/lkml/20181218140105.ajuiglkpvstt3qxs@treble/T/#u [3] https://patchwork.kernel.org/patch/8707981/#21817015 This reverts commit 95272c29378ee7dc15f43fa2758cb28a5913a06d. Suggested-by: Andi Kleen <ak@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Martin Jambor <mjambor@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21regulator: tps65910: fix a missing check of return valueKangjie Lu
tps65910_reg_set_bits() may fail. The fix checks if it fails, and if so, returns with its error code. Signed-off-by: Kangjie Lu <kjlu@umn.edu> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-12-21KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routinesSean Christopherson
Transitioning to/from a VMX guest requires KVM to manually save/load the bulk of CPU state that the guest is allowed to direclty access, e.g. XSAVE state, CR2, GPRs, etc... For obvious reasons, loading the guest's GPR snapshot prior to VM-Enter and saving the snapshot after VM-Exit is done via handcoded assembly. The assembly blob is written as inline asm so that it can easily access KVM-defined structs that are used to hold guest state, e.g. moving the blob to a standalone assembly file would require generating defines for struct offsets. The other relevant aspect of VMX transitions in KVM is the handling of VM-Exits. KVM doesn't employ a separate VM-Exit handler per se, but rather treats the VMX transition as a mega instruction (with many side effects), i.e. sets the VMCS.HOST_RIP to a label immediately following VMLAUNCH/VMRESUME. The label is then exposed to C code via a global variable definition in the inline assembly. Because of the global variable, KVM takes steps to (attempt to) ensure only a single instance of the owning C function, e.g. vmx_vcpu_run, is generated by the compiler. The earliest approach placed the inline assembly in a separate noinline function[1]. Later, the assembly was folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which is still used today. After moving to __noclone, an edge case was encountered where GCC's -ftracer optimization resulted in the inline assembly blob being duplicated. This was "fixed" by explicitly disabling -ftracer in the __noclone definition[4]. Recently, it was found that disabling -ftracer causes build warnings for unsuspecting users of __noclone[5], and more importantly for KVM, prevents the compiler for properly optimizing vmx_vcpu_run()[6]. And perhaps most importantly of all, it was pointed out that there is no way to prevent duplication of a function with 100% reliability[7], i.e. more edge cases may be encountered in the future. So to summarize, the only way to prevent the compiler from duplicating the global variable definition is to move the variable out of inline assembly, which has been suggested several times over[1][7][8]. Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and VM-Exit "handler" to standalone assembly sub-routines. Moving only the core VMX transition codes allows the struct indexing to remain as inline assembly and also allows the sub-routines to be used by nested_vmx_check_vmentry_hw(). Reusing the sub-routines has a happy side-effect of eliminating two VMWRITEs in the nested_early_check path as there is no longer a need to dynamically change VMCS.HOST_RIP. Note that callers to vmx_vmenter() must account for the CALL modifying RSP, e.g. must subtract op-size from RSP when synchronizing RSP with VMCS.HOST_RSP and "restore" RSP prior to the CALL. There are no great alternatives to fudging RSP. Saving RSP in vmx_enter() is difficult because doing so requires a second register (VMWRITE does not provide an immediate encoding for the VMCS field and KVM supports Hyper-V's memory-based eVMCS ABI). The other more drastic alternative would be to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu variable (which can be encoded as e.g. gs:[imm]). But because a valid stack is needed at the time of VM-Exit (NMIs aren't blocked and a user could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a dedicated per-cpu VM-Exit stack would be required. A dedicated stack isn't difficult to implement, but it would require at least one page per CPU and knowledge of the stack in the dumpstack routines. And in most cases there is essentially zero overhead in dynamically updating VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first VMLAUNCH unless nested_early_check=1, which is not a fast path. In other words, avoiding the VMCS.HOST_RSP by using a dedicated stack would only make the code marginally less ugly while requiring at least one page per CPU and forcing the kernel to be aware (and approve) of the VM-Exit stack shenanigans. [1] cea15c24ca39 ("KVM: Move KVM context switch into own function") [2] a3b5ba49a8c5 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run") [3] 104f226bfd0a ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()") [4] 95272c29378e ("compiler-gcc: disable -ftracer for __noclone functions") [5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble [6] https://patchwork.kernel.org/patch/8707981/#21817015 [7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz [8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.com Suggested-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Martin Jambor <mjambor@suse.cz> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Martin Jambor <mjambor@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobsSean Christopherson
Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the 'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run() and nested_vmx_check_vmentry_hw(). Using the symbolic name means that adding/removing an output parameter(s) requires "rewriting" almost all of the asm blob, which makes it nearly impossible to understand what's being changed in even the most minor patches. Opportunistically improve the code comments. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21regulator: mcp16502: Select REGMAP_I2C to fix build errorAxel Lin
Fix build error when CONFIG_REGMAP_I2C=m && CONFIG_REGULATOR_MCP16502=y. drivers/regulator/mcp16502.o: In function `mcp16502_probe': mcp16502.c:(.text+0xca): undefined reference to `__devm_regmap_init_i2c' Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-12-21Merge tag 'kvm-ppc-next-4.21-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-next Second PPC KVM update for 4.21 This has 5 commits that fix page dirty tracking when running nested HV KVM guests, from Suraj Jitindar Singh.
2018-12-21KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixupSean Christopherson
____kvm_handle_fault_on_reboot() provides a generic exception fixup handler that is used to cleanly handle faults on VMX/SVM instructions during reboot (or at least try to). If there isn't a reboot in progress, ____kvm_handle_fault_on_reboot() treats any exception as fatal to KVM and invokes kvm_spurious_fault(), which in turn generates a BUG() to get a stack trace and die. When it was originally added by commit 4ecac3fd6dc2 ("KVM: Handle virtualization instruction #UD faults during reboot"), the "call" to kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value is the RIP of the faulting instructing. The PUSH+JMP trickery is necessary because the exception fixup handler code lies outside of its associated function, e.g. right after the function. An actual CALL from the .fixup code would show a slightly bogus stack trace, e.g. an extra "random" function would be inserted into the trace, as the return RIP on the stack would point to no known function (and the unwinder will likely try to guess who owns the RIP). Unfortunately, the JMP was replaced with a CALL when the macro was reworked to not spin indefinitely during reboot (commit b7c4145ba2eb "KVM: Don't spin on virt instruction faults during reboot"). This causes the aforementioned behavior where a bogus function is inserted into the stack trace, e.g. my builds like to blame free_kvm_area(). Revert the CALL back to a JMP. The changelog for commit b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") contains nothing that indicates the switch to CALL was deliberate. This is backed up by the fact that the PUSH <insn RIP> was left intact. Note that an alternative to the PUSH+JMP magic would be to JMP back to the "real" code and CALL from there, but that would require adding a JMP in the non-faulting path to avoid calling kvm_spurious_fault() and would add no value, i.e. the stack trace would be the same. Using CALL: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0 R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000 FS: 00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0 Call Trace: free_kvm_area+0x1044/0x43ea [kvm_intel] ? vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace 9775b14b123b1713 ]--- Using JMP: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0 R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40 FS: 00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0 Call Trace: vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace f9daedb85ab3ddba ]--- Fixes: b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entrySean Christopherson
A series currently sitting in KVM's queue for 4.21 moves the bulk of KVM's VMX code to a dedicated VMX sub-directory[1]. As a result, get_maintainers.pl doesn't get any hits on the newly relocated VMX files when the script is run with --pattern-depth=1. Add all arch/x86/kvm sub-directories to the existing MAINTAINERS entry for KVM/x86 instead of arch/x86/kvm/vmx as other code, e.g. SVM, may get similar treatment in the near future. [1] https://patchwork.kernel.org/cover/10710751/ Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>