summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2019-04-05KVM: PPC: Book3S: Protect memslots while validating user addressAlexey Kardashevskiy
Guest physical to user address translation uses KVM memslots and reading these requires holding the kvm->srcu lock. However recently introduced kvmppc_tce_validate() broke the rule (see the lockdep warning below). This moves srcu_read_lock(&vcpu->kvm->srcu) earlier to protect kvmppc_tce_validate() as well. ============================= WARNING: suspicious RCU usage 5.1.0-rc2-le_nv2_aikATfstn1-p1 #380 Not tainted ----------------------------- include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-ppc/8020: #0: 0000000094972fe9 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0xdc/0x850 [kvm] stack backtrace: CPU: 44 PID: 8020 Comm: qemu-system-ppc Not tainted 5.1.0-rc2-le_nv2_aikATfstn1-p1 #380 Call Trace: [c000003fece8f740] [c000000000bcc134] dump_stack+0xe8/0x164 (unreliable) [c000003fece8f790] [c000000000181be0] lockdep_rcu_suspicious+0x130/0x170 [c000003fece8f810] [c0000000000d5f50] kvmppc_tce_to_ua+0x280/0x290 [c000003fece8f870] [c00800001a7e2c78] kvmppc_tce_validate+0x80/0x1b0 [kvm] [c000003fece8f8e0] [c00800001a7e3fac] kvmppc_h_put_tce+0x94/0x3e4 [kvm] [c000003fece8f9a0] [c00800001a8baac4] kvmppc_pseries_do_hcall+0x30c/0xce0 [kvm_hv] [c000003fece8fa10] [c00800001a8bd89c] kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv] [c000003fece8fae0] [c00800001a7d95dc] kvmppc_vcpu_run+0x34/0x48 [kvm] [c000003fece8fb00] [c00800001a7d56bc] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] [c000003fece8fb90] [c00800001a7c3618] kvm_vcpu_ioctl+0x460/0x850 [kvm] [c000003fece8fd00] [c00000000041c4f4] do_vfs_ioctl+0xe4/0x930 [c000003fece8fdb0] [c00000000041ce04] ksys_ioctl+0xc4/0x110 [c000003fece8fe00] [c00000000041ce78] sys_ioctl+0x28/0x80 [c000003fece8fe20] [c00000000000b5a4] system_call+0x5c/0x70 Fixes: 42de7b9e2167 ("KVM: PPC: Validate TCEs against preregistered memory page sizes", 2018-09-10) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-05KVM: PPC: Book3S HV: Perserve PSSCR FAKE_SUSPEND bit on guest exitSuraj Jitindar Singh
There is a hardware bug in some POWER9 processors where a treclaim in fake suspend mode can cause an inconsistency in the XER[SO] bit across the threads of a core, the workaround being to force the core into SMT4 when doing the treclaim. The FAKE_SUSPEND bit (bit 10) in the PSSCR is used to control whether a thread is in fake suspend or real suspend. The important difference here being that thread reconfiguration is blocked in real suspend but not fake suspend mode. When we exit a guest which was in fake suspend mode, we force the core into SMT4 while we do the treclaim in kvmppc_save_tm_hv(). However on the new exit path introduced with the function kvmhv_run_single_vcpu() we restore the host PSSCR before calling kvmppc_save_tm_hv() which means that if we were in fake suspend mode we put the thread into real suspend mode when we clear the PSSCR[FAKE_SUSPEND] bit. This means that we block thread reconfiguration and the thread which is trying to get the core into SMT4 before it can do the treclaim spins forever since it itself is blocking thread reconfiguration. The result is that that core is essentially lost. This results in a trace such as: [ 93.512904] CPU: 7 PID: 13352 Comm: qemu-system-ppc Not tainted 5.0.0 #4 [ 93.512905] NIP: c000000000098a04 LR: c0000000000cc59c CTR: 0000000000000000 [ 93.512908] REGS: c000003fffd2bd70 TRAP: 0100 Not tainted (5.0.0) [ 93.512908] MSR: 9000000302883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[SE]> CR: 22222444 XER: 00000000 [ 93.512914] CFAR: c000000000098a5c IRQMASK: 3 [ 93.512915] PACATMSCRATCH: 0000000000000001 [ 93.512916] GPR00: 0000000000000001 c000003f6cc1b830 c000000001033100 0000000000000004 [ 93.512928] GPR04: 0000000000000004 0000000000000002 0000000000000004 0000000000000007 [ 93.512930] GPR08: 0000000000000000 0000000000000004 0000000000000000 0000000000000004 [ 93.512932] GPR12: c000203fff7fc000 c000003fffff9500 0000000000000000 0000000000000000 [ 93.512935] GPR16: 2000000000300375 000000000000059f 0000000000000000 0000000000000000 [ 93.512951] GPR20: 0000000000000000 0000000000080053 004000000256f41f c000003f6aa88ef0 [ 93.512953] GPR24: c000003f6aa89100 0000000000000010 0000000000000000 0000000000000000 [ 93.512956] GPR28: c000003f9e9a0800 0000000000000000 0000000000000001 c000203fff7fc000 [ 93.512959] NIP [c000000000098a04] pnv_power9_force_smt4_catch+0x1b4/0x2c0 [ 93.512960] LR [c0000000000cc59c] kvmppc_save_tm_hv+0x40/0x88 [ 93.512960] Call Trace: [ 93.512961] [c000003f6cc1b830] [0000000000080053] 0x80053 (unreliable) [ 93.512965] [c000003f6cc1b8a0] [c00800001e9cb030] kvmhv_p9_guest_entry+0x508/0x6b0 [kvm_hv] [ 93.512967] [c000003f6cc1b940] [c00800001e9cba44] kvmhv_run_single_vcpu+0x2dc/0xb90 [kvm_hv] [ 93.512968] [c000003f6cc1ba10] [c00800001e9cc948] kvmppc_vcpu_run_hv+0x650/0xb90 [kvm_hv] [ 93.512969] [c000003f6cc1bae0] [c00800001e8f620c] kvmppc_vcpu_run+0x34/0x48 [kvm] [ 93.512971] [c000003f6cc1bb00] [c00800001e8f2d4c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] [ 93.512972] [c000003f6cc1bb90] [c00800001e8e3918] kvm_vcpu_ioctl+0x460/0x7d0 [kvm] [ 93.512974] [c000003f6cc1bd00] [c0000000003ae2c0] do_vfs_ioctl+0xe0/0x8e0 [ 93.512975] [c000003f6cc1bdb0] [c0000000003aeb24] ksys_ioctl+0x64/0xe0 [ 93.512978] [c000003f6cc1be00] [c0000000003aebc8] sys_ioctl+0x28/0x80 [ 93.512981] [c000003f6cc1be20] [c00000000000b3a4] system_call+0x5c/0x70 [ 93.512983] Instruction dump: [ 93.512986] 419dffbc e98c0000 2e8b0000 38000001 60000000 60000000 60000000 40950068 [ 93.512993] 392bffff 39400000 79290020 39290001 <7d2903a6> 60000000 60000000 7d235214 To fix this we preserve the PSSCR[FAKE_SUSPEND] bit until we call kvmppc_save_tm_hv() which will mean the core can get into SMT4 and perform the treclaim. Note kvmppc_save_tm_hv() clears the PSSCR[FAKE_SUSPEND] bit again so there is no need to explicitly do that. Fixes: 95a6432ce9038 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-03-28Merge tag 'kvmarm-fixes-for-5.1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM fixes for 5.1 - Fix THP handling in the presence of pre-existing PTEs - Honor request for PTE mappings even when THPs are available - GICv4 performance improvement - Take the srcu lock when writing to guest-controlled ITS data structures - Reset the virtual PMU in preemptible context - Various cleanups
2019-03-28KVM: x86: update %rip after emulating IOSean Christopherson
Most (all?) x86 platforms provide a port IO based reset mechanism, e.g. OUT 92h or CF9h. Userspace may emulate said mechanism, i.e. reset a vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM that it is doing a reset, e.g. Qemu jams vCPU state and resumes running. To avoid corruping %rip after such a reset, commit 0967b7bf1c22 ("KVM: Skip pio instruction when it is emulated, not executed") changed the behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the instruction prior to exiting to userspace. Full emulation doesn't need such tricks becase re-emulating the instruction will naturally handle %rip being changed to point at the reset vector. Updating %rip prior to executing to userspace has several drawbacks: - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation fails it will likely yell about the wrong address. - Single step exits to userspace for are effectively dropped as KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO. - Behavior of PIO emulation is different depending on whether it goes down the fast path or the slow path. Rather than skip the PIO instruction before exiting to userspace, snapshot the linear %rip and cancel PIO completion if the current value does not match the snapshot. For a 64-bit vCPU, i.e. the most common scenario, the snapshot and comparison has negligible overhead as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra VMREAD in this case. All other alternatives to snapshotting the linear %rip that don't rely on an explicit reset announcenment suffer from one corner case or another. For example, canceling PIO completion on any write to %rip fails if userspace does a save/restore of %rip, and attempting to avoid that issue by canceling PIO only if %rip changed then fails if PIO collides with the reset %rip. Attempting to zero in on the exact reset vector won't work for APs, which means adding more hooks such as the vCPU's MP_STATE, and so on and so forth. Checking for a linear %rip match technically suffers from corner cases, e.g. userspace could theoretically rewrite the underlying code page and expect a different instruction to execute, or the guest hardcodes a PIO reset at 0xfffffff0, but those are far, far outside of what can be considered normal operation. Fixes: 432baf60eee3 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O") Cc: <stable@vger.kernel.org> Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28x86/kvm/hyper-v: avoid spurious pending stimer on vCPU initVitaly Kuznetsov
When userspace initializes guest vCPUs it may want to zero all supported MSRs including Hyper-V related ones including HV_X64_MSR_STIMERn_CONFIG/ HV_X64_MSR_STIMERn_COUNT. With commit f3b138c5d89a ("kvm/x86: Update SynIC timers on guest entry only") we began doing stimer_mark_pending() unconditionally on every config change. The issue I'm observing manifests itself as following: - Qemu writes 0 to STIMERn_{CONFIG,COUNT} MSRs and marks all stimers as pending in stimer_pending_bitmap, arms KVM_REQ_HV_STIMER; - kvm_hv_has_stimer_pending() starts returning true; - kvm_vcpu_has_events() starts returning true; - kvm_arch_vcpu_runnable() starts returning true; - when kvm_arch_vcpu_ioctl_run() gets into (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) case: - kvm_vcpu_block() gets in 'kvm_vcpu_check_block(vcpu) < 0' and returns immediately, avoiding normal wait path; - -EAGAIN is returned from kvm_arch_vcpu_ioctl_run() immediately forcing userspace to retry. So instead of normal wait path we get a busy loop on all secondary vCPUs before they get INIT signal. This seems to be undesirable, especially given that this happens even when Hyper-V extensions are not used. Generally, it seems to be pointless to mark an stimer as pending in stimer_pending_bitmap and arm KVM_REQ_HV_STIMER as the only thing kvm_hv_process_stimers() will do is clear the corresponding bit. We may just not mark disabled timers as pending instead. Fixes: f3b138c5d89a ("kvm/x86: Update SynIC timers on guest entry only") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrsXiaoyao Li
Since MSR_IA32_ARCH_CAPABILITIES is emualted unconditionally even if host doesn't suppot it. We should move it to array emulated_msrs from arry msrs_to_save, to report to userspace that guest support this msr. Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hostsSean Christopherson
The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES regardless of hardware support under the pretense that KVM fully emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts). Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so that it's emulated on AMD hosts. Fixes: 1eaafe91a0df4 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported") Cc: stable@vger.kernel.org Reported-by: Xiaoyao Li <xiaoyao.li@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28kvm: mmu: Used range based flushing in slot_handle_level_rangeBen Gardon
Replace kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address in slot_handle_level_range. When range based flushes are not enabled kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs. This changes the behavior of many functions that indirectly use slot_handle_level_range, iff the range based flushes are enabled. The only potential problem I see with this is that kvm->tlbs_dirty will be cleared less often, however the only caller of slot_handle_level_range that checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which checks it and does a kvm_flush_remote_tlbs after calling kvm_unmap_hva_range anyway. Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and without this patch. The patch introduced no new failures. Signed-off-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supportedMasahiro Yamada
I do not see any consistency about headers_install of <linux/kvm_para.h> and <asm/kvm_para.h>. According to my analysis of Linux 5.1-rc1, there are 3 groups: [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported alpha, arm, hexagon, mips, powerpc, s390, sparc, x86 [2] <asm/kvm_para.h> is exported, but <linux/kvm_para.h> is not arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc, parisc, sh, unicore32, xtensa [3] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported csky, nds32, riscv This does not match to the actual KVM support. At least, [2] is half-baked. Nor do arch maintainers look like they care about this. For example, commit 0add53713b1c ("microblaze: Add missing kvm_para.h to Kbuild") exported <asm/kvm_para.h> to user-space in order to fix an in-kernel build error. We have two ways to make this consistent: [A] export both <linux/kvm_para.h> and <asm/kvm_para.h> for all architectures, irrespective of the KVM support [B] Match the header export of <linux/kvm_para.h> and <asm/kvm_para.h> to the KVM support My first attempt was [A] because the code looks cleaner, but Paolo suggested [B]. So, this commit goes with [B]. For most architectures, <asm/kvm_para.h> was moved to the kernel-space. I changed include/uapi/linux/Kbuild so that it checks generated asm/kvm_para.h as well as check-in ones. After this commit, there will be two groups: [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported arm, arm64, mips, powerpc, s390, x86 [2] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze, nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()Wei Yang
* nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is non-zero. * nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages() never return zero. Based on these two reasons, we can merge the two *if* clause and use the return value from kvm_mmu_calculate_mmu_pages() directly. This simplify the code and also eliminate the possibility for reader to believe nr_mmu_pages would be zero. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP ↵Krish Sadhukhan
fields According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check is performed on vmentry of L2 guests: On processors that support Intel 64 architecture, the IA32_SYSENTER_ESP field and the IA32_SYSENTER_EIP field must each contain a canonical address. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)Singh, Brijesh
Errata#1096: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then log the message and request a guest shutdown. Reported-by: Venkatesh Srinivas <venkateshs@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'Sean Christopherson
The cr4_pae flag is a bit of a misnomer, its purpose is really to track whether the guest PTE that is being shadowed is a 4-byte entry or an 8-byte entry. Prior to supporting nested EPT, the size of the gpte was reflected purely by CR4.PAE. KVM fudged things a bit for direct sptes, but it was mostly harmless since the size of the gpte never mattered. Now that a spte may be tracking an indirect EPT entry, relying on CR4.PAE is wrong and ill-named. For direct shadow pages, force the gpte_size to '1' as they are always 8-byte entries; EPT entries can only be 8-bytes and KVM always uses 8-byte entries for NPT and its identity map (when running with EPT but not unrestricted guest). Likewise, nested EPT entries are always 8-bytes. Nested EPT presents a unique scenario as the size of the entries are not dictated by CR4.PAE, but neither is the shadow page a direct map. To handle this scenario, set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination, to denote a nested EPT shadow page. Use the information to avoid incorrectly zapping an unsync'd indirect page in __kvm_sync_page(). Providing a consistent and accurate gpte_size fixes a bug reported by Vitaly where fast_cr3_switch() always fails when switching from L2 to L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages, whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE. Fixes: 7dcd575520082 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPTSean Christopherson
Explicitly zero out quadrant and invalid instead of inheriting them from the root_mmu. Functionally, this patch is a nop as we (should) never set quadrant for a direct mapped (EPT) root_mmu and nested EPT is only allowed if EPT is used for L1, and the root_mmu will never be invalid at this point. Explicitly setting flags sets the stage for repurposing the legacy paging bits in role, e.g. nxe, cr0_wp, and sm{a,e}p_andnot_wp, at which point 'smm' would be the only flag to be inherited from root_mmu. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-24Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes: - Prevent potential NULL pointer dereferences in the HPET and HyperV code - Exclude the GART aperture from /proc/kcore to prevent kernel crashes on access - Use the correct macros for Cyrix I/O on Geode processors - Remove yet another kernel address printk leak - Announce microcode reload completion as requested by quite some people. Microcode loading has become popular recently. - Some 'Make Clang' happy fixlets - A few cleanups for recently added code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/gart: Exclude GART aperture from kcore x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error x86/mm/pti: Make local symbols static x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors x86/microcode: Announce reload operation's completion x86/hyperv: Prevent potential NULL pointer dereference x86/hpet: Prevent potential NULL pointer dereference x86/lib: Fix indentation issue, remove extra tab x86/boot: Restrict header scope to make Clang happy x86/mm: Don't leak kernel addresses x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
2019-03-24Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for the interrupt subsystem: - Remove secondary GIC support on systems w/o device-tree support - A set of small fixlets in various irqchip drivers - static and fall-through annotations - Kernel doc and typo fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Mark expected switch case fall-through genirq/devres: Remove excess parameter from kernel doc irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static irqchip/mbigen: Don't clear eventid when freeing an MSI irqchip/stm32: Don't set rising configuration registers at init irqchip/stm32: Don't clear rising/falling config registers at init dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support irqchip/mmp: Make mmp_irq_domain_ops static irqchip/brcmstb-l2: Make two init functions static genirq: Fix typo in comment of IRQD_MOVE_PCNTXT irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp irqchip/gic: Drop support for secondary GIC in non-DT systems irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
2019-03-23x86/gart: Exclude GART aperture from kcoreKairui Song
On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range. Accessing the GART range via /proc/kcore results in a kernel crash. vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. Apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook functions introduced in the previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
2019-03-22Merge tag 'powerpc-5.1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for a boot failure on 32-bit, introduced during the merge window. A fix for our handling of CLOCK_MONOTONIC in the 64-bit VDSO. Changing the wall clock across the Y2038 boundary could cause CLOCK_MONOTONIC to jump forward and backward. Our spectre_v2 reporting was a bit confusing due to a bug I introduced. On some systems it was reporting that the count cache was disabled and also that we were flushing the count cache on context switch. Only the former is true, and given that the count cache is disabled it doesn't make any sense to flush it. No one reported it, so presumably the presence of any mitigation is all people check for. Finally a small build fix for zsmalloc on 32-bit. Thanks to: Ben Hutchings, Christophe Leroy, Diana Craciun, Guenter Roeck, Michael Neuling" * tag 'powerpc-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/security: Fix spectre_v2 reporting powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations powerpc/6xx: fix setup and use of SPRN_SPRG_PGDIR for hash32 powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038
2019-03-22Merge branch 'x86/cpu' into x86/urgentThomas Gleixner
Merge the forgotten cleanup patch for the new file, so the mess does not propagate further.
2019-03-22x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return ↵Nathan Chancellor
an error When building with -Wsometimes-uninitialized, Clang warns: arch/x86/kernel/hw_breakpoint.c:355:2: warning: variable 'align' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] The default cannot be reached because arch_build_bp_info() initializes hw->len to one of the specified cases. Nevertheless the warning is valid and returning -EINVAL makes sure that this cannot be broken by future modifications. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/392 Link: https://lkml.kernel.org/r/20190307212756.4648-1-natechancellor@gmail.com
2019-03-22x86/mm/pti: Make local symbols staticValdis Kletnieks
With 'make C=2 W=1', sparse and gcc both complain: CHECK arch/x86/mm/pti.c arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static? arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static? CC arch/x86/mm/pti.o arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes] 605 | void pti_set_kernel_image_nonglobal(void) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated local (static) symbol. Make both static. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/27680.1552376873@turing-police
2019-03-21Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: "Mostly fixes apart from the kprobe blacklist checking which was deferred because of conflicting with a fix merged after I pinned the arm64 for-next/core branch (f2b3d8566d81 "arm64: kprobe: Always blacklist the KVM world-switch code"). Summary: - Update the kprobe blacklist checking for arm64. This was supposed to be queued during the merging window but, due to conflicts, it was deferred post -rc1 - Extend the Fujitsu erratum 010001 workaround to A64FX v1r0 - Whitelist HiSilicon Taishan v110 CPUs as not susceptible to Meltdown - Export save_stack_trace_regs() - Remove obsolete selection of MULTI_IRQ_HANDLER" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: remove obsolete selection of MULTI_IRQ_HANDLER arm64: kpti: Whitelist HiSilicon Taishan v110 CPUs arm64: Add MIDR encoding for HiSilicon Taishan CPUs arm64/stacktrace: Export save_stack_trace_regs() arm64: apply workaround on A64FX v1r0 arm64: kprobes: Use arch_populate_kprobe_blacklist() arm64: kprobes: Move exception_text check in blacklist arm64: kprobes: Remove unneeded RODATA check arm64: kprobes: Move extable address check into arch_prepare_kprobe()
2019-03-21Merge tag 'irqchip-5.1-2' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip updates for 5.1 from Marc Zyngier: - irqsteer error handling fix - GICv3 range coalescing fix - stm32 coprocessor coexistence fixes - mbigen MSI teardown fix - non-DT secondary GIC infrastructure removed - various cleanups (brcmstb-l2, mmp) - new DT bindings (r8a774c0)
2019-03-21x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processorsMatthew Whitehead
The getCx86_old() and setCx86_old() macros have been replaced with correctly working getCx86() and setCx86(), so remove these unused macros. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1552596361-8967-3-git-send-email-tedheadster@gmail.com
2019-03-21x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processorsMatthew Whitehead
There are comments in processor-cyrix.h advising you to _not_ make calls using the deprecated macros in this style: setCx86_old(CX86_CCR4, getCx86_old(CX86_CCR4) | 0x80); This is because it expands the macro into a non-functioning calling sequence. The calling order must be: outb(CX86_CCR2, 0x22); inb(0x23); From the comments: * When using the old macros a line like * setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88); * gets expanded to: * do { * outb((CX86_CCR2), 0x22); * outb((({ * outb((CX86_CCR2), 0x22); * inb(0x23); * }) | 0x88), 0x23); * } while (0); The new macros fix this problem, so use them instead. Tested on an actual Geode processor. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1552596361-8967-2-git-send-email-tedheadster@gmail.com
2019-03-21x86/microcode: Announce reload operation's completionBorislav Petkov
By popular demand, issue a single line to dmesg after the reload operation completes to let the user know that a reload has at least been attempted. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190313110022.8229-1-bp@alien8.de
2019-03-21x86/hyperv: Prevent potential NULL pointer dereferenceKangjie Lu
The page allocation in hv_cpu_init() can fail, but the code does not have a check for that. Add a check and return -ENOMEM when the allocation fails. [ tglx: Massaged changelog ] Signed-off-by: Kangjie Lu <kjlu@umn.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Acked-by: "K. Y. Srinivasan" <kys@microsoft.com> Cc: pakki001@umn.edu Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: linux-hyperv@vger.kernel.org Link: https://lkml.kernel.org/r/20190314054651.1315-1-kjlu@umn.edu
2019-03-21x86/hpet: Prevent potential NULL pointer dereferenceAditya Pakki
hpet_virt_address may be NULL when ioremap_nocache fail, but the code lacks a check. Add a check to prevent NULL pointer dereference. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: kjlu@umn.edu Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Roland Dreier <roland@purestorage.com> Link: https://lkml.kernel.org/r/20190319021958.17275-1-pakki001@umn.edu
2019-03-21x86/lib: Fix indentation issue, remove extra tabColin Ian King
The increment of buff is indented one level too deeply, clean this up by removing a tab. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20190314230838.18256-1-colin.king@canonical.com
2019-03-21x86/boot: Restrict header scope to make Clang happyNick Desaulniers
The inclusion of <linux/kernel.h> was causing issue as the definition of __arch_hweight64 from arch/x86/include/asm/arch_hweight.h eventually gets included. The definition is problematic when compiled with -m16 (all code in arch/x86/boot/ is) as the "D" inline assembly constraint is rejected by both compilers when passed an argument of type long long (regardless of signedness, anything smaller is fine). Because GCC performs inlining before semantic analysis, and __arch_hweight64 is dead in this translation unit, GCC does not report any issues at compile time. Clang does the semantic analysis in the front end, before inlining (run in the middle) can determine the code is dead. I consider this another case of PR33587, which I think we can do more work to solve. It turns out that arch/x86/boot/string.c doesn't actually need linux/kernel.h, simply linux/limits.h and linux/compiler.h. Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Cc: bp@alien8.de Cc: niravd@google.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://bugs.llvm.org/show_bug.cgi?id=33587 Link: https://github.com/ClangBuiltLinux/linux/issues/347 Link: https://lkml.kernel.org/r/20190314221458.83047-1-ndesaulniers@google.com
2019-03-21powerpc/security: Fix spectre_v2 reportingMichael Ellerman
When I updated the spectre_v2 reporting to handle software count cache flush I got the logic wrong when there's no software count cache enabled at all. The result is that on systems with the software count cache flush disabled we print: Mitigation: Indirect branch cache disabled, Software count cache flush Which correctly indicates that the count cache is disabled, but incorrectly says the software count cache flush is enabled. The root of the problem is that we are trying to handle all combinations of options. But we know now that we only expect to see the software count cache flush enabled if the other options are false. So split the two cases, which simplifies the logic and fixes the bug. We were also missing a space before "(hardware accelerated)". The result is we see one of: Mitigation: Indirect branch serialisation (kernel only) Mitigation: Indirect branch cache disabled Mitigation: Software count cache flush Mitigation: Software count cache flush (hardware accelerated) Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-20Merge tag 'arc-5.1-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC updates from Vineet Gupta: - unaligned access support for HS cores - Removed extra memory barrier around spinlock code - HSDK platform updates: enable dmac, reset - some more boot logging updates - misc minor fixes * tag 'arc-5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: arch: arc: Kconfig: pedantic formatting ARCv2: spinlock: remove the extra smp_mb before lock, after unlock ARC: unaligned: relax the check for gcc supporting -mno-unaligned-access ARC: boot log: cut down on verbosity ARCv2: boot log: refurbish HS core/release identification arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM ARC: u-boot args: check that magic number is correct ARC: perf: bpok condition only exists for ARCompact ARCv2: Add explcit unaligned access support (and ability to disable too) ARCv2: lib: introduce memcpy optimized for unaligned access ARC: [plat-hsdk]: Enable AXI DW DMAC support ARC: [plat-hsdk]: Add reset controller handle to manage USB reset ARC: DTB: [scripted] fix node name and address spelling
2019-03-20arm64: remove obsolete selection of MULTI_IRQ_HANDLERMatthias Kaehlcke
The arm64 config selects MULTI_IRQ_HANDLER, which was renamed to GENERIC_IRQ_MULTI_HANDLER by commit 4c301f9b6a94 ("ARM: Convert to GENERIC_IRQ_MULTI_HANDLER"). The 'new' option is already selected, so just remove the obsolete entry. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-20KVM: arm/arm64: Fix handling of stage2 huge mappingsSuzuki K Poulose
We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang <zhengxiang9@huawei.com> Cc: Zheng Xiang <zhengxiang9@huawei.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-21powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurationsBen Hutchings
MAX_PHYSMEM_BITS only needs to be defined if CONFIG_SPARSEMEM is enabled, and that was the case before commit 4ffe713b7587 ("powerpc/mm: Increase the max addressable memory to 2PB"). On 32-bit systems, where CONFIG_SPARSEMEM is not enabled, we now define it as 46. That is larger than the real number of physical address bits, and breaks calculations in zsmalloc: mm/zsmalloc.c:130:49: warning: right shift count is negative MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) ^~ ... mm/zsmalloc.c:253:21: error: variably modified 'size_class' at file scope struct size_class *size_class[ZS_SIZE_CLASSES]; ^~~~~~~~~~ Fixes: 4ffe713b7587 ("powerpc/mm: Increase the max addressable memory to 2PB") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-19KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memoryMarc Zyngier
When halting a guest, QEMU flushes the virtual ITS caches, which amounts to writing to the various tables that the guest has allocated. When doing this, we fail to take the srcu lock, and the kernel shouts loudly if running a lockdep kernel: [ 69.680416] ============================= [ 69.680819] WARNING: suspicious RCU usage [ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted [ 69.682096] ----------------------------- [ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! [ 69.683225] [ 69.683225] other info that might help us debug this: [ 69.683225] [ 69.683975] [ 69.683975] rcu_scheduler_active = 2, debug_locks = 1 [ 69.684598] 6 locks held by qemu-system-aar/4097: [ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0 [ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0 [ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.690729] [ 69.690729] stack backtrace: [ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18 [ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019 [ 69.692831] Call trace: [ 69.694072] lockdep_rcu_suspicious+0xcc/0x110 [ 69.694490] gfn_to_memslot+0x174/0x190 [ 69.694853] kvm_write_guest+0x50/0xb0 [ 69.695209] vgic_its_save_tables_v0+0x248/0x330 [ 69.695639] vgic_its_set_attr+0x298/0x3a0 [ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8 [ 69.696424] kvm_device_ioctl+0x8c/0xf8 [ 69.696788] do_vfs_ioctl+0xc8/0x960 [ 69.697128] ksys_ioctl+0x8c/0xa0 [ 69.697445] __arm64_sys_ioctl+0x28/0x38 [ 69.697817] el0_svc_common+0xd8/0x138 [ 69.698173] el0_svc_handler+0x38/0x78 [ 69.698528] el0_svc+0x8/0xc The fix is to obviously take the srcu lock, just like we do on the read side of things since bf308242ab98. One wonders why this wasn't fixed at the same time, but hey... Fixes: bf308242ab98 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-19KVM: arm64: Reset the PMU in preemptible contextMarc Zyngier
We've become very cautious to now always reset the vcpu when nothing is loaded on the physical CPU. To do so, we now disable preemption and do a kvm_arch_vcpu_put() to make sure we have all the state in memory (and that it won't be loaded behind out back). This now causes issues with resetting the PMU, which calls into perf. Perf itself uses mutexes, which clashes with the lack of preemption. It is worth realizing that the PMU is fully emulated, and that no PMU state is ever loaded on the physical CPU. This means we can perfectly reset the PMU outside of the non-preemptible section. Fixes: e761a927bc9a ("KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded") Reported-by: Julien Grall <julien.grall@arm.com> Tested-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-19Merge tag 'mips_fixes_5.1_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Paul Burton: "A small batch of MIPS fixes for 5.1: - An interrupt masking fix for Loongson-based Lemote 2F systems (fixing a regression from v3.19) - A relocation fix for configurations in which the devicetree is stored in an ELF section (fixing a regression from v4.7) - Fix jump labels for MIPSr6 kernels where they previously could inadvertently place a control transfer instruction in a forbidden slot & take unexpected exceptions (fixing MIPSr6 support added in v4.0) - Extend an existing USB power workaround for the Netgear WNDR3400 to v2 boards in addition to the v3 ones that already used it - Remove the custom MIPS32 definition of __kernel_fsid_t to make it consistent with MIPS64 & every other architecture, in particular resolving issues for code which tries to print the val field whose type previously differed (though had identical memory layout)" * tag 'mips_fixes_5.1_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Remove custom MIPS32 __kernel_fsid_t type mips: bcm47xx: Enable USB power on Netgear WNDR3400v2 MIPS: Fix kernel crash for R6 in jump label branch function MIPS: Ensure ELF appended dtb is relocated mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction.
2019-03-19arm64: kpti: Whitelist HiSilicon Taishan v110 CPUsHanjun Guo
HiSilicon Taishan v110 CPUs didn't implement CSV3 field of the ID_AA64PFR0_EL1 and are not susceptible to Meltdown, so whitelist the MIDR in kpti_safe_list[] table. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Zhangshaokun <zhangshaokun@hisilicon.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-19arm64: Add MIDR encoding for HiSilicon Taishan CPUsHanjun Guo
Adding the MIDR encodings for HiSilicon Taishan v110 CPUs, which is used in Kunpeng ARM64 server SoCs. TSV110 is the abbreviation of Taishan v110. Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Zhangshaokun <zhangshaokun@hisilicon.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-19arm64/stacktrace: Export save_stack_trace_regs()William Cohen
The ARM64 implements the save_stack_trace_regs function, but it is unusable for any diagnostic tooling compiled as a kernel module due the missing EXPORT_SYMBOL_GPL for the function. Export save_stack_trace_regs() to align with other architectures such as s390, openrisc, and powerpc. This is similar to the ARM64 export of save_stack_trace_tsk() added in git commit e27c7fa015d6. Signed-off-by: William Cohen <wcohen@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-19arm64: apply workaround on A64FX v1r0Mark Rutland
Fujitsu erratum 010001 applies to A64FX v0r0 and v1r0, and we try to handle either by masking MIDR with MIDR_FUJITSU_ERRATUM_010001_MASK before comparing it to MIDR_FUJITSU_ERRATUM_010001. Unfortunately, MIDR_FUJITSU_ERRATUM_010001 is constructed incorrectly using MIDR_VARIANT(), which is intended to extract the variant field from MIDR_EL1, rather than generate the field in-place. This results in MIDR_FUJITSU_ERRATUM_010001 being all-ones, and we only match A64FX v0r0. This patch uses MIDR_CPU_VAR_REV() to generate an in-place mask for the variant field, ensuring the we match both v0r0 and v1r0. Fixes: 3e32131abc311a5c ("arm64: Add workaround for Fujitsu A64FX erratum 010001") Reported-by: "Okamoto, Takayuki" <tokamoto@jp.fujitsu.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: fixed the patch author] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-19arm64: kprobes: Use arch_populate_kprobe_blacklist()Masami Hiramatsu
Use arch_populate_kprobe_blacklist() instead of arch_within_kprobe_blacklist() so that we can see the full blacklisted symbols under the debugfs. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> [catalin.marinas@arm.com: Add arch_populate_kprobe_blacklist() comment] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-19arm64: kprobes: Move exception_text check in blacklistMasami Hiramatsu
Move exception/irqentry text address check in blacklist, since those are symbol based rejection. If we prohibit probing on the symbols in exception_text, those should be blacklisted. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-19arm64: kprobes: Remove unneeded RODATA checkMasami Hiramatsu
Remove unneeded RODATA check from arch_prepare_kprobe(). Since check_kprobe_address_safe() already ensured that the probe address is in kernel text, we don't need to check whether the address in RODATA or not. That must be always false. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-19arm64: kprobes: Move extable address check into arch_prepare_kprobe()Masami Hiramatsu
Move extable address check into arch_prepare_kprobe() from arch_within_kprobe_blacklist(). The blacklist is exposed via debugfs as a list of symbols. The extable entries are smaller, so must be filtered out by arch_prepare_kprobe(). Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-19x86/mm: Don't leak kernel addressesMatteo Croce
Since commit: ad67b74d2469d9b8 ("printk: hash addresses printed with %p") at boot "____ptrval____" is printed instead of actual addresses: found SMP MP-table at [mem 0x000f5cc0-0x000f5ccf] mapped at [(____ptrval____)] Instead of changing the print to "%px", and leaking a kernel addresses, just remove the print completely, like in: 071929dbdd865f77 ("arm64: Stop printing the virtual memory layout"). Signed-off-by: Matteo Croce <mcroce@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-03-19powerpc/6xx: fix setup and use of SPRN_SPRG_PGDIR for hash32Christophe Leroy
Not only the 603 but all 6xx need SPRN_SPRG_PGDIR to be initialised at startup. This patch move it from __setup_cpu_603() to start_here() and __secondary_start(), close to the initialisation of SPRN_THREAD. Previously, virt addr of PGDIR was retrieved from thread struct. Now that it is the phys addr which is stored in SPRN_SPRG_PGDIR, hash_page() shall not convert it to phys anymore. This patch removes the conversion. Fixes: 93c4a162b014 ("powerpc/6xx: Store PGDIR physical address in a SPRG") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-18powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038Michael Ellerman
Jakub Drnec reported: Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years. Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce. Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other than booting with rtc at >2038 and then running ntp. When this happens, anything with timers (e.g. openjdk) breaks rather badly. And included a test case (slightly edited for brevity): #define _POSIX_C_SOURCE 199309L #include <stdio.h> #include <time.h> #include <stdlib.h> #include <unistd.h> long get_time(void) { struct timespec tp; clock_gettime(CLOCK_MONOTONIC, &tp); return tp.tv_sec + tp.tv_nsec / 1000000000; } int main(void) { long last = get_time(); while(1) { long now = get_time(); if (now < last) { printf("clock went backwards by %ld seconds!\n", last - now); } last = now; sleep(1); } return 0; } Which when run concurrently with: # date -s 2040-1-1 # date -s 2037-1-1 Will detect the clock going backward. The root cause is that wtom_clock_sec in struct vdso_data is only a 32-bit signed value, even though we set its value to be equal to tk->wall_to_monotonic.tv_sec which is 64-bits. Because the monotonic clock starts at zero when the system boots the wall_to_montonic.tv_sec offset is negative for current and future dates. Currently on a freshly booted system the offset will be in the vicinity of negative 1.5 billion seconds. However if the wall clock is set past the Y2038 boundary, the offset from wall to monotonic becomes less than negative 2^31, and no longer fits in 32-bits. When that value is assigned to wtom_clock_sec it is truncated and becomes positive, causing the VDSO assembly code to calculate CLOCK_MONOTONIC incorrectly. That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which it is not meant to do. Worse, if the time is then set back before the Y2038 boundary CLOCK_MONOTONIC will jump backward. We can fix it simply by storing the full 64-bit offset in the vdso_data, and using that in the VDSO assembly code. We also shuffle some of the fields in vdso_data to avoid creating a hole. The original commit that added the CLOCK_MONOTONIC support to the VDSO did actually use a 64-bit value for wtom_clock_sec, see commit a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") (Nov 2005). However just 3 days later it was converted to 32-bits in commit 0c37ec2aa88b ("[PATCH] powerpc: vdso fixes (take #2)"), and the bug has existed since then AFAICS. Fixes: 0c37ec2aa88b ("[PATCH] powerpc: vdso fixes (take #2)") Cc: stable@vger.kernel.org # v2.6.15+ Link: http://lkml.kernel.org/r/HaC.ZfES.62bwlnvAvMP.1STMMj@seznam.cz Reported-by: Jakub Drnec <jaydee@email.cz> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-17Merge tag 'kbuild-v5.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - add more Build-Depends to Debian source package - prefix header search paths with $(srctree)/ - make modpost show verbose section mismatch warnings - avoid hard-coded CROSS_COMPILE for h8300 - fix regression for Debian make-kpkg command - add semantic patch to detect missing put_device() - fix some warnings of 'make deb-pkg' - optimize NOSTDINC_FLAGS evaluation - add warnings about redundant generic-y - clean up Makefiles and scripts * tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: remove stale lxdialog/.gitignore kbuild: force all architectures except um to include mandatory-y kbuild: warn redundant generic-y Revert "modsign: Abort modules_install when signing fails" kbuild: Make NOSTDINC_FLAGS a simply expanded variable kbuild: deb-pkg: avoid implicit effects coccinelle: semantic code search for missing put_device() kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb kbuild: deb-pkg: add CONFIG_ prefix to kernel config options kbuild: add workaround for Debian make-kpkg kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG} unicore32: simplify linker script generation for decompressor h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- kbuild: move archive command to scripts/Makefile.lib modpost: always show verbose warning for section mismatch ia64: prefix header search path with $(srctree)/ libfdt: prefix header search paths with $(srctree)/ deb-pkg: generate correct build dependencies