summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-06-10s390/sclp: detect 64-bit-SCAO facilityDavid Hildenbrand
Let's correctly detect that facility, so we can correctly handle its abscence later on. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: interface to query and configure cpu subfunctionsDavid Hildenbrand
We have certain instructions that indicate available subfunctions via a query subfunction (crypto functions and ptff), or via a test bit function (plo). By exposing these "subfunction blocks" to user space, we allow user space to 1) query available subfunctions and make sure subfunctions won't get lost during migration - e.g. properly indicate them via a CPU model 2) change the subfunctions to be reported to the guest (even adding unavailable ones) This mechanism works just like the way we indicate the stfl(e) list to user space. This way, user space could even emulate some subfunctions in QEMU in the future. If this is ever applicable, we have to make sure later on, that unsupported subfunctions result in an intercept to QEMU. Please note that support to indicate them to the guest is still missing and requires hardware support. Usually, the IBC takes already care of these subfunctions for migration safety. QEMU should make sure to always set these bits properly according to the machine generation to be emulated. Available subfunctions are only valid in combination with STFLE bits retrieved via KVM_S390_VM_CPU_MACHINE and enabled via KVM_S390_VM_CPU_PROCESSOR. If the applicable bits are available, the indicated subfunctions are guaranteed to be correct. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/crypto: allow to query all known cpacf functionsDavid Hildenbrand
KVM will have to query these functions, let's add at least the query capabilities. PCKMO has RRE format, as bit 16-31 are ignored, we can still use the existing function. As PCKMO won't touch the cc, let's force it to 0 upfront. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: gaccess: convert get_vcpu_asce()David Hildenbrand
Let's use our new function for preparing translation exceptions. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: gaccess: convert guest_page_range()David Hildenbrand
Let's use our new function for preparing translation exceptions. As we will need the correct ar, let's pass that to guest_page_range(). This will also make sure that the guest address is stored in the tec for applicable excptions. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: gaccess: convert guest_translate_address()David Hildenbrand
Let's use our new function for preparing translation exceptions. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: gaccess: convert kvm_s390_check_low_addr_prot_real()David Hildenbrand
Let's use our new function for preparing translation exceptions. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: gaccess: function for preparing translation exceptionsDavid Hildenbrand
Let's provide a function trans_exc() that can be used for handling preparation of translation exceptions on a central basis. We will use that function to replace existing code in gaccess. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: gaccess: store guest address on ALC prot exceptionsDavid Hildenbrand
Let's pass the effective guest address to get_vcpu_asce(), so we can properly set the guest address in case we inject an ALC protection exception. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: forward ESOP if availableDavid Hildenbrand
ESOP guarantees that during a protection exception, bit 61 of real location 168-175 will only be set to 1 if it was because of ALCP or DATP. If the exception is due to LAP or KCP, the bit will always be set to 0. The old SOP definition allowed bit 61 to be unpredictable in case of LAP or KCP in some conditions. So ESOP replaces this unpredictability by a guarantee. Therefore, we can directly forward ESOP if it is available on our machine. We don't have to do anything when ESOP is disabled - the guest will simply expect unpredictable values. Our guest access functions are already handling ESOP properly. Please note that future functionality in KVM will require knowledge about ESOP being enabled for a guest or not. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: interface to query and configure cpu featuresDavid Hildenbrand
For now, we only have an interface to query and configure facilities indicated via STFL(E). However, we also have features indicated via SCLP, that have to be indicated to the guest by user space and usually require KVM support. This patch allows user space to query and configure available cpu features for the guest. Please note that disabling a feature doesn't necessarily mean that it is completely disabled (e.g. ESOP is mostly handled by the SIE). We will try our best to disable it. Most features (e.g. SCLP) can't directly be forwarded, as most of them need in addition to hardware support, support in KVM. As we later on want to turn these features in KVM explicitly on/off (to simulate different behavior), we have to filter all features provided by the hardware and make them configurable. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Add mnemonic print to kvm_s390_intercept_progAlexander Yarygin
We have a table of mnemonic names for intercepted program interruptions, let's print readable name of the interruption in the kvm_s390_intercept_prog trace event. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Limit sthyi executionJanosch Frank
Store hypervisor information is a valid instruction not only in supervisor state but also in problem state, i.e. the guest's userspace. Its execution is not only computational and memory intensive, but also has to get hold of the ipte lock to write to the guest's memory. This lock is not intended to be held often and long, especially not from the untrusted guest userspace. Therefore we apply rate limiting of sthyi executions per VM. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Add sthyi emulationJanosch Frank
Store Hypervisor Information is an emulated z/VM instruction that provides a guest with basic information about the layers it is running on. This includes information about the cpu configuration of both the machine and the lpar, as well as their names, machine model and machine type. This information enables an application to determine the maximum capacity of CPs and IFLs available to software. The instruction is available whenever the facility bit 74 is set, otherwise executing it results in an operation exception. It is important to check the validity flags in the sections before using data from any structure member. It is not guaranteed that all members will be valid on all machines / machine configurations. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Extend diag 204 fieldsJanosch Frank
The new store hypervisor information instruction, which we are going to introduce, needs previously unused fields in diag 204 structures. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Add operation exception interception handlerJanosch Frank
This commit introduces code that handles operation exception interceptions. With this handler we can emulate instructions by using illegal opcodes. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390: Make diag224 publicJanosch Frank
Diag204's cpu structures only contain the cpu type by means of an index in the diag224 name table. Hence, to be able to use diag204 in any meaningful way, we also need a usable diag224 interface. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390: Make cpc_name accessibleJanosch Frank
sclp_ocf.c is the only way to get the cpc name, as it registers the sole event handler for the ocf event. By creating a new global function that copies that name, we make it accessible to the world which longs to retrieve it. Additionally we now also store the cpc name as EBCDIC, so we don't have to convert it to and from ASCII if it is requested in native encoding. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390: hypfs: Move diag implementation and data definitionsJanosch Frank
Diag 204 data and function definitions currently live in the hypfs files. As KVM will be a consumer of this data, we need to make it publicly available and move it to the appropriate diag.{c,h} files. __attribute__ ((packed)) occurences were replaced with __packed for all moved structs. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-03kvm/x86: remove unnecessary header file inclusionKai Huang
arch/x86/kvm/iommu.c includes <linux/intel-iommu.h> and <linux/dmar.h>, which both are unnecessary, in fact incorrect to be here as they are intel specific. Building kvm on x86 passed after removing above inclusion. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-03KVM: x86: protect KVM_CREATE_PIT/KVM_CREATE_PIT2 with kvm->lockPaolo Bonzini
The syzkaller folks reported a NULL pointer dereference that seems to be cause by a race between KVM_CREATE_IRQCHIP and KVM_CREATE_PIT2. The former takes kvm->lock (except when registering the devices, which needs kvm->slots_lock); the latter takes kvm->slots_lock only. Change KVM_CREATE_PIT2 to follow the same model as KVM_CREATE_IRQCHIP. Testcase: #include <pthread.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> #include <stdint.h> #include <string.h> #include <stdlib.h> #include <sys/syscall.h> #include <unistd.h> long r[23]; void* thr1(void* arg) { struct kvm_pit_config pitcfg = { .flags = 4 }; switch ((long)arg) { case 0: r[2] = open("/dev/kvm", O_RDONLY|O_ASYNC); break; case 1: r[3] = ioctl(r[2], KVM_CREATE_VM, 0); break; case 2: r[4] = ioctl(r[3], KVM_CREATE_IRQCHIP, 0); break; case 3: r[22] = ioctl(r[3], KVM_CREATE_PIT2, &pitcfg); break; } return 0; } int main(int argc, char **argv) { long i; pthread_t th[4]; memset(r, -1, sizeof(r)); for (i = 0; i < 4; i++) { pthread_create(&th[i], 0, thr, (void*)i); if (argc > 1 && rand()%2) usleep(rand()%1000); } usleep(20000); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-03KVM: x86: rename process_smi to enter_smm, process_smi_request to process_smiPaolo Bonzini
Make the function names more similar between KVM_REQ_NMI and KVM_REQ_SMI. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-03KVM: x86: avoid simultaneous queueing of both IRQ and SMIPaolo Bonzini
If the processor exits to KVM while delivering an interrupt, the hypervisor then requeues the interrupt for the next vmentry. Trying to enter SMM in this same window causes to enter non-root mode in emulated SMM (i.e. with IF=0) and with a request to inject an IRQ (i.e. with a valid VM-entry interrupt info field). This is invalid guest state (SDM 26.3.1.4 "Check on Guest RIP and RFLAGS") and the processor fails vmentry. The fix is to defer the injection from KVM_REQ_SMI to KVM_REQ_EVENT, like we already do for e.g. NMIs. This patch doesn't change the name of the process_smi function so that it can be applied to stable releases. The next patch will modify the names so that process_nmi and process_smi handle respectively KVM_REQ_NMI and KVM_REQ_SMI. This is especially common with Windows, probably due to the self-IPI trick that it uses to deliver deferred procedure calls (DPCs). Reported-by: Laszlo Ersek <lersek@redhat.com> Reported-by: Michał Zegan <webczat_200@poczta.onet.pl> Fixes: 64d6067057d9658acb8675afcfba549abdb7fc16 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - two fixes for 4.6 vgic [Christoffer] (cc stable) - six fixes for 4.7 vgic [Marc] x86: - six fixes from syzkaller reports [Paolo] (two of them cc stable) - allow OS X to boot [Dmitry] - don't trust compilers [Nadav]" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR KVM: Handle MSR_IA32_PERF_CTL KVM: x86: avoid write-tearing of TDP KVM: arm/arm64: vgic-new: Removel harmful BUG_ON arm64: KVM: vgic-v3: Relax synchronization when SRE==1 arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1 arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value KVM: arm/arm64: vgic-v3: Always resample level interrupts KVM: arm/arm64: vgic-v2: Always resample level interrupts KVM: arm/arm64: vgic-v3: Clear all dirty LRs KVM: arm/arm64: vgic-v2: Clear all dirty LRs
2016-06-02KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGSPaolo Bonzini
MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS time, and the next KVM_RUN oopses: general protection fault: 0000 [#1] SMP CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 [...] Call Trace: [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm] [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480 [<ffffffff812418a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 RIP [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40 RSP <ffff88005836bd50> Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[8]; int main() { struct kvm_debugregs dr = { 0 }; r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); memcpy(&dr, "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72" "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8" "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9" "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb", 48); r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr); r[6] = ioctl(r[4], KVM_RUN, 0); } Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUIDPaolo Bonzini
This causes an ugly dmesg splat. Beautified syzkaller testcase: #include <unistd.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <fcntl.h> #include <linux/kvm.h> long r[8]; int main() { struct kvm_irq_routing ir = { 0 }; r[2] = open("/dev/kvm", O_RDWR); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_SET_GSI_ROUTING, &ir); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsiPaolo Bonzini
Found by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000120 IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm] PGD 6f80b067 PUD b6535067 PMD 0 Oops: 0000 [#1] SMP CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 [...] Call Trace: [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm] [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm] [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm] [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480 [<ffffffff812418a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a1062>] tracesys_phase2+0x84/0x89 Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85 RIP [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm] RSP <ffff8800926cbca8> CR2: 0000000000000120 Testcase: #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[26]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", 0); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); struct kvm_irqfd ifd; ifd.fd = syscall(SYS_eventfd2, 5, 0); ifd.gsi = 3; ifd.flags = 2; ifd.resamplefd = ifd.fd; r[25] = ioctl(r[3], KVM_IRQFD, &ifd); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: fail KVM_SET_VCPU_EVENTS with invalid exception numberPaolo Bonzini
This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return EINVAL. It causes a WARN from exception_type: WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]() CPU: 3 PID: 16732 Comm: a.out Tainted: G W 4.4.6-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e 0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2 ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001 Call Trace: [<ffffffff813b542e>] dump_stack+0x63/0x85 [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm] [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm] [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 ---[ end trace b1a0391266848f50 ]--- Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kvm.h> long r[31]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0); struct kvm_vcpu_events ve = { .exception.injected = 1, .exception.nr = 0xd4 }; r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve); r[30] = ioctl(r[7], KVM_RUN, 0); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUIDPaolo Bonzini
This causes an ugly dmesg splat. Beautified syzkaller testcase: #include <unistd.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <fcntl.h> #include <linux/kvm.h> long r[8]; int main() { struct kvm_cpuid2 c = { 0 }; r[2] = open("/dev/kvm", O_RDWR); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 0x8); r[7] = ioctl(r[4], KVM_SET_CPUID, &c); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDRPaolo Bonzini
Found by syzkaller: WARNING: CPU: 3 PID: 15175 at arch/x86/kvm/x86.c:7705 __x86_set_memory_region+0x1dc/0x1f0 [kvm]() CPU: 3 PID: 15175 Comm: a.out Tainted: G W 4.4.6-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000286 00000000950899a7 ffff88011ab3fbf0 ffffffff813b542e 0000000000000000 ffffffffa0966496 ffff88011ab3fc28 ffffffff810a40f2 00000000000001fd 0000000000003000 ffff88014fc50000 0000000000000000 Call Trace: [<ffffffff813b542e>] dump_stack+0x63/0x85 [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa09251cc>] __x86_set_memory_region+0x1dc/0x1f0 [kvm] [<ffffffffa092521b>] x86_set_memory_region+0x3b/0x60 [kvm] [<ffffffffa09bb61c>] vmx_set_tss_addr+0x3c/0x150 [kvm_intel] [<ffffffffa092f4d4>] kvm_arch_vm_ioctl+0x654/0xbc0 [kvm] [<ffffffffa091d31a>] kvm_vm_ioctl+0x9a/0x6f0 [kvm] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 Testcase: #include <unistd.h> #include <sys/ioctl.h> #include <fcntl.h> #include <string.h> #include <linux/kvm.h> long r[8]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", O_RDONLY|O_TRUNC); r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul); r[5] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul); r[7] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: Handle MSR_IA32_PERF_CTLDmitry Bilunov
Intel CPUs having Turbo Boost feature implement an MSR to provide a control interface via rdmsr/wrmsr instructions. One could detect the presence of this feature by issuing one of these instructions and handling the #GP exception which is generated in case the referenced MSR is not implemented by the CPU. KVM's vCPU model behaves exactly as a real CPU in this case by injecting a fault when MSR_IA32_PERF_CTL is called (which KVM does not support). However, some operating systems use this register during an early boot stage in which their kernel is not capable of handling #GP correctly, causing #DP and finally a triple fault effectively resetting the vCPU. This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the crashes. Signed-off-by: Dmitry Bilunov <kmeaw@yandex-team.ru> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: x86: avoid write-tearing of TDPNadav Amit
In theory, nothing prevents the compiler from write-tearing PTEs, or split PTE writes. These partially-modified PTEs can be fetched by other cores and cause mayhem. I have not really encountered such case in real-life, but it does seem possible. For example, the compiler may try to do something creative for kvm_set_pte_rmapp() and perform multiple writes to the PTE. Signed-off-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02Merge tag 'kvm-arm-for-v4.7-rc2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Fixes for v4.7-rc2 Fixes for the vgic, 2 of the patches address a bug introduced in v4.6 while the rest are for the new vgic.
2016-06-02KVM: arm/arm64: vgic-new: Removel harmful BUG_ONMarc Zyngier
When changing the active bit from an MMIO trap, we decide to explode if the intid is that of a private interrupt. This flawed logic comes from the fact that we were assuming that kvm_vcpu_kick() as called by kvm_arm_halt_vcpu() would not return before the called vcpu responded, but this is not the case, so we need to perform this wait even for private interrupts. Dropping the BUG_ON seems like the right thing to do. [ Commit message tweaked by Christoffer ] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-01Merge tag 'pinctrl-v4.7-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Here are three pin control fixes for v4.7. Not much, and just driver fixes: - add device tree matches to MAINTAINERS - inversion bug in the Nomadik driver - dual edge handling bug in the mediatek driver" * tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: mediatek: fix dual-edge code defect MAINTAINERS: Add file patterns for pinctrl device tree bindings pinctrl: nomadik: fix inversion of gpio direction
2016-06-01Merge tag 'dma-buf-for-4.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf Pull dma-buf updates from Sumit Semwal: - use of vma_pages instead of explicit computation - DocBook and headerdoc updates for dma-buf * tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf: dma-buf: use vma_pages() fence: add missing descriptions for fence doc: update/fixup dma-buf related DocBook reservation: add headerdoc comments dma-buf: headerdoc fixes
2016-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi. 2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized properly. From Ezequiel Garcia. 3) Missing spinlock init in mvneta driver, from Gregory CLEMENT. 4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT. 5) Fix deadlock on team->lock when propagating features, from Ivan Vecera. 6) Work around buffer offset hw bug in alx chips, from Feng Tang. 7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin Long. 8) Various statistics bug fixes in mlx4 from Eric Dumazet. 9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann. 10) All of l2tp was namespace aware, but the ipv6 support code was not doing so. From Shmulik Ladkani. 11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck. 12) Propagate MAC changes properly through VLAN devices, from Mike Manning. 13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) sfc: Track RPS flow IDs per channel instead of per function usbnet: smsc95xx: fix link detection for disabled autonegotiation virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv bnx2x: avoid leaking memory on bnx2x_init_one() failures fou: fix IPv6 Kconfig options openvswitch: update checksum in {push,pop}_mpls sctp: sctp_diag should dump sctp socket type net: fec: update dirty_tx even if no skb vlan: Propagate MAC address to VLANs atm: iphase: off by one in rx_pkt() atm: firestream: add more reserved strings vxlan: Accept user specified MTU value when create new vxlan link net: pktgen: Call destroy_hrtimer_on_stack() timer: Export destroy_hrtimer_on_stack() net: l2tp: Make l2tp_ip6 namespace aware Documentation: ip-sysctl.txt: clarify secure_redirects sfc: use flow dissector helpers for aRFS ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr net: nps_enet: Disable interrupts before napi reschedule net/lapb: tuse %*ph to dump buffers ...
2016-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: "sparc64 mmu context allocation and trap return bug fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix return from trap window fill crashes. sparc: Harden signal return frame checks. sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
2016-05-31sfc: Track RPS flow IDs per channel instead of per functionJon Cooper
Otherwise we get confused when two flows on different channels get the same flow ID. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31usbnet: smsc95xx: fix link detection for disabled autonegotiationChristoph Fritz
To detect link status up/down for connections where autonegotiation is explicitly disabled, we don't get an irq but need to poll the status register for link up/down detection. This patch adds a workqueue to poll for link status. Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recvwangyunjian
In function virtnet_open() and virtnet_probe(), func try_fill_recv() may be executed at the same time. VQ in virtqueue_add() has not been protected well and BUG_ON will be triggered when virito_net.ko being removed. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31bnx2x: avoid leaking memory on bnx2x_init_one() failuresVitaly Kuznetsov
bnx2x_init_bp() allocates memory with bnx2x_alloc_mem_bp() so if we fail later in bnx2x_init_one() we need to free this memory with bnx2x_free_mem_bp() to avoid leakages. E.g. I'm observing memory leaks reported by kmemleak when a failure (unrelated) happens in bnx2x_vfpf_acquire(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31fou: fix IPv6 Kconfig optionsArnd Bergmann
The Kconfig options I added to work around broken compilation ended up screwing up things more, as I used the wrong symbol to control compilation of the file, resulting in IPv6 fou support to never be built into the kernel. Changing CONFIG_NET_FOU_IPV6_TUNNELS to CONFIG_IPV6_FOU fixes that problem, I had renamed the symbol in one location but not the other, and as the file is never being used by other kernel code, this did not lead to a build failure that I would have caught. After that fix, another issue with the same patch becomes obvious, as we 'select INET6_TUNNEL', which is related to IPV6_TUNNEL, but not the same, and this can still cause the original build failure when IPV6_TUNNEL is not built-in but IPV6_FOU is. The fix is equally trivial, we just need to select the right symbol. I have successfully build 350 randconfig kernels with this patch and verified that the driver is now being built. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Valentin Rothberg <valentinrothberg@gmail.com> Fixes: fabb13db448e ("fou: add Kconfig options for IPv6 support") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31openvswitch: update checksum in {push,pop}_mplsSimon Horman
In the case of CHECKSUM_COMPLETE the skb checksum should be updated in {push,pop}_mpls() as they the type in the ethernet header. As suggested by Pravin Shelar. Cc: Pravin Shelar <pshelar@nicira.com> Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31sctp: sctp_diag should dump sctp socket typeXin Long
Now we cannot distinguish that one sk is a udp or sctp style when we use ss to dump sctp_info. it's necessary to dump it as well. For sctp_diag, ss support is not officially available, thus there are no official users of this yet, so we can add this field in the middle of sctp_info without breaking user API. v1->v2: - move 'sctpi_s_type' field to the end of struct sctp_info, so that it won't cause incompatibility with applications already built. - add __reserved3 in sctp_info to make sure sctp_info is 8-byte alignment. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31net: fec: update dirty_tx even if no skbTroy Kisky
If dirty_tx isn't updated, then dma_unmap_single can be called twice. This fixes a [ 58.420980] ------------[ cut here ]------------ [ 58.425667] WARNING: CPU: 0 PID: 377 at /home/schurig/d/mkarm/linux-4.5/lib/dma-debug.c:1096 check_unmap+0x9d0/0xab8() [ 58.436405] fec 2188000.ethernet: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=66 bytes] encountered by Holger Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Tested-by: <holgerschurig@gmail.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31vlan: Propagate MAC address to VLANsMike Manning
The MAC address of the physical interface is only copied to the VLAN when it is first created, resulting in an inconsistency after MAC address changes of only newly created VLANs having an up-to-date MAC. The VLANs should continue inheriting the MAC address of the physical interface until the VLAN MAC address is explicitly set to any value. This allows IPv6 EUI64 addresses for the VLAN to reflect any changes to the MAC of the physical interface and thus for DAD to behave as expected. Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31atm: iphase: off by one in rx_pkt()Dan Carpenter
The iadev->rx_open[] array holds "iadev->num_vc" pointers (this code assumes that pointers are 32 bits). So the > here should be >= or else we could end up reading a garbage pointer from one element beyond the end of the array. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31atm: firestream: add more reserved stringsDan Carpenter
This bug was there when the driver was first added in back in year 2000. It causes a Smatch warning: drivers/atm/firestream.c:849 process_incoming() error: buffer overflow 'res_strings' 60 <= 63 There are supposed to be 64 entries in this array and the missing strings are clearly in the 30 40 range. I added them as reserved 37 to reserved 40. It's possible that strings are really supposed to be added in the middle instead of at the end, but this approach is safe, in that it fixes the bug and doesn't break anything that wasn't already broken. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31vxlan: Accept user specified MTU value when create new vxlan linkChen Haiquan
When create a new vxlan link, example: ip link add vtap mtu 1440 type vxlan vni 1 dev eth0 The argument "mtu" has no effect, because it is not set to conf->mtu. The default value is used in vxlan_dev_configure function. This problem was introduced by commit 0dfbdf4102b9 (vxlan: Factor out device configuration). Fixes: 0dfbdf4102b9 (vxlan: Factor out device configuration) Signed-off-by: Chen Haiquan <oc@yunify.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>