summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2018-12-19powerpc/smp: Use code patching to restore reset vectorChristophe Leroy
Instead of hardcoding reset vector restore, use patch_instruction() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/44x: use patch_sites for TLB handlers patchingChristophe Leroy
Use patch sites and associated helpers to manage TLB handlers patching instead of hardcoding. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/signal: Use code patching instead of hardcodingChristophe Leroy
Instead of hardcoding code modifications, use code patching functions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/8xx: use modify_instruction_site()Christophe Leroy
Instead of hardcoding the TLB handlers patching, use the newly created modify_instruction_site() helper. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/book3s/32: Use patch_site to patch hash functionsChristophe Leroy
Use patch_sites and the new modify_instruction_site() function instead of hardcoding hash functions patching. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/book3s/32: Use MMU_FTR_HPTE_TABLE in head_32.SChristophe Leroy
Instead of manually patching a blr at hash_page() entry in MMU_init_hw(), this patch adds a features section in head_32.S Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/32: use patch_site_addr() in machine_init()Christophe Leroy
Use patch_site_addr() instead of hardcoding the address calculation in machine_init() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc: add modify_instruction() and modify_instruction_site()Christophe Leroy
Add two helpers to avoid hardcoding of instructions modifications. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc: simplify patch_instruction_site() and patch_branch_site()Christophe Leroy
Using patch_site_addr() helper, patch_instruction_site() and patch_branch_site() can be simplified and inlined. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/mm: remove unused variableChristophe Leroy
In file included from ./include/linux/hugetlb.h:445:0, from arch/powerpc/kernel/setup-common.c:37: ./arch/powerpc/include/asm/hugetlb.h: In function ‘huge_ptep_clear_flush’: ./arch/powerpc/include/asm/hugetlb.h:154:8: error: variable ‘pte’ set but not used [-Werror=unused-but-set-variable] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc: implement CONFIG_DEBUG_VIRTUALChristophe Leroy
This patch implements CONFIG_DEBUG_VIRTUAL to warn about incorrect use of virt_to_phys() and page_to_phys() Below is the result of test_debug_virtual: [ 1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 test_debug_virtual_init+0x3c/0xd4 [ 1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc5-00560-g6bfb52e23a00-dirty #532 [ 1.457259] NIP: c066c550 LR: c0650ccc CTR: c066c514 [ 1.462257] REGS: c900bdb0 TRAP: 0700 Not tainted (4.20.0-rc5-00560-g6bfb52e23a00-dirty) [ 1.471184] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 48000422 XER: 20000000 [ 1.477811] [ 1.477811] GPR00: c0650ccc c900be60 c60d0000 00000000 006000c0 c9000000 00009032 c7fa0020 [ 1.477811] GPR08: 00002400 00000001 09000000 00000000 c07b5d04 00000000 c00037d8 00000000 [ 1.477811] GPR16: 00000000 00000000 00000000 00000000 c0760000 c0740000 00000092 c0685bb0 [ 1.477811] GPR24: c065042c c068a734 c0685b8c 00000006 00000000 c0760000 c075c3c0 ffffffff [ 1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4 [ 1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc [ 1.523163] Call Trace: [ 1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable) [ 1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc [ 1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc [ 1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110 [ 1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c [ 1.553971] Instruction dump: [ 1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 554893be [ 1.564566] 7d294010 7d294910 90010034 39290001 <0f090000> 7c3e0b78 955e0008 3fe0c062 [ 1.572425] ---[ end trace 6f6984225b280ad6 ]--- [ 1.577467] PA: 0x09000000 for VA: 0xc9000000 [ 1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-17powerpc/32: Move the old 6xx -mcpu logic before the TARGET_CPU logicMathieu Malaterre
The code: ifdef CONFIG_6xx KBUILD_CFLAGS += -mcpu=powerpc endif was added in 2006 in commit f48b8296b315 ("[PATCH] powerpc32: Set cpu explicitly in kernel compiles"). This change was acceptable since the TARGET_CPU logic was 64-bit only. Since commit 0e00a8c9fd92 ("powerpc: Allow CPU selection also on PPC32") this logic is no longer acceptable after the TARGET_CPU specific. It currently appends -mcpu=powerpc at the end of the command line, after any TARGET_CPU specific: gcc -Wp,-MD,init/.do_mounts.o.d ... -mcpu=powerpc -mbig-endian -m32 ... -mcpu=e300c2 ... -mcpu=powerpc ... ../init/do_mounts.c Fixes: 0e00a8c9fd92 ("powerpc: Allow CPU selection also on PPC32") Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-17powerpc/ipic: Remove unused ipic_set_priority()Michael Ellerman
ipic_set_priority() has been unused since 2006 when the last usage was removed in commit b9f0f1bb2bca ("[POWERPC] Adapt ipic driver to new host_ops interface, add set_irq_type to set IRQ sense"). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-17Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch again, this has a couple of build fixes and also a change to do_syscall_trace_enter() that will conflict with a patch we want to apply in next.
2018-12-17powerpc/iommu: Use device_iommu_mapped()Joerg Roedel
Use the new function to replace the open-coded iommu check. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell Currey <ruscur@russell.cc> Cc: Sam Bobroff <sbobroff@linux.ibm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-12-17KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guestSuraj Jitindar Singh
Previously when a device was being emulated by an L1 guest for an L2 guest, that device couldn't then be passed through to an L3 guest. This was because the L1 guest had no method for accessing L3 memory. The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for passthrough can now be allowed. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S: Introduce new hcall H_COPY_TOFROM_GUEST to access ↵Suraj Jitindar Singh
quadrants 1 & 2 A guest cannot access quadrants 1 or 2 as this would result in an exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a guest when it wants to perform an access to quadrants 1 or 2, for example when it wants to access memory for one of its nested guests. Also provide an implementation for the kvm-hv module. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guestSuraj Jitindar Singh
Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrantsSuraj Jitindar Singh
The functions kvmppc_st and kvmppc_ld are used to access guest memory from the host using a guest effective address. They do so by translating through the process table to obtain a guest real address and then using kvm_read_guest or kvm_write_guest to make the access with the guest real address. This method of access however only works for L1 guests and will give the incorrect results for a nested guest. We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to perform the access for a nested guesti (and a L1 guest). So attempt this method first and fall back to the old method if this fails and we aren't running a nested guest. At this stage there is no fall back method to perform the access for a nested guest and this is left as a future improvement. For now we will return to the nested guest and rely on the fact that a translation should be faulted in before retrying the access. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops structSuraj Jitindar Singh
The kvmppc_ops struct is used to store function pointers to kvm implementation specific functions. Introduce two new functions load_from_eaddr and store_to_eaddr to be used to load from and store to a guest effective address respectively. Also implement these for the kvm-hv module. If we are using the radix mmu then we can call the functions to access quadrant 1 and 2. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2Suraj Jitindar Singh
The POWER9 radix mmu has the concept of quadrants. The quadrant number is the two high bits of the effective address and determines the fully qualified address to be used for the translation. The fully qualified address consists of the effective lpid, the effective pid and the effective address. This gives then 4 possible quadrants 0, 1, 2, and 3. When accessing these quadrants the fully qualified address is obtained as follows: Quadrant | Hypervisor | Guest -------------------------------------------------------------------------- | EA[0:1] = 0b00 | EA[0:1] = 0b00 0 | effLPID = 0 | effLPID = LPIDR | effPID = PIDR | effPID = PIDR -------------------------------------------------------------------------- | EA[0:1] = 0b01 | 1 | effLPID = LPIDR | Invalid Access | effPID = PIDR | -------------------------------------------------------------------------- | EA[0:1] = 0b10 | 2 | effLPID = LPIDR | Invalid Access | effPID = 0 | -------------------------------------------------------------------------- | EA[0:1] = 0b11 | EA[0:1] = 0b11 3 | effLPID = 0 | effLPID = LPIDR | effPID = 0 | effPID = 0 -------------------------------------------------------------------------- In the Guest; Quadrant 3 is normally used to address the operating system since this uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to be switched. Quadrant 0 is normally used to address user space since the effLPID and effPID are taken from the corresponding registers. In the Host; Quadrant 0 and 3 are used as above, however the effLPID is always 0 to address the host. Quadrants 1 and 2 can be used by the host to address guest memory using a guest effective address. Since the effLPID comes from the LPID register, the host loads the LPID of the guest it would like to access (and the PID of the process) and can perform accesses to a guest effective address. This means quadrant 1 can be used to address the guest user space and quadrant 2 can be used to address the guest operating system from the hypervisor, using a guest effective address. Access to the quadrants can cause a Hypervisor Data Storage Interrupt (HDSI) due to being unable to perform partition scoped translation. Previously this could only be generated from a guest and so the code path expects us to take the KVM trampoline in the interrupt handler. This is no longer the case so we modify the handler to call bad_page_fault() to check if we were expecting this fault so we can handle it gracefully and just return with an error code. In the hash mmu case we still raise an unknown exception since quadrants aren't defined for the hash mmu. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()Suraj Jitindar Singh
There exists a function kvm_is_radix() which is used to determine if a kvm instance is using the radix mmu. However this only applies to the first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can be used to determine if the current execution context of the vcpu is radix, accounting for if the vcpu is running a nested guest. Currently all nested guests must be radix but this may change in the future. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machinesSuraj Jitindar Singh
The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the availability of in kernel tce acceleration for vfio. However it is currently the case that this is only available on a powernv machine, not for a pseries machine. Thus make this capability dependent on having the cpu feature CPU_FTR_HVMODE. [paulus@ozlabs.org - fixed compilation for Book E.] Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S HV: Flush guest mappings when turning dirty tracking on/offPaul Mackerras
This adds code to flush the partition-scoped page tables for a radix guest when dirty tracking is turned on or off for a memslot. Only the guest real addresses covered by the memslot are flushed. The reason for this is to get rid of any 2M PTEs in the partition-scoped page tables that correspond to host transparent huge pages, so that page dirtiness is tracked at a system page (4k or 64k) granularity rather than a 2M granularity. The page tables are also flushed when turning dirty tracking off so that the memslot's address space can be repopulated with THPs if possible. To do this, we add a new function kvmppc_radix_flush_memslot(). Since this does what's needed for kvmppc_core_flush_memslot_hv() on a radix guest, we now make kvmppc_core_flush_memslot_hv() call the new kvmppc_radix_flush_memslot() rather than calling kvm_unmap_radix() for each page in the memslot. This has the effect of fixing a bug in that kvmppc_core_flush_memslot_hv() was previously calling kvm_unmap_radix() without holding the kvm->mmu_lock spinlock, which is required to be held. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S HV: Cleanups - constify memslots, fix commentsPaul Mackerras
This adds 'const' to the declarations for the struct kvm_memory_slot pointer parameters of some functions, which will make it possible to call those functions from kvmppc_core_commit_memory_region_hv() in the next patch. This also fixes some comments about locking. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S HV: Map single pages when doing dirty page loggingPaul Mackerras
For radix guests, this makes KVM map guest memory as individual pages when dirty page logging is enabled for the memslot corresponding to the guest real address. Having a separate partition-scoped PTE for each system page mapped to the guest means that we have a separate dirty bit for each page, thus making the reported dirty bitmap more accurate. Without this, if part of guest memory is backed by transparent huge pages, the dirty status is reported at a 2MB granularity rather than a 64kB (or 4kB) granularity for that part, causing userspace to have to transmit more data when migrating the guest. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Pass change type down to memslot commit functionBharata B Rao
Currently, kvm_arch_commit_memory_region() gets called with a parameter indicating what type of change is being made to the memslot, but it doesn't pass it down to the platform-specific memslot commit functions. This adds the `change' parameter to the lower-level functions so that they can use it in future. [paulus@ozlabs.org - fix book E also.] Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-14Merge tag 'powerpc-4.20-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One notable fix for our change to split pt_regs between user/kernel, we forgot to update BPF to use the user-visible type which was an ABI break for BPF programs. A slightly ugly but minimal fix to do_syscall_trace_enter() so that we use tracehook_report_syscall_entry() properly. We'll rework the code in next to avoid the empty if body. Seven commits fixing bugs in the new papr_scm (Storage Class Memory) driver. The driver was finally able to be tested on the other hypervisor which exposed several bugs. The fixes are all fairly minimal at least. Fix a crash in our MSI code if an MSI-capable device is plugged into a non-MSI capable PHB, only seen on older hardware (MPC8378). Fix our legacy serial code to look for "stdout-path" since the device trees were updated to use that instead of "linux,stdout-path". A change to the COFF zImage code to fix booting old powermacs. A couple of minor build fixes. Thanks to: Benjamin Herrenschmidt, Daniel Axtens, Dmitry V. Levin, Elvira Khabirova, Oliver O'Halloran, Paul Mackerras, Radu Rendec, Rob Herring, Sandipan Das" * tag 'powerpc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call powerpc/mm: Fallback to RAM if the altmap is unusable powerpc/papr_scm: Use ibm,unit-guid as the iset cookie powerpc/papr_scm: Fix DIMM device registration race powerpc/papr_scm: Remove endian conversions powerpc/papr_scm: Update DT properties powerpc/papr_scm: Fix resource end address powerpc/papr_scm: Use depend instead of select powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT powerpc/boot: Fix build failures with -j 1 powerpc: Look for "stdout-path" when setting up legacy consoles powerpc/msi: Fix NULL pointer access in teardown code powerpc/mm: Fix linux page tables build with some configs powerpc: Fix COFF zImage booting on old powermacs
2018-12-14kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnosticPaolo Bonzini
The first such capability to be handled in virt/kvm/ will be manual dirty page reprotection. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: PPC: Book3S PR: Set hflag to indicate that POWER9 supports 1T segmentsSuraj Jitindar Singh
When booting a kvm-pr guest on a POWER9 machine the following message is observed: "qemu-system-ppc64: KVM does not support 1TiB segments which guest expects" This is because the guest is expecting to be able to use 1T segments however we don't indicate support for it. This is because we don't set the BOOK3S_HFLAG_MULTI_PGSIZE flag in the hflags in kvmppc_set_pvr_pr() on POWER9. POWER9 does indeed have support for 1T segments, so add a case for POWER9 to the switch statement to ensure it is set. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-14KVM: PPC: Book3S HV: Change to use DEFINE_SHOW_ATTRIBUTE macroYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-14KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switchPaul Mackerras
Testing has revealed an occasional crash which appears to be caused by a race between kvmppc_switch_mmu_to_hpt and kvm_unmap_hva_range_hv. The symptom is a NULL pointer dereference in __find_linux_pte() called from kvm_unmap_radix() with kvm->arch.pgtable == NULL. Looking at kvmppc_switch_mmu_to_hpt(), it does indeed clear kvm->arch.pgtable (via kvmppc_free_radix()) before setting kvm->arch.radix to NULL, and there is nothing to prevent kvm_unmap_hva_range_hv() or the other MMU callback functions from being called concurrently with kvmppc_switch_mmu_to_hpt() or kvmppc_switch_mmu_to_radix(). This patch therefore adds calls to spin_lock/unlock on the kvm->mmu_lock around the assignments to kvm->arch.radix, and makes sure that the partition-scoped radix tree or HPT is only freed after changing kvm->arch.radix. This also takes the kvm->mmu_lock in kvmppc_rmap_reset() to make sure that the clearing of each rmap array (one per memslot) doesn't happen concurrently with use of the array in the kvm_unmap_hva_range_hv() or the other MMU callbacks. Fixes: 18c3640cefc7 ("KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-13dma-direct: merge swiotlb_dma_ops into the dma_direct codeChristoph Hellwig
While the dma-direct code is (relatively) clean and simple we actually have to use the swiotlb ops for the mapping on many architectures due to devices with addressing limits. Instead of keeping two implementations around this commit allows the dma-direct implementation to call the swiotlb bounce buffering functions and thus share the guts of the mapping implementation. This also simplified the dma-mapping setup on a few architectures where we don't have to differenciate which implementation to use. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13dma-mapping: move various slow path functions out of lineChristoph Hellwig
There is no need to have all setup and coherent allocation / freeing routines inline. Move them out of line to keep the implemeation nicely encapsulated and save some kernel text size. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-12-11 The following pull-request contains BPF updates for your *net-next* tree. It has three minor merge conflicts, resolutions: 1) tools/testing/selftests/bpf/test_verifier.c Take first chunk with alignment_prevented_execution. 2) net/core/filter.c [...] case bpf_ctx_range_ptr(struct __sk_buff, flow_keys): case bpf_ctx_range(struct __sk_buff, wire_len): return false; [...] 3) include/uapi/linux/bpf.h Take the second chunk for the two cases each. The main changes are: 1) Add support for BPF line info via BTF and extend libbpf as well as bpftool's program dump to annotate output with BPF C code to facilitate debugging and introspection, from Martin. 2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter and all JIT backends, from Jiong. 3) Improve BPF test coverage on archs with no efficient unaligned access by adding an "any alignment" flag to the BPF program load to forcefully disable verifier alignment checks, from David. 4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz. 5) Extend tc BPF programs to use a new __sk_buff field called wire_len for more accurate accounting of packets going to wire, from Petar. 6) Improve bpftool to allow dumping the trace pipe from it and add several improvements in bash completion and map/prog dump, from Quentin. 7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for kernel addresses and add a dedicated BPF JIT backend allocator, from Ard. 8) Add a BPF helper function for IR remotes to report mouse movements, from Sean. 9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info member naming consistent with existing conventions, from Yonghong and Song. 10) Misc cleanups and improvements in allowing to pass interface name via cmdline for xdp1 BPF example, from Matteo. 11) Fix a potential segfault in BPF sample loader's kprobes handling, from Daniel T. 12) Fix SPDX license in libbpf's README.rst, from Andrey. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several conflicts, seemingly all over the place. I used Stephen Rothwell's sample resolutions for many of these, if not just to double check my own work, so definitely the credit largely goes to him. The NFP conflict consisted of a bug fix (moving operations past the rhashtable operation) while chaning the initial argument in the function call in the moved code. The net/dsa/master.c conflict had to do with a bug fix intermixing of making dsa_master_set_mtu() static with the fixing of the tagging attribute location. cls_flower had a conflict because the dup reject fix from Or overlapped with the addition of port range classifiction. __set_phy_supported()'s conflict was relatively easy to resolve because Andrew fixed it in both trees, so it was just a matter of taking the net-next copy. Or at least I think it was :-) Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup() intermixed with changes on how the sdif and caller_net are calculated in these code paths in net-next. The remaining BPF conflicts were largely about the addition of the __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions to the relevant data structure where the MD pointer macros are used. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10powerpc/ptrace: replace ptrace_report_syscall() with a tracehook callElvira Khabirova
Arch code should use tracehook_*() helpers, as documented in include/linux/tracehook.h, ptrace_report_syscall() is not expected to be used outside that file. The patch does not look very nice, but at least it is correct and opens the way for PTRACE_GET_SYSCALL_INFO API. Co-authored-by: Dmitry V. Levin <ldv@altlinux.org> Fixes: 5521eb4bca2d ("powerpc/ptrace: Add support for PTRACE_SYSEMU") Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> [mpe: Take this as a minimal fix for 4.20, we'll rework it later] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A decent batch of fixes here. I'd say about half are for problems that have existed for a while, and half are for new regressions added in the 4.20 merge window. 1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach. 2) Revert bogus emac driver change, from Benjamin Herrenschmidt. 3) Handle BPF exported data structure with pointers when building 32-bit userland, from Daniel Borkmann. 4) Memory leak fix in act_police, from Davide Caratti. 5) Check RX checksum offload in RX descriptors properly in aquantia driver, from Dmitry Bogdanov. 6) SKB unlink fix in various spots, from Edward Cree. 7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from Eric Dumazet. 8) Fix FID leak in mlxsw driver, from Ido Schimmel. 9) IOTLB locking fix in vhost, from Jean-Philippe Brucker. 10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory limits otherwise namespace exit can hang. From Jiri Wiesner. 11) Address block parsing length fixes in x25 from Martin Schiller. 12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan. 13) For tun interfaces, only iface delete works with rtnl ops, enforce this by disallowing add. From Nicolas Dichtel. 14) Use after free in liquidio, from Pan Bian. 15) Fix SKB use after passing to netif_receive_skb(), from Prashant Bhole. 16) Static key accounting and other fixes in XPS from Sabrina Dubroca. 17) Partially initialized flow key passed to ip6_route_output(), from Shmulik Ladkani. 18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas Falcon. 19) Several small TCP fixes (off-by-one on window probe abort, NULL deref in tail loss probe, SNMP mis-estimations) from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits) net/sched: cls_flower: Reject duplicated rules also under skip_sw bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips. bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips. bnxt_en: Keep track of reserved IRQs. bnxt_en: Fix CNP CoS queue regression. net/mlx4_core: Correctly set PFC param if global pause is turned off. Revert "net/ibm/emac: wrong bit is used for STA control" neighbour: Avoid writing before skb->head in neigh_hh_output() ipv6: Check available headroom in ip6_xmit() even without options tcp: lack of available data can also cause TSO defer ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl mlxsw: spectrum_router: Relax GRE decap matching check mlxsw: spectrum_switchdev: Avoid leaking FID's reference count mlxsw: spectrum_nve: Remove easily triggerable warnings ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes sctp: frag_point sanity check tcp: fix NULL ref in tail loss probe tcp: Do not underestimate rwnd_limited net: use skb_list_del_init() to remove from RX sublists ...
2018-12-09x86, powerpc: Remove -funit-at-a-time compiler option entirelyMasahiro Yamada
GCC 4.6 manual says: -funit-at-a-time This option is left for compatibility reasons. -funit-at-a-time has no effect, while -fno-unit-at-a-time implies -fno-toplevel-reorder and -fno-section-anchors. Enabled by default. Remove it. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Weinberger <richard@sigma-star.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1541990120-9643-3-git-send-email-yamada.masahiro@socionext.com
2018-12-09powerpc/mm: Fallback to RAM if the altmap is unusableOliver O'Halloran
The "altmap" is used to provide a pool of memory that is reserved for the vmemmap backing of hot-plugged memory. This is useful when adding large amount of ZONE_DEVICE memory to a system with a limited amount of normal memory. On ppc64 we use huge pages to map the vmemmap which requires the backing storage to be contigious and aligned to the hugepage size. The altmap implementation allows for the altmap provider to reserve a few PFNs at the start of the range for it's own uses and when this occurs the first chunk of the altmap is not usable for hugepage mappings. On hash there is no sane way to fall back to a normal sized page mapping so we fail the allocation. This results in memory hotplug failing with ENOMEM when the new range doesn't fall into an existing vmemmap block. This patch handles this case by falling back to using system memory rather than failing if we cannot allocate from the altmap. This fallback should only ever be used for the first vmemmap block so it should not cause excess memory consumption. Fixes: 7b73d978a5d0 ("mm: pass the vmem_altmap to vmemmap_populate") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09powerpc/papr_scm: Use ibm,unit-guid as the iset cookieOliver O'Halloran
The interleave set cookie is used to determine if a label stored in the metadata space should be applied to the current region. This is important in the case of NVDIMMs since the firmware may change the interleaving configuration of a DIMM which would invalidate the existing labels. In our case the hypervisor hides those details from us so we don't really care, but libnvdimm still requires the interleave set cookie to be non-zero. For our purposes we just need the set cookie to be unique and fixed for a given PAPR SCM region and using the unit-guid (really a UUID) is fine for this purpose. Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Use kernel types (u64)] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09powerpc/papr_scm: Fix DIMM device registration raceOliver O'Halloran
When a new nvdimm device is registered with libnvdimm via nvdimm_create() it is added as a device on the nvdimm bus. The probe function for the DIMM driver is potentially quite slow so actually registering and probing the device is done in an async domain rather than immediately after device creation. This can result in a race where the region device (created 2nd) is probed first and fails to activate at boot. To fix this we use the same approach as the ACPI/NFIT driver which is to check that all the DIMM devices registered successfully. LibNVDIMM provides the nvdimm_bus_count_dimms() function which synchronises with the async domain and verifies that the dimm was successfully registered with the bus. If either of these does not occur then we bail. Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09powerpc/papr_scm: Remove endian conversionsOliver O'Halloran
The return values of a h-call are returned in the CPU registers and written to the provided buffer by the plpar_hcall() wrapper. As a result the values written to memory are always in the native endian and should not be byte swapped. The inital implementation of the H-Call interface was done in qemu and the returned values were byte swapped unnecessarily in both the hypervisor and in the driver so this was only noticed when bringing up the PowerVM implementation. Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09powerpc/papr_scm: Update DT propertiesOliver O'Halloran
The ibm,unit-sizes property was originally specified as an array of two u32s corresponding to the memory block size, and the number of blocks available in that region. A fairly last-minute change to the SCM DT specification was splitting that into two seperate u64 properties: ibm,block-sizes and ibm,number-of-blocks that convey the same information. No firmware / hypervisor that emitted the ibm,unit-size property ever appeared in the wild. Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Use kernel types (u32/u64)] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-07ppc: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_*Jiong Wang
This patch implements code-gen for BPF_ALU | BPF_ARSH | BPF_*. Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07powerpc/papr_scm: Fix resource end addressOliver O'Halloran
Fix an off-by-one error in the memory resource range. This resource is used to determine the address range of the memory to be hot-plugged as ZONE_DEVICE memory. The current end address results in the kernel attempting to map an additional memblock and the hypervisor may reject the mapping resulting in the entire hot-plug failing. Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-07powerpc/papr_scm: Use depend instead of selectOliver O'Halloran
Making PAPR_SCM select LIBNVDIMM results in circular dependencies in Kconfig when another symbol depends on it. Fix this by replacing the select with a depends. Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Reported-by: Alastair D'Silva <alastair@d-silva.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-07powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENTSandipan Das
Now that there are different variants of pt_regs for userspace and kernel, the uapi for the BPF_PROG_TYPE_PERF_EVENT program type must be changed by exporting the user_pt_regs structure instead of the pt_regs structure that is in-kernel only. Fixes: 002af9391bfb ("powerpc: Split user/kernel definitions of struct pt_regs") Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-06arch: switch the default on ARCH_HAS_SG_CHAINChristoph Hellwig
These days architectures are mostly out of the business of dealing with struct scatterlist at all, unless they have architecture specific iommu drivers. Replace the ARCH_HAS_SG_CHAIN symbol with a ARCH_NO_SG_CHAIN one only enabled for architectures with horrible legacy iommu drivers like alpha and parisc, and conditionally for arm which wants to keep it disable for legacy platforms. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-06powerpc/iommu: remove the mapping_error dma_map_ops methodChristoph Hellwig
The powerpc iommu code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>