summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-04-30powerpc: Use encode avpn where we need only avpn valuesAneesh Kumar K.V
In all these cases we are doing something similar to HPTE_V_COMPARE(hpte_v, want_v) which ignores the HPTE_V_LARGE bit With MPSS support we would need actual page size to set HPTE_V_LARGE bit and that won't be available in most of these cases. Since we are ignoring HPTE_V_LARGE bit, use the avpn value instead. There should not be any change in behaviour after this patch. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Reduce PTE table memory wastageAneesh Kumar K.V
We allocate one page for the last level of linux page table. With THP and large page size of 16MB, that would mean we are wasting large part of that page. To map 16MB area, we only need a PTE space of 2K with 64K page size. This patch reduce the space wastage by sharing the page allocated for the last level of linux page table with multiple pmd entries. We call these smaller chunks PTE page fragments and allocated page, PTE page. In order to support systems which doesn't have 64K HPTE support, we also add another 2K to PTE page fragment. The second half of the PTE fragments is used for storing slot and secondary bit information of an HPTE. With this we now have a 4K PTE fragment. We use a simple approach to share the PTE page. On allocation, we bump the PTE page refcount to 16 and share the PTE page with the next 16 pte alloc request. This should help in the node locality of the PTE page fragment, assuming that the immediate pte alloc request will mostly come from the same NUMA node. We don't try to reuse the freed PTE page fragment. Hence we could be waisting some space. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Move the pte free routines from common headerAneesh Kumar K.V
Acked-by: Paul Mackerras <paulus@samba.org> This patch moves the common code to 32/64 bit headers and also duplicate 4K_PAGES and 64K_PAGES section. We will later change the 64 bit 64K_PAGES version to support smaller PTE fragments. The patch doesn't introduce any functional changes. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Reduce the PTE_INDEX_SIZEAneesh Kumar K.V
This make one PMD cover 16MB range. That helps in easier implementation of THP on power. THP core code make use of one pmd entry to track the hugepage and the range mapped by a single pmd entry should be equal to the hugepage size supported by the hardware. This also switch PGD to cover 16GB. That is needed so that we can simplify the hugetlb page walking code so that we have same pte format for explicit hugepage and THP hugepage. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Switch 16GB and 16MB explicit hugepages to a different page table ↵Aneesh Kumar K.V
format We will be switching PMD_SHIFT to 24 bits to facilitate THP impmenetation. With PMD_SHIFT set to 24, we now have 16MB huge pages allocated at PGD level. That means with 32 bit process we cannot allocate normal pages at all, because we cover the entire address space with one pgd entry. Fix this by switching to a new page table format for hugepages. With the new page table format for 16GB and 16MB hugepages we won't allocate hugepage directory. Instead we encode the PTE information directly at the directory level. This forces 16MB hugepage at PMD level. This will also make the page take walk much simpler later when we add the THP support. With the new table format we have 4 cases for pgds and pmds: (1) invalid (all zeroes) (2) pointer to next table, as normal; bottom 6 bits == 0 (3) leaf pte for huge page, bottom two bits != 00 (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: New hugepage directory formatAneesh Kumar K.V
Change the hugepage directory format so that we can have leaf ptes directly at page directory avoiding the allocation of hugepage directory. With the new table format we have 3 cases for pgds and pmds: (1) invalid (all zeroes) (2) pointer to next table, as normal; bottom 6 bits == 0 (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table Instead of storing shift value in hugepd pointer we use mmu_psize_def index so that we can fit all the supported hugepage size in 4 bits Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Don't truncate pgd_index wronglyAneesh Kumar K.V
With PGD_INDEX_SIZE set to 12 the existing macro doesn't work. Fix it to use PTRS_PER_PGD The idea originally was to have one more bit in the result of pgd_index() than PGD_INDEX_SIZE, so that if one had an address corresponding to the last PGD entry, and then incremented that address by PGD_SIZE, and took pgd_index() of that, you wouldn't end up with zero. The commit that introduced that dates back to 2002, and the code that was sensitive to that edge case has long since been refactored (several times), so there is no need for it these days. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Don't hard code the size of pte pageAneesh Kumar K.V
USE PTRS_PER_PTE to indicate the size of pte page. To support THP, later patches will be changing PTRS_PER_PTE value. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Save DAR and DSISR in pt_regs on MCEAneesh Kumar K.V
We were not saving DAR and DSISR on MCE. Save then and also print the values along with exception details in xmon. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Use signed formatting when printing errorAneesh Kumar K.V
PAPR defines these errors as negative values. So print them accordingly for easy debugging. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc/pseries: Correct builds break when CONFIG_SMP not definedNathan Fontenot
Correct build failure for powerpc/pseries builds with CONFIG_SMP not defined. The function cpu_sibling_mask has no meaning (or definition) when CONFIG_SMP is not defined. Additionally, the updating of NUMA affinity for a CPU in a UP system doesn't really make sense. This patch ifdef's out the code making the affinity updates for PRRN events to fix the following build break. arch/powerpc/mm/numa.c: In function ‘stage_topology_update’: arch/powerpc/mm/numa.c:1535: error: implicit declaration of function ‘cpu_sibling_mask’ arch/powerpc/mm/numa.c:1535: warning: passing argument 3 of ‘cpumask_or’ makes pointer from integer without a cast make[1]: *** [arch/powerpc/mm/numa.o] Error 1 Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc/booke: Remove obsolete macro FINISH_EXCEPTIONKevin Hao
This is stale and not used by anyone now. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc/rtas_flash: Fix bad memory accessVasant Hegde
We use kmem_cache_alloc() to allocate memory to hold the new firmware which will be flashed. kmem_cache_alloc() calls rtas_block_ctor() to set memory to NULL. But these constructor is called only for newly allocated slabs. If we run below command multiple time without rebooting, allocator may allocate memory from the area which was free'd by kmem_cache_free and it will not call constructor. In this situation we may hit kernel oops. dd if=<fw image> of=/proc/ppc64/rtas/firmware_flash bs=4096 oops message: ------------- [ 1602.399755] Oops: Kernel access of bad area, sig: 11 [#1] [ 1602.399772] SMP NR_CPUS=1024 NUMA pSeries [ 1602.399779] Modules linked in: rtas_flash nfsd lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod sg ipv6 ses enclosure ehea ehci_pci ohci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ipr libata scsi_mod [ 1602.399817] NIP: d00000000a170b9c LR: d00000000a170b64 CTR: c00000000079cd58 [ 1602.399823] REGS: c0000003b9937930 TRAP: 0300 Not tainted (3.9.0-rc4-0.27-ppc64) [ 1602.399828] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 22000428 XER: 20000000 [ 1602.399841] SOFTE: 1 [ 1602.399844] CFAR: c000000000005f24 [ 1602.399848] DAR: 8c2625a820631fef, DSISR: 40000000 [ 1602.399852] TASK = c0000003b4520760[3655] 'dd' THREAD: c0000003b9934000 CPU: 3 GPR00: 8c2625a820631fe7 c0000003b9937bb0 d00000000a179f28 d00000000a171f08 GPR04: 0000000010040000 0000000000001000 c0000003b9937df0 c0000003b5fb2080 GPR08: c0000003b58f7200 d00000000a179f28 c0000003b40058d4 c00000000079cd58 GPR12: d00000000a171450 c000000007f40900 0000000000000005 0000000010178d20 GPR16: 00000000100cb9d8 000000000000001d 0000000000000000 000000001003ffff GPR20: 0000000000000001 0000000000000000 00003fffa0b50d30 000000001001f010 GPR24: 0000000010020888 0000000010040000 d00000000a171f08 d00000000a172808 GPR28: 0000000000001000 0000000010040000 c0000003b4005880 8c2625a820631fe7 [ 1602.399924] NIP [d00000000a170b9c] .rtas_flash_write+0x7c/0x1e8 [rtas_flash] [ 1602.399930] LR [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash] [ 1602.399934] Call Trace: [ 1602.399939] [c0000003b9937bb0] [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash] (unreliable) [ 1602.399948] [c0000003b9937c60] [c000000000282830] .proc_reg_write+0x90/0xe0 [ 1602.399955] [c0000003b9937ce0] [c0000000001ff374] .vfs_write+0x114/0x238 [ 1602.399961] [c0000003b9937d80] [c0000000001ff5d8] .SyS_write+0x70/0xe8 [ 1602.399968] [c0000003b9937e30] [c000000000009cdc] syscall_exit+0x0/0xa0 [ 1602.399973] Instruction dump: [ 1602.399977] eb698010 801b0028 2f80dcd6 419e00a4 2fbc0000 419e009c ebfb0030 2fbf0000 [ 1602.399989] 409e0010 480000d8 60000000 7c1f0378 <e81f0008> 2fa00000 409efff4 e81f0000 [ 1602.400012] ---[ end trace b4136d115dc31dac ]--- [ 1602.402178] [ 1602.402185] Sending IPI to other CPUs [ 1602.403329] IPI complete This patch uses kmem_cache_zalloc() instead of kmem_cache_alloc() to allocate memory, which makes sure memory is set to 0 before using. Also removes rtas_block_ctor(), which is no longer required. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Fix build failure after merge of the cgroup treeStephen Rothwell
After merging the cgroup tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology': arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration] arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror] arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration] Caused by commit 30c05350c39d ("powerpc/pseries: Use stop machine to update cpu maps") from the powerpc tree interacting with (probably) commit ff794dea52ea ("cpuset: remove include of cgroup.h from cpuset.h") from the cgroup tree. Removing includes from header files is fraught with danger ... The former should have added an include of linux/slab.h to arch/powerpc/mm/numa.c. I have added the following merge fix patch for today (but it should be applied to the powerpc tree ASAP). From: Stephen Rothwell <sfr@canb.auug.org.au> Date: Mon, 29 Apr 2013 14:01:44 +1000 Subject: [PATCH] powerpc: numa.c: using kzalloc/kfree requires including slab.h fixes these build errors: arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology': arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration] arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror] arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Fix usage of setup_pci_atmu()Michael Neuling
Linux next is currently failing to compile mpc85xx_defconfig with: arch/powerpc/sysdev/fsl_pci.c:944:2: error: too many arguments to function 'setup_pci_atmu' This is caused by (from Kumar's next branch): commit 34642bbb3d12121333efcf4ea7dfe66685e403a1 Author: Kumar Gala <galak@kernel.crashing.org> powerpc/fsl-pci: Keep PCI SoC controller registers in pci_controller Which changed definition of setup_pci_atmu() but didn't update one of the callers. Below fixes this. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30Merge remote-tracking branch 'kumar/next' into nextBenjamin Herrenschmidt
From Kumar Gala: << Add support for T4 and B4 SoC families from Freescale, e6500 altivec support, some various board fixes and other minor cleanups. >>
2013-04-30Merge remote-tracking branch 'agust/next' into nextBenjamin Herrenschmidt
From Anatolij Gustschin: << There are some changes for mpc5121 generic platform code to support mpc5125 SoC and DTS files for ac14xx and MPC5125-TWR boards. >>
2013-04-30mm: use vm_unmapped_area() on powerpc architectureMichel Lespinasse
Update the powerpc slice_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Tested-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30mm: remove free_area_cache use in powerpc architectureMichel Lespinasse
As all other architectures have been converted to use vm_unmapped_area(), we are about to retire the free_area_cache. This change simply removes the use of that cache in slice_get_unmapped_area(), which will most certainly have a performance cost. Next one will convert that function to use the vm_unmapped_area() infrastructure and regain the performance. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-29powerpc/fsl-booke: add the reg prop for pci bridge device node for T4/B4Kevin Hao
The reg property in the pci bridge device node is used to bind this device node to the pci bridge device. Then all the pci devices under this bridge could use the interrupt maps defined in this device node to do the irq translation. So if this property is missed, the pci traditional irq mechanism will not work. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2013-04-29powerpc/fsl-pci: don't unmap the PCI SoC controller registers in setup_pci_atmuKevin Hao
In patch 34642bbb (powerpc/fsl-pci: Keep PCI SoC controller registers in pci_controller) we choose to keep the map of the PCI SoC controller registers. But we missed to delete the unmap in setup_pci_atmu function. This will cause the following call trace once we access the PCI SoC controller registers later. Unable to handle kernel paging request for data at address 0x8000080080040f14 Faulting instruction address: 0xc00000000002ea58 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=24 T4240 QDS Modules linked in: NIP: c00000000002ea58 LR: c00000000002eaf4 CTR: c00000000002eac0 REGS: c00000017e10b4a0 TRAP: 0300 Not tainted (3.9.0-rc1-00052-gfa3529f-dirty) MSR: 0000000080029000 <CE,EE,ME> CR: 28adbe22 XER: 00000000 SOFTE: 0 DEAR: 8000080080040f14, ESR: 0000000000000000 TASK = c00000017e100000[1] 'swapper/0' THREAD: c00000017e108000 CPU: 2 GPR00: 0000000000000000 c00000017e10b720 c0000000009928d8 c00000017e578e00 GPR04: 0000000000000000 000000000000000c 0000000000000001 c00000017e10bb40 GPR08: 0000000000000000 8000080080040000 0000000000000000 0000000000000016 GPR12: 0000000088adbe22 c00000000fffa800 c000000000001ba0 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000008a5b70 GPR24: c0000000008af938 c0000000009a28d8 c0000000009bb5dc c00000017e10bb40 GPR28: c00000017e32a400 c00000017e10bc00 c00000017e32a400 c00000017e578e00 NIP [c00000000002ea58] .fsl_pcie_check_link+0x88/0xf0 LR [c00000000002eaf4] .fsl_indirect_read_config+0x34/0xb0 Call Trace: [c00000017e10b720] [c00000017e10b7a0] 0xc00000017e10b7a0 (unreliable) [c00000017e10ba30] [c00000000002eaf4] .fsl_indirect_read_config+0x34/0xb0 [c00000017e10bad0] [c00000000033aa08] .pci_bus_read_config_byte+0x88/0xd0 [c00000017e10bb90] [c00000000088d708] .pci_apply_final_quirks+0x9c/0x18c [c00000017e10bc40] [c0000000000013dc] .do_one_initcall+0x5c/0x1f0 [c00000017e10bcf0] [c00000000086ebac] .kernel_init_freeable+0x180/0x26c [c00000017e10bdb0] [c000000000001bbc] .kernel_init+0x1c/0x460 [c00000017e10be30] [c000000000000880] .ret_from_kernel_thread+0x64/0xe4 Instruction dump: 38210310 2b800015 4fdde842 7c600026 5463fffe e8010010 7c0803a6 4e800020 60000000 60000000 e92301d0 7c0004ac <80690f14> 0c030000 4c00012c 38210310 ---[ end trace 7a8fe0cbccb7d992 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Signed-off-by: Kevin Hao <haokexin@gmail.com> Acked-by: Roy Zang <tie-fei.zang@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2013-04-29powerpc/dts: Fix the dts for p1025rdb 36bitZhicheng Fan
Fix the following errors: Error: p1025rdb.dtsi:326.2-3 label or path, 'qe', not found Error: p1021si-post.dtsi:242.2-3 label or path, 'qe', not found FATAL ERROR: Syntax error parsing input tree Signed-off-by: Zhicheng Fan <B32736@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2013-04-26powerpc/perf: Enable branch stack sampling frameworkAnshuman Khandual
Provides basic enablement for perf branch stack sampling framework on POWER8 processor based platforms. Adds new BHRB related elements into cpu_hw_event structure to represent current BHRB config, BHRB filter configuration, manage context and to hold output BHRB buffer during PMU interrupt before passing to the user space. This also enables processing of BHRB data and converts them into generic perf branch stack data format. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Define BHRB generic functions, data and flags for POWER8Anshuman Khandual
This patch populates BHRB specific data for power_pmu structure. It also implements POWER8 specific BHRB filter and configuration functions. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add new BHRB related generic functions, data and flagsAnshuman Khandual
This patch adds couple of generic functions to power_pmu structure which would configure the BHRB and it's filters. It also adds representation of the number of BHRB entries present on the PMU. A new PMU flag PPMU_BHRB would indicate presence of BHRB feature. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add basic assembly code to read BHRB entries on POWER8Anshuman Khandual
This patch adds the basic assembly code to read BHRB buffer. BHRB entries are valid only after a PMU interrupt has happened (when MMCR0[PMAO]=1) and BHRB has been freezed. BHRB read should not be attempted when it is still enabled (MMCR0[PMAE]=1) and getting updated, as this can produce non-deterministic results. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add new BHRB related instructions for POWER8Anshuman Khandual
This patch adds new POWER8 instruction encoding for reading and clearing Branch History Rolling Buffer entries. The new instruction 'mfbhrbe' (move from branch history rolling buffer entry) is used to read BHRB buffer entries and instruction 'clrbhrb' (clear branch history rolling buffer) is used to clear the entire buffer. The instruction 'clrbhrb' has straight forward encoding. But the instruction encoding format for reading the BHRB entries is like 'mfbhrbe RT, BHRBE' where it takes two arguments, i.e the index for the BHRB buffer entry to read and a general purpose register to put the value which was read from the buffer entry. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Power8 PMU supportMichael Ellerman
This patch adds support for the power8 PMU to perf. Work is ongoing to add generic cache events. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add support for SIERMichael Ellerman
On power8 we have a new SIER (Sampled Instruction Event Register), which captures information about instructions when we have random sampling enabled. Add support for loading the SIER into pt_regs, overloading regs->dar. Also set the new NO_SIPR flag in regs->result if we don't have SIPR. Update regs_sihv/sipr() to look for SIPR/SIHV in SIER. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add regs_no_sipr()Michael Ellerman
On power8 the presence or absence of SIPR depends on settings at runtime, so convert to using a dynamic flag for NO_SIPR. Existing backends that set NO_SIPR unconditionally set the dynamic flag obviously. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add an accessor for regs->resultMichael Ellerman
Add an accessor for regs->result so we can use it to store more flags in future. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Convert mmcra_sipr/sihv() to regs_sipr/sihv()Michael Ellerman
On power8 the SIPR and SIHV are not in MMCRA, so convert the routines to take regs and change the names accordingly. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/perf: Add an explict flag indicating presence of SLOT fieldMichael Ellerman
In perf_ip_adjust() we potentially use the MMCRA[SLOT] field to adjust the reported IP of a sampled instruction. Currently the logic is written so that if the backend does NOT have the PPMU_ALT_SIPR flag set then we assume MMCRA[SLOT] exists. However on power8 we do not want to set ALT_SIPR (it's in a third location), and we also do not have MMCRA[SLOT]. So add a new flag which only indicates whether MMCRA[SLOT] exists. Naively we'd set it on everything except power6/7, because they set ALT_SIPR, and we've reversed the polarity of the flag. But it's more complicated than that. mpc7450 is 32-bit, and uses its own version of perf_ip_adjust() which doesn't use MMCRA[SLOT], so it doesn't need the new flag set and the behaviour is unchanged. PPC970 (and I assume power4) don't have MMCRA[SLOT], so shouldn't have the new flag set. This is a behaviour change on those cpus, though we were probably getting lucky and the bits in question were 0. power5 and power5+ set the new flag, behaviour unchanged. power6 & power7 do not set the new flag, behaviour unchanged. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc: Initialise PMU related regs on Power8Michael Ellerman
For both HV and guest kernels, intialise PMU regs to something sane. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Fix invalid IOMMU tableGavin Shan
Ben found the root cause. Commit 37f02195bee9c25ce44e25204f40b7961a6d7c9d ("powerpc/pci: fix PCI-e devices rescan issue on powerpc platform") overwrites the IOMMU table of PCI device while enabling PCI device. The patch intends to fix the IOMMU table after that point. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Build DMA space for PE on PHB3Gavin Shan
The patch intends to build 32-bits DMA space for individual PEs on PHB3. The TVE# is recognized by the combo of PE# and fixed bits from DMA address, which is zero for 32-bits DMA space. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: TCE invalidation for PHB3Gavin Shan
The TCE should be invalidated while it's created or free'd. The approach to do that for IODA1 and IODA2 compliant PHBs are different. So the patch differentiate them with different functions called to do that for IODA1 and IODA2 compliant PHBs. It's notable that the PCI address is used to invalidate the corresponding TCE on IODA2 compliant PHB3. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Patch MSI EOI handler on P8Gavin Shan
The EOI handler of MSI/MSI-X interrupts for P8 (PHB3) need additional steps to handle the P/Q bits in IVE before EOIing the corresponding interrupt. The patch changes the EOI handler to cover that. we have individual IRQ chip in each PHB instance. During the MSI IRQ setup time, the IRQ chip is copied over from the original one for that IRQ, and the EOI handler is patched with the one that will handle the P/Q bits (As Ben suggested). Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Add option CONFIG_POWERNV_MSIGavin Shan
As Michael Ellerman suggested, to add CONFIG_POWERNV_MSI for PowerNV platform. That's similar to CONFIG_PSERIES_MSI for pSeries platform. For now, we don't make it dependent on CONFIG_EEH since it's not ready to enable that yet. Apart from that, we also enable CONFIG_PPC_MSI_BITMAP on selecting CONFIG_POWERNV_MSI. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/powernv: Supports PHB3Gavin Shan
The patch intends to initialize PHB3 during system boot stage. The flag "PNV_PHB_MODEL_PHB3" is introduced to differentiate IODA2 compatible PHB3 from other types of PHBs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc: Fix "attempt to move .org backwards" errorPaul Mackerras
Building a 64-bit powerpc kernel with PR KVM enabled currently gives this error: AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1 This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into 33 instructions, but we only have space for 32 at the decrementer interrupt vector (from 0x900 to 0x980). In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we currently have two instances of the HMT_MEDIUM macro, which has the effect of setting the SMT thread priority to medium. One is the first instruction, and is overwritten by a no-op on processors where we save the PPR (processor priority register), that is, POWER7 or later. The other is after we have saved the PPR. In order to reduce the code at 0x900 by one instruction, we omit the first HMT_MEDIUM. On processors without SMT this will have no effect since HMT_MEDIUM is a no-op there. On POWER5 and RS64 machines this will mean that the first few instructions take a little longer in the case where a decrementer interrupt occurs when the hardware thread is running at low SMT priority. On POWER6 and later machines, the hardware automatically boosts the thread priority when a decrementer interrupt is taken if the thread priority was below medium, so this change won't make any difference. The alternative would be to branch out of line after saving the CFAR. However, that would incur an extra overhead on all processors, whereas the approach adopted here only adds overhead on older threaded processors. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Add /proc interface to control topology updatesNathan Fontenot
There are instances in which we do not want topology updates to occur. In order to allow this a /proc interface (/proc/powerpc/topology_updates) is introduced so that topology updates can be enabled and disabled. This patch also adds a prrn_is_enabled() call so that PRRN events are handled in the kernel only if topology updating is enabled. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Enable PRRN handlingNathan Fontenot
The Linux kernel and platform firmware negotiate their mutual support of the PRRN option via the ibm,client-architecture-support interface. This patch simply sets the appropriate fields in the client architecture vector to indicate Linux support for PRRN and will allow the firmware to report PRRN events via the RTAS event-scan mechanism. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: RE-enable Virtual Processor Home Node updatingJesse Larrew
The new PRRN firmware feature provides a more convenient and event-driven interface than VPHN for notifying Linux of changes to the NUMA affinity of platform resources. However, for practical reasons, it may not be feasible for some customers to update to the latest firmware. For these customers, the VPHN feature supported on previous firmware versions may still be the best option. The VPHN feature was previously disabled due to races with the load balancing code when accessing the NUMA cpu maps, but the new stop_machine() approach protects the NUMA cpu maps from these concurrent accesses. It should be safe to re-enable this feature now. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Update NUMA VDSO information when updating CPU mapsJesse Larrew
The following patch adds vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3: Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") adds vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3. This patch ensures that this information is also updated when the NUMA affinity of a cpu changes. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Use stop machine to update cpu mapsNathan Fontenot
The new PRRN firmware feature allows CPU and memory resources to be transparently reassigned across NUMA boundaries. When this happens, the kernel must update the node maps to reflect the new affinity information. Although the NUMA maps can be protected by locking primitives during the update itself, this is insufficient to prevent concurrent accesses to these structures. Since cpumask_of_node() hands out a pointer to these structures, they can still be modified outside of the lock. Furthermore, tracking down each usage of these pointers and adding locks would be quite invasive and difficult to maintain. The approach used is to make a list of affected cpus and call stop_machine to have the update routine run on each of the affected cpus allowing them to update themselves. Each cpu finds itself in the list of cpus and makes the appropriate updates. We need to have each cpu do this for themselves to handle calls to vdso_getcpu_init() added in a subsequent patch. Situations like these are best handled using stop_machine(). Since the NUMA affinity updates are exceptionally rare events, this approach has the benefit of not adding any overhead while accessing the NUMA maps during normal operation. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Update CPU maps when device tree is updatedJesse Larrew
Platform events such as partition migration or the new PRRN firmware feature can cause the NUMA characteristics of a CPU to change, and these changes will be reflected in the device tree nodes for the affected CPUs. This patch registers a handler for Open Firmware device tree updates and reconfigures the CPU and node maps whenever the associativity changes. Currently, this is accomplished by marking the affected CPUs in the cpu_associativity_changes_mask and allowing arch_update_cpu_topology() to retrieve the new associativity information using hcall_vphn(). Protecting the NUMA cpu maps from concurrent access during an update operation will be addressed in a subsequent patch in this series. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Update numa.c to use updated firmware_has_feature()Nathan Fontenot
Update the numa code to use the updated firmware_has_feature() when checking for type 1 affinity. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Update firmware_has_feature() to check architecture vector ↵Nathan Fontenot
5 bits The firmware_has_feature() function makes it easy to check for supported features of the hypervisor. This patch extends the capability of firmware_has_feature() to include checking for specified bits in vector 5 of the architecture vector as reported in the device tree. As part of this the #defines used for the architecture vector are re-defined such that each option has the index into vector 5 and the feature bit encoded into it. This makes checking for architecture bits when initiating data for firmware_has_feature much easier. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26powerpc/pseries: Use ARRAY_SIZE to iterate over firmware_features_table arrayNathan Fontenot
When iterating over the entries in firmware_features_table we only need to go over the actual number of entries in the array instead of declaring it to be bigger and checking to make sure there is a valid entry in every slot. This patch removes the FIRMWARE_MAX_FEATURES #define and replaces the array looping with the use of ARRAY_SIZE(). Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>