summaryrefslogtreecommitdiff
path: root/arch/powerpc/perf
AgeCommit message (Collapse)Author
2020-07-22powerpc/perf: BHRB control to disable BHRB logic when not usedAthira Rajeev
PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB). BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This field controls whether BHRB entries are written when BHRB recording is enabled by other bits. This patch implements support for this BHRB disable bit. By setting 0b1 to this bit will disable the BHRB and by setting 0b0 to this bit will have BHRB enabled. This addresses backward compatibility (for older OS), since this bit will be cleared and hardware will be writing to BHRB by default. This patch addresses changes to set MMCRA (BHRBRD) at boot for power10 (there by the core will run faster) and enable this feature only on runtime ie, on explicit need from user. Also save/restore MMCRA in the restore path of state-loss idle state to make sure we keep BHRB disabled if it was not enabled on request at runtime. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-12-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Add Power10 BHRB filter support for ↵Athira Rajeev
PERF_SAMPLE_BRANCH_IND_CALL/COND PowerISA v3.1 introduce filtering support for PERF_SAMPLE_BRANCH_IND_CALL/COND. The patch adds BHRB filter support for "ind_call" and "cond" in power10_bhrb_filter_map(). Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-11-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Ignore the BHRB kernel address filtering for P10Athira Rajeev
Commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") added a check in bhrb_read() to filter the kernel address from BHRB buffer. This patch modified it to avoid that check for PowerISA v3.1 based processors, since PowerISA v3.1 allows only MSR[PR]=1 address to be written to BHRB buffer. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-10-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: power10 Performance Monitoring supportAthira Rajeev
Base enablement patch to register performance monitoring hardware support for power10. Patch introduce the raw event encoding format, defines the supported list of events, config fields for the event attributes and their corresponding bit values which are exported via sysfs. Patch also enhances the support function in isa207_common.c to include power10 pmu hardware. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-9-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Add Power10 PMU feature to DT CPU featuresMadhavan Srinivasan
Add Power10 feature function to DT CPU features, along with a Power10 specific init() to initialize PMU SPRs, sets the oprofile_cpu_type and cpu_features. This will enable performance monitoring unit (PMU) for Power10 in CPU features with "performance-monitor-power10". For Power ISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB feature at boot for Power10. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Move MMCRA_BHRB_DISABLE as noted by jpn, drop CPU setup changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-8-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Add support for ISA3.1 PMU SPRsMadhavan Srinivasan
PowerISA v3.1 includes new performance monitoring unit(PMU) special purpose registers (SPRs). They are Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register 2 (SIER2) Sampled Instruction Event Register 3 (SIER3) MMCR3 is added for further sampling related configuration control. SIER2/SIER3 are added to provide additional information about the sampled instruction. Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of these new SPRs, updates the struct thread_struct to include these new SPRs, include MMCR3 in struct mmcr_regs. This is needed to support programming of MMCR3 SPR during event_enable/disable. Patch also adds the sysfs support for the MMCR3 SPR along with SPRN_ macros for these new pmu SPRs. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Rename to PPMU_ARCH_31 as noted by jpn] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-5-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Update Power PMU cache_events to u64 typeAthira Rajeev
Events of type PERF_TYPE_HW_CACHE was described for Power PMU as: int (*cache_events)[type][op][result]; where type, op, result values unpacked from the event attribute config value is used to generate the raw event code at runtime. So far the event code values which used to create these cache-related events were within 32 bit and `int` type worked. In power10, some of the event codes are of 64-bit value and hence update the Power PMU cache_events to `u64` type in `power_pmu` struct. Also propagate this change to existing all PMU driver code paths which are using ppmu->cache_events. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-4-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Update cpu_hw_event to use `struct` for storing MMCR registersAthira Rajeev
core-book3s currently uses array to store the MMCR registers as part of per-cpu `cpu_hw_events`. This patch does a clean up to use `struct` to store mmcr regs instead of array. This will make code easier to read and reduces chance of any subtle bug that may come in the future, say when new registers are added. Patch updates all relevant code that was using MMCR array ( cpuhw->mmcr[x]) to use newly introduced `struct`. This includes the PMU driver code for supported platforms (power5 to power9) and ISA macros for counter support functions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-2-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-16powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imcAnju T Sudhakar
IMC trace-mode record has MSR[HV PR] bits added in the third DW. These bits can be used to set the cpumode for the instruction pointer captured in each sample. Add support in kernel to use these bits to set the cpumode for each sample. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713144623.508695-1-maddy@linux.ibm.com
2020-07-16powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumaskKajol Jain
Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation. Primary use to expose the cpumask is for the perf tool which has the capability to parse the driver sysfs folder and understand the cpumask file. Having cpumask file will reduce the number of perf command line parameters (will avoid "-C" option in the perf tool command line). It can also notify the user which is the current cpu used to retrieve the counter data. command:# cat /sys/devices/hv_24x7/interface/cpumask 0 Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709051836.723765-3-kjain@linux.ibm.com
2020-07-16powerpc/perf/hv-24x7: Add cpu hotplug supportKajol Jain
Patch here adds cpu hotplug functions to hv_24x7 pmu. A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum is added. The online callback function updates the cpumask only if its empty. As the primary intention of adding hotplug support is to designate a CPU to make HCALL to collect the counter data. The offline function test and clear corresponding cpu in a cpumask and update cpumask to any other active cpu. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709051836.723765-2-kjain@linux.ibm.com
2020-06-17maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofaultChristoph Hellwig
Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofaultChristoph Hellwig
Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: don't include asm/pgtable.h if linux/mm.h is already includedMike Rapoport
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08mm/gup.c: convert to use get_user_{page|pages}_fast_only()Souptick Joarder
API __get_user_pages_fast() renamed to get_user_pages_fast_only() to align with pin_user_pages_fast_only(). As part of this we will get rid of write parameter. Instead caller will pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any existing functionality of the API. All the callers are changed to pass FOLL_WRITE. Also introduce get_user_page_fast_only(), and use it in a few places that hard-code nr_pages to 1. Updated the documentation of the API. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> [arch/powerpc/kvm] Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michal Suchanek <msuchanek@suse.de> Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-05Merge tag 'powerpc-5.8-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Support for userspace to send requests directly to the on-chip GZIP accelerator on Power9. - Rework of our lockless page table walking (__find_linux_pte()) to make it safe against parallel page table manipulations without relying on an IPI for serialisation. - A series of fixes & enhancements to make our machine check handling more robust. - Lots of plumbing to add support for "prefixed" (64-bit) instructions on Power10. - Support for using huge pages for the linear mapping on 8xx (32-bit). - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver. - Removal of some obsolete 40x platforms and associated cruft. - Initial support for booting on Power10. - Lots of other small features, cleanups & fixes. Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram Sang, Xiongfeng Wang. * tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits) powerpc/pseries: Make vio and ibmebus initcalls pseries specific cxl: Remove dead Kconfig options powerpc: Add POWER10 architected mode powerpc/dt_cpu_ftrs: Add MMA feature powerpc/dt_cpu_ftrs: Enable Prefixed Instructions powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected powerpc: Add support for ISA v3.1 powerpc: Add new HWCAP bits powerpc/64s: Don't set FSCR bits in INIT_THREAD powerpc/64s: Save FSCR to init_task.thread.fscr after feature init powerpc/64s: Don't let DT CPU features set FSCR_DSCR powerpc/64s: Don't init FSCR_DSCR in __init_FSCR() powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations powerpc/module_64: Consolidate ftrace code powerpc/32: Disable KASAN with pages bigger than 16k powerpc/uaccess: Don't set KUEP by default on book3s/32 powerpc/uaccess: Don't set KUAP by default on book3s/32 powerpc/8xx: Reduce time spent in allow_user_access() and friends ...
2020-05-28powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show processor detailsKajol Jain
To expose the system dependent parameter like total number of sockets and numbers of chips per socket, patch adds two sysfs files. "sockets" and "chips" are added to /sys/devices/hv_24x7/interface/ of the "hv_24x7" pmu. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200525104308.9814-4-kjain@linux.ibm.com
2020-05-28powerpc/hv-24x7: Add rtas call in hv-24x7 driver to get processor detailsKajol Jain
For hv_24x7 socket/chip level events, specific chip-id to which the data requested should be added as part of pmu events. But number of chips/socket in the system details are not exposed. Patch implements read_24x7_sys_info() to get system parameter values like number of sockets, cores per chip and chips per socket. Rtas_call with token "PROCESSOR_MODULE_INFO" is used to get these values. Subsequent patch exports these values via sysfs. Patch also make these parameters default to 1. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200525104308.9814-3-kjain@linux.ibm.com
2020-05-28powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 ↵Kajol Jain
events run Commit 2b206ee6b0df ("powerpc/perf/hv-24x7: Display change in counter values")' added to print _change_ in the counter value rather then raw value for 24x7 counters. Incase of transactions, the event count is set to 0 at the beginning of the transaction. It also sets the event's prev_count to the raw value at the time of initialization. Because of setting event count to 0, we are seeing some weird behaviour, whenever we run multiple 24x7 events at a time. For example: command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/, hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}" -C 0 -I 1000 sleep 100 1.000121704 120 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 1.000121704 5 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 2.000357733 8 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 2.000357733 10 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 3.000495215 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 3.000495215 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 4.000641884 56 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 4.000641884 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 5.000791887 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ Getting these large values in case we do -I. As we are setting event_count to 0, for interval case, overall event_count is not coming in incremental order. As we may can get new delta lesser then previous count. Because of which when we print intervals, we are getting negative value which create these large values. This patch removes part where we set event_count to 0 in function 'h_24x7_event_read'. There won't be much impact as we do set event->hw.prev_count to the raw value at the time of initialization to print change value. With this patch In power9 platform command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/, hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}" -C 0 -I 1000 sleep 100 1.000117685 93 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 1.000117685 1 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 2.000349331 98 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 2.000349331 2 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 3.000495900 131 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 3.000495900 4 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 4.000645920 204 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 4.000645920 61 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 4.284169997 22 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ Suggested-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200525104308.9814-2-kjain@linux.ibm.com
2020-05-26powerpc/8xx: Remove now unused TLB miss functionsChristophe Leroy
The code to setup linear and IMMR mapping via huge TLB entries is not called anymore. Remove it. Also remove the handling of removed code exits in the perf driver. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/75750d25849cb8e73ca519866bb892d7eb9649c0.1589866984.git.christophe.leroy@csgroup.eu
2020-05-19powerpc: Use a datatype for instructionsJordan Niethe
Currently unsigned ints are used to represent instructions on powerpc. This has worked well as instructions have always been 4 byte words. However, ISA v3.1 introduces some changes to instructions that mean this scheme will no longer work as well. This change is Prefixed Instructions. A prefixed instruction is made up of a word prefix followed by a word suffix to make an 8 byte double word instruction. No matter the endianness of the system the prefix always comes first. Prefixed instructions are only planned for powerpc64. Introduce a ppc_inst type to represent both prefixed and word instructions on powerpc64 while keeping it possible to exclusively have word instructions on powerpc32. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Fix compile error in emulate_spe()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-12-jniethe5@gmail.com
2020-05-19powerpc: Use a macro for creating instructions from u32sJordan Niethe
In preparation for instructions having a more complex data type start using a macro, ppc_inst(), for making an instruction out of a u32. A macro is used so that instructions can be used as initializer elements. Currently this does nothing, but it will allow for creating a data type that can represent prefixed instructions. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Change include guard to _ASM_POWERPC_INST_H] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-7-jniethe5@gmail.com
2020-05-05powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slowAneesh Kumar K.V
read_user_stack_slow is called with interrupts soft disabled and it copies contents from the page which we find mapped to a specific address. To convert userspace address to pfn, the kernel now uses lockless page table walk. The kernel needs to make sure the pfn value read remains stable and is not released and reused for another process while the contents are read from the page. This can only be achieved by holding a page reference. One of the first approaches I tried was to check the pte value after the kernel copies the contents from the page. But as shown below we can still get it wrong CPU0 CPU1 pte = READ_ONCE(*ptep); pte_clear(pte); put_page(page); page = alloc_page(); memcpy(page_address(page), "secret password", nr); memcpy(buf, kaddr + offset, nb); put_page(page); handle_mm_fault() page = alloc_page(); set_pte(pte, page); if (pte_val(pte) != pte_val(*ptep)) Hence switch to __get_user_pages_fast. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-8-aneesh.kumar@linux.ibm.com
2020-04-16powerpc/perf: open access for CAP_PERFMON privileged processAlexey Budankov
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/ac98cd9f-b59e-673c-c70d-180b3e7695d2@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-09Merge tag 'powerpc-5.7-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "The bulk of this is the series to make CONFIG_COMPAT user-selectable, it's been around for a long time but was blocked behind the syscall-in-C series. Plus there's also a few fixes and other minor things. Summary: - A fix for a crash in machine check handling on pseries (ie. guests) - A small series to make it possible to disable CONFIG_COMPAT, and turn it off by default for ppc64le where it's not used. - A few other miscellaneous fixes and small improvements. Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann, Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven, Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek, Nicholas Piggin, Stephen Boyd, Wen Xiong" * tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Always build the tm-poison test 64-bit powerpc: Improve ppc_save_regs() Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled" powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h> powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory powerpc/perf: split callchain.c by bitness powerpc/64: Make COMPAT user-selectable disabled on littleendian by default. powerpc/64: make buildable without CONFIG_COMPAT powerpc/perf: consolidate valid_user_sp -> invalid_user_sp powerpc/perf: consolidate read_user_stack_32 powerpc: move common register copy functions from signal_32.c to signal.c powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig powerpc/ps3: Remove an unneeded NULL check powerpc/ps3: Remove duplicate error message powerpc/powernv: Re-enable imc trace-mode in kernel powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events. powerpc/pseries: Fix MCE handling on pseries selftests/eeh: Skip ahci adapters powerpc/64s: Fix doorbell wakeup msgclr optimisation
2020-04-03powerpc/perf: split callchain.c by bitnessMichal Suchanek
Building callchain.c with !COMPAT proved quite ugly with all the defines. Splitting out the 32bit and 64bit parts looks better. No code change intended. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a20027bf1074935a7934ee2a6757c99ea047e70d.1584699455.git.msuchanek@suse.de
2020-04-03powerpc/64: make buildable without CONFIG_COMPATMichal Suchanek
There are numerous references to 32bit functions in generic and 64bit code so ifdef them out. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e5619617020ef3a1f54f0c076e7d74cb9ec9f3bf.1584699455.git.msuchanek@suse.de
2020-04-03powerpc/perf: consolidate valid_user_sp -> invalid_user_spMichal Suchanek
Merge the 32bit and 64bit version. Halve the check constants on 32bit. Use STACK_TOP since it is defined. Passing is_64 is now redundant since is_32bit_task() is used to determine which callchain variant should be used. Use STACK_TOP and is_32bit_task() directly. This removes a page from the valid 32bit area on 64bit: #define TASK_SIZE_USER32 (0x0000000100000000UL - (1 * PAGE_SIZE)) #define STACK_TOP_USER32 TASK_SIZE_USER32 Change return value to bool. It is inverted by users anyway. Change to invalid_user_sp to avoid inverting the return value twice. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be8e40fc0737fb28ad08b198552dee7cac1c5ce2.1584699455.git.msuchanek@suse.de
2020-04-03powerpc/perf: consolidate read_user_stack_32Michal Suchanek
There are two almost identical copies for 32bit and 64bit. The function is used only in 32bit code which will be split out in next patch so consolidate to one function. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0c21c919ed1296420199c78f7c3cfd29d3c7e909.1584699455.git.msuchanek@suse.de
2020-04-03powerpc/perf: Implement a global lock to avoid races between trace, core and ↵Anju T Sudhakar
thread imc events. IMC(In-memory Collection Counters) does performance monitoring in two different modes, i.e accumulation mode(core-imc and thread-imc events), and trace mode(trace-imc events). A cpu thread can either be in accumulation-mode or trace-mode at a time and this is done via the LDBAR register in POWER architecture. The current design does not address the races between thread-imc and trace-imc events. Patch implements a global id and lock to avoid the races between core, trace and thread imc events. With this global id-lock implementation, the system can either run core, thread or trace imc events at a time. i.e. to run any core-imc events, thread/trace imc events should not be enabled/monitored. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313055238.8656-1-anju@linux.vnet.ibm.com
2020-02-11perf/core: Add new branch sample type for HW index of raw branch recordsKan Liang
The low level index is the index in the underlying hardware buffer of the most recently captured taken branch which is always saved in branch_entries[0]. It is very useful for reconstructing the call stack. For example, in Intel LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. With the low level index information, perf tool may stitch the stacks of two samples. The reconstructed LBR call stack can break the HW limitation. Add a new branch sample type to retrieve low level index of raw branch records. The low level index is between -1 (unknown) and max depth which can be retrieved in /sys/devices/cpu/caps/branches. Only when the new branch sample type is set, the low level index information is dumped into the PERF_SAMPLE_BRANCH_STACK output. Perf tool should check the attr.branch_sample_type, and apply the corresponding format for PERF_SAMPLE_BRANCH_STACK samples. Otherwise, some user case may be broken. For example, users may parse a perf.data, which include the new branch sample type, with an old version perf tool (without the check). Users probably get incorrect information without any warning. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com
2020-01-27powerpc/8xx: Use alternative scratch registers in DTLB miss handlerChristophe Leroy
In preparation of handling CONFIG_VMAP_STACK, DTLB miss handler need to use different scratch registers than other exception handlers in order to not jeopardise exception entry on stack DTLB misses. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c5287ea59ae9630f505019b309bf94029241635f.1576916812.git.christophe.leroy@c-s.fr
2020-01-26powerpc: use probe_user_read() and probe_user_write()Christophe Leroy
Instead of opencoding, use probe_user_read() to failessly read a user location and probe_user_write() for writing to user. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e041f5eedb23f09ab553be8a91c3de2087147320.1579800517.git.christophe.leroy@c-s.fr
2019-11-30Merge tag 'powerpc-5.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Infrastructure for secure boot on some bare metal Power9 machines. The firmware support is still in development, so the code here won't actually activate secure boot on any existing systems. - A change to xmon (our crash handler / pseudo-debugger) to restrict it to read-only mode when the kernel is lockdown'ed, otherwise it's trivial to drop into xmon and modify kernel data, such as the lockdown state. - Support for KASLR on 32-bit BookE machines (Freescale / NXP). - Fixes for our flush_icache_range() and __kernel_sync_dicache() (VDSO) to work with memory ranges >4GB. - Some reworks of the pseries CMM (Cooperative Memory Management) driver to make it behave more like other balloon drivers and enable some cleanups of generic mm code. - A series of fixes to our hardware breakpoint support to properly handle unaligned watchpoint addresses. Plus a bunch of other smaller improvements, fixes and cleanups. Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V, Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues, Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes, Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth, Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing" * tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits) powerpc/fixmap: fix crash with HIGHMEM x86/efi: remove unused variables powerpc: Define arch_is_kernel_initmem_freed() for lockdep powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp powerpc: Avoid clang warnings around setjmp and longjmp powerpc: Don't add -mabi= flags when building with Clang powerpc: Fix Kconfig indentation powerpc/fixmap: don't clear fixmap area in paging_init() selftests/powerpc: spectre_v2 test must be built 64-bit powerpc/powernv: Disable native PCIe port management powerpc/kexec: Move kexec files into a dedicated subdir. powerpc/32: Split kexec low level code out of misc_32.S powerpc/sysdev: drop simple gpio powerpc/83xx: map IMMR with a BAT. powerpc/32s: automatically allocate BAT in setbat() powerpc/ioremap: warn on early use of ioremap() powerpc: Add support for GENERIC_EARLY_IOREMAP powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt() powerpc/8xx: use the fixmapped IMMR in cpm_reset() powerpc/8xx: add __init to cpm1 init functions ...
2019-11-13powerpc/perf: remove current_is_64bit()Michal Suchanek
Since commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK") current_is_64bit() is quivalent to !is_32bit_task(). Remove the redundant function. Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190912194633.12045-1-msuchanek@suse.de
2019-10-17perf_event: Add support for LSM and SELinux checksJoel Fernandes (Google)
In current mainline, the degree of access to perf_event_open(2) system call depends on the perf_event_paranoid sysctl. This has a number of limitations: 1. The sysctl is only a single value. Many types of accesses are controlled based on the single value thus making the control very limited and coarse grained. 2. The sysctl is global, so if the sysctl is changed, then that means all processes get access to perf_event_open(2) opening the door to security issues. This patch adds LSM and SELinux access checking which will be used in Android to access perf_event_open(2) for the purposes of attaching BPF programs to tracepoints, perf profiling and other operations from userspace. These operations are intended for production systems. 5 new LSM hooks are added: 1. perf_event_open: This controls access during the perf_event_open(2) syscall itself. The hook is called from all the places that the perf_event_paranoid sysctl is checked to keep it consistent with the systctl. The hook gets passed a 'type' argument which controls CPU, kernel and tracepoint accesses (in this context, CPU, kernel and tracepoint have the same semantics as the perf_event_paranoid sysctl). Additionally, I added an 'open' type which is similar to perf_event_paranoid sysctl == 3 patch carried in Android and several other distros but was rejected in mainline [1] in 2016. 2. perf_event_alloc: This allocates a new security object for the event which stores the current SID within the event. It will be useful when the perf event's FD is passed through IPC to another process which may try to read the FD. Appropriate security checks will limit access. 3. perf_event_free: Called when the event is closed. 4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event. 5. perf_event_write: Called from the ioctl(2) syscalls for the event. [1] https://lwn.net/Articles/696240/ Since Peter had suggest LSM hooks in 2016 [1], I am adding his Suggested-by tag below. To use this patch, we set the perf_event_paranoid sysctl to -1 and then apply selinux checking as appropriate (default deny everything, and then add policy rules to give access to domains that need it). In the future we can remove the perf_event_paranoid sysctl altogether. Suggested-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: James Morris <jmorris@namei.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: rostedt@goodmis.org Cc: Yonghong Song <yhs@fb.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: jeffv@google.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: primiano@google.com Cc: Song Liu <songliubraving@fb.com> Cc: rsavitski@google.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Matthew Garrett <matthewgarrett@google.com> Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
2019-08-20powerpc/perf: fix imc allocation failure handlingNicholas Piggin
The alloc_pages_node return value should be tested for failure before being passed to page_address. Tested-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-3-npiggin@gmail.com
2019-07-13Merge tag 'powerpc-5.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well as some other functions only used by drivers that haven't (yet?) made it upstream. - A fix for a bug in our handling of hardware watchpoints (eg. perf record -e mem: ...) which could lead to register corruption and kernel crashes. - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc when using the Radix MMU. - A large but incremental rewrite of our exception handling code to use gas macros rather than multiple levels of nested CPP macros. And the usual small fixes, cleanups and improvements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung Bauermann, YueHaibing" * tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits) powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state. powerpc/eeh: Handle hugepages in ioremap space ocxl: Update for AFU descriptor template version 1.1 powerpc/boot: pass CONFIG options in a simpler and more robust way powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h powerpc/irq: Don't WARN continuously in arch_local_irq_restore() powerpc/module64: Use symbolic instructions names. powerpc/module32: Use symbolic instructions names. powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h powerpc/module64: Fix comment in R_PPC64_ENTRY handling powerpc/boot: Add lzo support for uImage powerpc/boot: Add lzma support for uImage powerpc/boot: don't force gzipped uImage powerpc/8xx: Add microcode patch to move SMC parameter RAM. powerpc/8xx: Use IO accessors in microcode programming. powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c powerpc/8xx: refactor programming of microcode CPM params. powerpc/8xx: refactor printing of microcode patch name. powerpc/8xx: Refactor microcode write powerpc/8xx: refactor writing of CPM microcode arrays ...
2019-07-04powerpc/perf/24x7: use rb_entryGeliang Tang
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/perf: Use cpumask_last() to determine the designated cpu for ↵Anju T Sudhakar
nest/core units. Nest and core IMC (In-Memory Collection counters) assigns a particular cpu as the designated target for counter data collection. During system boot, the first online cpu in a chip gets assigned as the designated cpu for that chip(for nest-imc) and the first online cpu in a core gets assigned as the designated cpu for that core(for core-imc). If the designated cpu goes offline, the next online cpu from the same chip(for nest-imc)/core(for core-imc) is assigned as the next target, and the event context is migrated to the target cpu. Currently, cpumask_any_but() function is used to find the target cpu. Though this function is expected to return a `random` cpu, this always returns the next online cpu. If all cpus in a chip/core is offlined in a sequential manner, starting from the first cpu, the event migration has to happen for all the cpus which goes offline. Since the migration process involves a grace period, the total time taken to offline all the cpus will be significantly high. Example: In a system which has 2 sockets, with NUMA node0 CPU(s): 0-87 NUMA node8 CPU(s): 88-175 Time taken to offline cpu 88-175: real 2m56.099s user 0m0.191s sys 0m0.000s Use cpumask_last() to choose the target cpu, when the designated cpu goes online, so the migration will happen only when the last_cpu in the mask goes offline. This way the time taken to offline all cpus in a chip/core can be reduced. With the patch: Time taken to offline cpu 88-175: real 0m12.207s user 0m0.171s sys 0m0.000s Offlining all cpus in reverse order is also taken care because, cpumask_any_but() is used to find the designated cpu if the last cpu in the mask goes offline. Since cpumask_any_but() always return the first cpu in the mask, that becomes the designated cpu and migration will happen only when the first_cpu in the mask goes offline. Example: With the patch, Time taken to offline cpu from 175-88: real 0m9.330s user 0m0.110s sys 0m0.000s Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-02Merge tag 'powerpc-5.2-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A minor fix to our IMC PMU code to print a less confusing error message when the driver can't initialise properly. A fix for a bug where a user requesting an unsupported branch sampling filter can corrupt PMU state, preventing the PMU from counting properly. And finally a fix for a bug in our support for kexec_file_load(), which prevented loading a kernel and initramfs. Most versions of kexec don't yet use kexec_file_load(). Thanks to: Anju T Sudhakar, Dave Young, Madhavan Srinivasan, Ravi Bangoria, Thiago Jung Bauermann" * tag 'powerpc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load() powerpc/perf: Fix MMCRA corruption by bhrb_filter powerpc/powernv: Return for invalid IMC domain
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 138Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.067492367@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 4 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091650.480557885@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22powerpc/perf: Fix MMCRA corruption by bhrb_filterRavi Bangoria
Consider a scenario where user creates two events: 1st event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; fd = perf_event_open(attr, 0, 1, -1, 0); This sets cpuhw->bhrb_filter to 0 and returns valid fd. 2nd event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL; fd = perf_event_open(attr, 0, 1, -1, 0); It overrides cpuhw->bhrb_filter to -1 and returns with error. Now if power_pmu_enable() gets called by any path other than power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1. Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Trace imc PMU functionsAnju T Sudhakar
Add PMU functions to support trace-imc. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Trace imc events detection and cpuhotplugAnju T Sudhakar
Patch detects trace-imc events, does memory initilizations for each online cpu, and registers cpuhotplug call-backs. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Add privileged access check for thread_imcMadhavan Srinivasan
Add code to restrict user access to thread_imc pmu since some event report privilege level information. Fixes: f74c89bd80fb3 ("powerpc/perf: Add thread IMC PMU support") Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Rearrange setting of ldbar for thread-imcAnju T Sudhakar
LDBAR holds the memory address allocated for each cpu. For thread-imc the mode bit (i.e bit 1) of LDBAR is set to accumulation. Currently, ldbar is loaded with per cpu memory address and mode set to accumulation at boot time. To enable trace-imc, the mode bit of ldbar should be set to 'trace'. So to accommodate trace-mode of IMC, reposition setting of ldbar for thread-imc to thread_imc_event_add(). Also reset ldbar at thread_imc_event_del(). Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03powerpc/perf: Fix loop exit condition in nest_imc_event_initAnju T Sudhakar
The data structure (i.e struct imc_mem_info) to hold the memory address information for nest imc units is allocated based on the number of nodes in the system. nest_imc_event_init() traverse this struct array to calculate the memory base address for the event-cpu. If we fail to find a match for the event cpu's chip-id in imc_mem_info struct array, then the do-while loop will iterate until we crash. Fix this by changing the loop exit condition based on the number of non zero vbase elements in the array, since the allocation is done for nr_chips + 1. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>