Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- A big series refactoring parts of our KVM code, and converting some
to C.
- Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on
some CPUs.
- Support for the Microwatt soft-core.
- Optimisations to our interrupt return path on 64-bit.
- Support for userspace access to the NX GZIP accelerator on PowerVM on
Power10.
- Enable KUAP and KUEP by default on 32-bit Book3S CPUs.
- Other smaller features, fixes & cleanups.
Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe
Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand,
Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe,
Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas
Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika
Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar
Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei.
* tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits)
powerpc: Only build restart_table.c for 64s
powerpc/64s: move ret_from_fork etc above __end_soft_masked
powerpc/64s/interrupt: clean up interrupt return labels
powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols
powerpc/64: enable MSR[EE] in irq replay pt_regs
powerpc/64s/interrupt: preserve regs->softe for NMI interrupts
powerpc/64s: add a table of implicit soft-masked addresses
powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
powerpc/64e: fix CONFIG_RELOCATABLE build warnings
powerpc/64s: fix hash page fault interrupt handler
powerpc/4xx: Fix setup_kuep() on SMP
powerpc/32s: Fix setup_{kuap/kuep}() on SMP
powerpc/interrupt: Use names in check_return_regs_valid()
powerpc/interrupt: Also use exit_must_hard_disable() on PPC32
powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE
powerpc/ptrace: Refactor regs_set_return_{msr/ip}
powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip}
powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
powerpc/pseries/vas: Include irqdomain.h
powerpc: mark local variables around longjmp as volatile
...
|
|
Merge misc updates from Andrew Morton:
"191 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
pagealloc, and memory-failure)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
mm,hwpoison: make get_hwpoison_page() call get_any_page()
mm,hwpoison: send SIGBUS with error virutal address
mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
docs: remove description of DISCONTIGMEM
arch, mm: remove stale mentions of DISCONIGMEM
mm: remove CONFIG_DISCONTIGMEM
m68k: remove support for DISCONTIGMEM
arc: remove support for DISCONTIGMEM
arc: update comment about HIGHMEM implementation
alpha: remove DISCONTIGMEM and NUMA
mm/page_alloc: move free_the_page
mm/page_alloc: fix counting of managed_pages
mm/page_alloc: improve memmap_pages dbg msg
mm: drop SECTION_SHIFT in code comments
mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
mm/page_alloc: scale the number of pages that are batch freed
...
|
|
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
configuration options are equivalent.
Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.
Done with
$ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
$(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
$ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
$(git grep -wl NEED_MULTIPLE_NODES)
with manual tweaks afterwards.
[rppt@linux.ibm.com: fix arm boot crash]
Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com
Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
printk_safe_flush_on_panic() has special lock breaking code for the case
where we panic()ed with the console lock held. It relies on panic IPI
causing other CPUs to mark themselves offline.
Do as most other architectures do.
This effectively reverts commit de6e5d38417e ("powerpc: smp_send_stop do
not offline stopped CPUs"), unfortunately it may result in some false
positive warnings, but the alternative is more situations where we can
crash without getting messages out.
Fixes: de6e5d38417e ("powerpc: smp_send_stop do not offline stopped CPUs")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210623041245.865134-1-npiggin@gmail.com
|
|
In order to selectively activate KUAP and KUEP in a following patch,
perform KUAP and KUEP initialisation in C.
Unlike PPC64, PPC32 doesn't have an early_setup_secondary(),
so do it in start_secondary().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87be72023448dd4e476744ed279b8c04b8d08a1c.1622708530.git.christophe.leroy@csgroup.eu
|
|
As pointed out by commit
de9b8f5dcbd9 ("sched: Fix crash trying to dequeue/enqueue the idle thread")
init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.
As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().
Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().
Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().
Secondary startups were patched via coccinelle:
@begone@
@@
-preempt_disable();
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
|
|
Geethika reported a trace when doing a dlpar CPU add.
------------[ cut here ]------------
WARNING: CPU: 152 PID: 1134 at kernel/sched/topology.c:2057
CPU: 152 PID: 1134 Comm: kworker/152:1 Not tainted 5.12.0-rc5-master #5
Workqueue: events cpuset_hotplug_workfn
NIP: c0000000001cfc14 LR: c0000000001cfc10 CTR: c0000000007e3420
REGS: c0000034a08eb260 TRAP: 0700 Not tainted (5.12.0-rc5-master+)
MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28828422 XER: 00000020
CFAR: c0000000001fd888 IRQMASK: 0 #012GPR00: c0000000001cfc10
c0000034a08eb500 c000000001f35400 0000000000000027 #012GPR04:
c0000035abaa8010 c0000035abb30a00 0000000000000027 c0000035abaa8018
#012GPR08: 0000000000000023 c0000035abaaef48 00000035aa540000
c0000035a49dffe8 #012GPR12: 0000000028828424 c0000035bf1a1c80
0000000000000497 0000000000000004 #012GPR16: c00000000347a258
0000000000000140 c00000000203d468 c000000001a1a490 #012GPR20:
c000000001f9c160 c0000034adf70920 c0000034aec9fd20 0000000100087bd3
#012GPR24: 0000000100087bd3 c0000035b3de09f8 0000000000000030
c0000035b3de09f8 #012GPR28: 0000000000000028 c00000000347a280
c0000034aefe0b00 c0000000010a2a68
NIP [c0000000001cfc14] build_sched_domains+0x6a4/0x1500
LR [c0000000001cfc10] build_sched_domains+0x6a0/0x1500
Call Trace:
[c0000034a08eb500] [c0000000001cfc10] build_sched_domains+0x6a0/0x1500 (unreliable)
[c0000034a08eb640] [c0000000001d1e6c] partition_sched_domains_locked+0x3ec/0x530
[c0000034a08eb6e0] [c0000000002936d4] rebuild_sched_domains_locked+0x524/0xbf0
[c0000034a08eb7e0] [c000000000296bb0] rebuild_sched_domains+0x40/0x70
[c0000034a08eb810] [c000000000296e74] cpuset_hotplug_workfn+0x294/0xe20
[c0000034a08ebc30] [c000000000178dd0] process_one_work+0x300/0x670
[c0000034a08ebd10] [c0000000001791b8] worker_thread+0x78/0x520
[c0000034a08ebda0] [c000000000185090] kthread+0x1a0/0x1b0
[c0000034a08ebe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
7d2903a6 4e800421 e8410018 7f67db78 7fe6fb78 7f45d378 7f84e378 7c681b78
3c62ff1a 3863c6f8 4802dc35 60000000 <0fe00000> 3920fff4 f9210070 e86100a0
---[ end trace 532d9066d3d4d7ec ]---
Some of the per-CPU masks use cpu_cpu_mask as a filter to limit the search
for related CPUs. On a dlpar add of a CPU, update cpu_cpu_mask before
updating the per-CPU masks. This will ensure the cpu_cpu_mask is updated
correctly before its used in setting the masks. Setting the numa_node will
ensure that when cpu_cpu_mask() gets called, the correct node number is
used. This code movement helped fix the above call trace.
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210401154200.150077-1-srikar@linux.vnet.ibm.com
|
|
On systems with large CPUs per node, even with the filtered matching of
related CPUs, there can be large number of calls to cpu_to_chip_id for
the same CPU. For example with 4096 vCPU, 1 node QEMU configuration,
with 4 threads per core, system could be see upto 1024 calls to
cpu_to_chip_id() for the same CPU. On a given system, cpu_to_chip_id()
for a given CPU would always return the same. Hence cache the result in
a lookup table for use in subsequent calls.
Since all CPUs sharing the same core will belong to the same chip, the
lookup_table has an entry for one CPU per core. chip_id_lookup_table is
not being freed and would be used on subsequent CPU online post CPU
offline.
Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210415120934.232271-4-srikar@linux.vnet.ibm.com
|
|
Daniel reported that with Commit 4ca234a9cbd7 ("powerpc/smp: Stop
updating cpu_core_mask") QEMU was unable to set single NUMA node SMP
topologies such as:
-smp 8,maxcpus=8,cores=2,threads=2,sockets=2
i.e he expected 2 sockets in one NUMA node.
The above commit helped to reduce boot time on Large Systems for
example 4096 vCPU single socket QEMU instance. PAPR is silent on
having more than one socket within a NUMA node.
cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the
number of sockets is different from the number of NUMA nodes.
One option is to reintroduce cpu_core_mask but use a slightly
different method to arrive at the cpu_core_mask. Previously each CPU's
chip-id would be compared with all other CPU's chip-id to verify if
both the CPUs were related at the chip level. Now if a CPU 'A' is
found related / (unrelated) to another CPU 'B', all the thread
siblings of 'A' and thread siblings of 'B' are automatically marked as
related / (unrelated).
Also if a platform doesn't support ibm,chip-id property, i.e its
cpu_to_chip_id returns -1, cpu_core_map holds a copy of
cpu_cpu_mask().
Fixes: 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask")
Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210415120934.232271-2-srikar@linux.vnet.ibm.com
|
|
The sparse tool complains as follows:
arch/powerpc/kernel/smp.c:86:1: warning:
symbol '__pcpu_scope_cpu_coregroup_map' was not declared. Should it be static?
arch/powerpc/kernel/smp.c:125:1: warning:
symbol '__pcpu_scope_thread_group_l1_cache_map' was not declared. Should it be static?
arch/powerpc/kernel/smp.c:132:1: warning:
symbol '__pcpu_scope_thread_group_l2_cache_map' was not declared. Should it be static?
These symbols are not used outside of smp.c, so this
commit marks them static.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210407125903.4139663-1-yukuai3@huawei.com
|
|
debugger_ipi_callback() is a local routine used as a NMI IPI handler and
does not need to be external.
It fixes this W=1 compile error :
../arch/powerpc/kernel/smp.c:579:6: error: no previous prototype for ‘debugger_ipi_callback’ [-Werror=missing-prototypes]
579 | void debugger_ipi_callback(struct pt_regs *regs)
| ^~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210104143206.695198-10-clg@kaod.org
|
|
It fixes this W=1 compile error :
../arch/powerpc/kernel/smp.c:569:6: error: no previous prototype for ‘tick_broadcast’ [-Werror=missing-prototypes]
569 | void tick_broadcast(const struct cpumask *mask)
| ^~~~~~~~~~~~~~
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210104143206.695198-9-clg@kaod.org
|
|
It fixes this link warning:
WARNING: modpost: vmlinux.o(.text.unlikely+0x2d98): Section mismatch in reference from the function init_big_cores.isra.0() to the function .init.text:init_thread_group_cache_map()
The function init_big_cores.isra.0() references
the function __init init_thread_group_cache_map().
This is often because init_big_cores.isra.0 lacks a __init
annotation or the annotation of init_thread_group_cache_map is wrong.
Fixes: 425752c63b6f ("powerpc: Detect the presence of big-cores via "ibm, thread-groups"")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201221074154.403779-1-clg@kaod.org
|
|
On POWER systems, groups of threads within a core sharing the L2-cache
can be indicated by the "ibm,thread-groups" property array with the
identifier "2".
This patch adds support for detecting this, and when present, populate
the populating the cpu_l2_cache_mask of every CPU to the core-siblings
which share L2 with the CPU as specified in the by the
"ibm,thread-groups" property array.
On a platform with the following "ibm,thread-group" configuration
00000001 00000002 00000004 00000000
00000002 00000004 00000006 00000001
00000003 00000005 00000007 00000002
00000002 00000004 00000000 00000002
00000004 00000006 00000001 00000003
00000005 00000007
Without this patch, the sched-domain hierarchy for CPUs 0,1 would be
CPU0 attaching sched-domain(s):
domain-0: span=0,2,4,6 level=SMT
domain-1: span=0-7 level=CACHE
domain-2: span=0-15,24-39,48-55 level=MC
domain-3: span=0-55 level=DIE
CPU1 attaching sched-domain(s):
domain-0: span=1,3,5,7 level=SMT
domain-1: span=0-7 level=CACHE
domain-2: span=0-15,24-39,48-55 level=MC
domain-3: span=0-55 level=DIE
The CACHE domain at 0-7 is incorrect since the ibm,thread-groups
sub-array
[00000002 00000002 00000004
00000000 00000002 00000004 00000006
00000001 00000003 00000005 00000007]
indicates that L2 (Property "2") is shared only between the threads of a single
group. There are "2" groups of threads where each group contains "4"
threads each. The groups being {0,2,4,6} and {1,3,5,7}.
With this patch, the sched-domain hierarchy for CPUs 0,1 would be
CPU0 attaching sched-domain(s):
domain-0: span=0,2,4,6 level=SMT
domain-1: span=0-15,24-39,48-55 level=MC
domain-2: span=0-55 level=DIE
CPU1 attaching sched-domain(s):
domain-0: span=1,3,5,7 level=SMT
domain-1: span=0-15,24-39,48-55 level=MC
domain-2: span=0-55 level=DIE
The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1
resp.) gets degenerated into the SMT domain. Furthermore, the
last-level-cache domain gets correctly set to the SMT sched-domain.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-5-git-send-email-ego@linux.vnet.ibm.com
|
|
init_thread_group_l1_cache_map() initializes the per-cpu cpumask
thread_group_l1_cache_map with the core-siblings which share L1 cache
with the CPU. Make this function generic to the cache-property (L1 or
L2) and update a suitable mask. This is a preparatory patch for the
next patch where we will introduce discovery of thread-groups that
share L2-cache.
No functional change.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-4-git-send-email-ego@linux.vnet.ibm.com
|
|
On platforms which have the "ibm,thread-groups" property, the per-cpu
variable cpu_l1_cache_map keeps a track of which group of threads
within the same core share the L1 cache, Instruction and Data flow.
This patch renames the variable to "thread_group_l1_cache_map" to make
it consistent with a subsequent patch which will introduce
thread_group_l2_cache_map.
This patch introduces no functional change.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-3-git-send-email-ego@linux.vnet.ibm.com
|
|
The "ibm,thread-groups" device-tree property is an array that is used
to indicate if groups of threads within a core share certain
properties. It provides details of which property is being shared by
which groups of threads. This array can encode information about
multiple properties being shared by different thread-groups within the
core.
Example: Suppose,
"ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15]
This can be decomposed up into two consecutive arrays:
a) [1,2,4,8,10,12,14,9,11,13,15]
b) [2,2,4,8,10,12,14,9,11,13,15]
where in,
a) provides information of Property "1" being shared by "2" groups,
each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the
first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of
the second group is {9,11,13,15}. Property "1" is indicative of
the thread in the group sharing L1 cache, translation cache and
Instruction Data flow.
b) provides information of Property "2" being shared by "2" groups,
each group with "4" threads. The "ibm,ppc-interrupt-server#s" of
the first group is {8,10,12,14} and the
"ibm,ppc-interrupt-server#s" of the second group is
{9,11,13,15}. Property "2" indicates that the threads in each group
share the L2-cache.
The existing code assumes that the "ibm,thread-groups" encodes
information about only one property. Hence even on platforms which
encode information about multiple properties being shared by the
corresponding groups of threads, the current code will only pick the
first one. (In the above example, it will only consider
[1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]).
This patch extends the parsing support on platforms which encode
information about multiple properties being shared by the
corresponding groups of threads.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-2-git-send-email-ego@linux.vnet.ibm.com
|
|
The call to rcu_cpu_starting() in start_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows (with CONFIG_PROVE_RCU_LIST=y):
WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
Call Trace:
dump_stack+0xec/0x144 (unreliable)
lockdep_rcu_suspicious+0x128/0x14c
__lock_acquire+0x1060/0x1c60
lock_acquire+0x140/0x5f0
_raw_spin_lock_irqsave+0x64/0xb0
clockevents_register_device+0x74/0x270
register_decrementer_clockevent+0x94/0x110
start_secondary+0x134/0x800
start_secondary_prolog+0x10/0x14
This is avoided by adding a call to rcu_cpu_starting() near the
beginning of the start_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.
It's safe to call rcu_cpu_starting() in the arch code as well as later
in generic code, as explained by Paul:
It uses a per-CPU variable so that RCU pays attention only to the
first call to rcu_cpu_starting() if there is more than one of them.
This is even intentional, due to there being a generic
arch-independent call to rcu_cpu_starting() in
notify_cpu_starting().
So multiple calls to rcu_cpu_starting() are fine by design.
Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
[mpe: Add Fixes tag, reword slightly & expand change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com
|
|
Qian Cai reported a regression where CPU Hotplug fails with the latest
powerpc/next
BUG: sleeping function called from invalid context at mm/slab.h:494
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88
no locks held by swapper/88/0.
irq event stamp: 18074448
hardirqs last enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110
hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0
do_idle at kernel/sched/idle.c:253 (discriminator 1)
softirqs last enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0
softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0
CPU: 88 PID: 0 Comm: swapper/88 Tainted: G W 5.9.0-rc8-next-20201007 #1
Call Trace:
[c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable)
[c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310
[c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190
[c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0
slab_alloc_node at mm/slub.c:2817
(inlined by) __kmalloc_node at mm/slub.c:4013
[c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80
kmalloc_node at include/linux/slab.h:577
(inlined by) alloc_cpumask_var_node at lib/cpumask.c:116
[c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800
update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267
(inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387
(inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420
[c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14
Allocating a temporary mask while performing a CPU Hotplug operation
with CONFIG_CPUMASK_OFFSTACK enabled, leads to calling a sleepable
function from a atomic context. Fix this by allocating the temporary
mask with GFP_ATOMIC flag. Also instead of having to allocate twice,
allocate the mask in the caller so that we only have to allocate once.
If the allocation fails, assume the mask to be same as sibling mask, which
will make the scheduler to drop this domain for this CPU.
Fixes: 70a94089d7f7 ("powerpc/smp: Optimize update_coregroup_mask")
Fixes: 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2")
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201019042716.106234-3-srikar@linux.vnet.ibm.com
|
|
Commit 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2")
introduced submask_fn in update_mask_by_l2 to track the right submask.
However commit f6606cfdfbcd ("powerpc/smp: Dont assume l2-cache to be
superset of sibling") introduced sibling_mask in update_mask_by_l2 to
track the same submask. Remove sibling_mask in favour of submask_fn.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201019042716.106234-2-srikar@linux.vnet.ibm.com
|
|
All threads of a SMT4/SMT8 core can either be part of CPU's coregroup
mask or outside the coregroup. Use this relation to reduce the
number of iterations needed to find all the CPUs that share the same
coregroup
Use a temporary mask to iterate through the CPUs that may share
coregroup mask. Also instead of setting one CPU at a time into
cpu_coregroup_mask, copy the SMT4/SMT8/submask at one shot.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-12-srikar@linux.vnet.ibm.com
|
|
Move the logic for updating the coregroup mask of a CPU to its own
function. This will help in reworking the updation of coregroup mask in
subsequent patch.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-11-srikar@linux.vnet.ibm.com
|
|
All threads of a SMT4 core can either be part of this CPU's l2-cache
mask or not related to this CPU l2-cache mask. Use this relation to
reduce the number of iterations needed to find all the CPUs that share
the same l2-cache.
Use a temporary mask to iterate through the CPUs that may share l2_cache
mask. Also instead of setting one CPU at a time into cpu_l2_cache_mask,
copy the SMT4/sub mask at one shot.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-10-srikar@linux.vnet.ibm.com
|
|
CACHE and COREGROUP domains are now part of default topology. However on
systems that don't support CACHE or COREGROUP, these domains will
eventually be degenerated. The degeneration happens per CPU. Do note the
current fixup_topology() logic ensures that mask of a domain that is not
supported on the current platform is set to the previous domain.
Instead of waiting for the scheduler to degenerated try to consolidate
based on their masks and sd_flags. This is done just before setting
the scheduler topology.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-9-srikar@linux.vnet.ibm.com
|
|
Currently on hotplug/hotunplug, CPU iterates through all the CPUs in
its core to find threads in its thread group. However this info is
already captured in cpu_l1_cache_map. Hence reduce iterations and
cleanup add_cpu_to_smallcore_masks function.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-8-srikar@linux.vnet.ibm.com
|
|
update_mask_by_l2 is called only once. But it passes cpu_l2_cache_mask
as parameter. Instead of passing cpu_l2_cache_mask, use it directly in
update_mask_by_l2.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-7-srikar@linux.vnet.ibm.com
|
|
All the arch specific topology cpumasks are within a node/DIE.
However when setting these per CPU cpumasks, system traverses through
all the online CPUs. This is redundant.
Reduce the traversal to only CPUs that are online in the node to which
the CPU belongs to.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-6-srikar@linux.vnet.ibm.com
|
|
While offlining a CPU, system currently iterate through all the CPUs in
the DIE to clear sibling, l2_cache and smallcore maps. However if there
are more cores in a DIE, system can end up spending more time iterating
through CPUs which are completely unrelated.
Optimize this by only iterating through smaller but relevant cpumap.
If shared_cache is set, cpu_l2_cache_map should be relevant else
cpu_sibling_map would be relevant.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-5-srikar@linux.vnet.ibm.com
|
|
Now that cpu_core_mask has been removed and topology_core_cpumask has
been updated to use cpu_cpu_mask, we no more need
get_physical_package_id.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-4-srikar@linux.vnet.ibm.com
|
|
Anton Blanchard reported that his 4096 vcpu KVM guest took around 30
minutes to boot. He also analyzed it to the time taken to iterate while
setting the cpu_core_mask.
Further analysis shows that cpu_core_mask and cpu_cpu_mask for any CPU
would be equal on Power. However updating cpu_core_mask took forever to
update as its a per cpu cpumask variable. Instead cpu_cpu_mask was a per
NODE /per DIE cpumask that was shared by all the respective CPUs.
Also cpu_cpu_mask is needed from a scheduler perspective. However
cpu_core_map is an exported symbol. Hence stop updating cpu_core_map
and make it point to cpu_cpu_mask.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200921095653.9701-3-srikar@linux.vnet.ibm.com
|
|
We have smp_ops->cpu_die() and ppc_md.cpu_die(). One of them offlines
the current CPU and one offlines another CPU, can you guess which is
which? Also one is in smp_ops and one is in ppc_md?
So rename ppc_md.cpu_die(), to cpu_offline_self(), because that's what
it does. And move it into smp_ops where it belongs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200819015634.1974478-3-mpe@ellerman.id.au
|
|
Avoid the eternal confusion between cpu_die() and __cpu_die() by
removing the former, folding it into its only caller.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200819015634.1974478-2-mpe@ellerman.id.au
|
|
arch_cpu_idle_dead() is in idle.c, which makes sense, but it's inside
a CONFIG_HOTPLUG_CPU block.
It would be more at home in smp.c, inside the existing
CONFIG_HOTPLUG_CPU block. Note that CONFIG_HOTPLUG_CPU depends on
CONFIG_SMP so even though smp.c is not built for SMP=n builds, that's
fine.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200819015634.1974478-1-mpe@ellerman.id.au
|
|
Add percpu coregroup maps and masks to create coregroup domain.
If a coregroup doesn't exist, the coregroup domain will be degenerated
in favour of SMT/CACHE domain. Do note this patch is only creating stubs
for cpu_to_coregroup_id. The actual cpu_to_coregroup_id implementation
would be in a subsequent patch.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200810071834.92514-10-srikar@linux.vnet.ibm.com
|
|
If allocated earlier and the search fails, then cpu_l1_cache_map cpumask
is unnecessarily cleared. However cpu_l1_cache_map can be allocated /
cleared after we search thread group.
Please note CONFIG_CPUMASK_OFFSTACK is not set on Powerpc. Hence cpumask
allocated by zalloc_cpumask_var_node is never freed.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200810071834.92514-9-srikar@linux.vnet.ibm.com
|
|
Add support for grouping cores based on the device-tree classification.
- The last domain in the associativity domains always refers to the
core.
- If primary reference domain happens to be the penultimate domain in
the associativity domains device-tree property, then there are no
coregroups. However if its not a penultimate domain, then there are
coregroups. There can be more than one coregroup. For now we would be
interested in the last or the smallest coregroups, i.e one sub-group
per DIE.
Currently there are no firmwares that are exposing this grouping. Hence
allow the basis for grouping to be abstract. Once the firmware starts
using this grouping, code would be added to detect the type of grouping
and adjust the sd domain flags accordingly.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200810071834.92514-8-srikar@linux.vnet.ibm.com
|
|
In start_secondary, even if shared_cache was already set, system does a
redundant match for cpumask. This redundant check can be removed by
checking if shared_cache is already set.
While here, localize the sibling_mask variable to within the if
condition.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200810071834.92514-7-srikar@linux.vnet.ibm.com
|
|
Current code assumes that cpumask of cpus sharing a l2-cache mask will
always be a superset of cpu_sibling_mask.
Lets stop that assumption. cpu_l2_cache_mask is a superset of
cpu_sibling_mask if and only if shared_caches is set.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200913171038.GB11808@linux.vnet.ibm.com
|
|
Move topology fixup based on the platform attributes into its own
function which is called just before set_sched_topology.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200810071834.92514-5-srikar@linux.vnet.ibm.com
|
|
Just moving the powerpc_topology description above.
This will help in using functions in this file and avoid declarations.
No other functional changes
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200810071834.92514-4-srikar@linux.vnet.ibm.com
|
|
A new sched_domain_topology_level was added just for Power9. However the
same can be achieved by merging powerpc_topology with power9_topology
and makes the code more simpler especially when adding a new sched
domain.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200810071834.92514-3-srikar@linux.vnet.ibm.com
|
|
Fix a build warning in a non CONFIG_NEED_MULTIPLE_NODES
"error: _numa_cpu_lookup_table_ undeclared"
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200810071834.92514-2-srikar@linux.vnet.ibm.com
|
|
UAMOR values are not application-specific. The kernel initializes
its value based on different reserved keys. Remove the thread-specific
UAMOR value and don't switch the UAMOR on context switch.
Move UAMOR initialization to key initialization code and remove
thread_struct.uamor because it is not used anymore.
Before commit: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not change pkey registers")
we used to update uamor based on key allocation and free.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-20-aneesh.kumar@linux.ibm.com
|
|
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes. Fix this up with the aid of
the below script and manual adjustments here and there.
import sys
import re
if len(sys.argv) is not 3:
print "USAGE: %s <file> <header>" % (sys.argv[0])
sys.exit(1)
hdr_to_move="#include <linux/%s>" % sys.argv[2]
moved = False
in_hdrs = False
with open(sys.argv[1], "r") as f:
lines = f.readlines()
for _line in lines:
line = _line.rstrip('
')
if line == hdr_to_move:
continue
if line.startswith("#include <linux/"):
in_hdrs = True
elif not moved and in_hdrs:
moved = True
print hdr_to_move
print line
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently when we boot on a big core system, we get this print:
[ 0.040500] Using small cores at SMT level
This is misleading as we've actually detected big cores.
This patch clears up the print to say we've detect big cores but are
using small cores for scheduling.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200528230731.1235752-1-mikey@neuling.org
|
|
We can avoid the #ifdef by using IS_ENABLED() in the existing
condition check.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313112020.28235-2-mpe@ellerman.id.au
|
|
We don't need the NULL check of np, the result is the same because the
OF helpers cope with NULL, of_node_to_nid(NULL) == NUMA_NO_NODE (-1).
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313112020.28235-1-mpe@ellerman.id.au
|
|
With commit ("powerpc/numa: Early request for home node associativity"),
commit 2ea626306810 ("powerpc/topology: Get topology for shared
processors at boot") which was requesting home node associativity
becomes redundant.
Hence remove the late request for home node associativity.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135301.24739-6-srikar@linux.vnet.ibm.com
|
|
package_id is to match cores that are part of the same chip. On
PowerNV machines, package_id defaults to chip_id. However ibm,chip_id
property is not present in device-tree of PowerVM LPARs. Hence lscpu
output shows one core per socket and multiple cores.
To overcome this, use nid as the package_id on PowerVM LPARs.
Before the patch:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 8
Core(s) per socket: 1 <----------------------
Socket(s): 16 <----------------------
NUMA node(s): 2
Model: 2.2 (pvr 004e 0202)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127
#
# cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
-1
After the patch:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 8 <---------------------
Core(s) per socket: 8 <---------------------
Socket(s): 2
NUMA node(s): 2
Model: 2.2 (pvr 004e 0202)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127
#
# cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
0
Now lscpu output is more in line with the system configuration.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[mpe: Use pkg_id instead of ppid, tweak change log and comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135121.24617-1-srikar@linux.vnet.ibm.com
|