diff options
author | Nicholas Piggin <npiggin@gmail.com> | 2018-06-01 20:01:21 +1000 |
---|---|---|
committer | Michael Ellerman <mpe@ellerman.id.au> | 2018-06-03 20:40:36 +1000 |
commit | 0cef77c7798a7832769fbd25a4d0b0b3361cc6f0 (patch) | |
tree | ee531c9889105f2387e121f65628234a9622bdd6 /arch/powerpc/include | |
parent | 85bcfaf69cbd610fdfac3351cf385809a2f4a93b (diff) |
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diffstat (limited to 'arch/powerpc/include')
-rw-r--r-- | arch/powerpc/include/asm/tlb.h | 13 |
1 files changed, 13 insertions, 0 deletions
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h index a7eabff27a0f..9138baccebb0 100644 --- a/arch/powerpc/include/asm/tlb.h +++ b/arch/powerpc/include/asm/tlb.h @@ -76,6 +76,19 @@ static inline int mm_is_thread_local(struct mm_struct *mm) return false; return cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm)); } +static inline void mm_reset_thread_local(struct mm_struct *mm) +{ + WARN_ON(atomic_read(&mm->context.copros) > 0); + /* + * It's possible for mm_access to take a reference on mm_users to + * access the remote mm from another thread, but it's not allowed + * to set mm_cpumask, so mm_users may be > 1 here. + */ + WARN_ON(current->mm != mm); + atomic_set(&mm->context.active_cpus, 1); + cpumask_clear(mm_cpumask(mm)); + cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm)); +} #else /* CONFIG_PPC_BOOK3S_64 */ static inline int mm_is_thread_local(struct mm_struct *mm) { |