Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The major update to this release is that there's a new arch config
option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.
Currently, only x86_64 enables it. All the ftrace callbacks now take a
struct ftrace_regs instead of a struct pt_regs. If the architecture
has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
have enough information to read the arguments of the function being
traced, as well as access to the stack pointer.
This way, if a user (like live kernel patching) only cares about the
arguments, then it can avoid using the heavier weight "regs" callback,
that puts in enough information in the struct ftrace_regs to simulate
a breakpoint exception (needed for kprobes).
A new config option that audits the timestamps of the ftrace ring
buffer at most every event recorded.
Ftrace recursion protection has been cleaned up to move the protection
to the callback itself (this saves on an extra function call for those
callbacks).
Perf now handles its own RCU protection and does not depend on ftrace
to do it for it (saving on that extra function call).
New debug option to add "recursed_functions" file to tracefs that
lists all the places that triggered the recursion protection of the
function tracer. This will show where things need to be fixed as
recursion slows down the function tracer.
The eval enum mapping updates done at boot up are now offloaded to a
work queue, as it caused a noticeable pause on slow embedded boards.
Various clean ups and last minute fixes"
* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Offload eval map updates to a work queue
Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
ring-buffer: Add rb_check_bpage in __rb_allocate_pages
ring-buffer: Fix two typos in comments
tracing: Drop unneeded assignment in ring_buffer_resize()
tracing: Disable ftrace selftests when any tracer is running
seq_buf: Avoid type mismatch for seq_buf_init
ring-buffer: Fix a typo in function description
ring-buffer: Remove obsolete rb_event_is_commit()
ring-buffer: Add test to validate the time stamp deltas
ftrace/documentation: Fix RST C code blocks
tracing: Clean up after filter logic rewriting
tracing: Remove the useless value assignment in test_create_synth_event()
livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
MAINTAINERS: assign ./fs/tracefs to TRACING
tracing: Fix some typos in comments
ftrace: Remove unused varible 'ret'
ring-buffer: Add recording of ring buffer recursion into recursed_functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb update from Konrad Rzeszutek Wilk:
"A generic (but for right now engaged only with AMD SEV) mechanism to
adjust a larger size SWIOTLB based on the total memory of the SEV
guests which right now require the bounce buffer for interacting with
the outside world.
Normal knobs (swiotlb=XYZ) still work"
* 'stable/for-linus-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
x86,swiotlb: Adjust SWIOTLB bounce buffer size for SEV guests
|
|
Merge misc updates from Andrew Morton:
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
mm: cleanup kstrto*() usage
mm: fix fall-through warnings for Clang
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm:backing-dev: use sysfs_emit in macro defining functions
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm: use sysfs_emit for struct kobject * uses
mm: fix kernel-doc markups
zram: break the strict dependency from lzo
zram: add stat to gather incompressible pages since zram set up
zram: support page writeback
mm/process_vm_access: remove redundant initialization of iov_r
mm/zsmalloc.c: rework the list_add code in insert_zspage()
mm/zswap: move to use crypto_acomp API for hardware acceleration
mm/zswap: fix passing zero to 'PTR_ERR' warning
mm/zswap: make struct kernel_param_ops definitions const
userfaultfd/selftests: hint the test runner on required privilege
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: always dump something in modes
userfaultfd: selftests: make __{s,u}64 format specifiers portable
...
|
|
As kernel expect to see only one of such mappings, any further operations
on the VMA-copy may be unexpected by the kernel. Maybe it's being on the
safe side, but there doesn't seem to be any expected use-case for this, so
restrict it now.
Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com
Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during
fork() for ptes") pages under a FOLL_PIN will not be write protected
during COW for fork. This means that pages returned from
pin_user_pages(FOLL_WRITE) should not become write protected while the pin
is active.
However, there is a small race where get_user_pages_fast(FOLL_PIN) can
establish a FOLL_PIN at the same time copy_present_page() is write
protecting it:
CPU 0 CPU 1
get_user_pages_fast()
internal_get_user_pages_fast()
copy_page_range()
pte_alloc_map_lock()
copy_present_page()
atomic_read(has_pinned) == 0
page_maybe_dma_pinned() == false
atomic_set(has_pinned, 1);
gup_pgd_range()
gup_pte_range()
pte_t pte = gup_get_pte(ptep)
pte_access_permitted(pte)
try_grab_compound_head()
pte = pte_wrprotect(pte)
set_pte_at();
pte_unmap_unlock()
// GUP now returns with a write protected page
The first attempt to resolve this by using the write protect caused
problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid
early COW write protect games during fork()")
Instead wrap copy_p4d_range() with the write side of a seqcount and check
the read side around gup_pgd_range(). If there is a collision then
get_user_pages_fast() fails and falls back to slow GUP.
Slow GUP is safe against this race because copy_page_range() is only
called while holding the exclusive side of the mmap_lock on the src
mm_struct.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com
Link: https://lkml.kernel.org/r/2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com
Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: "Ahmed S. Darwish" <a.darwish@linutronix.de> [seqcount_t parts]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
"Yet another large set of x86 interrupt management updates:
- Simplification and distangling of the MSI related functionality
- Let IO/APIC construct the RTE entries from an MSI message instead
of having IO/APIC specific code in the interrupt remapping drivers
- Make the retrieval of the parent interrupt domain (vector or remap
unit) less hardcoded and use the relevant irqdomain callbacks for
selection.
- Allow the handling of more than 255 CPUs without a virtualized
IOMMU when the hypervisor supports it. This has made been possible
by the above modifications and also simplifies the existing
workaround in the HyperV specific virtual IOMMU.
- Cleanup of the historical timer_works() irq flags related
inconsistencies"
* tag 'x86-apic-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/ioapic: Cleanup the timer_works() irqflags mess
iommu/hyper-v: Remove I/O-APIC ID check from hyperv_irq_remapping_select()
iommu/amd: Fix IOMMU interrupt generation in X2APIC mode
iommu/amd: Don't register interrupt remapping irqdomain when IR is disabled
iommu/amd: Fix union of bitfields in intcapxt support
x86/ioapic: Correct the PCI/ISA trigger type selection
x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index
x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it
x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected
iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available
x86/apic: Support 15 bits of APIC ID in MSI where available
x86/ioapic: Handle Extended Destination ID field in RTE
iommu/vt-d: Simplify intel_irq_remapping_select()
x86: Kill all traces of irq_remapping_get_irq_domain()
x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain
iommu/hyper-v: Implement select() method on remapping irqdomain
iommu/vt-d: Implement select() method on remapping irqdomain
iommu/amd: Implement select() method on remapping irqdomain
x86/apic: Add select() method on vector irqdomain
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kmap updates from Thomas Gleixner:
"The new preemtible kmap_local() implementation:
- Consolidate all kmap_atomic() internals into a generic
implementation which builds the base for the kmap_local() API and
make the kmap_atomic() interface wrappers which handle the
disabling/enabling of preemption and pagefaults.
- Switch the storage from per-CPU to per task and provide scheduler
support for clearing mapping when scheduling out and restoring them
when scheduling back in.
- Merge the migrate_disable/enable() code, which is also part of the
scheduler pull request. This was required to make the kmap_local()
interface available which does not disable preemption when a
mapping is established. It has to disable migration instead to
guarantee that the virtual address of the mapped slot is the same
across preemption.
- Provide better debug facilities: guard pages and enforced
utilization of the mapping mechanics on 64bit systems when the
architecture allows it.
- Provide the new kmap_local() API which can now be used to cleanup
the kmap_atomic() usage sites all over the place. Most of the usage
sites do not require the implicit disabling of preemption and
pagefaults so the penalty on 64bit and 32bit non-highmem systems is
removed and quite some of the code can be simplified. A wholesale
conversion is not possible because some usage depends on the
implicit side effects and some need to be cleaned up because they
work around these side effects.
The migrate disable side effect is only effective on highmem
systems and when enforced debugging is enabled. On 64bit and 32bit
non-highmem systems the overhead is completely avoided"
* tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
ARM: highmem: Fix cache_is_vivt() reference
x86/crashdump/32: Simplify copy_oldmem_page()
io-mapping: Provide iomap_local variant
mm/highmem: Provide kmap_local*
sched: highmem: Store local kmaps in task struct
x86: Support kmap_local() forced debugging
mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL
microblaze/mm/highmem: Add dropped #ifdef back
xtensa/mm/highmem: Make generic kmap_atomic() work correctly
mm/highmem: Take kmap_high_get() properly into account
highmem: High implementation details and document API
Documentation/io-mapping: Remove outdated blurb
io-mapping: Cleanup atomic iomap
mm/highmem: Remove the old kmap_atomic cruft
highmem: Get rid of kmap_types.h
xtensa/mm/highmem: Switch to generic kmap atomic
sparc/mm/highmem: Switch to generic kmap atomic
powerpc/mm/highmem: Switch to generic kmap atomic
nds32/mm/highmem: Switch to generic kmap atomic
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- migrate_disable/enable() support which originates from the RT tree
and is now a prerequisite for the new preemptible kmap_local() API
which aims to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
sched/fair: Trivial correction of the newidle_balance() comment
sched/fair: Clear SMT siblings after determining the core is not idle
sched: Fix kernel-doc markup
x86: Print ratio freq_max/freq_base used in frequency invariance calculations
x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
x86, sched: Calculate frequency invariance for AMD systems
irq_work: Optimize irq_work_single()
smp: Cleanup smp_call_function*()
irq_work: Cleanup
sched: Limit the amount of NUMA imbalance that can exist at fork time
sched/numa: Allow a floating imbalance between NUMA nodes
sched: Avoid unnecessary calculation of load imbalance at clone time
sched/numa: Rename nr_running and break out the magic number
sched: Make migrate_disable/enable() independent of RT
sched/topology: Condition EAS enablement on FIE support
arm64: Rebuild sched domains on invariance status changes
sched/topology,schedutil: Wrap sched domains rebuild
sched/uclamp: Allow to reset a task uclamp constraint value
sched/core: Fix typos in comments
Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
"Core:
- Better handling of page table leaves on archictectures which have
architectures have non-pagetable aligned huge/large pages. For such
architectures a leaf can actually be part of a larger entry.
- Prevent a deadlock vs exec_update_mutex
Architectures:
- The related updates for page size calculation of leaf entries
- The usual churn to support new CPUs
- Small fixes and improvements all over the place"
* tag 'perf-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
perf/x86/intel: Add Tremont Topdown support
uprobes/x86: Fix fall-through warnings for Clang
perf/x86: Fix fall-through warnings for Clang
kprobes/x86: Fix fall-through warnings for Clang
perf/x86/intel/lbr: Fix the return type of get_lbr_cycles()
perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake
x86/kprobes: Restore BTF if the single-stepping is cancelled
perf: Break deadlock involving exec_update_mutex
sparc64/mm: Implement pXX_leaf_size() support
powerpc/8xx: Implement pXX_leaf_size() support
arm64/mm: Implement pXX_leaf_size() support
perf/core: Fix arch_perf_get_page_size()
mm: Introduce pXX_leaf_size()
mm/gup: Provide gup_get_pte() more generic
perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY
perf/x86/intel/uncore: Add Rocket Lake support
perf/x86/msr: Add Rocket Lake CPU support
perf/x86/cstate: Add Rocket Lake CPU support
perf/x86/intel: Add Rocket Lake CPU support
perf,mm: Handle non-page-table-aligned hugetlbfs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Thomas Gleixner:
"RCU, LKMM and KCSAN updates collected by Paul McKenney.
RCU:
- Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs
- Lockdep-RCU updates reducing the need for __maybe_unused
- Tasks-RCU updates
- Miscellaneous fixes
- Documentation updates
- Torture-test updates
KCSAN:
- updates for selftests, avoiding setting watchpoints on NULL pointers
- fix to watchpoint encoding
LKMM:
- updates for documentation along with some updates to example-code
litmus tests"
* tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
srcu: Take early exit on memory-allocation failure
rcu/tree: Defer kvfree_rcu() allocation to a clean context
rcu: Do not report strict GPs for outgoing CPUs
rcu: Fix a typo in rcu_blocking_is_gp() header comment
rcu: Prevent lockdep-RCU splats on lock acquisition/release
rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs
rcu,ftrace: Fix ftrace recursion
rcu/tree: Make struct kernel_param_ops definitions const
rcu/tree: Add a warning if CPU being onlined did not report QS already
rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config
rcu: Fix single-CPU check in rcu_blocking_is_gp()
rcu: Implement rcu_segcblist_is_offloaded() config dependent
list.h: Update comment to explicitly note circular lists
rcu: Panic after fixed number of stalls
x86/smpboot: Move rcu_cpu_starting() earlier
rcu: Allow rcu_irq_enter_check_tick() from NMI
tools/memory-model: Label MP tests' producers and consumers
tools/memory-model: Use "buf" and "flag" for message-passing tests
tools/memory-model: Add types to litmus tests
tools/memory-model: Add a glossary of LKMM terms
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry/exit updates from Thomas Gleixner:
"A set of updates for entry/exit handling:
- More generalization of entry/exit functionality
- The consolidation work to reclaim TIF flags on x86 and also for
non-x86 specific TIF flags which are solely relevant for syscall
related work and have been moved into their own storage space. The
x86 specific part had to be merged in to avoid a major conflict.
- The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal
delivery mode of task work and results in an impressive performance
improvement for io_uring. The non-x86 consolidation of this is
going to come seperate via Jens.
- The selective syscall redirection facility which provides a clean
and efficient way to support the non-Linux syscalls of WINE by
catching them at syscall entry and redirecting them to the user
space emulation. This can be utilized for other purposes as well
and has been designed carefully to avoid overhead for the regular
fastpath. This includes the core changes and the x86 support code.
- Simplification of the context tracking entry/exit handling for the
users of the generic entry code which guarantee the proper ordering
and protection.
- Preparatory changes to make the generic entry code accomodate S390
specific requirements which are mostly related to their syscall
restart mechanism"
* tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
entry: Add syscall_exit_to_user_mode_work()
entry: Add exit_to_user_mode() wrapper
entry_Add_enter_from_user_mode_wrapper
entry: Rename exit_to_user_mode()
entry: Rename enter_from_user_mode()
docs: Document Syscall User Dispatch
selftests: Add benchmark for syscall user dispatch
selftests: Add kselftest for syscall user dispatch
entry: Support Syscall User Dispatch on common syscall entry
kernel: Implement selective syscall userspace redirection
signal: Expose SYS_USER_DISPATCH si_code type
x86: vdso: Expose sigreturn address on vdso to the kernel
MAINTAINERS: Add entry for common entry code
entry: Fix boot for !CONFIG_GENERIC_ENTRY
x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK
context_tracking: Only define schedule_user() on !HAVE_CONTEXT_TRACKING_OFFSTACK archs
sched: Detect call to schedule from critical entry code
context_tracking: Don't implement exception_enter/exit() on CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK
context_tracking: Introduce HAVE_CONTEXT_TRACKING_OFFSTACK
x86: Reclaim unused x86 TI flags
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Expose tag address bits in siginfo. The original arm64 ABI did not
expose any of the bits 63:56 of a tagged address in siginfo. In the
presence of user ASAN or MTE, this information may be useful. The
implementation is generic to other architectures supporting tags
(like SPARC ADI, subject to wiring up the arch code). The user will
have to opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra
bits, if available, become visible in si_addr.
- Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the
lowest 1GB to cope with the Raspberry Pi 4 limitations, to the
detriment of other platforms. With these changes, the kernel scans
the Device Tree dma-ranges and the ACPI IORT information before
deciding on a smaller ZONE_DMA.
- Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building
with LTO, there is an increased risk of the compiler converting an
address dependency headed by a READ_ONCE() invocation into a control
dependency and consequently allowing for harmful reordering by the
CPU.
- Add CPPC FFH support using arm64 AMU counters.
- set_fs() removal on arm64. This renders the User Access Override
(UAO) ARMv8 feature unnecessary.
- Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs
identifier file for SMMUv3, stop event counters support for i.MX8MP,
enable the perf events-based hard lockup detector.
- Reorganise the kernel VA space slightly so that 52-bit VA
configurations can use more virtual address space.
- Improve the robustness of the arm64 memory offline event notifier.
- Pad the Image header to 64K following the EFI header definition
updated recently to increase the section alignment to 64K.
- Support CONFIG_CMDLINE_EXTEND on arm64.
- Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8
bits for PtrAuth.
- Switch to vmapped shadow call stacks.
- Miscellaneous clean-ups.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
perf/imx_ddr: Add system PMU identifier for userspace
bindings: perf: imx-ddr: add compatible string
arm64: Fix build failure when HARDLOCKUP_DETECTOR_PERF is enabled
arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
arm64: mark __system_matches_cap as __maybe_unused
arm64: uaccess: remove vestigal UAO support
arm64: uaccess: remove redundant PAN toggling
arm64: uaccess: remove addr_limit_user_check()
arm64: uaccess: remove set_fs()
arm64: uaccess cleanup macro naming
arm64: uaccess: split user/kernel routines
arm64: uaccess: refactor __{get,put}_user
arm64: uaccess: simplify __copy_user_flushcache()
arm64: uaccess: rename privileged uaccess routines
arm64: sdei: explicitly simulate PAN/UAO entry
arm64: sdei: move uaccess logic to arch/arm64/
arm64: head.S: always initialize PSTATE
arm64: head.S: cleanup SCTLR_ELx initialization
arm64: head.S: rename el2_setup -> init_kernel_el
arm64: add C wrappers for SET_PSTATE_*()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov:
"Two x86 build fixes:
- Fix the vmlinux size check on 64-bit along with adding useful
clarifications on the topic (Arvind Sankar)
- Remove -m16 workaround now that the GCC versions that need it are
unsupported (Nick Desaulniers)"
* tag 'x86_build_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Remove -m16 workaround for unsupported versions of GCC
x86/build: Fix vmlinux size check on 64-bit
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource control updates from Borislav Petkov:
- add logic to correct MBM total and local values fixing errata SKX99
and BDF102 (Fenghua Yu)
- cleanups
* tag 'x86_cache_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Clean up unused function parameter in rmdir path
x86/resctrl: Constify kernfs_ops
x86/resctrl: Correct MBM total and local values
Documentation/x86: Rename resctrl_ui.rst and add two errata to the file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
"Another branch with a nicely negative diffstat, just the way I
like 'em:
- Remove all uses of TIF_IA32 and TIF_X32 and reclaim the two bits in
the end (Gabriel Krisman Bertazi)
- All kinds of minor cleanups all over the tree"
* tag 'x86_cleanups_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/ia32_signal: Propagate __user annotation properly
x86/alternative: Update text_poke_bp() kernel-doc comment
x86/PCI: Make a kernel-doc comment a normal one
x86/asm: Drop unused RDPID macro
x86/boot/compressed/64: Use TEST %reg,%reg instead of CMP $0,%reg
x86/head64: Remove duplicate include
x86/mm: Declare 'start' variable where it is used
x86/head/64: Remove unused GET_CR2_INTO() macro
x86/boot: Remove unused finalize_identity_maps()
x86/uaccess: Document copy_from_user_nmi()
x86/dumpstack: Make show_trace_log_lvl() static
x86/mtrr: Fix a kernel-doc markup
x86/setup: Remove unused MCA variables
x86, libnvdimm/test: Remove COPY_MC_TEST
x86: Reclaim TIF_IA32 and TIF_X32
x86/mm: Convert mmu context ia32_compat into a proper flags field
x86/elf: Use e_machine to check for x32/ia32 in setup_additional_pages()
elf: Expose ELF header on arch_setup_additional_pages()
x86/elf: Use e_machine to select start_thread for x32
elf: Expose ELF header in compat_start_thread()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
"The main part of this branch is the ongoing fight against windmills in
an attempt to have userspace tools not poke at naked MSRs.
This round deals with MSR_IA32_ENERGY_PERF_BIAS and removes direct
poking into it by our in-tree tools in favor of the proper
"energy_perf_bias" sysfs interface which we already have.
In addition, the msr.ko write filtering's error message points to a
new summary page which contains the info we collected from helpful
reporters about which userspace tools write MSRs:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about
along with the current status of their conversion.
The rest is the usual small fixes and improvements"
* tag 'x86_misc_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/msr: Add a pointer to an URL which contains further details
x86/pci: Fix the function type for check_reserved_t
selftests/x86: Add missing .note.GNU-stack sections
selftests/x86/fsgsbase: Fix GS == 1, 2, and 3 tests
x86/msr: Downgrade unrecognized MSR message
x86/msr: Do not allow writes to MSR_IA32_ENERGY_PERF_BIAS
tools/power/x86_energy_perf_policy: Read energy_perf_bias from sysfs
tools/power/turbostat: Read energy_perf_bias from sysfs
tools/power/cpupower: Read energy_perf_bias from sysfs
MAINTAINERS: Cleanup SGI-related entries
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Borislav Petkov:
- add a new uv_sysfs driver and expose read-only information from UV
BIOS (Justin Ernst and Mike Travis)
- the usual set of small fixes
* tag 'x86_platform_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Update sysfs documentation
x86/platform/uv: Add deprecated messages to /proc info leaves
x86/platform/uv: Add sysfs hubless leaves
x86/platform/uv: Add sysfs leaves to replace those in procfs
x86/platform/uv: Add kernel interfaces for obtaining system info
x86/platform/uv: Make uv_pcibus_kset and uv_hubs_kset static
x86/platform/uv: Fix an error code in uv_hubs_init()
x86/platform/uv: Update MAINTAINERS for uv_sysfs driver
x86/platform/uv: Update ABI documentation of /sys/firmware/sgi_uv/
x86/platform/uv: Add new uv_sysfs platform driver
x86/platform/uv: Add and export uv_bios_* functions
x86/platform/uv: Remove existing /sys/firmware/sgi_uv/ interface
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
"Only AMD-specific changes this time:
- Save the AMD physical die ID into cpuinfo_x86.cpu_die_id and
convert all code to use it (Yazen Ghannam)
- Remove a dead and unused TSEG region remapping workaround on AMD
(Arvind Sankar)"
* tag 'x86_cpu_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Remove dead code for TSEG region remapping
x86/topology: Set cpu_die_id only if DIE_TYPE found
EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
x86/CPU/AMD: Remove amd_get_nb_id()
x86/CPU/AMD: Save AMD NodeId as cpu_die_id
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGC support from Borislav Petkov:
"Intel Software Guard eXtensions enablement. This has been long in the
making, we were one revision number short of 42. :)
Intel SGX is new hardware functionality that can be used by
applications to populate protected regions of user code and data
called enclaves. Once activated, the new hardware protects enclave
code and data from outside access and modification.
Enclaves provide a place to store secrets and process data with those
secrets. SGX has been used, for example, to decrypt video without
exposing the decryption keys to nosy debuggers that might be used to
subvert DRM. Software has generally been rewritten specifically to run
in enclaves, but there are also projects that try to run limited
unmodified software in enclaves.
Most of the functionality is concentrated into arch/x86/kernel/cpu/sgx/
except the addition of a new mprotect() hook to control enclave page
permissions and support for vDSO exceptions fixup which will is used
by SGX enclaves.
All this work by Sean Christopherson, Jarkko Sakkinen and many others"
* tag 'x86_sgx_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86/sgx: Return -EINVAL on a zero length buffer in sgx_ioc_enclave_add_pages()
x86/sgx: Fix a typo in kernel-doc markup
x86/sgx: Fix sgx_ioc_enclave_provision() kernel-doc comment
x86/sgx: Return -ERESTARTSYS in sgx_ioc_enclave_add_pages()
selftests/sgx: Use a statically generated 3072-bit RSA key
x86/sgx: Clarify 'laundry_list' locking
x86/sgx: Update MAINTAINERS
Documentation/x86: Document SGX kernel architecture
x86/sgx: Add ptrace() support for the SGX driver
x86/sgx: Add a page reclaimer
selftests/x86: Add a selftest for SGX
x86/vdso: Implement a vDSO for Intel SGX enclave call
x86/traps: Attempt to fixup exceptions in vDSO before signaling
x86/fault: Add a helper function to sanitize error code
x86/vdso: Add support for exception fixup in vDSO functions
x86/sgx: Add SGX_IOC_ENCLAVE_PROVISION
x86/sgx: Add SGX_IOC_ENCLAVE_INIT
x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES
x86/sgx: Add SGX_IOC_ENCLAVE_CREATE
x86/sgx: Add an SGX misc driver interface
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loader update from Borislav Petkov:
"This one wins the award for most boring pull request ever. But that's
a good thing - this is how I like 'em and the microcode loader
*should* be boring. :-)
A single cleanup removing "break" after a return statement (Tom Rix)"
* tag 'x86_microcode_update_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/amd: Remove unneeded break
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Enable additional logging mode on older Xeons (Tony Luck)
- Pass error records logged by firmware through the MCE decoding chain
to provide human-readable error descriptions instead of raw values
(Smita Koralahalli)
- Some #MC handler fixes (Gabriele Paoloni)
- The usual small fixes and cleanups all over.
* tag 'ras_updates_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Rename kill_it to kill_current_task
x86/mce: Remove redundant call to irq_work_queue()
x86/mce: Panic for LMCE only if mca_cfg.tolerant < 3
x86/mce: Move the mce_panic() call and 'kill_it' assignments to the right places
x86/mce, cper: Pass x86 CPER through the MCA handling chain
x86/mce: Use "safe" MSR functions when enabling additional error logging
x86/mce: Correct the detection of invalid notifier priorities
x86/mce: Assign boolean values to a bool variable
x86/mce: Enable additional error logging on certain Intel CPUs
x86/mce: Remove unneeded break
|
|
Commit
7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
changed the padding bytes between functions from NOP to INT3. However,
when optprobe decodes a target function it finds INT3 and gives up the
jump optimization.
Instead of giving up any INT3 detection, check whether the rest of the
bytes to the end of the function are INT3. If all of them are INT3,
those come from the linker. In that case, continue the optprobe jump
optimization.
[ bp: Massage commit message. ]
Fixes: 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
Reported-by: Adam Zabrocki <pi3@pi3.com.pl>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160767025681.3880685.16021570341428835411.stgit@devnote2
|
|
For SEV, all DMA to and from guest has to use shared (un-encrypted) pages.
SEV uses SWIOTLB to make this happen without requiring changes to device
drivers. However, depending on the workload being run, the default 64MB
of it might not be enough and it may run out of buffers to use for DMA,
resulting in I/O errors and/or performance degradation for high
I/O workloads.
Adjust the default size of SWIOTLB for SEV guests using a
percentage of the total memory available to guest for the SWIOTLB buffers.
Adds a new sev_setup_arch() function which is invoked from setup_arch()
and it calls into a new swiotlb generic code function swiotlb_adjust_size()
to do the SWIOTLB buffer adjustment.
v5 fixed build errors and warnings as
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
The value freq_max/freq_base is a fundamental component of frequency
invariance calculations. It may come from a variety of sources such as MSRs
or ACPI data, tracking it down when troubleshooting a system could be
non-trivial. It is worth saving it in the kernel logs.
# dmesg | grep 'Estimated ratio of average max'
[ 14.024036] smpboot: Estimated ratio of average max frequency by base frequency (times 1024): 1289
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201112182614.10700-4-ggherdovich@suse.cz
|
|
AMD EPYC
Frequency invariant accounting calculations need the ratio
freq_curr/freq_max, but freq_max is unknown as it depends on dynamic power
allocation between cores: AMD EPYC CPUs implement "Core Performance Boost".
Three candidates are considered to estimate this value:
- maximum non-boost frequency
- maximum boost frequency
- the mid point between the above two
Experimental data on an AMD EPYC Zen2 machine slightly favors the third
option, which is applied with this patch.
The analysis uses the ondemand cpufreq governor as baseline, and compares
it with schedutil in a number of configurations. Using the freq_max value
described above offers a moderate advantage in performance and efficiency:
sugov-max (freq_max=max_boost) performs the worst on tbench: less
throughput and reduced efficiency than the other invariant-schedutil
options (see "Data Overview" below). Consider that tbench is generally a
problematic case as no schedutil version currently is better than ondemand.
sugov-P0 (freq_max=max_P) is the worst on dbench, while the other sugov's
can surpass ondemand with less filesystem latency and slightly increased
efficiency.
1. DATA OVERVIEW
2. DETAILED PERFORMANCE TABLES
3. POWER CONSUMPTION TABLE
1. DATA OVERVIEW
================
sugov-noinv : non-invariant schedutil governor
sugov-max : invariant schedutil, freq_max=max_boost
sugov-mid : invariant schedutil, freq_max=midpoint
sugov-P0 : invariant schedutil, freq_max=max_P
perfgov : performance governor
driver : acpi_cpufreq
machine : AMD EPYC 7742 (Zen2, aka "Rome"), dual socket,
128 cores / 256 threads, SATA SSD storage, 250G of memory,
XFS filesystem
Benchmarks are described in the next section.
Tilde (~) means the value is the same as baseline.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ondemand perfgov sugov-noinv sugov-max sugov-mid sugov-P0 better if
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PERFORMANCE RATIOS
tbench 1.00 1.44 0.90 0.87 0.93 0.93 higher
dbench 1.00 0.91 0.95 0.94 0.94 1.06 lower
kernbench 1.00 0.93 ~ ~ ~ 0.97 lower
gitsource 1.00 0.66 0.97 0.96 ~ 0.95 lower
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PERFORMANCE-PER-WATT RATIOS
tbench 1.00 1.16 0.84 0.84 0.88 0.85 higher
dbench 1.00 1.03 1.02 1.02 1.02 0.93 higher
kernbench 1.00 1.05 ~ ~ ~ ~ higher
gitsource 1.00 1.46 1.04 1.04 ~ 1.05 higher
2. DETAILED PERFORMANCE TABLES
==============================
Benchmark : tbench4 (i.e. dbench4 over the network, actually loopback)
Varying parameter : number of clients
Unit : MB/sec (higher is better)
5.9.0-ondemand (BASELINE) 5.9.0-perfgov 5.9.0-sugov-noinv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hmean 1 427.19 +- 0.16% ( ) 778.35 +- 0.10% ( 82.20%) 346.92 +- 0.14% ( -18.79%)
Hmean 2 853.82 +- 0.09% ( ) 1536.23 +- 0.03% ( 79.93%) 694.36 +- 0.05% ( -18.68%)
Hmean 4 1657.54 +- 0.12% ( ) 2938.18 +- 0.12% ( 77.26%) 1362.81 +- 0.11% ( -17.78%)
Hmean 8 3301.87 +- 0.06% ( ) 5679.10 +- 0.04% ( 72.00%) 2693.35 +- 0.04% ( -18.43%)
Hmean 16 6139.65 +- 0.05% ( ) 9498.81 +- 0.04% ( 54.71%) 4889.97 +- 0.17% ( -20.35%)
Hmean 32 11170.28 +- 0.09% ( ) 17393.25 +- 0.08% ( 55.71%) 9104.55 +- 0.09% ( -18.49%)
Hmean 64 19322.97 +- 0.17% ( ) 31573.91 +- 0.08% ( 63.40%) 18552.52 +- 0.40% ( -3.99%)
Hmean 128 30383.71 +- 0.11% ( ) 37416.91 +- 0.15% ( 23.15%) 25938.70 +- 0.41% ( -14.63%)
Hmean 256 31143.96 +- 0.41% ( ) 30908.76 +- 0.88% ( -0.76%) 29754.32 +- 0.24% ( -4.46%)
Hmean 512 30858.49 +- 0.26% ( ) 38524.60 +- 1.19% ( 24.84%) 42080.39 +- 0.56% ( 36.37%)
Hmean 1024 39187.37 +- 0.19% ( ) 36213.86 +- 0.26% ( -7.59%) 39555.98 +- 0.12% ( 0.94%)
5.9.0-sugov-max 5.9.0-sugov-mid 5.9.0-sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hmean 1 352.59 +- 1.03% ( -17.46%) 352.08 +- 0.75% ( -17.58%) 352.31 +- 1.48% ( -17.53%)
Hmean 2 697.32 +- 0.08% ( -18.33%) 700.16 +- 0.20% ( -18.00%) 696.79 +- 0.06% ( -18.39%)
Hmean 4 1369.88 +- 0.04% ( -17.35%) 1369.72 +- 0.07% ( -17.36%) 1365.91 +- 0.05% ( -17.59%)
Hmean 8 2696.79 +- 0.04% ( -18.33%) 2711.06 +- 0.04% ( -17.89%) 2715.10 +- 0.61% ( -17.77%)
Hmean 16 4725.03 +- 0.03% ( -23.04%) 4875.65 +- 0.02% ( -20.59%) 4953.05 +- 0.28% ( -19.33%)
Hmean 32 9231.65 +- 0.10% ( -17.36%) 8704.89 +- 0.27% ( -22.07%) 10562.02 +- 0.36% ( -5.45%)
Hmean 64 15364.27 +- 0.19% ( -20.49%) 17786.64 +- 0.15% ( -7.95%) 19665.40 +- 0.22% ( 1.77%)
Hmean 128 42100.58 +- 0.13% ( 38.56%) 34946.28 +- 0.13% ( 15.02%) 38635.79 +- 0.06% ( 27.16%)
Hmean 256 30660.23 +- 1.08% ( -1.55%) 32307.67 +- 0.54% ( 3.74%) 31153.27 +- 0.12% ( 0.03%)
Hmean 512 24604.32 +- 0.14% ( -20.27%) 40408.50 +- 1.10% ( 30.95%) 38800.29 +- 1.23% ( 25.74%)
Hmean 1024 35535.47 +- 0.28% ( -9.32%) 41070.38 +- 2.56% ( 4.81%) 31308.29 +- 2.52% ( -20.11%)
Benchmark : dbench (filesystem stressor)
Varying parameter : number of clients
Unit : seconds (lower is better)
NOTE-1: This dbench version measures the average latency of a set of filesystem
operations, as we found the traditional dbench metric (throughput) to be
misleading.
NOTE-2: Due to high variability, we partition the original dataset and apply
statistical bootrapping (a resampling method). Accuracy is reported in the
form of 95% confidence intervals.
5.9.0-ondemand (BASELINE) 5.9.0-perfgov 5.9.0-sugov-noinv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SubAmean 1 98.79 +- 0.92 ( ) 83.36 +- 0.82 ( 15.62%) 84.82 +- 0.92 ( 14.14%)
SubAmean 2 116.00 +- 0.89 ( ) 102.12 +- 0.77 ( 11.96%) 109.63 +- 0.89 ( 5.49%)
SubAmean 4 149.90 +- 1.03 ( ) 132.12 +- 0.91 ( 11.86%) 143.90 +- 1.15 ( 4.00%)
SubAmean 8 182.41 +- 1.13 ( ) 159.86 +- 0.93 ( 12.36%) 165.82 +- 1.03 ( 9.10%)
SubAmean 16 237.83 +- 1.23 ( ) 219.46 +- 1.14 ( 7.72%) 229.28 +- 1.19 ( 3.59%)
SubAmean 32 334.34 +- 1.49 ( ) 309.94 +- 1.42 ( 7.30%) 321.19 +- 1.36 ( 3.93%)
SubAmean 64 576.61 +- 2.16 ( ) 540.75 +- 2.00 ( 6.22%) 551.27 +- 1.99 ( 4.39%)
SubAmean 128 1350.07 +- 4.14 ( ) 1205.47 +- 3.20 ( 10.71%) 1280.26 +- 3.75 ( 5.17%)
SubAmean 256 3444.42 +- 7.97 ( ) 3698.00 +- 27.43 ( -7.36%) 3494.14 +- 7.81 ( -1.44%)
SubAmean 2048 39457.89 +- 29.01 ( ) 34105.33 +- 41.85 ( 13.57%) 39688.52 +- 36.26 ( -0.58%)
5.9.0-sugov-max 5.9.0-sugov-mid 5.9.0-sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SubAmean 1 85.68 +- 1.04 ( 13.27%) 84.16 +- 0.84 ( 14.81%) 83.99 +- 0.90 ( 14.99%)
SubAmean 2 108.42 +- 0.95 ( 6.54%) 109.91 +- 1.39 ( 5.24%) 112.06 +- 0.91 ( 3.39%)
SubAmean 4 136.90 +- 1.04 ( 8.67%) 137.59 +- 0.93 ( 8.21%) 136.55 +- 0.95 ( 8.91%)
SubAmean 8 163.15 +- 0.96 ( 10.56%) 166.07 +- 1.02 ( 8.96%) 165.81 +- 0.99 ( 9.10%)
SubAmean 16 224.86 +- 1.12 ( 5.45%) 223.83 +- 1.06 ( 5.89%) 230.66 +- 1.19 ( 3.01%)
SubAmean 32 320.51 +- 1.38 ( 4.13%) 322.85 +- 1.49 ( 3.44%) 321.96 +- 1.46 ( 3.70%)
SubAmean 64 553.25 +- 1.93 ( 4.05%) 554.19 +- 2.08 ( 3.89%) 562.26 +- 2.22 ( 2.49%)
SubAmean 128 1264.35 +- 3.72 ( 6.35%) 1256.99 +- 3.46 ( 6.89%) 2018.97 +- 18.79 ( -49.55%)
SubAmean 256 3466.25 +- 8.25 ( -0.63%) 3450.58 +- 8.44 ( -0.18%) 5032.12 +- 38.74 ( -46.09%)
SubAmean 2048 39133.10 +- 45.71 ( 0.82%) 39905.95 +- 34.33 ( -1.14%) 53811.86 +-193.04 ( -36.38%)
Benchmark : kernbench (kernel compilation)
Varying parameter : number of jobs
Unit : seconds (lower is better)
5.9.0-ondemand (BASELINE) 5.9.0-perfgov 5.9.0-sugov-noinv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean 2 471.71 +- 26.61% ( ) 409.88 +- 16.99% ( 13.11%) 430.63 +- 0.18% ( 8.71%)
Amean 4 211.87 +- 0.58% ( ) 194.03 +- 0.74% ( 8.42%) 215.33 +- 0.64% ( -1.63%)
Amean 8 109.79 +- 1.27% ( ) 101.43 +- 1.53% ( 7.61%) 111.05 +- 1.95% ( -1.15%)
Amean 16 59.50 +- 1.28% ( ) 55.61 +- 1.35% ( 6.55%) 59.65 +- 1.78% ( -0.24%)
Amean 32 34.94 +- 1.22% ( ) 32.36 +- 1.95% ( 7.41%) 35.44 +- 0.63% ( -1.43%)
Amean 64 22.58 +- 0.38% ( ) 20.97 +- 1.28% ( 7.11%) 22.41 +- 1.73% ( 0.74%)
Amean 128 17.72 +- 0.44% ( ) 16.68 +- 0.32% ( 5.88%) 17.65 +- 0.96% ( 0.37%)
Amean 256 16.44 +- 0.53% ( ) 15.76 +- 0.32% ( 4.18%) 16.76 +- 0.60% ( -1.93%)
Amean 512 16.54 +- 0.21% ( ) 15.62 +- 0.41% ( 5.53%) 16.84 +- 0.85% ( -1.83%)
5.9.0-sugov-max 5.9.0-sugov-mid 5.9.0-sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean 2 421.30 +- 0.24% ( 10.69%) 419.26 +- 0.15% ( 11.12%) 414.38 +- 0.33% ( 12.15%)
Amean 4 217.81 +- 5.53% ( -2.80%) 211.63 +- 0.99% ( 0.12%) 208.43 +- 0.47% ( 1.63%)
Amean 8 108.80 +- 0.43% ( 0.90%) 108.48 +- 1.44% ( 1.19%) 108.59 +- 3.08% ( 1.09%)
Amean 16 58.84 +- 0.74% ( 1.12%) 58.37 +- 0.94% ( 1.91%) 57.78 +- 0.78% ( 2.90%)
Amean 32 34.04 +- 2.00% ( 2.59%) 34.28 +- 1.18% ( 1.91%) 33.98 +- 2.21% ( 2.75%)
Amean 64 22.22 +- 1.69% ( 1.60%) 22.27 +- 1.60% ( 1.38%) 22.25 +- 1.41% ( 1.47%)
Amean 128 17.55 +- 0.24% ( 0.97%) 17.53 +- 0.94% ( 1.04%) 17.49 +- 0.43% ( 1.30%)
Amean 256 16.51 +- 0.46% ( -0.40%) 16.48 +- 0.48% ( -0.19%) 16.44 +- 1.21% ( 0.00%)
Amean 512 16.50 +- 0.35% ( 0.19%) 16.35 +- 0.42% ( 1.14%) 16.37 +- 0.33% ( 0.99%)
Benchmark : gitsource (time to run the git unit test suite)
Varying parameter : none
Unit : seconds (lower is better)
5.9.0-ondemand (BASELINE) 5.9.0-perfgov 5.9.0-sugov-noinv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean 1035.76 +- 0.30% ( ) 688.21 +- 0.04% ( 33.56%) 1003.85 +- 0.14% ( 3.08%)
5.9.0-sugov-max 5.9.0-sugov-mid 5.9.0-sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean 995.82 +- 0.08% ( 3.86%) 1011.98 +- 0.03% ( 2.30%) 986.87 +- 0.19% ( 4.72%)
3. POWER CONSUMPTION TABLE
==========================
Average power consumption (watts).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ondemand perfgov sugov-noinv sugov-max sugov-mid sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
tbench4 227.25 281.83 244.17 236.76 241.50 247.99
dbench4 151.97 161.87 157.08 158.10 158.06 153.73
kernbench 162.78 167.22 162.90 164.19 164.65 164.72
gitsource 133.65 139.00 133.04 134.43 134.18 134.32
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201112182614.10700-3-ggherdovich@suse.cz
|
|
This is the first pass in creating the ability to calculate the
frequency invariance on AMD systems. This approach uses the CPPC
highest performance and nominal performance values that range from
0 - 255 instead of a high and base frquency. This is because we do
not have the ability on AMD to get a highest frequency value.
On AMD systems the highest performance and nominal performance
vaues do correspond to the highest and base frequencies for the system
so using them should produce an appropriate ratio but some tweaking
is likely necessary.
Due to CPPC being initialized later in boot than when the frequency
invariant calculation is currently made, I had to create a callback
from the CPPC init code to do the calculation after we have CPPC
data.
Special thanks to "kernel test robot <lkp@intel.com>" for reporting that
compilation of drivers/acpi/cppc_acpi.c is conditional to
CONFIG_ACPI_CPPC_LIB, not just CONFIG_ACPI.
[ ggherdovich@suse.cz: made safe under CPU hotplug, edited changelog. ]
Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201112182614.10700-2-ggherdovich@suse.cz
|
|
Mark tripped over the creative irqflags handling in the IO-APIC timer
delivery check which ends up doing:
local_irq_save(flags);
local_irq_enable();
local_irq_restore(flags);
which triggered a new consistency check he's working on required for
replacing the POPF based restore with a conditional STI.
That code is a historical mess and none of this is needed. Make it
straightforward use local_irq_disable()/enable() as that's all what is
required. It is invoked from interrupt enabled code nowadays.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/87k0tpju47.fsf@nanos.tec.linutronix.de
|
|
Prarit reported that depending on the affinity setting the
' irq $N: Affinity broken due to vector space exhaustion.'
message is showing up in dmesg, but the vector space on the CPUs in the
affinity mask is definitely not exhausted.
Shung-Hsi provided traces and analysis which pinpoints the problem:
The ordering of trying to assign an interrupt vector in
assign_irq_vector_any_locked() is simply wrong if the interrupt data has a
valid node assigned. It does:
1) Try the intersection of affinity mask and node mask
2) Try the node mask
3) Try the full affinity mask
4) Try the full online mask
Obviously #2 and #3 are in the wrong order as the requested affinity
mask has to take precedence.
In the observed cases #1 failed because the affinity mask did not contain
CPUs from node 0. That made it allocate a vector from node 0, thereby
breaking affinity and emitting the misleading message.
Revert the order of #2 and #3 so the full affinity mask without the node
intersection is tried before actually affinity is broken.
If no node is assigned then only the full affinity mask and if that fails
the full online mask is tried.
Fixes: d6ffc6ac83b1 ("x86/vector: Respect affinity mask in irq descriptor")
Reported-by: Prarit Bhargava <prarit@redhat.com>
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87ft4djtyp.fsf@nanos.tec.linutronix.de
|
|
The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().
The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:
rdtgroup_mondata_show()
mon_event_read()
mon_event_count()
__mon_event_count()
In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.
When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.
Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.
Test steps:
# Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
&& make
./tools/membw/membw -c 0 -b 10000 --read
# Enable MBA software controller
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
# Create control group c1
mkdir /sys/fs/resctrl/c1
# Set MB throttle to 6 GB/s
echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata
# Write PID of the workload to tasks file
echo `pidof membw` > /sys/fs/resctrl/c1/tasks
# Read local bytes counters twice with 1s interval, the calculated
# local bandwidth is not as expected (approaching to 6 GB/s):
local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
sleep 1
local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
echo "local b/w (bytes/s):" `expr $local_2 - $local_1`
Before fix:
local b/w (bytes/s): 11076796416
After fix:
local b/w (bytes/s): 5465014272
Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
|
|
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://github.com/KSPP/linux/issues/115
|
|
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://github.com/KSPP/linux/issues/115
|
|
Fix to restore BTF if single-stepping causes a page fault and
it is cancelled.
Usually the BTF flag was restored when the single stepping is done
(in resume_execution()). However, if a page fault happens on the
single stepping instruction, the fault handler is invoked and
the single stepping is cancelled. Thus, the BTF flag is not
restored.
Fixes: 1ecc798c6764 ("x86: debugctlmsr kprobes")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/160389546985.106936.12727996109376240993.stgit@devnote2
|
|
Commit
26bfa5f89486 ("x86, amd: Cleanup init_amd")
moved the code that remaps the TSEG region using 4k pages from
init_amd() to bsp_init_amd().
However, bsp_init_amd() is executed well before the direct mapping is
actually created:
setup_arch()
-> early_cpu_init()
-> early_identify_cpu()
-> this_cpu->c_bsp_init()
-> bsp_init_amd()
...
-> init_mem_mapping()
So the change effectively disabled the 4k remapping, because
pfn_range_is_mapped() is always false at this point.
It has been over six years since the commit, and no-one seems to have
noticed this, so just remove the code. The original code was also
incomplete, since it doesn't check how large the TSEG address range
actually is, so it might remap only part of it in any case.
Hygon has copied the incorrect version, so the code has never run on it
since the cpu support was added two years ago. Remove it from there as
well.
Committer notes:
This workaround is incomplete anyway:
1. The code must check MSRC001_0113.TValid (SMM TSeg Mask MSR) first, to
check whether the TSeg address range is enabled.
2. The code must check whether the range is not 2M aligned - if it is,
there's nothing to work around.
3. In all the BIOSes tested, the TSeg range is in a e820 reserved area
and those are not mapped anymore, after
66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM")
which means, there's nothing to be worked around either.
So let's rip it out.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201127171324.1846019-1-nivedita@alum.mit.edu
|
|
After having collected the majority of reports about MSRs being written
by userspace tools and what tools those are, and all newer reports
mostly repeating, add an URL where detailed information is gathered and
kept up-to-date.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201205002825.19107-1-bp@alien8.de
|
|
Add "deprecated" message to any access to old /proc/sgi_uv/* leaves.
[ bp: Do not have a trailing function opening brace and the arguments
continuing on the next line and align them on the opening brace. ]
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lkml.kernel.org/r/20201128034227.120869-5-mike.travis@hpe.com
|
|
Add kernel interfaces used to obtain info for the uv_sysfs driver
to display.
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lkml.kernel.org/r/20201128034227.120869-2-mike.travis@hpe.com
|
|
Update kernel-doc parameter name after
c3d6324f841b ("x86/alternatives: Teach text_poke_bp() to emulate instructions")
changed the last parameter from @handler to @emulate.
[ bp: Make commit message more precise. ]
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201203145020.2441-1-hqjagain@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Make the AMD L3 QoS code and data priorization enable/disable
mechanism work correctly.
The control bit was only set/cleared on one of the CPUs in a L3
domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one
from Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes
the UV hubs to be identified wrongly.
The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the
uprobes code which fails to handle repeated prefixes properly.
The aggregate size of the prefixes can be larger than the bytes
array but the code blindly iterated over the aggregate size beyond
the array boundary. Add a macro to handle this case properly and
use it at the affected places"
* tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
x86/platform/uv: Fix UV4 hub revision adjustment
x86/resctrl: Fix AMD L3 QOS CDP enable/disable
|
|
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes.
Introduce a for_each_insn_prefix() macro for this purpose. Debugged by
Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message, sync with the respective header in tools/
and drop "we". ]
Fixes: 2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697103739.3146288.7437620795200799020.stgit@devnote2
|
|
The sgx_enclave_add_pages.length field is documented as
* @length: length of the data (multiple of the page size)
Fail with -EINVAL, when the caller gives a zero length buffer of data
to be added as pages to an enclave. Right now 'ret' is returned as
uninitialized in that case.
[ bp: Flesh out commit message. ]
Fixes: c6d26d370767 ("x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/linux-sgx/X8ehQssnslm194ld@mwanda/
Link: https://lkml.kernel.org/r/20201203183527.139317-1-jarkko@kernel.org
|
|
Currently, UV4 is incorrectly identified as UV4A and UV4A as UV5. Hub
chip starts with revision 1, fix it.
[ bp: Massage commit message. ]
Fixes: 647128f1536e ("x86/platform/uv: Update UV MMRs for UV5")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Link: https://lkml.kernel.org/r/20201203152252.371199-1-mike.travis@hpe.com
|
|
SYS_USER_DISPATCH will be triggered when a syscall is sent to userspace
by the Syscall User Dispatch mechanism. This adjusts eventual
BUILD_BUG_ON around the tree.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20201127193238.821364-3-krisman@collabora.com
|
|
Currently, if an MCE happens in user-mode or while the kernel is copying
data from user space, 'kill_it' is used to check if execution of the
interrupted task can be recovered or not; the flag name however is not
very meaningful, hence rename it to match its goal.
[ bp: Massage commit message, rename the queue_task_work() arg too. ]
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201127161819.3106432-6-gabriele.paoloni@intel.com
|
|
Currently, __mc_scan_banks() in do_machine_check() does the following
callchain:
__mc_scan_banks()->mce_log()->irq_work_queue(&mce_irq_work).
Hence, the call to irq_work_queue() below after __mc_scan_banks()
seems redundant. Just remove it.
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20201127161819.3106432-5-gabriele.paoloni@intel.com
|
|
Right now for LMCE, if no_way_out is set, mce_panic() is called
regardless of mca_cfg.tolerant. This is not correct as, if
mca_cfg.tolerant = 3, the code should never panic.
Add that check.
[ bp: use local ptr 'cfg'. ]
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20201127161819.3106432-4-gabriele.paoloni@intel.com
|
|
Right now, for local MCEs the machine calls panic(), if needed, right
after lmce is set. For MCE broadcasting, mce_reign() takes care of
calling mce_panic().
Hence:
- improve readability by moving the conditional evaluation of
tolerant up to when kill_it is set first;
- move the mce_panic() call up into the statement where mce_end()
fails.
[ bp: Massage, remove comment in the mce_end() failure case because it
is superfluous; use local ptr 'cfg' in both tests. ]
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20201127161819.3106432-3-gabriele.paoloni@intel.com
|
|
Merge the -rc6 tag to pick up dependent changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Commit
fd8d9db3559a ("x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak")
removed superfluous kernfs_get() calls in rdtgroup_ctrl_remove() and
rdtgroup_rmdir_ctrl(). That change resulted in an unused function
parameter to these two functions.
Clean up the unused function parameter in rdtgroup_ctrl_remove(),
rdtgroup_rmdir_mon() and their callers rdtgroup_rmdir_ctrl() and
rdtgroup_rmdir().
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1606759618-13181-1-git-send-email-xiaochen.shen@intel.com
|
|
Merge -rc6 tag to pick up dependent changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
When the AMD QoS feature CDP (code and data prioritization) is enabled
or disabled, the CDP bit in MSR 0000_0C81 is written on one of the CPUs
in an L3 domain (core complex). That is not correct - the CDP bit needs
to be updated on all the logical CPUs in the domain.
This was not spelled out clearly in the spec earlier. The specification
has been updated and the updated document, "AMD64 Technology Platform
Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
Date: October 2020" is available now. Refer the section: Code and Data
Prioritization.
Fix the issue by adding a new flag arch_has_per_cpu_cfg in rdt_cache
data structure.
The documentation can be obtained at:
https://developer.amd.com/wp-content/resources/56375.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
[ bp: Massage commit message. ]
Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/160675180380.15628.3309402017215002347.stgit@bmoger-ubuntu
|