summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Some highlights from this development cycle: 1) Big refactoring of ipv6 route and neigh handling to support nexthop objects configurable as units from userspace. From David Ahern. 2) Convert explored_states in BPF verifier into a hash table, significantly decreased state held for programs with bpf2bpf calls, from Alexei Starovoitov. 3) Implement bpf_send_signal() helper, from Yonghong Song. 4) Various classifier enhancements to mvpp2 driver, from Maxime Chevallier. 5) Add aRFS support to hns3 driver, from Jian Shen. 6) Fix use after free in inet frags by allocating fqdirs dynamically and reworking how rhashtable dismantle occurs, from Eric Dumazet. 7) Add act_ctinfo packet classifier action, from Kevin Darbyshire-Bryant. 8) Add TFO key backup infrastructure, from Jason Baron. 9) Remove several old and unused ISDN drivers, from Arnd Bergmann. 10) Add devlink notifications for flash update status to mlxsw driver, from Jiri Pirko. 11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski. 12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes. 13) Various enhancements to ipv6 flow label handling, from Eric Dumazet and Willem de Bruijn. 14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van der Merwe, and others. 15) Various improvements to axienet driver including converting it to phylink, from Robert Hancock. 16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean. 17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana Radulescu. 18) Add devlink health reporting to mlx5, from Moshe Shemesh. 19) Convert stmmac over to phylink, from Jose Abreu. 20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from Shalom Toledo. 21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera. 22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel. 23) Track spill/fill of constants in BPF verifier, from Alexei Starovoitov. 24) Support bounded loops in BPF, from Alexei Starovoitov. 25) Various page_pool API fixes and improvements, from Jesper Dangaard Brouer. 26) Just like ipv4, support ref-countless ipv6 route handling. From Wei Wang. 27) Support VLAN offloading in aquantia driver, from Igor Russkikh. 28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy. 29) Add flower GRE encap/decap support to nfp driver, from Pieter Jansen van Vuuren. 30) Protect against stack overflow when using act_mirred, from John Hurley. 31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen. 32) Use page_pool API in netsec driver, Ilias Apalodimas. 33) Add Google gve network driver, from Catherine Sullivan. 34) More indirect call avoidance, from Paolo Abeni. 35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan. 36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek. 37) Add MPLS manipulation actions to TC, from John Hurley. 38) Add sending a packet to connection tracking from TC actions, and then allow flower classifier matching on conntrack state. From Paul Blakey. 39) Netfilter hw offload support, from Pablo Neira Ayuso" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits) net/mlx5e: Return in default case statement in tx_post_resync_params mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync(). net: dsa: add support for BRIDGE_MROUTER attribute pkt_sched: Include const.h net: netsec: remove static declaration for netsec_set_tx_de() net: netsec: remove superfluous if statement netfilter: nf_tables: add hardware offload support net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload net: flow_offload: add flow_block_cb_is_busy() and use it net: sched: remove tcf block API drivers: net: use flow block API net: sched: use flow block API net: flow_offload: add flow_block_cb_{priv, incref, decref}() net: flow_offload: add list handling functions net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free() net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND net: flow_offload: add flow_block_cb_setup_simple() net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC ...
2019-07-11Merge tag 'clone3-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull clone3 system call from Christian Brauner: "This adds the clone3 syscall which is an extensible successor to clone after we snagged the last flag with CLONE_PIDFD during the 5.2 merge window for clone(). It cleanly supports all of the flags from clone() and thus all legacy workloads. There are few user visible differences between clone3 and clone. First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse this flag. Second, the CSIGNAL flag is deprecated and will cause EINVAL to be reported. It is superseeded by a dedicated "exit_signal" argument in struct clone_args thus freeing up even more flags. And third, clone3 gives CLONE_PIDFD a dedicated return argument in struct clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr argument. The clone3 uapi is designed to be easy to handle on 32- and 64 bit: /* uapi */ struct clone_args { __aligned_u64 flags; __aligned_u64 pidfd; __aligned_u64 child_tid; __aligned_u64 parent_tid; __aligned_u64 exit_signal; __aligned_u64 stack; __aligned_u64 stack_size; __aligned_u64 tls; }; and a separate kernel struct is used that uses proper kernel typing: /* kernel internal */ struct kernel_clone_args { u64 flags; int __user *pidfd; int __user *child_tid; int __user *parent_tid; int exit_signal; unsigned long stack; unsigned long stack_size; unsigned long tls; }; The system call comes with a size argument which enables the kernel to detect what version of clone_args userspace is passing in. clone3 validates that any additional bytes a given kernel does not know about are set to zero and that the size never exceeds a page. A nice feature is that this patchset allowed us to cleanup and simplify various core kernel codepaths in kernel/fork.c by making the internal _do_fork() function take struct kernel_clone_args even for legacy clone(). This patch also unblocks the time namespace patchset which wants to introduce a new CLONE_TIMENS flag. Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and xtensa. These were the architectures that did not require special massaging. Other architectures treat fork-like system calls individually and after some back and forth neither Arnd nor I felt confident that we dared to add clone3 unconditionally to all architectures. We agreed to leave this up to individual architecture maintainers. This is why there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3 which any architecture can set once it has implemented support for clone3. The patch also adds a cond_syscall(clone3) for architectures such as nios2 or h8300 that generate their syscall table by simply including asm-generic/unistd.h. The hope is to get rid of __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon" * tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: arch: handle arches who do not yet define clone3 arch: wire-up clone3() syscall fork: add clone3
2019-07-10Merge tag 'pidfd-updates-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd updates from Christian Brauner: "This adds two main features. - First, it adds polling support for pidfds. This allows process managers to know when a (non-parent) process dies in a race-free way. The notification mechanism used follows the same logic that is currently used when the parent of a task is notified of a child's death. With this patchset it is possible to put pidfds in an {e}poll loop and get reliable notifications for process (i.e. thread-group) exit. - The second feature compliments the first one by making it possible to retrieve pollable pidfds for processes that were not created using CLONE_PIDFD. A lot of processes get created with traditional PID-based calls such as fork() or clone() (without CLONE_PIDFD). For these processes a caller can currently not create a pollable pidfd. This is a problem for Android's low memory killer (LMK) and service managers such as systemd. Both patchsets are accompanied by selftests. It's perhaps worth noting that the work done so far and the work done in this branch for pidfd_open() and polling support do already see some adoption: - Android is in the process of backporting this work to all their LTS kernels [1] - Service managers make use of pidfd_send_signal but will need to wait until we enable waiting on pidfds for full adoption. - And projects I maintain make use of both pidfd_send_signal and CLONE_PIDFD [2] and will use polling support and pidfd_open() too" [1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22 https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22 https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22 [2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753 * tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add pidfd_open() tests arch: wire-up pidfd_open() pid: add pidfd_open() pidfd: add polling selftests pidfd: add polling support
2019-07-09Merge tag 'docs-5.3' of git://git.lwn.net/linuxLinus Torvalds
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
2019-07-09Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x865 kdump updates from Thomas Gleixner: "Yet more kexec/kdump updates: - Properly support kexec when AMD's memory encryption (SME) is enabled - Pass reserved e820 ranges to the kexec kernel so both PCI and SME can work" * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active x86/kexec: Set the C-bit in the identity map page table when SEV is active x86/kexec: Do not map kexec area as decrypted when SEV is active x86/crash: Add e820 reserved ranges to kdump kernel's e820 table x86/mm: Rework ioremap resource mapping determination x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED x86/mm: Create a workarea in the kernel for SME early encryption x86/mm: Identify the end of the kernel area to be reserved
2019-07-09Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Thomas Gleixner: "Assorted updates to kexec/kdump: - Proper kexec support for 4/5-level paging and jumping from a 5-level to a 4-level paging kernel. - Make the EFI support for kexec/kdump more robust - Enforce that the GDT is properly aligned instead of getting the alignment by chance" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kdump/64: Restrict kdump kernel reservation to <64TB x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel x86/boot: Add xloadflags bits to check for 5-level paging support x86/boot: Make the GDT 8-byte aligned x86/kexec: Add the ACPI NVS region to the ident map x86/boot: Call get_rsdp_addr() after console_init() Revert "x86/boot: Disable RSDP parsing temporarily" x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels x86/kexec: Add the EFI system tables and ACPI tables to the ident map
2019-07-09Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle on the kernel side were: - CPU PMU and uncore driver updates to Intel Snow Ridge, IceLake, KabyLake, AmberLake and WhiskeyLake CPUs. - Rework the MSR probing infrastructure to make it more robust, make it work better on virtualized systems and to better expose it on sysfs. - Rework PMU attributes group support based on the feedback from Greg. The core sysfs patch that adds sysfs_update_groups() was acked by Greg. There's a lot of perf tooling changes as well, all around the place: - vendor updates to Intel, cs-etm (ARM), ARM64, s390, - various enhancements to Intel PT tooling support: - Improve CBR (Core to Bus Ratio) packets support. - Export power and ptwrite events to sqlite and postgresql. - Add support for decoding PEBS via PT packets. - Add support for samples to contain IPC ratio, collecting cycles information from CYC packets, showing the IPC info periodically - Allow using time ranges - lots of updates to perf pmu, perf stat, perf trace, eBPF support, perf record, perf diff, etc. - please see the shortlog and Git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (252 commits) tools arch x86: Sync asm/cpufeatures.h with the with the kernel tools build: Check if gettid() is available before providing helper perf jvmti: Address gcc string overflow warning for strncpy() perf python: Remove -fstack-protector-strong if clang doesn't have it perf annotate TUI browser: Do not use member from variable within its own initialization perf tests: Fix record+probe_libc_inet_pton.sh for powerpc64 perf evsel: Do not rely on errno values for precise_ip fallback perf thread: Allow references to thread objects after machine__exit() perf header: Assign proper ff->ph in perf_event__synthesize_features() tools arch kvm: Sync kvm headers with the kernel sources perf script: Allow specifying the files to process guest samples perf tools metric: Don't include duration_time in group perf list: Avoid extra : for --raw metrics perf vendor events intel: Metric fixes for SKX/CLX perf tools: Fix typos / broken sentences perf jevents: Add support for Hisi hip08 L3C PMU aliasing perf jevents: Add support for Hisi hip08 HHA PMU aliasing perf jevents: Add support for Hisi hip08 DDRC PMU aliasing perf pmu: Support more complex PMU event aliasing perf diff: Documentation -c cycles option ...
2019-07-08Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...
2019-07-08Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 5.3: API: - Test shash interface directly in testmgr - cra_driver_name is now mandatory Algorithms: - Replace arc4 crypto_cipher with library helper - Implement 5 way interleave for ECB, CBC and CTR on arm64 - Add xxhash - Add continuous self-test on noise source to drbg - Update jitter RNG Drivers: - Add support for SHA204A random number generator - Add support for 7211 in iproc-rng200 - Fix fuzz test failures in inside-secure - Fix fuzz test failures in talitos - Fix fuzz test failures in qat" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (143 commits) crypto: stm32/hash - remove interruptible condition for dma crypto: stm32/hash - Fix hmac issue more than 256 bytes crypto: stm32/crc32 - rename driver file crypto: amcc - remove memset after dma_alloc_coherent crypto: ccp - Switch to SPDX license identifiers crypto: ccp - Validate the the error value used to index error messages crypto: doc - Fix formatting of new crypto engine content crypto: doc - Add parameter documentation crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR crypto: arm64/aes-ce - add 5 way interleave routines crypto: talitos - drop icv_ool crypto: talitos - fix hash on SEC1. crypto: talitos - move struct talitos_edesc into talitos.h lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE crypto/NX: Set receive window credits to max number of CRBs in RxFIFO crypto: asymmetric_keys - select CRYPTO_HASH where needed crypto: serpent - mark __serpent_setkey_sbox noinline crypto: testmgr - dynamically allocate crypto_shash crypto: testmgr - dynamically allocate testvec_config crypto: talitos - eliminate unneeded 'done' functions at build time ...
2019-07-08Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Bug fixes, code clean up, and new features: - IMA policy rules can be defined in terms of LSM labels, making the IMA policy dependent on LSM policy label changes, in particular LSM label deletions. The new environment, in which IMA-appraisal is being used, frequently updates the LSM policy and permits LSM label deletions. - Prevent an mmap'ed shared file opened for write from also being mmap'ed execute. In the long term, making this and other similar changes at the VFS layer would be preferable. - The IMA per policy rule template format support is needed for a couple of new/proposed features (eg. kexec boot command line measurement, appended signatures, and VFS provided file hashes). - Other than the "boot-aggregate" record in the IMA measuremeent list, all other measurements are of file data. Measuring and storing the kexec boot command line in the IMA measurement list is the first buffer based measurement included in the measurement list" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: Introduce struct evm_xattr ima: Update MAX_TEMPLATE_NAME_LEN to fit largest reasonable definition KEXEC: Call ima_kexec_cmdline to measure the boot command line args IMA: Define a new template field buf IMA: Define a new hook to measure the kexec boot command line arguments IMA: support for per policy rule template formats integrity: Fix __integrity_init_keyring() section mismatch ima: Use designated initializers for struct ima_event_data ima: use the lsm policy update notifier LSM: switch to blocking policy update notifiers x86/ima: fix the Kconfig dependency for IMA_ARCH_POLICY ima: Make arch_policy_entry static ima: prevent a file already mmap'ed write to be mmap'ed execute x86/ima: check EFI SetupMode too
2019-07-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Merge branch 'x86-topology-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 topology updates from Ingo Molnar: "Implement multi-die topology support on Intel CPUs and expose the die topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui. These changes should have no effect on the kernel's existing understanding of topologies, i.e. there should be no behavioral impact on cache, NUMA, scheduler, perf and other topologies and overall system performance" * 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/rapl: Cosmetic rename internal variables in response to multi-die/pkg support perf/x86/intel/uncore: Cosmetic renames in response to multi-die/pkg support hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages thermal/x86_pkg_temp_thermal: Cosmetic: Rename internal variables to zones from packages perf/x86/intel/cstate: Support multi-die/package perf/x86/intel/rapl: Support multi-die/package perf/x86/intel/uncore: Support multi-die/package topology: Create core_cpus and die_cpus sysfs attributes topology: Create package_cpus sysfs attribute hwmon/coretemp: Support multi-die/package powercap/intel_rapl: Update RAPL domain name and debug messages thermal/x86_pkg_temp_thermal: Support multi-die/package powercap/intel_rapl: Support multi-die/package powercap/intel_rapl: Simplify rapl_find_package() x86/topology: Define topology_logical_die_id() x86/topology: Define topology_die_id() cpu/topology: Export die_id x86/topology: Create topology_max_die_per_package() x86/topology: Add CPUID.1F multi-die/package support
2019-07-08Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updayes from Ingo Molnar: "Most of the commits add ACRN hypervisor guest support, plus two cleanups" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/jailhouse: Mark jailhouse_x2apic_available() as __init x86/platform/geode: Drop <linux/gpio.h> includes x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector x86: Add support for Linux guests on an ACRN hypervisor x86/Kconfig: Add new X86_HV_CALLBACK_VECTOR config symbol
2019-07-08Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Ingo Molnar: "A handful of paravirt patching code enhancements to make it more robust against patching failures, and related cleanups and not so related cleanups - by Thomas Gleixner and myself" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/paravirt: Rename paravirt_patch_site::instrtype to paravirt_patch_site::type x86/paravirt: Standardize 'insn_buff' variable names x86/paravirt: Match paravirt patchlet field definition ordering to initialization ordering x86/paravirt: Replace the paravirt patch asm magic x86/paravirt: Unify the 32/64 bit paravirt patching code x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call() x86/paravirt: Detect over-sized patching bugs in paravirt_patch_insns() x86/paravirt: Remove bogus extern declarations
2019-07-08Merge branch 'x86-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 AVX512 status update from Ingo Molnar: "This adds a new ABI that the main scheduler probably doesn't want to deal with but HPC job schedulers might want to use: the AVX512_elapsed_ms field in the new /proc/<pid>/arch_status task status file, which allows the user-space job scheduler to cluster such tasks, to avoid turbo frequency drops" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/filesystems/proc.txt: Add arch_status file x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status proc: Add /proc/<pid>/arch_status
2019-07-08Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc small cleanups: removal of superfluous code and coding style cleanups mostly" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Make variable static and config dependent x86/defconfigs: Remove useless UEVENT_HELPER_PATH x86/amd_nb: Make hygon_nb_misc_ids static x86/tsc: Move inline keyword to the beginning of function declarations x86/io_delay: Define IO_DELAY macros in C instead of Kconfig x86/io_delay: Break instead of fallthrough in switch statement
2019-07-08Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource control update from Ingo Molnar: "Two cleanup patches" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Cleanup cbm_ensure_valid() x86/resctrl: Use _ASM_BX to avoid ifdeffery
2019-07-08Merge branch 'x86-build-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "Two kbuild enhancements by Masahiro Yamada" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Remove redundant 'clean-files += capflags.c' x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
2019-07-08Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "Most of the changes relate to Peter Zijlstra's cleanup of ptregs handling, in particular the i386 part is now much simplified and standardized - no more partial ptregs stack frames via the esp/ss oddity. This simplifies ftrace, kprobes, the unwinder, ptrace, kdump and kgdb. There's also a CR4 hardening enhancements by Kees Cook, to make the generic platform functions such as native_write_cr4() less useful as ROP gadgets that disable SMEP/SMAP. Also protect the WP bit of CR0 against similar attacks. The rest is smaller cleanups/fixes" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Add int3_emulate_call() selftest x86/stackframe/32: Allow int3_emulate_push() x86/stackframe/32: Provide consistent pt_regs x86/stackframe, x86/ftrace: Add pt_regs frame annotations x86/stackframe, x86/kprobes: Fix frame pointer annotations x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h x86/entry/32: Clean up return from interrupt preemption path x86/asm: Pin sensitive CR0 bits x86/asm: Pin sensitive CR4 bits Documentation/x86: Fix path to entry_32.S x86/asm: Remove unused TASK_TI_flags from asm-offsets.c
2019-07-08Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Remove the unused per rq load array and all its infrastructure, by Dietmar Eggemann. - Add utilization clamping support by Patrick Bellasi. This is a refinement of the energy aware scheduling framework with support for boosting of interactive and capping of background workloads: to make sure critical GUI threads get maximum frequency ASAP, and to make sure background processing doesn't unnecessarily move to cpufreq governor to higher frequencies and less energy efficient CPU modes. - Add the bare minimum of tracepoints required for LISA EAS regression testing, by Qais Yousef - which allows automated testing of various power management features, including energy aware scheduling. - Restructure the former tsk_nr_cpus_allowed() facility that the -rt kernel used to modify the scheduler's CPU affinity logic such as migrate_disable() - introduce the task->cpus_ptr value instead of taking the address of &task->cpus_allowed directly - by Sebastian Andrzej Siewior. - Misc optimizations, fixes, cleanups and small enhancements - see the Git log for details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) sched/uclamp: Add uclamp support to energy_compute() sched/uclamp: Add uclamp_util_with() sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks sched/uclamp: Set default clamps for RT tasks sched/uclamp: Reset uclamp values on RESET_ON_FORK sched/uclamp: Extend sched_setattr() to support utilization clamping sched/core: Allow sched_setattr() to use the current policy sched/uclamp: Add system default clamps sched/uclamp: Enforce last task's UCLAMP_MAX sched/uclamp: Add bucket local max tracking sched/uclamp: Add CPU's clamp buckets refcounting sched/fair: Rename weighted_cpuload() to cpu_runnable_load() sched/debug: Export the newly added tracepoints sched/debug: Add sched_overutilized tracepoint sched/debug: Add new tracepoint to track PELT at se level sched/debug: Add new tracepoints to track PELT at rq level sched/debug: Add a new sched_trace_*() helper functions sched/autogroup: Make autogroup_path() always available sched/wait: Deduplicate code with do-while sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity() ...
2019-07-08Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Boris is on vacation so I'm sending the RAS bits this time. The main changes were: - Various RAS/CEC improvements and fixes by Borislav Petkov: - error insertion fixes - offlining latency fix - memory leak fix - additional sanity checks - cleanups - debug output improvements - More SMCA enhancements by Yazen Ghannam: - make banks truly per-CPU which they are in the hardware - don't over-cache certain registers - make the number of MCA banks per-CPU variable The long term goal with these changes is to support future heterogenous SMCA extensions. - Misc fixes and improvements" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Do not check return value of debugfs_create functions x86/MCE: Determine MCA banks' init state properly x86/MCE: Make the number of MCA banks a per-CPU variable x86/MCE/AMD: Don't cache block addresses on SMCA systems x86/MCE: Make mce_banks a per-CPU array x86/MCE: Make struct mce_banks[] static RAS/CEC: Add copyright RAS/CEC: Add CONFIG_RAS_CEC_DEBUG and move CEC debug features there RAS/CEC: Dump the different array element sections RAS/CEC: Rename count_threshold to action_threshold RAS/CEC: Sanity-check array on every insertion RAS/CEC: Fix potential memory leak RAS/CEC: Do not set decay value on error RAS/CEC: Check count_threshold unconditionally RAS/CEC: Fix pfn insertion
2019-07-08Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - rwsem scalability improvements, phase #2, by Waiman Long, which are rather impressive: "On a 2-socket 40-core 80-thread Skylake system with 40 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255 After the patchset, they became: 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098" There's a lot of changes to the locking implementation that makes it similar to qrwlock, including owner handoff for more fair locking. Another microbenchmark shows how across the spectrum the improvements are: "With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 2-socket Skylake system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 2,618 4,193 4 1,202 3,726 8 802 3,622 16 729 3,359 32 319 2,826 64 102 2,744" The changes are extensive and the patch-set has been through several iterations addressing various locking workloads. There might be more regressions, but unless they are pathological I believe we want to use this new implementation as the baseline going forward. - jump-label optimizations by Daniel Bristot de Oliveira: the primary motivation was to remove IPI disturbance of isolated RT-workload CPUs, which resulted in the implementation of batched jump-label updates. Beyond the improvement of the real-time characteristics kernel, in one test this patchset improved static key update overhead from 57 msecs to just 1.4 msecs - which is a nice speedup as well. - atomic64_t cross-arch type cleanups by Mark Rutland: over the last ~10 years of atomic64_t existence the various types used by the APIs only had to be self-consistent within each architecture - which means they became wildly inconsistent across architectures. Mark puts and end to this by reworking all the atomic64 implementations to use 's64' as the base type for atomic64_t, and to ensure that this type is consistently used for parameters and return values in the API, avoiding further problems in this area. - A large set of small improvements to lockdep by Yuyang Du: type cleanups, output cleanups, function return type and othr cleanups all around the place. - A set of percpu ops cleanups and fixes by Peter Zijlstra. - Misc other changes - please see the Git log for more details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits) locking/lockdep: increase size of counters for lockdep statistics locking/atomics: Use sed(1) instead of non-standard head(1) option locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING x86/jump_label: Make tp_vec_nr static x86/percpu: Optimize raw_cpu_xchg() x86/percpu, sched/fair: Avoid local_clock() x86/percpu, x86/irq: Relax {set,get}_irq_regs() x86/percpu: Relax smp_processor_id() x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() locking/rwsem: Guard against making count negative locking/rwsem: Adaptive disabling of reader optimistic spinning locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Make rwsem->owner an atomic_long_t locking/rwsem: Enable readers spinning on writer locking/rwsem: Clarify usage of owner's nonspinaable bit locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: More optimal RT task handling of null owner locking/rwsem: Always release wait_lock before waking up tasks locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Make rwsem_spin_on_owner() return owner state ...
2019-07-08Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Thomas Gleixner: "The speculative paranoia departement delivers a few more plugs for possible (probably theoretical) spectre/mds leaks" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tls: Fix possible spectre-v1 in do_get_thread_area() x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg() x86/speculation/mds: Eliminate leaks by trace_hardirqs_on()
2019-07-08Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer updates from Thomas Gleixner: "A rather large series consolidating the HPET code, which was triggered by the attempt to bolt HPET NMI watchdog support on to the existing maze with the usual duct tape and super glue approach. This mainly removes two separate partially redundant storage layers and consolidates them into a single one which provides a consistent view of the different HPET channels and their usage and allows to integrate HPET NMI watchdog support (if it turns out to be feasible) in a non intrusive way" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) x86/hpet: Use channel for legacy clockevent storage x86/hpet: Use common init for legacy clockevent x86/hpet: Carve out shareable parts of init_one_hpet_msi_clockevent() x86/hpet: Consolidate clockevent functions x86/hpet: Wrap legacy clockevent in hpet_channel x86/hpet: Use cached info instead of extra flags x86/hpet: Move clockevents into channels x86/hpet: Rename variables to prepare for switching to channels x86/hpet: Add function to select a /dev/hpet channel x86/hpet: Add mode information to struct hpet_channel x86/hpet: Use cached channel data x86/hpet: Introduce struct hpet_base and struct hpet_channel x86/hpet: Coding style cleanup x86/hpet: Clean up comments x86/hpet: Make naming consistent x86/hpet: Remove not required includes x86/hpet: Decapitalize and rename EVT_TO_HPET_DEV x86/hpet: Simplify counter validation x86/hpet: Separate counter check out of clocksource register code x86/hpet: Shuffle code around for readability sake ...
2019-07-08Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU feature updates from Thomas Gleixner: "Updates for x86 CPU features: - Support for UMWAIT/UMONITOR, which allows to use MWAIT and MONITOR instructions in user space to save power e.g. in HPC workloads which spin wait on synchronization points. The maximum time a MWAIT can halt in userspace is controlled by the kernel and can be adjusted by the sysadmin. - Speed up the MTRR handling code on CPUs which support cache self-snooping correctly. On those CPUs the wbinvd() invocations can be omitted which speeds up the MTRR setup by a factor of 50. - Support for the new x86 vendor Zhaoxin who develops processors based on the VIA Centaur technology. - Prevent 'cat /proc/cpuinfo' from affecting isolated NOHZ_FULL CPUs by sending IPIs to retrieve the CPU frequency and use the cached values instead. - The addition and late revert of the FSGSBASE support. The revert was required as it turned out that the code still has hard to diagnose issues. Yet another engineering trainwreck... - Small fixes, cleanups, improvements and the usual new Intel CPU family/model addons" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) x86/fsgsbase: Revert FSGSBASE support selftests/x86/fsgsbase: Fix some test case bugs x86/entry/64: Fix and clean up paranoid_exit x86/entry/64: Don't compile ignore_sysret if 32-bit emulation is enabled selftests/x86: Test SYSCALL and SYSENTER manually with TF set x86/mtrr: Skip cache flushes on CPUs with cache self-snooping x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata Documentation/ABI: Document umwait control sysfs interfaces x86/umwait: Add sysfs interface to control umwait maximum time x86/umwait: Add sysfs interface to control umwait C0.2 state x86/umwait: Initialize umwait control values x86/cpufeatures: Enumerate user wait instructions x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUs x86/acpi/cstate: Add Zhaoxin processors support for cache flush policy in C3 ACPI, x86: Add Zhaoxin processors support for NONSTOP TSC x86/cpu: Create Zhaoxin processors architecture support file x86/cpu: Split Tremont based Atoms from the rest Documentation/x86/64: Add documentation for GS/FS addressing mode x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit ...
2019-07-08Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Thomas Gleixner: "A small set of updates for the FPU code: - Make the no387/nofxsr command line options useful by restricting them to 32bit and actually clearing all dependencies to prevent random crashes and malfunction. - Simplify and cleanup the kernel_fpu_*() helpers" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Inline fpu__xstate_clear_all_cpu_caps() x86/fpu: Make 'no387' and 'nofxsr' command line options useful x86/fpu: Remove the fpu__save() export x86/fpu: Simplify kernel_fpu_begin() x86/fpu: Simplify kernel_fpu_end()
2019-07-08Merge branch 'x86-entry-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vsyscall updates from Thomas Gleixner: "Further hardening of the legacy vsyscall by providing support for execute only mode and switching the default to it. This prevents a certain class of attacks which rely on the vsyscall page being accessible at a fixed address in the canonical kernel address space" * 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/x86: Add a test for process_vm_readv() on the vsyscall page x86/vsyscall: Add __ro_after_init to global variables x86/vsyscall: Change the default vsyscall mode to xonly selftests/x86/vsyscall: Verify that vsyscall=none blocks execution x86/vsyscall: Document odd SIGSEGV error code for vsyscalls x86/vsyscall: Show something useful on a read fault x86/vsyscall: Add a new vsyscall=xonly mode Documentation/admin: Remove the vsyscall=native documentation
2019-07-08Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x96 apic updates from Thomas Gleixner: "Updates for the x86 APIC interrupt handling and APIC timer: - Fix a long standing issue with spurious interrupts which was caused by the big vector management rework a few years ago. Robert Hodaszi provided finally enough debug data and an excellent initial failure analysis which allowed to understand the underlying issues. This contains a change to the core interrupt management code which is required to handle this correctly for the APIC/IO_APIC. The core changes are NOOPs for most architectures except ARM64. ARM64 is not impacted by the change as confirmed by Marc Zyngier. - Newer systems allow to disable the PIT clock for power saving causing panic in the timer interrupt delivery check of the IO/APIC when the HPET timer is not enabled either. While the clock could be turned on this would cause an endless whack a mole game to chase the proper register in each affected chipset. These systems provide the relevant frequencies for TSC, CPU and the local APIC timer via CPUID and/or MSRs, which allows to avoid the PIT/HPET based calibration. As the calibration code is the only usage of the legacy timers on modern systems and is skipped anyway when the frequencies are known already, there is no point in setting up the PIT and actually checking for the interrupt delivery via IO/APIC. To achieve this on a wide variety of platforms, the CPUID/MSR based frequency readout has been made more robust, which also allowed to remove quite some workarounds which turned out to be not longer required. Thanks to Daniel Drake for analysis, patches and verification" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Seperate unused system vectors from spurious entry again x86/irq: Handle spurious interrupt after shutdown gracefully x86/ioapic: Implement irq_get_irqchip_state() callback genirq: Add optional hardware synchronization for shutdown genirq: Fix misleading synchronize_irq() documentation genirq: Delay deactivation in free_irq() x86/timer: Skip PIT initialization on modern chipsets x86/apic: Use non-atomic operations when possible x86/apic: Make apic_bsp_setup() static x86/tsc: Set LAPIC timer period to crystal clock frequency x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period' x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
2019-07-08Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timer and timekeeping departement delivers: Core: - The consolidation of the VDSO code into a generic library including the conversion of x86 and ARM64. Conversion of ARM and MIPS are en route through the relevant maintainer trees and should end up in 5.4. This gets rid of the unnecessary different copies of the same code and brings all architectures on the same level of VDSO functionality. - Make the NTP user space interface more robust by restricting the TAI offset to prevent undefined behaviour. Includes a selftest. - Validate user input in the compat settimeofday() syscall to catch invalid values which would be turned into valid values by a multiplication overflow - Consolidate the time accessors - Small fixes, improvements and cleanups all over the place Drivers: - Support for the NXP system counter, TI davinci timer - Move the Microsoft HyperV clocksource/events code into the drivers/clocksource directory so it can be shared between x86 and ARM64. - Overhaul of the Tegra driver - Delay timer support for IXP4xx - Small fixes, improvements and cleanups as usual" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) time: Validate user input in compat_settimeofday() timer: Document TIMER_PINNED clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic clocksource/drivers: Make Hyper-V clocksource ISA agnostic MAINTAINERS: Fix Andy's surname and the directory entries of VDSO hrtimer: Use a bullet for the returns bullet list arm64: vdso: Fix compilation with clang older than 8 arm64: compat: Fix __arch_get_hw_counter() implementation arm64: Fix __arch_get_hw_counter() implementation lib/vdso: Make delta calculation work correctly MAINTAINERS: Add entry for the generic VDSO library arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system arm64: vdso: Remove unnecessary asm-offsets.c definitions vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h clocksource/drivers/davinci: Add support for clocksource clocksource/drivers/davinci: Add support for clockevents clocksource/drivers/tegra: Set up maximum-ticks limit properly clocksource/drivers/tegra: Cycles can't be 0 clocksource/drivers/tegra: Restore base address before cleanup clocksource/drivers/tegra: Add verbose definition for 1MHz constant ...
2019-07-08Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP/hotplug updates from Thomas Gleixner: "A small set of updates for SMP and CPU hotplug: - Abort disabling secondary CPUs in the freezer when a wakeup is pending instead of evaluating it only after all CPUs have been offlined. - Remove the shared annotation for the strict per CPU cfd_data in the smp function call core code. - Remove the return values of smp_call_function() and on_each_cpu() as they are unconditionally 0. Fixup the few callers which actually bothered to check the return value" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove smp_call_function() and on_each_cpu() return values smp: Do not mark call_function_data as shared cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap()
2019-07-08Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP} - Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to manage the permissions of executable vmalloc regions more strictly - Slight performance improvement by keeping softirqs enabled while touching the FPSIMD/SVE state (kernel_neon_begin/end) - Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG and AXFLAG instructions for floating point comparison flags manipulation) and FRINT (rounding floating point numbers to integers) - Re-instate ARM64_PSEUDO_NMI support which was previously marked as BROKEN due to some bugs (now fixed) - Improve parking of stopped CPUs and implement an arm64-specific panic_smp_self_stop() to avoid warning on not being able to stop secondary CPUs during panic - perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI platforms - perf: DDR performance monitor support for iMX8QXP - cache_line_size() can now be set from DT or ACPI/PPTT if provided to cope with a system cache info not exposed via the CPUID registers - Avoid warning on hardware cache line size greater than ARCH_DMA_MINALIGN if the system is fully coherent - arm64 do_page_fault() and hugetlb cleanups - Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep) - Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags' introduced in 5.1) - CONFIG_RANDOMIZE_BASE now enabled in defconfig - Allow the selection of ARM64_MODULE_PLTS, currently only done via RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill over into the vmalloc area - Make ZONE_DMA32 configurable * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits) perf: arm_spe: Enable ACPI/Platform automatic module loading arm_pmu: acpi: spe: Add initial MADT/SPE probing ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens ACPI/PPTT: Modify node flag detection to find last IDENTICAL x86/entry: Simplify _TIF_SYSCALL_EMU handling arm64: rename dump_instr as dump_kernel_instr arm64/mm: Drop [PTE|PMD]_TYPE_FAULT arm64: Implement panic_smp_self_stop() arm64: Improve parking of stopped CPUs arm64: Expose FRINT capabilities to userspace arm64: Expose ARMv8.5 CondM capability to userspace arm64: defconfig: enable CONFIG_RANDOMIZE_BASE arm64: ARM64_MODULES_PLTS must depend on MODULES arm64: bpf: do not allocate executable memory arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP arm64: module: create module allocations without exec permissions arm64: Allow user selection of ARM64_MODULE_PLTS acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 arm64: Allow selecting Pseudo-NMI again ...
2019-07-08Merge tag 'v5.2' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-07x86/fpu: Inline fpu__xstate_clear_all_cpu_caps()Sebastian Andrzej Siewior
All fpu__xstate_clear_all_cpu_caps() does is to invoke one simple function since commit 73e3a7d2a7c3b ("x86/fpu: Remove the explicit clearing of XSAVE dependent features") so invoke that function directly and remove the wrapper. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190704060743.rvew4yrjd6n33uzx@linutronix.de
2019-07-07x86/fpu: Make 'no387' and 'nofxsr' command line options usefulSebastian Andrzej Siewior
The command line option `no387' is designed to disable the FPU entirely. This only 'works' with CONFIG_MATH_EMULATION enabled. But on 64bit this cannot work because user space expects SSE to work which required basic FPU support. MATH_EMULATION does not help because SSE is not emulated. The command line option `nofxsr' should also be limited to 32bit because FXSR is part of the required flags on 64bit so turning it off is not possible. Clearing X86_FEATURE_FPU without emulation enabled will not work anyway and hang in fpu__init_system_early_generic() before the console is enabled. Setting additioal dependencies, ensures that the CPU still boots on a modern CPU. Otherwise, dropping FPU will leave FXSR enabled causing the kernel to crash early in fpu__init_system_mxcsr(). With XSAVE support it will crash in fpu__init_cpu_xstate(). The problem is that xsetbv() with XMM set and SSE cleared is not allowed. That means XSAVE has to be disabled. The XSAVE support is disabled in fpu__init_system_xstate_size_legacy() but it is too late. It can be removed, it has been added in commit 1f999ab5a1360 ("x86, xsave: Disable xsave in i387 emulation mode") to use `no387' on a CPU with XSAVE support. All this happens before console output. After hat, the next possible crash is in RAID6 detect code because MMX remained enabled. With a 3DNOW enabled config it will explode in memcpy() for instance due to kernel_fpu_begin() but this is unconditionally enabled. This is enough to boot a Debian Wheezy on a 32bit qemu "host" CPU which supports everything up to XSAVES, AVX2 without 3DNOW. Later, Debian increased the minimum requirements to i686 which means it does not boot userland atleast due to CMOV. After masking the additional features it still keeps SSE4A and 3DNOW* enabled (if present on the host) but those are unused in the kernel. Restrict `no387' and `nofxsr' otions to 32bit only. Add dependencies for FPU, FXSR to additionaly mask CMOV, MMX, XSAVE if FXSR or FPU is cleared. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190703083247.57kjrmlxkai3vpw3@linutronix.de
2019-07-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "x86 bugfix patches and one compilation fix for ARM" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm64/sve: Fix vq_present() macro to yield a bool KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC KVM: nVMX: Change KVM_STATE_NESTED_EVMCS to signal vmcs12 is copied from eVMCS KVM: nVMX: Allow restore nested-state to enable eVMCS when vCPU in SMM KVM: x86: degrade WARN to pr_warn_ratelimited
2019-07-04Merge branch 'x86/cpu' into perf/core, to pick up revertIngo Molnar
perf/core has an earlier version of the x86/cpu tree merged, to avoid conflicts, and due to this we want to pick up this ABI impacting revert as well: 049331f277fe: ("x86/fsgsbase: Revert FSGSBASE support") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-04Merge tag 'trace-v5.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This includes three fixes: - Fix a deadlock from a previous fix to keep module loading and function tracing text modifications from stepping on each other (this has a few patches to help document the issue in comments) - Fix a crash when the snapshot buffer gets out of sync with the main ring buffer - Fix a memory leak when reading the memory logs" * tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare() tracing/snapshot: Resize spare buffer if size changed tracing: Fix memory leak in tracing_err_log_open() ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare() ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()
2019-07-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-07-03 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix the interpreter to properly handle BPF_ALU32 | BPF_ARSH on BE architectures, from Jiong. 2) Fix several bugs in the x32 BPF JIT for handling shifts by 0, from Luke and Xi. 3) Fix NULL pointer deref in btf_type_is_resolve_source_only(), from Stanislav. 4) Properly handle the check that forwarding is enabled on the device in bpf_ipv6_fib_lookup() helper code, from Anton. 5) Fix UAPI bpf_prog_info fields alignment for archs that have 16 bit alignment such as m68k, from Baruch. 6) Fix kernel hanging in unregister_netdevice loop while unregistering device bound to XDP socket, from Ilya. 7) Properly terminate tail update in xskq_produce_flush_desc(), from Nathan. 8) Fix broken always_inline handling in test_lwt_seg6local, from Jiri. 9) Fix bpftool to use correct argument in cgroup errors, from Jakub. 10) Fix detaching dummy prog in XDP redirect sample code, from Prashant. 11) Add Jonathan to AF_XDP reviewers, from Björn. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-03x86/fsgsbase: Revert FSGSBASE supportThomas Gleixner
The FSGSBASE series turned out to have serious bugs and there is still an open issue which is not fully understood yet. The confidence in those changes has become close to zero especially as the test cases which have been shipped with that series were obviously never run before sending the final series out to LKML. ./fsgsbase_64 >/dev/null Segmentation fault As the merge window is close, the only sane decision is to revert FSGSBASE support. The revert is necessary as this branch has been merged into perf/core already and rebasing all of that a few days before the merge window is not the most brilliant idea. I could definitely slap myself for not noticing the test case fail when merging that series, but TBH my expectations weren't that low back then. Won't happen again. Revert the following commits: 539bca535dec ("x86/entry/64: Fix and clean up paranoid_exit") 2c7b5ac5d5a9 ("Documentation/x86/64: Add documentation for GS/FS addressing mode") f987c955c745 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2") 2032f1f96ee0 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit") 5bf0cab60ee2 ("x86/entry/64: Document GSBASE handling in the paranoid path") 708078f65721 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit") 79e1932fa3ce ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") 1d07316b1363 ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry") f60a83df4593 ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace") 1ab5f3f7fe3d ("x86/process/64: Use FSBSBASE in switch_to() if available") a86b4625138d ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions") 8b71340d702e ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions") b64ed19b93c3 ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com>
2019-07-03bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_K shift by 0Luke Nelson
The current x32 BPF JIT does not correctly compile shift operations when the immediate shift amount is 0. The expected behavior is for this to be a no-op. The following program demonstrates the bug. The expexceted result is 1, but the current JITed code returns 2. r0 = 1 r1 = 1 r1 <<= 0 if r1 == 1 goto end r0 = 2 end: exit This patch simplifies the code and fixes the bug. Fixes: 03f5781be2c7 ("bpf, x86_32: add eBPF JIT compiler for ia32") Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_X shift by 0Luke Nelson
The current x32 BPF JIT for shift operations is not correct when the shift amount in a register is 0. The expected behavior is a no-op, whereas the current implementation changes bits in the destination register. The following example demonstrates the bug. The expected result of this program is 1, but the current JITed code returns 2. r0 = 1 r1 = 1 r2 = 0 r1 <<= r2 if r1 == 1 goto end r0 = 2 end: exit The bug is caused by an incorrect assumption by the JIT that a shift by 32 clear the register. On x32 however, shifts use the lower 5 bits of the source, making a shift by 32 equivalent to a shift by 0. This patch fixes the bug using double-precision shifts, which also simplifies the code. Fixes: 03f5781be2c7 ("bpf, x86_32: add eBPF JIT compiler for ia32") Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-03clocksource/drivers: Continue making Hyper-V clocksource ISA agnosticMichael Kelley
Continue consolidating Hyper-V clock and timer code into an ISA independent Hyper-V clocksource driver. Move the existing clocksource code under drivers/hv and arch/x86 to the new clocksource driver while separating out the ISA dependencies. Update Hyper-V initialization to call initialization and cleanup routines since the Hyper-V synthetic clock is not independently enumerated in ACPI. Update Hyper-V clocksource users in KVM and VDSO to get definitions from the new include file. No behavior is changed and no new functionality is added. Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "bp@alien8.de" <bp@alien8.de> Cc: "will.deacon@arm.com" <will.deacon@arm.com> Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com> Cc: "mark.rutland@arm.com" <mark.rutland@arm.com> Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: Sunil Muthuswamy <sunilmut@microsoft.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: "sashal@kernel.org" <sashal@kernel.org> Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org> Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org> Cc: "arnd@arndb.de" <arnd@arndb.de> Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk> Cc: "ralf@linux-mips.org" <ralf@linux-mips.org> Cc: "paul.burton@mips.com" <paul.burton@mips.com> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "salyzyn@android.com" <salyzyn@android.com> Cc: "pcc@google.com" <pcc@google.com> Cc: "shuah@kernel.org" <shuah@kernel.org> Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com> Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk> Cc: "huw@codeweavers.com" <huw@codeweavers.com> Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au> Cc: "pbonzini@redhat.com" <pbonzini@redhat.com> Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com> Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org> Link: https://lkml.kernel.org/r/1561955054-1838-3-git-send-email-mikelley@microsoft.com
2019-07-03clocksource/drivers: Make Hyper-V clocksource ISA agnosticMichael Kelley
Hyper-V clock/timer code and data structures are currently mixed in with other code in the ISA independent drivers/hv directory as well as the ISA dependent Hyper-V code under arch/x86. Consolidate this code and data structures into a Hyper-V clocksource driver to better follow the Linux model. In doing so, separate out the ISA dependent portions so the new clocksource driver works for x86 and for the in-process Hyper-V on ARM64 code. To start, move the existing clockevents code to create the new clocksource driver. Update the VMbus driver to call initialization and cleanup routines since the Hyper-V synthetic timers are not independently enumerated in ACPI. No behavior is changed and no new functionality is added. Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "bp@alien8.de" <bp@alien8.de> Cc: "will.deacon@arm.com" <will.deacon@arm.com> Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com> Cc: "mark.rutland@arm.com" <mark.rutland@arm.com> Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: Sunil Muthuswamy <sunilmut@microsoft.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: "sashal@kernel.org" <sashal@kernel.org> Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org> Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org> Cc: "arnd@arndb.de" <arnd@arndb.de> Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk> Cc: "ralf@linux-mips.org" <ralf@linux-mips.org> Cc: "paul.burton@mips.com" <paul.burton@mips.com> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "salyzyn@android.com" <salyzyn@android.com> Cc: "pcc@google.com" <pcc@google.com> Cc: "shuah@kernel.org" <shuah@kernel.org> Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com> Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk> Cc: "huw@codeweavers.com" <huw@codeweavers.com> Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au> Cc: "pbonzini@redhat.com" <pbonzini@redhat.com> Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com> Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org> Link: https://lkml.kernel.org/r/1561955054-1838-2-git-send-email-mikelley@microsoft.com
2019-07-03Merge branch 'timers/vdso' into timers/coreThomas Gleixner
so the hyper-v clocksource update can be applied.
2019-07-03x86/irq: Seperate unused system vectors from spurious entry againThomas Gleixner
Quite some time ago the interrupt entry stubs for unused vectors in the system vector range got removed and directly mapped to the spurious interrupt vector entry point. Sounds reasonable, but it's subtly broken. The spurious interrupt vector entry point pushes vector number 0xFF on the stack which makes the whole logic in __smp_spurious_interrupt() pointless. As a consequence any spurious interrupt which comes from a vector != 0xFF is treated as a real spurious interrupt (vector 0xFF) and not acknowledged. That subsequently stalls all interrupt vectors of equal and lower priority, which brings the system to a grinding halt. This can happen because even on 64-bit the system vector space is not guaranteed to be fully populated. A full compile time handling of the unused vectors is not possible because quite some of them are conditonally populated at runtime. Bring the entry stubs back, which wastes 160 bytes if all stubs are unused, but gains the proper handling back. There is no point to selectively spare some of the stubs which are known at compile time as the required code in the IDT management would be way larger and convoluted. Do not route the spurious entries through common_interrupt and do_IRQ() as the original code did. Route it to smp_spurious_interrupt() which evaluates the vector number and acts accordingly now that the real vector numbers are handed in. Fixup the pr_warn so the actual spurious vector (0xff) is clearly distiguished from the other vectors and also note for the vectored case whether it was pending in the ISR or not. "Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen." "Spurious interrupt vector 0xed on CPU#1. Acked." "Spurious interrupt vector 0xee on CPU#1. Not pending!." Fixes: 2414e021ac8d ("x86: Avoid building unused IRQ entry stubs") Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Jan Beulich <jbeulich@suse.com> Link: https://lkml.kernel.org/r/20190628111440.550568228@linutronix.de
2019-07-03x86/irq: Handle spurious interrupt after shutdown gracefullyThomas Gleixner
Since the rework of the vector management, warnings about spurious interrupts have been reported. Robert provided some more information and did an initial analysis. The following situation leads to these warnings: CPU 0 CPU 1 IO_APIC interrupt is raised sent to CPU1 Unable to handle immediately (interrupts off, deep idle delay) mask() ... free() shutdown() synchronize_irq() clear_vector() do_IRQ() -> vector is clear Before the rework the vector entries of legacy interrupts were statically assigned and occupied precious vector space while most of them were unused. Due to that the above situation was handled silently because the vector was handled and the core handler of the assigned interrupt descriptor noticed that it is shut down and returned. While this has been usually observed with legacy interrupts, this situation is not limited to them. Any other interrupt source, e.g. MSI, can cause the same issue. After adding proper synchronization for level triggered interrupts, this can only happen for edge triggered interrupts where the IO-APIC obviously cannot provide information about interrupts in flight. While the spurious warning is actually harmless in this case it worries users and driver developers. Handle it gracefully by marking the vector entry as VECTOR_SHUTDOWN instead of VECTOR_UNUSED when the vector is freed up. If that above late handling happens the spurious detector will not complain and switch the entry to VECTOR_UNUSED. Any subsequent spurious interrupt on that line will trigger the spurious warning as before. Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode") Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>- Tested-by: Robert Hodaszi <Robert.Hodaszi@digi.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20190628111440.459647741@linutronix.de
2019-07-03x86/ioapic: Implement irq_get_irqchip_state() callbackThomas Gleixner
When an interrupt is shut down in free_irq() there might be an inflight interrupt pending in the IO-APIC remote IRR which is not yet serviced. That means the interrupt has been sent to the target CPUs local APIC, but the target CPU is in a state which delays the servicing. So free_irq() would proceed to free resources and to clear the vector because synchronize_hardirq() does not see an interrupt handler in progress. That can trigger a spurious interrupt warning, which is harmless and just confuses users, but it also can leave the remote IRR in a stale state because once the handler is invoked the interrupt resources might be freed already and therefore acknowledgement is not possible anymore. Implement the irq_get_irqchip_state() callback for the IO-APIC irq chip. The callback is invoked from free_irq() via __synchronize_hardirq(). Check the remote IRR bit of the interrupt and return 'in flight' if it is set and the interrupt is configured in level mode. For edge mode the remote IRR has no meaning. As this is only meaningful for level triggered interrupts this won't cure the potential spurious interrupt warning for edge triggered interrupts, but the edge trigger case does not result in stale hardware state. This has to be addressed at the vector/interrupt entry level seperately. Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode") Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20190628111440.370295517@linutronix.de
2019-07-02ftrace/x86: Anotate text_mutex split between ↵Jiri Kosina
ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare() ftrace_arch_code_modify_prepare() is acquiring text_mutex, while the corresponding release is happening in ftrace_arch_code_modify_post_process(). This has already been documented in the code, but let's also make the fact that this is intentional clear to the semantic analysis tools such as sparse. Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1906292321170.27227@cbobk.fhfr.pm Fixes: 39611265edc1a ("ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()") Fixes: d5b844a2cf507 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-02KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPICWanpeng Li
Thomas reported that: | Background: | | In preparation of supporting IPI shorthands I changed the CPU offline | code to software disable the local APIC instead of just masking it. | That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV | register. | | Failure: | | When the CPU comes back online the startup code triggers occasionally | the warning in apic_pending_intr_clear(). That complains that the IRRs | are not empty. | | The offending vector is the local APIC timer vector who's IRR bit is set | and stays set. | | It took me quite some time to reproduce the issue locally, but now I can | see what happens. | | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1 | (and hardware support) it behaves correctly. | | Here is the series of events: | | Guest CPU | | goes down | | native_cpu_disable() | | apic_soft_disable(); | | play_dead() | | .... | | startup() | | if (apic_enabled()) | apic_pending_intr_clear() <- Not taken | | enable APIC | | apic_pending_intr_clear() <- Triggers warning because IRR is stale | | When this happens then the deadline timer or the regular APIC timer - | happens with both, has fired shortly before the APIC is disabled, but the | interrupt was not serviced because the guest CPU was in an interrupt | disabled region at that point. | | The state of the timer vector ISR/IRR bits: | | ISR IRR | before apic_soft_disable() 0 1 | after apic_soft_disable() 0 1 | | On startup 0 1 | | Now one would assume that the IRR is cleared after the INIT reset, but this | happens only on CPU0. | | Why? | | Because our CPU0 hotplug is just for testing to make sure nothing breaks | and goes through an NMI wakeup vehicle because INIT would send it through | the boots-trap code which is not really working if that CPU was not | physically unplugged. | | Now looking at a real world APIC the situation in that case is: | | ISR IRR | before apic_soft_disable() 0 1 | after apic_soft_disable() 0 1 | | On startup 0 0 | | Why? | | Once the dying CPU reenables interrupts the pending interrupt gets | delivered as a spurious interupt and then the state is clear. | | While that CPU0 hotplug test case is surely an esoteric issue, the APIC | emulation is still wrong, Even if the play_dead() code would not enable | interrupts then the pending IRR bit would turn into an ISR .. interrupt | when the APIC is reenabled on startup. From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled * Pending interrupts in the IRR and ISR registers are held and require masking or handling by the CPU. In Thomas's testing, hardware cpu will not respect soft disable LAPIC when IRR has already been set or APICv posted-interrupt is in flight, so we can skip soft disable APIC checking when clearing IRR and set ISR, continue to respect soft disable APIC when attempting to set IRR. Reported-by: Rong Chen <rong.a.chen@intel.com> Reported-by: Feng Tang <feng.tang@intel.com> Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rong Chen <rong.a.chen@intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-02KVM: nVMX: Change KVM_STATE_NESTED_EVMCS to signal vmcs12 is copied from eVMCSLiran Alon
Currently KVM_STATE_NESTED_EVMCS is used to signal that eVMCS capability is enabled on vCPU. As indicated by vmx->nested.enlightened_vmcs_enabled. This is quite bizarre as userspace VMM should make sure to expose same vCPU with same CPUID values in both source and destination. In case vCPU is exposed with eVMCS support on CPUID, it is also expected to enable KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability. Therefore, KVM_STATE_NESTED_EVMCS is redundant. KVM_STATE_NESTED_EVMCS is currently used on restore path (vmx_set_nested_state()) only to enable eVMCS capability in KVM and to signal need_vmcs12_sync such that on next VMEntry to guest nested_sync_from_vmcs12() will be called to sync vmcs12 content into eVMCS in guest memory. However, because restore nested-state is rare enough, we could have just modified vmx_set_nested_state() to always signal need_vmcs12_sync. From all the above, it seems that we could have just removed the usage of KVM_STATE_NESTED_EVMCS. However, in order to preserve backwards migration compatibility, we cannot do that. (vmx_get_nested_state() needs to signal flag when migrating from new kernel to old kernel). Returning KVM_STATE_NESTED_EVMCS when just vCPU have eVMCS enabled have a bad side-effect of userspace VMM having to send nested-state from source to destination as part of migration stream. Even if guest have never used eVMCS as it doesn't even run a nested hypervisor workload. This requires destination userspace VMM and KVM to support setting nested-state. Which make it more difficult to migrate from new host to older host. To avoid this, change KVM_STATE_NESTED_EVMCS to signal eVMCS is not only enabled but also active. i.e. Guest have made some eVMCS active via an enlightened VMEntry. i.e. vmcs12 is copied from eVMCS and therefore should be restored into eVMCS resident in memory (by copy_vmcs12_to_enlightened()). Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>