Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 festive updates from Will Deacon:
"In the end, we ended up with quite a lot more than I expected:
- Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
kernel-side support to come later)
- Support for per-thread stack canaries, pending an update to GCC
that is currently undergoing review
- Support for kexec_file_load(), which permits secure boot of a kexec
payload but also happens to improve the performance of kexec
dramatically because we can avoid the sucky purgatory code from
userspace. Kdump will come later (requires updates to libfdt).
- Optimisation of our dynamic CPU feature framework, so that all
detected features are enabled via a single stop_machine()
invocation
- KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
they can benefit from global TLB entries when KASLR is not in use
- 52-bit virtual addressing for userspace (kernel remains 48-bit)
- Patch in LSE atomics for per-cpu atomic operations
- Custom preempt.h implementation to avoid unconditional calls to
preempt_schedule() from preempt_enable()
- Support for the new 'SB' Speculation Barrier instruction
- Vectorised implementation of XOR checksumming and CRC32
optimisations
- Workaround for Cortex-A76 erratum #1165522
- Improved compatibility with Clang/LLD
- Support for TX2 system PMUS for profiling the L3 cache and DMC
- Reflect read-only permissions in the linear map by default
- Ensure MMIO reads are ordered with subsequent calls to Xdelay()
- Initial support for memory hotplug
- Tweak the threshold when we invalidate the TLB by-ASID, so that
mremap() performance is improved for ranges spanning multiple PMDs.
- Minor refactoring and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (125 commits)
arm64: kaslr: print PHYS_OFFSET in dump_kernel_offset()
arm64: sysreg: Use _BITUL() when defining register bits
arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches
arm64: cpufeature: Reduce number of pointer auth CPU caps from 6 to 4
arm64: docs: document pointer authentication
arm64: ptr auth: Move per-thread keys from thread_info to thread_struct
arm64: enable pointer authentication
arm64: add prctl control for resetting ptrauth keys
arm64: perf: strip PAC when unwinding userspace
arm64: expose user PAC bit positions via ptrace
arm64: add basic pointer authentication support
arm64/cpufeature: detect pointer authentication
arm64: Don't trap host pointer auth use to EL2
arm64/kvm: hide ptrauth from guests
arm64/kvm: consistently handle host HCR_EL2 flags
arm64: add pointer authentication register bits
arm64: add comments about EC exception levels
arm64: perf: Treat EXCLUDE_EL* bit definitions as unsigned
arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field
arm64: enable per-task stack canaries
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timer department delivers the following christmas presents:
Core code:
- Use proper seqcount initializer to make lockdep happy
- SPDX annotations and cleanup of license boilerplates
- Use DEFINE_SHOW_ATTRIBUTE() instead of open coding it
- Minor cleanups
Driver code:
- Add the sched_clock for the arc timer (Alexey Brodkin)
- Change the file timer names for riscv, rockchip, tegra20, sun4i and
meson6 (Daniel Lezcano)
- Add the DT bindings for r8a7796, r8a77470 and r8a774a1 (Biju Das)
- Remove the early platform driver registration for timer-ti-dm
(Bartosz Golaszewski)
- Provide the sched_clock for the riscv timer (Anup Patel)
- Add support for ARM64 for the imx-gpt and convert the imx-tpm to
the timer-of API (Anson Huang)
- Remove useless irq protection for the imx-gpt (Clément Péron)
- Remove a duplicate function name for the vt8500 (Dan Carpenter)
- Remove obsolete inclusion of <asm/smp_twd.h> for the tegra20 (Geert
Uytterhoeven)
- Demote the prcmu and the custom sched_clock for the dbx500 and the
ux500 (Linus Walleij)
- Add a new timer clock for the RDA8810PL (Manivannan Sadhasivam)
- Rename the macro to stick to the register name and add the delay
timer (Martin Blumenstingl)
- Switch the bcm2835 to the SPDX identifier (Stefan Wahren)
- Fix the interrupt register access on the fttmr010 (Tao Ren)
- Add missing of_node_put in the initialization path on the
integrator-ap (Yangtao Li)"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
dt-bindings: timer: Document RDA8810PL SoC timer
clocksource/drivers/rda: Add clock driver for RDA8810PL SoC
clocksource/drivers/meson6: Change name meson6_timer timer-meson6
clocksource/drivers/sun4i: Change name sun4i_timer to timer-sun4i
clocksource/drivers/tegra20: Change name tegra20_timer to timer-tegra20
clocksource/drivers/rockchip: Change name rockchip_timer to timer-rockchip
clocksource/drivers/riscv: Change name riscv_timer to timer-riscv
clocksource/drivers/riscv_timer: Provide the sched_clock
clocksource/drivers/timer-imx-tpm: Specify clock name for timer-of
clocksource/drivers/fttmr010: Fix invalid interrupt register access
clocksource/drivers/integrator-ap: Add missing of_node_put()
clocksource/drivers/bcm2835: Switch to SPDX identifier
dt-bindings: timer: renesas, cmt: Document r8a774a1 CMT support
clocksource/drivers/timer-imx-tpm: Convert the driver to timer-of
clocksource/drivers/arc_timer: Utilize generic sched_clock
dt-bindings: timer: renesas, cmt: Document r8a77470 CMT support
dt-bindings: timer: renesas, cmt: Document r8a7796 CMT support
clocksource/drivers/imx-gpt: Remove unnecessary irq protection
clocksource/drivers/imx-gpt: Add support for ARM64
clocksource/drivers/meson6_timer: Implement the ARM delay timer
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The interrupt department provides:
Core updates:
- Better spreading to NUMA nodes in the affinity management
- Support for more than one set of interrupts to spread out to allow
separate queues for separate functionality of a single device.
- Decouple the non queue interrupts from being managed. Those are
usually general interrupts for error handling etc. and those should
never be shut down. This also a preparation to utilize the
spreading mechanism for initial spreading of non-managed interrupts
later.
- Make the single CPU target selection in the matrix allocator more
balanced so interrupts won't accumulate on single CPUs in certain
situations.
- A large spell checking patch so we don't end up fixing single typos
over and over.
Driver updates:
- A bunch of new irqchip drivers (RDA8810PL, Madera, imx-irqsteer)
- Updates for the 8MQ, F1C100s platform drivers
- A number of SPDX cleanups
- A workaround for a very broken GICv3 implementation on msm8996
which sports a botched register set.
- A platform-msi fix to prevent memory leakage
- Various cleanups"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
genirq/affinity: Add is_managed to struct irq_affinity_desc
genirq/core: Introduce struct irq_affinity_desc
genirq/affinity: Remove excess indentation
irqchip/stm32: protect configuration registers with hwspinlock
dt-bindings: interrupt-controller: stm32: Document hwlock properties
irqchip: Add driver for imx-irqsteer controller
dt-bindings/irq: Add binding for Freescale IRQSTEER multiplexer
irqchip: Add driver for Cirrus Logic Madera codecs
genirq: Fix various typos in comments
irqchip/irq-imx-gpcv2: Add IRQCHIP_DECLARE for i.MX8MQ compatible
irqchip/irq-rda-intc: Fix return value check in rda8810_intc_init()
irqchip/irq-imx-gpcv2: Silence "fall through" warning
irqchip/gic-v3: Add quirk for msm8996 broken registers
irqchip/gic: Add support to device tree based quirks
dt-bindings/gic-v3: Add msm8996 compatible string
irqchip/sun4i: Add support for Allwinner ARMv5 F1C100s
irqchip/sun4i: Move IC specific register offsets to struct
irqchip/sun4i: Add a struct to hold global variables
dt-bindings: interrupt-controller: Add suniv interrupt-controller
irqchip: Add RDA8810PL interrupt driver
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add sysadmin documentation for cpuidle, extend the cpuidle
subsystem somewhat, improve the handling of performance states in the
generic power domains (genpd) and operating performance points (OPP)
frameworks, add a new cpufreq driver for Qualcomm SoCs, update some
other cpufreq drivers, switch over the runtime PM framework to using
high-res timers for device autosuspend, fix a problem with
suspend-to-idle on ACPI-based platforms, add system-wide suspend and
resume handling to the devfreq framework, do some janitorial cleanups
all over and update some utilities.
Specifics:
- Add sysadmin documentation for cpuidle (Rafael Wysocki).
- Make it possible to specify a cpuidle governor from kernel command
line, add new cpuidle state sysfs attributes for governor
evaluation, and improve the "polling" idle state handling (Rafael
Wysocki).
- Fix the handling of the "required-opps" DT property in the
operating performance points (OPP) framework, improve the
integration of it with the generic power domains (genpd) framework,
improve the handling of performance states in them and clean up the
idle states vs performance states separation in genpd (Viresh
Kumar, Ulf Hansson).
- Add a cpufreq driver called "qcom-hw" for Qualcomm SoCs using a
hardware engine to control CPU frequency transitions along with DT
bindings for it (Taniya Das).
- Fix an intel_pstate driver issue related to CPU offline and update
the documentation of it (Srinivas Pandruvada).
- Clean up the imx6q cpufreq driver (Anson Huang).
- Add SPDX license IDs to cpufreq schedutil governor files (Daniel
Lezcano).
- Switch over the runtime PM framework to using high-res timers for
device autosuspend to allow the control of it to be more precise
(Vincent Guittot).
- Disable non-wakeup ACPI GPEs during suspend-to-idle so that they
don't prevent the system from reaching the target low-power state
and simplify the suspend-to-idle handling on ACPI platforms without
full Low-Power S0 Idle (LPS0) support (Rafael Wysocki).
- Add system-wide suspend and resume support to the devfreq framework
(Lukasz Luba).
- Clean up the SmartReflex adaptive voltage scaling (AVS) driver and
add an SPDX license ID to it (Nishanth Menon, Uwe Kleine-König,
Thomas Meyer).
- Get rid of code duplication by using the DEFINE_SHOW_ATTRIBUTE
macro in some places, fix some DT node refcount leaks, and do some
other janitorial cleanups (Yangtao Li).
- Update the cpupower, intel_pstate_tracer and turbosat utilities
(Abhishek Goel, Doug Smythies, Len Brown)"
* tag 'pm-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (54 commits)
PM / Domains: remove define_genpd_open_function() and define_genpd_debugfs_fops()
PM-runtime: Switch autosuspend over to using hrtimers
cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver
dt-bindings: cpufreq: Introduce QCOM cpufreq firmware bindings
ACPI: PM: Loop in full LPS0 mode only
ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle
tools/power/x86/intel_pstate_tracer: Fix non root execution for post processing a trace file
tools/power turbostat: consolidate duplicate model numbers
tools/power turbostat: fix goldmont C-state limit decoding
PM / Domains: Propagate performance state updates
PM / Domains: Factorize dev_pm_genpd_set_performance_state()
PM / Domains: Save OPP table pointer in genpd
OPP: Don't return 0 on error from of_get_required_opp_performance_state()
OPP: Add dev_pm_opp_xlate_performance_state() helper
OPP: Improve _find_table_of_opp_np()
PM / Domains: Make genpd performance states orthogonal to the idlestates
PM / sleep: convert to DEFINE_SHOW_ATTRIBUTE
cpuidle: Add 'above' and 'below' idle state metrics
PM / AVS: SmartReflex: Switch to SPDX Licence ID
PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
...
|
|
Commit 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.
Unfortunately, it also results in a crash.
This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.
This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:
#5 [ffffc900244efc88] die at ffffffff8101f0ab
#6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
#7 [ffffc900244efce0] general_protection at ffffffff818ff082
[exception RIP: llist_add_batch+7]
RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000
RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a
RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0
R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
#9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948
RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421
RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040
R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
The simplest fix is to assign tsk->stack right where it is allocated.
Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix a division by zero crash in the posix-timers code"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timers: Fix division by zero bug
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex fix from Ingo Molnar:
"A single fix for a robust futexes race between sys_exit() and
sys_futex_lock_pi()"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Cure exit race
|
|
* pm-core:
PM-runtime: Switch autosuspend over to using hrtimers
* pm-qos:
PM / QoS: Change to use DEFINE_SHOW_ATTRIBUTE macro
* pm-domains:
PM / Domains: remove define_genpd_open_function() and define_genpd_debugfs_fops()
* pm-sleep:
PM / sleep: convert to DEFINE_SHOW_ATTRIBUTE
|
|
* pm-cpuidle:
cpuidle: Add 'above' and 'below' idle state metrics
cpuidle: big.LITTLE: fix refcount leak
cpuidle: Add cpuidle.governor= command line parameter
cpuidle: poll_state: Disregard disable idle states
Documentation: admin-guide: PM: Add cpuidle document
* pm-cpufreq:
cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver
dt-bindings: cpufreq: Introduce QCOM cpufreq firmware bindings
cpufreq: nforce2: Remove meaningless return
cpufreq: ia64: Remove unused header files
cpufreq: imx6q: save one condition block for normal case of nvmem read
cpufreq: imx6q: remove unused code
cpufreq: pmac64: add of_node_put()
cpufreq: powernv: add of_node_put()
Documentation: intel_pstate: Clarify coordination of P-State limits
cpufreq: intel_pstate: Force HWP min perf before offline
cpufreq: s3c24xx: Change to use DEFINE_SHOW_ATTRIBUTE macro
* pm-cpufreq-sched:
sched/cpufreq: Add the SPDX tags
|
|
Pull networking fixes from David Miller:
1) Off by one in netlink parsing of mac802154_hwsim, from Alexander
Aring.
2) nf_tables RCU usage fix from Taehee Yoo.
3) Flow dissector needs nhoff and thoff clamping, from Stanislav
Fomichev.
4) Missing sin6_flowinfo initialization in SCTP, from Xin Long.
5) Spectrev1 in ipmr and ip6mr, from Gustavo A. R. Silva.
6) Fix r8169 crash when DEBUG_SHIRQ is enabled, from Heiner Kallweit.
7) Fix SKB leak in rtlwifi, from Larry Finger.
8) Fix state pruning in bpf verifier, from Jakub Kicinski.
9) Don't handle completely duplicate fragments as overlapping, from
Michal Kubecek.
10) Fix memory corruption with macb and 64-bit DMA, from Anssi Hannula.
11) Fix TCP fallback socket release in smc, from Myungho Jung.
12) gro_cells_destroy needs to napi_disable, from Lorenzo Bianconi.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (130 commits)
rds: Fix warning.
neighbor: NTF_PROXY is a valid ndm_flag for a dump request
net: mvpp2: fix the phylink mode validation
net/sched: cls_flower: Remove old entries from rhashtable
net/tls: allocate tls context using GFP_ATOMIC
iptunnel: make TUNNEL_FLAGS available in uapi
gro_cell: add napi_disable in gro_cells_destroy
lan743x: Remove MAC Reset from initialization
net/mlx5e: Remove the false indication of software timestamping support
net/mlx5: Typo fix in del_sw_hw_rule
net/mlx5e: RX, Fix wrong early return in receive queue poll
ipv6: explicitly initialize udp6_addr in udp_sock_create6()
bnxt_en: Fix ethtool self-test loopback.
net/rds: remove user triggered WARN_ON in rds_sendmsg
net/rds: fix warn in rds_message_alloc_sgs
ath10k: skip sending quiet mode cmd for WCN3990
mac80211: free skb fraglist before freeing the skb
nl80211: fix memory leak if validate_pae_over_nl80211() fails
net/smc: fix TCP fallback socket release
vxge: ensure data0 is initialized in when fetching firmware version information
...
|
|
Devices which use managed interrupts usually have two classes of
interrupts:
- Interrupts for multiple device queues
- Interrupts for general device management
Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.
Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:
default_irq_affinity = 4..7
So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.
It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled. That limitation was reported by Kashyap and Sumit.
Expand struct irq_affinity_desc with a new bit 'is_managed' which is set
for truly managed interrupts (queue interrupts) and cleared for the general
device interrupts.
[ tglx: Simplify code and massage changelog ]
Reported-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reported-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Dou Liyang <douliyangs@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: bhelgaas@google.com
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-3-douliyangs@gmail.com
|
|
The interrupt affinity management uses straight cpumask pointers to convey
the automatically assigned affinity masks for managed interrupts. The core
interrupt descriptor allocation also decides based on the pointer being non
NULL whether an interrupt is managed or not.
Devices which use managed interrupts usually have two classes of
interrupts:
- Interrupts for multiple device queues
- Interrupts for general device management
Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.
Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:
default_irq_affinity = 4..7
So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.
It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled.
To remedy that situation it's required to convey more information than the
cpumasks through various interfaces related to interrupt descriptor
allocation.
Instead of adding yet another argument, create a new data structure
'irq_affinity_desc' which for now just contains the cpumask. This struct
can be expanded to convey auxilliary information in the next step.
No functional change, just preparatory work.
[ tglx: Simplified logic and clarified changelog ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Dou Liyang <douliyangs@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: kashyap.desai@broadcom.com
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-2-douliyangs@gmail.com
|
|
Plus other coding style issues which stood out while staring at that code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():
CPU0 CPU1
sys_exit() sys_futex()
do_exit() futex_lock_pi()
exit_signals(tsk) No waiters:
tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
mm_release(tsk) Set waiter bit
exit_robust_list(tsk) { *uaddr = 0x80000PID;
Set owner died attach_to_pi_owner() {
*uaddr = 0xC0000000; tsk = get_task(PID);
} if (!tsk->flags & PF_EXITING) {
... attach();
tsk->flags |= PF_EXITPIDONE; } else {
if (!(tsk->flags & PF_EXITPIDONE))
return -EAGAIN;
return -ESRCH; <--- FAIL
}
ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.
Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.
If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- A bunch of new irqchip drivers (RDA8810PL, Madera, imx-irqsteer)
- Updates for new (and old) platforms (i.MX8MQ, F1C100s)
- A number of SPDX cleanups
- A workaround for a very broken GICv3 implementation
- A platform-msi fix
- Various cleanups
|
|
Go over the IRQ subsystem source code (including irqchip drivers) and
fix common typos in comments.
No change in functionality intended.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
|
|
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <john.stultz@linaro.org>
Cc: <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20181209062225.4344-1-yuehaibing@huawei.com
|
|
The dma_direct_supported() function intends to check the DMA mask against
specific values. However, the phys_to_dma() function includes the SME
encryption mask, which defeats the intended purpose of the check. This
results in drivers that support less than 48-bit DMA (SME encryption mask
is bit 47) from being able to set the DMA mask successfully when SME is
active, which results in the driver failing to initialize.
Change the function used to check the mask from phys_to_dma() to
__phys_to_dma() so that the SME encryption mask is not part of the check.
Fixes: c1d0af1a1d5d ("kernel/dma/direct: take DMA offset into account in dma_direct_supported")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The signal delivery path of posix-timers can try to rearm the timer even if
the interval is zero. That's handled for the common case (hrtimer) but not
for alarm timers. In that case the forwarding function raises a division by
zero exception.
The handling for hrtimer based posix timers is wrong because it marks the
timer as active despite the fact that it is stopped.
Move the check from common_hrtimer_rearm() to posixtimer_rearm() to cure
both issues.
Reported-by: syzbot+9d38bedac9cc77b8ad5e@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: sboyd@kernel.org
Cc: stable@vger.kernel.org
Cc: syzkaller-bugs@googlegroups.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1812171328050.1880@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2018-12-15
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix liveness propagation of callee saved registers, from Jakub.
2) fix overflow in bpf_jit_limit knob, from Daniel.
3) bpf_flow_dissector api fix, from Stanislav.
4) bpf_perf_event api fix on powerpc, from Sandipan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently for liveness and state pruning the register parentage
chains don't include states of the callee. This makes some sense
as the callee can't access those registers. However, this means
that READs done after the callee returns will not propagate into
the states of the callee. Callee will then perform pruning
disregarding differences in caller state.
Example:
0: (85) call bpf_user_rnd_u32
1: (b7) r8 = 0
2: (55) if r0 != 0x0 goto pc+1
3: (b7) r8 = 1
4: (bf) r1 = r8
5: (85) call pc+4
6: (15) if r8 == 0x1 goto pc+1
7: (05) *(u64 *)(r9 - 8) = r3
8: (b7) r0 = 0
9: (95) exit
10: (15) if r1 == 0x0 goto pc+0
11: (95) exit
Here we acquire unknown state with call to get_random() [1]. Then
we store this random state in r8 (either 0 or 1) [1 - 3], and make
a call on line 5. Callee does nothing but a trivial conditional
jump (to create a pruning point). Upon return caller checks the
state of r8 and either performs an unsafe read or not.
Verifier will first explore the path with r8 == 1, creating a pruning
point at [11]. The parentage chain for r8 will include only callers
states so once verifier reaches [6] it will mark liveness only on states
in the caller, and not [11]. Now when verifier walks the paths with
r8 == 0 it will reach [11] and since REG_LIVE_READ on r8 was not
propagated there it will prune the walk entirely (stop walking
the entire program, not just the callee). Since [6] was never walked
with r8 == 0, [7] will be considered dead and replaced with "goto -1"
causing hang at runtime.
This patch weaves the callee's explored states onto the callers
parentage chain. Rough parentage for r8 would have looked like this
before:
[0] [1] [2] [3] [4] [5] [10] [11] [6] [7]
| | ,---|----. | | |
sl0: sl0: / sl0: \ sl0: sl0: sl0:
fr0: r8 <-- fr0: r8<+--fr0: r8 `fr0: r8 ,fr0: r8<-fr0: r8
\ fr1: r8 <- fr1: r8 /
\__________________/
after:
[0] [1] [2] [3] [4] [5] [10] [11] [6] [7]
| | | | | |
sl0: sl0: sl0: sl0: sl0: sl0:
fr0: r8 <-- fr0: r8 <- fr0: r8 <- fr0: r8 <-fr0: r8<-fr0: r8
fr1: r8 <- fr1: r8
Now the mark from instruction 6 will travel through callees states.
Note that we don't have to connect r0 because its overwritten by
callees state on return and r1 - r5 because those are not alive
any more once a call is made.
v2:
- don't connect the callees registers twice (Alexei: suggestion & code)
- add more details to the comment (Ed & Alexei)
v1: don't unnecessarily link caller saved regs (Jiong)
Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)")
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add an arm64-specific prctl to allow a thread to reinitialize its
pointer authentication keys to random values. This can be useful when
exec() is not used for starting new processes, to ensure that different
processes still have different keys.
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Two threads can try to fire the irq_sim with different offsets and will
end up fighting for the irq_work asignment. Thomas Gleixner suggested a
solution based on a bitfield where we set a bit for every offset
associated with an interrupt that should be fired and then iterate over
all set bits in the interrupt handler.
This is a slightly modified solution using a bitmap so that we don't
impose a limit on the number of interrupts one can allocate with
irq_sim.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"While running various ftrace tests on new development code, the
kmemleak detector found some allocations that were not freed
correctly.
This fixes a couple of leaks in the event trigger code as well as in
adding function trace filters in trace instances"
* tag 'trace-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix memory leak of instance function hash filters
tracing: Fix memory leak in set_trigger_filter()
tracing: Fix memory leak in create_filter()
|
|
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Michael and Sandipan report:
Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
and is adjusted again at init time if MODULES_VADDR is defined.
For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
the compile-time default at boot-time, which is 0x9c400000 when
using 64K page size. This overflows the signed 32-bit bpf_jit_limit
value:
root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
-1673527296
and can cause various unexpected failures throughout the network
stack. In one case `strace dhclient eth0` reported:
setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
16) = -1 ENOTSUPP (Unknown error 524)
and similar failures can be seen with tools like tcpdump. This doesn't
always reproduce however, and I'm not sure why. The more consistent
failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
host would time out on systemd/netplan configuring a virtio-net NIC
with no noticeable errors in the logs.
Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().
Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.
Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.
Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Link: https://lkml.kernel.org/r/20181211163744.22133-1-tiny.windzz@gmail.com
|
|
The following commands will cause a memory leak:
# cd /sys/kernel/tracing
# mkdir instances/foo
# echo schedule > instance/foo/set_ftrace_filter
# rmdir instances/foo
The reason is that the hashes that hold the filters to set_ftrace_filter and
set_ftrace_notrace are not freed if they contain any data on the instance
and the instance is removed.
Found by kmemleak detector.
Cc: stable@vger.kernel.org
Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When create_event_filter() fails in set_trigger_filter(), the filter may
still be allocated and needs to be freed. The caller expects the
data->filter to be updated with the new filter, even if the new filter
failed (we could add an error message by setting set_str parameter of
create_event_filter(), but that's another update).
But because the error would just exit, filter was left hanging and
nothing could free it.
Found by kmemleak detector.
Cc: stable@vger.kernel.org
Fixes: bac5fb97a173a ("tracing: Add and use generic set_trigger_filter() implementation")
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The create_filter() calls create_filter_start() which allocates a
"parse_error" descriptor, but fails to call create_filter_finish() that
frees it.
The op_stack and inverts in predicate_parse() were also not freed.
Found by kmemleak detector.
Cc: stable@vger.kernel.org
Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster")
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The SPDX tags are not present in cpufreq.c and cpufreq_schedutil.c.
Add them and remove the license descriptions
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Pull networking fixes from David Miller:
"A decent batch of fixes here. I'd say about half are for problems that
have existed for a while, and half are for new regressions added in
the 4.20 merge window.
1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach.
2) Revert bogus emac driver change, from Benjamin Herrenschmidt.
3) Handle BPF exported data structure with pointers when building
32-bit userland, from Daniel Borkmann.
4) Memory leak fix in act_police, from Davide Caratti.
5) Check RX checksum offload in RX descriptors properly in aquantia
driver, from Dmitry Bogdanov.
6) SKB unlink fix in various spots, from Edward Cree.
7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from
Eric Dumazet.
8) Fix FID leak in mlxsw driver, from Ido Schimmel.
9) IOTLB locking fix in vhost, from Jean-Philippe Brucker.
10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory
limits otherwise namespace exit can hang. From Jiri Wiesner.
11) Address block parsing length fixes in x25 from Martin Schiller.
12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan.
13) For tun interfaces, only iface delete works with rtnl ops, enforce
this by disallowing add. From Nicolas Dichtel.
14) Use after free in liquidio, from Pan Bian.
15) Fix SKB use after passing to netif_receive_skb(), from Prashant
Bhole.
16) Static key accounting and other fixes in XPS from Sabrina Dubroca.
17) Partially initialized flow key passed to ip6_route_output(), from
Shmulik Ladkani.
18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas
Falcon.
19) Several small TCP fixes (off-by-one on window probe abort, NULL
deref in tail loss probe, SNMP mis-estimations) from Yuchung
Cheng"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
net/sched: cls_flower: Reject duplicated rules also under skip_sw
bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips.
bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips.
bnxt_en: Keep track of reserved IRQs.
bnxt_en: Fix CNP CoS queue regression.
net/mlx4_core: Correctly set PFC param if global pause is turned off.
Revert "net/ibm/emac: wrong bit is used for STA control"
neighbour: Avoid writing before skb->head in neigh_hh_output()
ipv6: Check available headroom in ip6_xmit() even without options
tcp: lack of available data can also cause TSO defer
ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl
mlxsw: spectrum_router: Relax GRE decap matching check
mlxsw: spectrum_switchdev: Avoid leaking FID's reference count
mlxsw: spectrum_nve: Remove easily triggerable warnings
ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
sctp: frag_point sanity check
tcp: fix NULL ref in tail loss probe
tcp: Do not underestimate rwnd_limited
net: use skb_list_del_init() to remove from RX sublists
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc stackleak plugin fixes from Kees Cook:
- Remove tracing for inserted stack depth marking function (Anders
Roxell)
- Move gcc-plugin pass location to avoid objtool warnings (Alexander
Popov)
* tag 'gcc-plugins-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' pass
stackleak: Mark stackleak_track_stack() as notrace
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"This is a single commit that fixes a bug in uprobes SDT code due to a
missing mutex protection"
* tag 'trace-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
Uprobes: Fix kernel oops with delayed_uprobe_remove()
|
|
In kdump case, there exists only one dedicated memblock region as usable
memory (crashk_res). With this patch, kexec_walk_memblock() runs a given
callback function on this region.
Cosmetic change: 0 to MEMBLOCK_NONE at for_each_free_mem_range*()
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Memblock list is another source for usable system memory layout.
So move powerpc's arch_kexec_walk_mem() to common code so that other
memblock-based architectures, particularly arm64, can also utilise it.
A moved function is now renamed to kexec_walk_memblock() and integrated
into kexec_locate_mem_hole(), which will now be usable for all
architectures with no need for overriding arch_kexec_walk_mem().
With this change, arch_kexec_walk_mem() need no longer be a weak function,
and was now renamed to kexec_walk_resources().
Since powerpc doesn't support kdump in its kexec_file_load(), the current
kexec_walk_memblock() won't work for kdump either in this form, this will
be fixed in the next patch.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Since s390 already knows where to locate buffers, calling
arch_kexec_mem_walk() has no sense. So we can just drop it as kbuf->mem
indicates this while all other architectures sets it to 0 initially.
This change is a preparatory work for the next patch, where all the
variant memory walks, either on system resource or memblock, will be
put in one common place so that it will satisfy all the architectures'
need.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Change this function from static to global so that arm64 can implement
its own arch_kimage_file_post_load_cleanup() later using
kexec_image_post_load_cleanup_default().
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
There could be a race between task exit and probe unregister:
exit_mm()
mmput()
__mmput() uprobe_unregister()
uprobe_clear_state() put_uprobe()
delayed_uprobe_remove() delayed_uprobe_remove()
put_uprobe() is calling delayed_uprobe_remove() without taking
delayed_uprobe_lock and thus the race sometimes results in a
kernel crash. Fix this by taking delayed_uprobe_lock before
calling delayed_uprobe_remove() from put_uprobe().
Detailed crash log can be found at:
Link: http://lkml.kernel.org/r/000000000000140c370577db5ece@google.com
Link: http://lkml.kernel.org/r/20181205033423.26242-1-ravi.bangoria@linux.ibm.com
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: syzbot+cb1fb754b771caca0a88@syzkaller.appspotmail.com
Fixes: 1cc33161a83d ("uprobes: Support SDT markers having reference count (semaphore)")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Function graph tracing recurses into itself when stackleak is enabled,
causing the ftrace graph selftest to run for up to 90 seconds and
trigger the softlockup watchdog.
Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
200 mcount_get_lr_addr x0 // pointer to function's saved lr
(gdb) bt
\#0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
\#1 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#2 0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106
\#3 0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507
\#4 0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286
\#5 ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321
\#6 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#7 0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876
\#8 0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650
\#9 0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205
Rework so we mark stackleak_track_stack as notrace
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2018-12-05
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix bpf uapi pointers for 32-bit architectures, from Daniel.
2) improve verifer ability to handle progs with a lot of branches, from Alexei.
3) strict btf checks, from Yonghong.
4) bpf_sk_lookup api cleanup, from Joe.
5) other misc fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tk_core.seq is initialized open coded, but that misses to initialize the
lockdep map when lockdep is enabled. Lockdep splats involving tk_core seq
consequently lack a name and are hard to read.
Use the proper initializer which takes care of the lockdep map
initialization.
[ tglx: Massaged changelog ]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: tj@kernel.org
Cc: johannes.berg@intel.com
Link: https://lkml.kernel.org/r/20181128234325.110011-12-bvanassche@acm.org
|
|
malicious bpf program may try to force the verifier to remember
a lot of distinct verifier states.
Put a limit to number of per-insn 'struct bpf_verifier_state'.
Note that hitting the limit doesn't reject the program.
It potentially makes the verifier do more steps to analyze the program.
It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner
instead of spending cpu time walking long link list.
The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs
with slight increase in number of "steps" it takes to successfully verify
the programs:
before after
bpf_lb-DLB_L3.o 1940 1940
bpf_lb-DLB_L4.o 3089 3089
bpf_lb-DUNKNOWN.o 1065 1065
bpf_lxc-DDROP_ALL.o 28052 | 28162
bpf_lxc-DUNKNOWN.o 35487 | 35541
bpf_netdev.o 10864 10864
bpf_overlay.o 6643 6643
bpf_lcx_jit.o 38437 38437
But it also makes malicious program to be rejected in 0.4 seconds vs 6.5
Hence apply this limit to unprivileged programs only.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
pathological bpf programs may try to force verifier to explode in
the number of branch states:
20: (d5) if r1 s<= 0x24000028 goto pc+0
21: (b5) if r0 <= 0xe1fa20 goto pc+2
22: (d5) if r1 s<= 0x7e goto pc+0
23: (b5) if r0 <= 0xe880e000 goto pc+0
24: (c5) if r0 s< 0x2100ecf4 goto pc+0
25: (d5) if r1 s<= 0xe880e000 goto pc+1
26: (c5) if r0 s< 0xf4041810 goto pc+0
27: (d5) if r1 s<= 0x1e007e goto pc+0
28: (b5) if r0 <= 0xe86be000 goto pc+0
29: (07) r0 += 16614
30: (c5) if r0 s< 0x6d0020da goto pc+0
31: (35) if r0 >= 0x2100ecf4 goto pc+0
Teach verifier to recognize always taken and always not taken branches.
This analysis is already done for == and != comparison.
Expand it to all other branches.
It also helps real bpf programs to be verified faster:
before after
bpf_lb-DLB_L3.o 2003 1940
bpf_lb-DLB_L4.o 3173 3089
bpf_lb-DUNKNOWN.o 1080 1065
bpf_lxc-DDROP_ALL.o 29584 28052
bpf_lxc-DUNKNOWN.o 36916 35487
bpf_netdev.o 11188 10864
bpf_overlay.o 6679 6643
bpf_lcx_jit.o 39555 38437
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Malicious user space may try to force the verifier to use as much cpu
time and memory as possible. Hence check for pending signals
while verifying the program.
Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN,
since the kernel has to release the resources used for program verification.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull STIBP fallout fixes from Thomas Gleixner:
"The performance destruction department finally got it's act together
and came up with a cure for the STIPB regression:
- Provide a command line option to control the spectre v2 user space
mitigations. Default is either seccomp or prctl (if seccomp is
disabled in Kconfig). prctl allows mitigation opt-in, seccomp
enables the migitation for sandboxed processes.
- Rework the code to handle the conditional STIBP/IBPB control and
remove the now unused ptrace_may_access_sched() optimization
attempt
- Disable STIBP automatically when SMT is disabled
- Optimize the switch_to() logic to avoid MSR writes and invocations
of __switch_to_xtra().
- Make the asynchronous speculation TIF updates synchronous to
prevent stale mitigation state.
As a general cleanup this also makes retpoline directly depend on
compiler support and removes the 'minimal retpoline' option which just
pretended to provide some form of security while providing none"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/speculation: Provide IBPB always command line options
x86/speculation: Add seccomp Spectre v2 user space protection mode
x86/speculation: Enable prctl mode for spectre_v2_user
x86/speculation: Add prctl() control for indirect branch speculation
x86/speculation: Prepare arch_smt_update() for PRCTL mode
x86/speculation: Prevent stale SPEC_CTRL msr content
x86/speculation: Split out TIF update
ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
x86/speculation: Prepare for conditional IBPB in switch_mm()
x86/speculation: Avoid __switch_to_xtra() calls
x86/process: Consolidate and simplify switch_to_xtra() code
x86/speculation: Prepare for per task indirect branch speculation control
x86/speculation: Add command line control for indirect branch speculation
x86/speculation: Unify conditional spectre v2 print functions
x86/speculataion: Mark command line parser data __initdata
x86/speculation: Mark string arrays const correctly
x86/speculation: Reorder the spec_v2 code
x86/l1tf: Show actual SMT state
x86/speculation: Rework SMT state change
sched/smt: Expose sched_smt_present static key
...
|
|
Merge misc fixes from Andrew Morton:
"31 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
ocfs2: fix potential use after free
mm/khugepaged: fix the xas_create_range() error path
mm/khugepaged: collapse_shmem() do not crash on Compound
mm/khugepaged: collapse_shmem() without freezing new_page
mm/khugepaged: minor reorderings in collapse_shmem()
mm/khugepaged: collapse_shmem() remember to clear holes
mm/khugepaged: fix crashes due to misaccounted holes
mm/khugepaged: collapse_shmem() stop if punched or truncated
mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
mm/huge_memory: splitting set mapping+index before unfreeze
mm/huge_memory: rename freeze_page() to unmap_page()
initramfs: clean old path before creating a hardlink
kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
psi: make disabling/enabling easier for vendor kernels
proc: fixup map_files test on arm
debugobjects: avoid recursive calls with kmemleak
userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
userfaultfd: shmem: add i_size checks
userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull stackleak plugin fix from Kees Cook:
"Fix crash by not allowing kprobing of stackleak_erase() (Alexander
Popov)"
* tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
stackleak: Disable function tracing and kprobes for stackleak_erase()
|
|
Since __sanitizer_cov_trace_pc() is marked as notrace, function calls in
__sanitizer_cov_trace_pc() shouldn't be traced either.
ftrace_graph_caller() gets called for each function that isn't marked
'notrace', like canonicalize_ip(). This is the call trace from a run:
[ 139.644550] ftrace_graph_caller+0x1c/0x24
[ 139.648352] canonicalize_ip+0x18/0x28
[ 139.652313] __sanitizer_cov_trace_pc+0x14/0x58
[ 139.656184] sched_clock+0x34/0x1e8
[ 139.659759] trace_clock_local+0x40/0x88
[ 139.663722] ftrace_push_return_trace+0x8c/0x1f0
[ 139.667767] prepare_ftrace_return+0xa8/0x100
[ 139.671709] ftrace_graph_caller+0x1c/0x24
Rework so that check_kcov_mode() and canonicalize_ip() that are called
from __sanitizer_cov_trace_pc() are also marked as notrace.
Link: http://lkml.kernel.org/r/20181128081239.18317-1-anders.roxell@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signen-off-by: Anders Roxell <anders.roxell@linaro.org>
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|