summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2020-10-01Merge tag 'trace-v5.9-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two tracing fixes: - Fix temp buffer accounting that caused a WARNING for ftrace_dump_on_opps() - Move the recursion check in one of the function callback helpers to the beginning of the function, as if the rcu_is_watching() gets traced, it will cause a recursive loop that will crash the kernel" * tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Move RCU is watching check after recursion check tracing: Fix trace_find_next_entry() accounting of temp buffer size
2020-09-29ftrace: Move RCU is watching check after recursion checkSteven Rostedt (VMware)
The first thing that the ftrace function callback helper functions should do is to check for recursion. Peter Zijlstra found that when "rcu_is_watching()" had its notrace removed, it caused perf function tracing to crash. This is because the call of rcu_is_watching() is tested before function recursion is checked and and if it is traced, it will cause an infinite recursion loop. rcu_is_watching() should still stay notrace, but to prevent this should never had crashed in the first place. The recursion prevention must be the first thing done in callback functions. Link: https://lore.kernel.org/r/20200929112541.GM2628@hirez.programming.kicks-ass.net Cc: stable@vger.kernel.org Cc: Paul McKenney <paulmck@kernel.org> Fixes: c68c0fa293417 ("ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-29tracing: Fix trace_find_next_entry() accounting of temp buffer sizeSteven Rostedt (VMware)
The temp buffer size variable for trace_find_next_entry() was incorrectly being updated when the size did not change. The temp buffer size should only be updated when it is reallocated. This is mostly an issue when used with ftrace_dump(). That's because ftrace_dump() can not allocate a new buffer, and instead uses a temporary buffer with a fix size. But the variable that keeps track of that size is incorrectly updated with each call, and it could fall into the path that would try to reallocate the buffer and produce a warning. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1601 at kernel/trace/trace.c:3548 trace_find_next_entry+0xd0/0xe0 Modules linked in [..] CPU: 1 PID: 1601 Comm: bash Not tainted 5.9.0-rc5-test+ #521 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:trace_find_next_entry+0xd0/0xe0 Code: 40 21 00 00 4c 89 e1 31 d2 4c 89 ee 48 89 df e8 c6 9e ff ff 89 ab 54 21 00 00 5b 5d 41 5c 41 5d c3 48 63 d5 eb bf 31 c0 eb f0 <0f> 0b 48 63 d5 eb b4 66 0f 1f 84 00 00 00 00 00 53 48 8d 8f 60 21 RSP: 0018:ffff95a4f2e8bd70 EFLAGS: 00010046 RAX: ffffffff96679fc0 RBX: ffffffff97910de0 RCX: ffffffff96679fc0 RDX: ffff95a4f2e8bd98 RSI: ffff95a4ee321098 RDI: ffffffff97913000 RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000046 R12: ffff95a4f2e8bd98 R13: 0000000000000000 R14: ffff95a4ee321098 R15: 00000000009aa301 FS: 00007f8565484740(0000) GS:ffff95a55aa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055876bd43d90 CR3: 00000000b76e6003 CR4: 00000000001706e0 Call Trace: trace_print_lat_context+0x58/0x2d0 ? cpumask_next+0x16/0x20 print_trace_line+0x1a4/0x4f0 ftrace_dump.cold+0xad/0x12c __handle_sysrq.cold+0x51/0x126 write_sysrq_trigger+0x3f/0x4a proc_reg_write+0x53/0x80 vfs_write+0xca/0x210 ksys_write+0x70/0xf0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f8565579487 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffd40707948 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f8565579487 RDX: 0000000000000002 RSI: 000055876bd74de0 RDI: 0000000000000001 RBP: 000055876bd74de0 R08: 000000000000000a R09: 0000000000000001 R10: 000055876bdec280 R11: 0000000000000246 R12: 0000000000000002 R13: 00007f856564a500 R14: 0000000000000002 R15: 00007f856564a700 irq event stamp: 109958 ---[ end trace 7aab5b7e51484b00 ]--- Not only fix the updating of the temp buffer, but also do not free the temp buffer before a new buffer is allocated (there's no reason to not continue to use the current temp buffer if an allocation fails). Cc: stable@vger.kernel.org Fixes: 8e99cf91b99bb ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic") Reported-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-27mm/fork: Pass new vma pointer into copy_page_range()Peter Xu
This prepares for the future work to trigger early cow on pinned pages during fork(). No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-27mm: Introduce mm_struct.has_pinnedPeter Xu
(Commit message majorly collected from Jason Gunthorpe) Reduce the chance of false positive from page_maybe_dma_pinned() by keeping track if the mm_struct has ever been used with pin_user_pages(). This allows cases that might drive up the page ref_count to avoid any penalty from handling dma_pinned pages. Future work is planned, to provide a more sophisticated solution, likely to turn it into a real counter. For now, make it atomic_t but use it as a boolean for simplicity. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-25Merge tag 'pm-5.9-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix more fallout of recent RCU-lockdep changes in CPU idle code and two devfreq issues. Specifics: - Export rcu_idle_{enter,exit} to modules to fix build issues introduced by recent RCU-lockdep fixes (Borislav Petkov) - Add missing return statement to a stub function in the ACPI processor driver to fix a build issue introduced by recent RCU-lockdep fixes (Rafael Wysocki) - Fix recently introduced suspicious RCU usage warnings in the PSCI cpuidle driver and drop stale comments regarding RCU_NONIDLE() usage from enter_s2idle_proper() (Ulf Hansson) - Fix error code path in the tegra30 devfreq driver (Dan Carpenter) - Add missing information to devfreq_summary debugfs (Chanwoo Choi)" * tag 'pm-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset PM / devfreq: tegra30: Disable clock on error in probe PM / devfreq: Add timer type to devfreq_summary debugfs cpuidle: Drop misleading comments about RCU usage cpuidle: psci: Fix suspicious RCU usage rcu/tree: Export rcu_idle_{enter,exit} to modules
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from Jakub Kicinski: - fix failure to add bond interfaces to a bridge, the offload-handling code was too defensive there and recent refactoring unearthed that. Users complained (Ido) - fix unnecessarily reflecting ECN bits within TOS values / QoS marking in TCP ACK and reset packets (Wei) - fix a deadlock with bpf iterator. Hopefully we're in the clear on this front now... (Yonghong) - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel) - fix AQL on mt76 devices with FW rate control and add a couple of AQL issues in mac80211 code (Felix) - fix authentication issue with mwifiex (Maximilian) - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro) - fix exception handling for multipath routes via same device (David Ahern) - revert back to a BH spin lock flavor for nsid_lock: there are paths which do require the BH context protection (Taehee) - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke) - fix ife module load deadlock (Cong) - make an adjustment to netlink reply message type for code added in this release (the sole change touching uAPI here) (Michal) - a number of fixes for small NXP and Microchip switches (Vladimir) [ Pull request acked by David: "you can expect more of this in the future as I try to delegate more things to Jakub" ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits) net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU net: Update MAINTAINERS for MediaTek switch driver net/mlx5e: mlx5e_fec_in_caps() returns a boolean net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock net/mlx5e: kTLS, Fix leak on resync error flow net/mlx5e: kTLS, Add missing dma_unmap in RX resync net/mlx5e: kTLS, Fix napi sync and possible use-after-free net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats() net/mlx5e: Fix multicast counter not up-to-date in "ip -s" net/mlx5e: Fix endianness when calculating pedit mask first bit net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported net/mlx5e: CT: Fix freeing ct_label mapping net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready net/mlx5e: Use synchronize_rcu to sync with NAPI net/mlx5e: Use RCU to protect rq->xdp_prog ...
2020-09-22Merge tag 'trace-v5.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Check kprobe is enabled before unregistering from ftrace as it isn't registered when disabled. - Remove kprobes enabled via command-line that is on init text when freed. - Add missing RCU synchronization for ftrace trampoline symbols removed from kallsyms. - Free trampoline on error path if ftrace_startup() fails. - Give more space for the longer PID numbers in trace output. - Fix a possible double free in the histogram code. - A couple of fixes that were discovered by sparse. * tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: bootconfig: init: make xbc_namebuf static kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot tracing: fix double free ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer tracing: Make the space reserved for the pid wider ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms ftrace: Free the trampoline when ftrace_startup() fails kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
2020-09-21Merge branch 'rcu/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This contains a single commit that fixes a bug that was introduced in the last merge window. This bug causes a compiler warning complaining about show_rcu_tasks_classic_gp_kthread() being an unused static function in !SMP kernels. The fix is straightforward, just adding an 'inline' to make this a static inline function, thus avoiding the warning. This bug was reported by Laurent Pinchart, who would like it fixed sooner rather than later" * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread()
2020-09-21rcu/tree: Export rcu_idle_{enter,exit} to modulesBorislav Petkov
Fix this link error: ERROR: modpost: "rcu_idle_enter" [drivers/acpi/processor.ko] undefined! ERROR: modpost: "rcu_idle_exit" [drivers/acpi/processor.ko] undefined! when CONFIG_ACPI_PROCESSOR is built as module. PeterZ says that in light of ARM needing those soon too, they should simply be exported. Fixes: 1fecfdbb7acc ("ACPI: processor: Take over RCU-idle for C3-BM idle") Reported-by: Sven Joachim <svenjoac@gmx.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Paul E. McKenney <paulmckrcu@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-20Merge tag 'core_urgent_for_v5.9_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull syscall tracing fix from Borislav Petkov: "Fix the seccomp syscall rewriting so that trace and audit see the rewritten syscall number, from Kees Cook" * tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: core/entry: Report syscall correctly for trace and audit
2020-09-20Merge tag 'locking_urgent_for_v5.9_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: "Two fixes from the locking/urgent pile: - Fix lockdep's detection of "USED" <- "IN-NMI" inversions (Peter Zijlstra) - Make percpu-rwsem operations on the semaphore's ->read_count IRQ-safe because it can be used in an IRQ context (Hou Tao)" * tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count locking/lockdep: Fix "USED" <- "IN-NMI" inversions
2020-09-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "15 patches. Subsystems affected by this patch series: mailmap, mm/hotfixes, mm/thp, mm/memory-hotplug, misc, kcsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments' fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype stackleak: let stack_erasing_sysctl take a kernel pointer buffer ftrace: let ftrace_enable_sysctl take a kernel pointer buffer mm/memory_hotplug: drain per-cpu pages again during memory offline selftests/vm: fix display of page size in map_hugetlb mm/thp: fix __split_huge_pmd_locked() for migration PMD kprobes: fix kill kprobe which has been marked as gone tmpfs: restore functionality of nr_inodes=0 mlock: fix unevictable_pgs event counts on THP mm: fix check_move_unevictable_pages() on THP mm: migration of hugetlbfs page skip memcg ksm: reinstate memcg charge on copied pages mailmap: add older email addresses for Kees Cook
2020-09-19stackleak: let stack_erasing_sysctl take a kernel pointer bufferTobias Klauser
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of stack_erasing_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces) kernel/stackleak.c:31:50: expected void * kernel/stackleak.c:31:50: got void [noderef] __user *buffer Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19ftrace: let ftrace_enable_sysctl take a kernel pointer bufferTobias Klauser
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of ftrace_enable_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces) kernel/trace/ftrace.c:7544:43: expected void * kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19kprobes: fix kill kprobe which has been marked as goneMuchun Song
If a kprobe is marked as gone, we should not kill it again. Otherwise, we can disarm the kprobe more than once. In that case, the statistics of kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not work. Fixes: e8386a0cb22f ("kprobes: support probing module __exit function") Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Song Liu <songliubraving@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-18Merge tag 's390-5.9-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix order in trace_hardirqs_off_caller() to make locking state consistent even if the IRQ tracer calls into lockdep again. Touches common code. Acked-by Peter Zijlstra. - Correctly handle secure storage violation exception to avoid kernel panic triggered by user space misbehaviour. - Switch the idle->seqcount over to using raw_write_*() to avoid "suspicious RCU usage". - Fix memory leaks on hard unplug in pci code. - Use kvmalloc instead of kmalloc for larger allocations in zcrypt. - Add few missing __init annotations to static functions to avoid section mismatch complains when functions are not inlined. * tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: add 3f program exception handler lockdep: fix order in trace_hardirqs_off_caller() s390/pci: fix leak of DMA tables on hard unplug s390/init: add missing __init annotations s390/zcrypt: fix kmalloc 256k failure s390/idle: fix suspicious RCU usage
2020-09-18kprobes: tracing/kprobes: Fix to kill kprobes on initmem after bootMasami Hiramatsu
Since kprobe_event= cmdline option allows user to put kprobes on the functions in initmem, kprobe has to make such probes gone after boot. Currently the probes on the init functions in modules will be handled by module callback, but the kernel init text isn't handled. Without this, kprobes may access non-exist text area to disable or remove it. Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2 Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter") Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18tracing: fix double freeTom Rix
clang static analyzer reports this problem trace_events_hist.c:3824:3: warning: Attempt to free released memory kfree(hist_data->attrs->var_defs.name[i]); In parse_var_defs() if there is a problem allocating var_defs.expr, the earlier var_defs.name is freed. This free is duplicated by free_var_defs() which frees the rest of the list. Because free_var_defs() has to run anyway, remove the second free fom parse_var_defs(). Link: https://lkml.kernel.org/r/20200907135845.15804-1-trix@redhat.com Cc: stable@vger.kernel.org Fixes: 30350d65ac56 ("tracing: Add variable support to hist triggers") Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18ftrace: Let ftrace_enable_sysctl take a kernel pointer bufferTobias Klauser
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of ftrace_enable_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces) kernel/trace/ftrace.c:7544:43: expected void * kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18tracing: Make the space reserved for the pid widerSebastian Andrzej Siewior
For 64bit CONFIG_BASE_SMALL=0 systems PID_MAX_LIMIT is set by default to 4194304. During boot the kernel sets a new value based on number of CPUs but no lower than 32768. It is 1024 per CPU so with 128 CPUs the default becomes 131072 which needs six digits. This value can be increased during run time but must not exceed the initial upper limit. Systemd sometime after v241 sets it to the upper limit during boot. The result is that when the pid exceeds five digits, the trace output is a little hard to read because it is no longer properly padded (same like on big iron with 98+ CPUs). Increase the pid padding to seven digits. Link: https://lkml.kernel.org/r/20200904082331.dcdkrr3bkn3e4qlg@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18ftrace: Fix missing synchronize_rcu() removing trampoline from kallsymsAdrian Hunter
Add synchronize_rcu() after list_del_rcu() in ftrace_remove_trampoline_from_kallsyms() to protect readers of ftrace_ops_trampoline_list (in ftrace_get_trampoline_kallsym) which is used when kallsyms is read. Link: https://lkml.kernel.org/r/20200901091617.31837-1-adrian.hunter@intel.com Fixes: fc0ea795f53c8d ("ftrace: Add symbols for ftrace trampolines") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18ftrace: Free the trampoline when ftrace_startup() failsMiroslav Benes
Commit fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines") missed to remove ops from new ftrace_ops_trampoline_list in ftrace_startup() if ftrace_hash_ipmodify_enable() fails there. It may lead to BUG if such ops come from a module which may be removed. Moreover, the trampoline itself is not freed in this case. Fix it by calling ftrace_trampoline_free() during the rollback. Link: https://lkml.kernel.org/r/20200831122631.28057-1-mbenes@suse.cz Fixes: fc0ea795f53c ("ftrace: Add symbols for ftrace trampolines") Fixes: f8b8be8a310a ("ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict") Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()Masami Hiramatsu
Commit 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") fixed one bug but not completely fixed yet. If we run a kprobe_module.tc of ftracetest, kernel showed a warning as below. # ./ftracetest test.d/kprobe/kprobe_module.tc === Ftrace unit tests === [1] Kprobe dynamic event - probing module ... [ 22.400215] ------------[ cut here ]------------ [ 22.400962] Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2) [ 22.402139] WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0 [ 22.403358] Modules linked in: trace_printk(-) [ 22.404028] CPU: 7 PID: 200 Comm: rmmod Not tainted 5.9.0-rc2+ #66 [ 22.404870] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 22.406139] RIP: 0010:__disarm_kprobe_ftrace.isra.0+0x7e/0xa0 [ 22.406947] Code: 30 8b 03 eb c9 80 3d e5 09 1f 01 00 75 dc 49 8b 34 24 89 c2 48 c7 c7 a0 c2 05 82 89 45 e4 c6 05 cc 09 1f 01 01 e8 a9 c7 f0 ff <0f> 0b 8b 45 e4 eb b9 89 c6 48 c7 c7 70 c2 05 82 89 45 e4 e8 91 c7 [ 22.409544] RSP: 0018:ffffc90000237df0 EFLAGS: 00010286 [ 22.410385] RAX: 0000000000000000 RBX: ffffffff83066024 RCX: 0000000000000000 [ 22.411434] RDX: 0000000000000001 RSI: ffffffff810de8d3 RDI: ffffffff810de8d3 [ 22.412687] RBP: ffffc90000237e10 R08: 0000000000000001 R09: 0000000000000001 [ 22.413762] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88807c478640 [ 22.414852] R13: ffffffff8235ebc0 R14: ffffffffa00060c0 R15: 0000000000000000 [ 22.415941] FS: 00000000019d48c0(0000) GS:ffff88807d7c0000(0000) knlGS:0000000000000000 [ 22.417264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 22.418176] CR2: 00000000005bb7e3 CR3: 0000000078f7a000 CR4: 00000000000006a0 [ 22.419309] Call Trace: [ 22.419990] kill_kprobe+0x94/0x160 [ 22.420652] kprobes_module_callback+0x64/0x230 [ 22.421470] notifier_call_chain+0x4f/0x70 [ 22.422184] blocking_notifier_call_chain+0x49/0x70 [ 22.422979] __x64_sys_delete_module+0x1ac/0x240 [ 22.423733] do_syscall_64+0x38/0x50 [ 22.424366] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 22.425176] RIP: 0033:0x4bb81d [ 22.425741] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48 [ 22.428726] RSP: 002b:00007ffc70fef008 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 [ 22.430169] RAX: ffffffffffffffda RBX: 00000000019d48a0 RCX: 00000000004bb81d [ 22.431375] RDX: 0000000000000000 RSI: 0000000000000880 RDI: 00007ffc70fef028 [ 22.432543] RBP: 0000000000000880 R08: 00000000ffffffff R09: 00007ffc70fef320 [ 22.433692] R10: 0000000000656300 R11: 0000000000000246 R12: 00007ffc70fef028 [ 22.434635] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000 [ 22.435682] irq event stamp: 1169 [ 22.436240] hardirqs last enabled at (1179): [<ffffffff810df542>] console_unlock+0x422/0x580 [ 22.437466] hardirqs last disabled at (1188): [<ffffffff810df19b>] console_unlock+0x7b/0x580 [ 22.438608] softirqs last enabled at (866): [<ffffffff81c0038e>] __do_softirq+0x38e/0x490 [ 22.439637] softirqs last disabled at (859): [<ffffffff81a00f42>] asm_call_on_stack+0x12/0x20 [ 22.440690] ---[ end trace 1e7ce7e1e4567276 ]--- [ 22.472832] trace_kprobe: This probe might be able to register after target module is loaded. Continue. This is because the kill_kprobe() calls disarm_kprobe_ftrace() even if the given probe is not enabled. In that case, ftrace_set_filter_ip() fails because the given probe point is not registered to ftrace. Fix to check the given (going) probe is enabled before invoking disarm_kprobe_ftrace(). Link: https://lkml.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2 Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") Cc: Ingo Molnar <mingo@kernel.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-17mm: allow a controlled amount of unfairness in the page lockLinus Torvalds
Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made the page locking entirely fair, in that if a waiter came in while the lock was held, the lock would be transferred to the lockers strictly in order. That was intended to finally get rid of the long-reported watchdog failures that involved the page lock under extreme load, where a process could end up waiting essentially forever, as other page lockers stole the lock from under it. It also improved some benchmarks, but it ended up causing huge performance regressions on others, simply because fair lock behavior doesn't end up giving out the lock as aggressively, causing better worst-case latency, but potentially much worse average latencies and throughput. Instead of reverting that change entirely, this introduces a controlled amount of unfairness, with a sysctl knob to tune it if somebody needs to. But the default value should hopefully be good for any normal load, allowing a few rounds of lock stealing, but enforcing the strict ordering before the lock has been stolen too many times. There is also a hint from Matthieu Baerts that the fair page coloring may end up exposing an ABBA deadlock that is hidden by the usual optimistic lock stealing, and while the unfairness doesn't fix the fundamental issue (and I'm still looking at that), it avoids it in practice. The amount of unfairness can be modified by writing a new value to the 'sysctl_page_lock_unfairness' variable (default value of 5, exposed through /proc/sys/vm/page_lock_unfairness), but that is hopefully something we'd use mainly for debugging rather than being necessary for any deep system tuning. This whole issue has exposed just how critical the page lock can be, and how contended it gets under certain locks. And the main contention doesn't really seem to be anything related to IO (which was the origin of this lock), but for things like just verifying that the page file mapping is stable while faulting in the page into a page table. Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/ Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1 Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/ Reported-and-tested-by: Michael Larabel <Michael@michaellarabel.com> Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-16rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread()Paul E. McKenney
Commit 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()") introduced conditional compilation of several functions, but forgot one occurrence of show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of an unused static function. This commit uses "static inline" to avoid these complaints and possibly also to avoid emitting an actual definition of this function. Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()") Cc: <stable@vger.kernel.org> # 5.8.x Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-16locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_countHou Tao
The __this_cpu*() accessors are (in general) IRQ-unsafe which, given that percpu-rwsem is a blocking primitive, should be just fine. However, file_end_write() is used from IRQ context and will cause load-store issues on architectures where the per-cpu accessors are not natively irq-safe. Fix it by using the IRQ-safe this_cpu_*() for operations on read_count. This will generate more expensive code on a number of platforms, which might cause a performance regression for some of the other percpu-rwsem users. If any such is reported, we can consider alternative solutions. Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com
2020-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-09-15 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 19 day(s) which contain a total of 10 files changed, 47 insertions(+), 38 deletions(-). The main changes are: 1) docs/bpf fixes, from Andrii. 2) ld_abs fix, from Daniel. 3) socket casting helpers fix, from Martin. 4) hash iterator fixes, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15bpf: Fix a rcu warning for bpffs map pretty-printYonghong Song
Running selftest ./btf_btf -p the kernel had the following warning: [ 51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300 [ 51.529217] Modules linked in: [ 51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878 [ 51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014 [ 51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300 ... [ 51.542826] Call Trace: [ 51.543119] map_seq_next+0x53/0x80 [ 51.543528] seq_read+0x263/0x400 [ 51.543932] vfs_read+0xad/0x1c0 [ 51.544311] ksys_read+0x5f/0xe0 [ 51.544689] do_syscall_64+0x33/0x40 [ 51.545116] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The related source code in kernel/bpf/hashtab.c: 709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) 710 { 711 struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 712 struct hlist_nulls_head *head; 713 struct htab_elem *l, *next_l; 714 u32 hash, key_size; 715 int i = 0; 716 717 WARN_ON_ONCE(!rcu_read_lock_held()); In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key() without holding a rcu_read_lock(), hence causing the above warning. To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock. Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com
2020-09-14core/entry: Report syscall correctly for trace and auditKees Cook
On v5.8 when doing seccomp syscall rewrites (e.g. getpid into getppid as seen in the seccomp selftests), trace (and audit) correctly see the rewritten syscall on entry and exit: seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (... seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304 With mainline we see a mismatched enter and exit (the original syscall is incorrectly visible on entry): seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (... seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027 When ptrace or seccomp change the syscall, this needs to be visible to trace and audit at that time as well. Update the syscall earlier so they see the correct value. Fixes: d88d59b64ca3 ("core/entry: Respect syscall number rewrites") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200912005826.586171-1-keescook@chromium.org
2020-09-14lockdep: fix order in trace_hardirqs_off_caller()Sven Schnelle
Switch order so that locking state is consistent even if the IRQ tracer calls into lockdep again. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-12Merge tag 'seccomp-v5.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fixes from Kees Cook: "This fixes a rare race condition in seccomp when using TSYNC and USER_NOTIF together where a memory allocation would not get freed (found by syzkaller, fixed by Tycho). Additionally updates Tycho's MAINTAINERS and .mailmap entries for his new address" * tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: don't leave dangling ->notif if file allocation fails mailmap, MAINTAINERS: move to tycho.pizza seccomp: don't leak memory when filter install races
2020-09-11gcov: add support for GCC 10.1Peter Oberparleiter
Using gcov to collect coverage data for kernels compiled with GCC 10.1 causes random malfunctions and kernel crashes. This is the result of a changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between the layout of the gcov_info structure created by GCC profiling code and the related structure used by the kernel. Fix this by updating the in-kernel GCOV_COUNTERS value. Also re-enable config GCOV_KERNEL for use with GCC 10. Reported-by: Colin Ian King <colin.king@canonical.com> Reported-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Tested-and-Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-09Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a regression in padata" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: padata: fix possible padata_works_lock deadlock
2020-09-08seccomp: don't leave dangling ->notif if file allocation failsTycho Andersen
Christian and Kees both pointed out that this is a bit sloppy to open-code both places, and Christian points out that we leave a dangling pointer to ->notif if file allocation fails. Since we check ->notif for null in order to determine if it's ok to install a filter, this means people won't be able to install a filter if the file allocation fails for some reason, even if they subsequently should be able to. To fix this, let's hoist this free+null into its own little helper and use it. Reported-by: Kees Cook <keescook@chromium.org> Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tycho Andersen <tycho@tycho.pizza> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20200902140953.1201956-1-tycho@tycho.pizza Signed-off-by: Kees Cook <keescook@chromium.org>
2020-09-08seccomp: don't leak memory when filter install racesTycho Andersen
In seccomp_set_mode_filter() with TSYNC | NEW_LISTENER, we first initialize the listener fd, then check to see if we can actually use it later in seccomp_may_assign_mode(), which can fail if anyone else in our thread group has installed a filter and caused some divergence. If we can't, we partially clean up the newly allocated file: we put the fd, put the file, but don't actually clean up the *memory* that was allocated at filter->notif. Let's clean that up too. To accomplish this, let's hoist the actual "detach a notifier from a filter" code to its own helper out of seccomp_notify_release(), so that in case anyone adds stuff to init_listener(), they only have to add the cleanup code in one spot. This does a bit of extra locking and such on the failure path when the filter is not attached, but it's a slow failure path anyway. Fixes: 51891498f2da ("seccomp: allow TSYNC and USER_NOTIF together") Reported-by: syzbot+3ad9614a12f80994c32e@syzkaller.appspotmail.com Signed-off-by: Tycho Andersen <tycho@tycho.pizza> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20200902014017.934315-1-tycho@tycho.pizza Signed-off-by: Kees Cook <keescook@chromium.org>
2020-09-06Merge tag 'x86-urgent-2020-09-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - more generic entry code ABI fallout - debug register handling bugfixes - fix vmalloc mappings on 32-bit kernels - kprobes instrumentation output fix on 32-bit kernels - fix over-eager WARN_ON_ONCE() on !SMAP hardware - NUMA debugging fix - fix Clang related crash on !RETPOLINE kernels * tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Unbreak 32bit fast syscall x86/debug: Allow a single level of #DB recursion x86/entry: Fix AC assertion tracing/kprobes, x86/ptrace: Fix regs argument order for i386 x86, fakenuma: Fix invalid starting node ID x86/mm/32: Bring back vmalloc faulting on x86_32 x86/cmdline: Disable jump tables for cmdline.c
2020-09-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "19 patches. Subsystems affected by this patch series: MAINTAINERS, ipc, fork, checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration, hugetlb)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: include/linux/log2.h: add missing () around n in roundup_pow_of_two() mm/khugepaged.c: fix khugepaged's request size in collapse_file mm/hugetlb: fix a race between hugetlb sysctl handlers mm/hugetlb: try preferred node first when alloc gigantic page from cma mm/migrate: preserve soft dirty in remove_migration_pte() mm/migrate: remove unnecessary is_zone_device_page() check mm/rmap: fixup copying of soft dirty and uffd ptes mm/migrate: fixup setting UFFD_WP flag mm: madvise: fix vma user-after-free checkpatch: fix the usage of capture group ( ... ) fork: adjust sysctl_max_threads definition to match prototype ipc: adjust proc_ipc_sem_dointvec definition to match prototype mm: track page table modifications in __apply_to_page_range() MAINTAINERS: IA64: mark Status as Odd Fixes only MAINTAINERS: add LLVM maintainers MAINTAINERS: update Cavium/Marvell entries mm: slub: fix conversion of freelist_corrupted() mm: memcg: fix memcg reclaim soft lockup memcg: fix use-after-free in uncharge_batch
2020-09-05fork: adjust sysctl_max_threads definition to match prototypeTobias Klauser
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the definition of sysctl_max_threads to match its prototype in linux/sysctl.h which fixes the following sparse error/warning: kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces) kernel/fork.c:3050:47: expected void * kernel/fork.c:3050:47: got void [noderef] __user *buffer kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)): kernel/fork.c:3036:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... ) kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h): include/linux/sysctl.h:242:5: note: previously declared as: include/linux/sysctl.h:242:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... ) Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04gcov: Disable gcov build with GCC 10Leon Romanovsky
GCOV built with GCC 10 doesn't initialize n_function variable. This produces different kernel panics as was seen by Colin in Ubuntu and me in FC 32. As a workaround, let's disable GCOV build for broken GCC 10 version. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1891288 Link: https://lore.kernel.org/lkml/20200827133932.3338519-1-leon@kernel.org Link: https://lore.kernel.org/lkml/CAHk-=whbijeSdSvx-Xcr0DPMj0BiwhJ+uiNnDSVZcr_h_kg7UA@mail.gmail.com/ Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04x86/entry: Unbreak 32bit fast syscallThomas Gleixner
Andy reported that the syscall treacing for 32bit fast syscall fails: # ./tools/testing/selftests/x86/ptrace_syscall_32 ... [RUN] SYSEMU [FAIL] Initial args are wrong (nr=224, args=10 11 12 13 14 4289172732) ... [RUN] SYSCALL [FAIL] Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732) The eason is that the conversion to generic entry code moved the retrieval of the sixth argument (EBP) after the point where the syscall entry work runs, i.e. ptrace, seccomp, audit... Unbreak it by providing a split up version of syscall_enter_from_user_mode(). - syscall_enter_from_user_mode_prepare() establishes state and enables interrupts - syscall_enter_from_user_mode_work() runs the entry work Replace the call to syscall_enter_from_user_mode() in the 32bit fast syscall C-entry with the split functions and stick the EBP retrieval between them. Fixes: 27d6b4d14f5c ("x86/entry: Use generic syscall entry function") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87k0xdjbtt.fsf@nanos.tec.linutronix.de
2020-09-04padata: fix possible padata_works_lock deadlockDaniel Jordan
syzbot reports, WARNING: inconsistent lock state 5.9.0-rc2-syzkaller #0 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. syz-executor.0/26715 takes: (padata_works_lock){+.?.}-{2:2}, at: padata_do_parallel kernel/padata.c:220 {IN-SOFTIRQ-W} state was registered at: spin_lock include/linux/spinlock.h:354 [inline] padata_do_parallel kernel/padata.c:220 ... __do_softirq kernel/softirq.c:298 ... sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt arch/x86/include/asm/idtentry.h:581 Possible unsafe locking scenario: CPU0 ---- lock(padata_works_lock); <Interrupt> lock(padata_works_lock); padata_do_parallel() takes padata_works_lock with softirqs enabled, so a deadlock is possible if, on the same CPU, the lock is acquired in process context and then softirq handling done in an interrupt leads to the same path. Fix by leaving softirqs disabled while do_parallel holds padata_works_lock. Reported-by: syzbot+f4b9f49e38e25eb4ef52@syzkaller.appspotmail.com Fixes: 4611ce2246889 ("padata: allocate work structures for parallel jobs from a pool") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi Kivilinna. 2) Fix loss of RTT samples in rxrpc, from David Howells. 3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu. 4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka. 5) We disable BH for too lokng in sctp_get_port_local(), add a cond_resched() here as well, from Xin Long. 6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu. 7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from Yonghong Song. 8) Missing of_node_put() in mt7530 DSA driver, from Sumera Priyadarsini. 9) Fix crash in bnxt_fw_reset_task(), from Michael Chan. 10) Fix geneve tunnel checksumming bug in hns3, from Yi Li. 11) Memory leak in rxkad_verify_response, from Dinghao Liu. 12) In tipc, don't use smp_processor_id() in preemptible context. From Tuong Lien. 13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu. 14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter. 15) Fix ABI mismatch between driver and firmware in nfp, from Louis Peens. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits) net/smc: fix sock refcounting in case of termination net/smc: reset sndbuf_desc if freed net/smc: set rx_off for SMCR explicitly net/smc: fix toleration of fake add_link messages tg3: Fix soft lockup when tg3_reset_task() fails. doc: net: dsa: Fix typo in config code sample net: dp83867: Fix WoL SecureOn password nfp: flower: fix ABI mismatch between driver and firmware tipc: fix shutdown() of connectionless socket ipv6: Fix sysctl max for fib_multipath_hash_policy drivers/net/wan/hdlc: Change the default of hard_header_len to 0 net: gemini: Fix another missing clk_disable_unprepare() in probe net: bcmgenet: fix mask check in bcmgenet_validate_flow() amd-xgbe: Add support for new port mode net: usb: dm9601: Add USB ID of Keenetic Plus DSL vhost: fix typo in error message net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init() pktgen: fix error message with wrong function name net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode cxgb4: fix thermal zone device registration ...
2020-09-03bpf: Do not use bucket_lock for hashmap iteratorYonghong Song
Currently, for hashmap, the bpf iterator will grab a bucket lock, a spinlock, before traversing the elements in the bucket. This can ensure all bpf visted elements are valid. But this mechanism may cause deadlock if update/deletion happens to the same bucket of the visited map in the program. For example, if we added bpf_map_update_elem() call to the same visited element in selftests bpf_iter_bpf_hash_map.c, we will have the following deadlock: ============================================ WARNING: possible recursive locking detected 5.9.0-rc1+ #841 Not tainted -------------------------------------------- test_progs/1750 is trying to acquire lock: ffff9a5bb73c5e70 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x1cf/0x410 but task is already holding lock: ffff9a5bb73c5e20 (&htab->buckets[i].raw_lock){....}-{2:2}, at: bpf_hash_map_seq_find_next+0x94/0x120 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&htab->buckets[i].raw_lock); lock(&htab->buckets[i].raw_lock); *** DEADLOCK *** ... Call Trace: dump_stack+0x78/0xa0 __lock_acquire.cold.74+0x209/0x2e3 lock_acquire+0xba/0x380 ? htab_map_update_elem+0x1cf/0x410 ? __lock_acquire+0x639/0x20c0 _raw_spin_lock_irqsave+0x3b/0x80 ? htab_map_update_elem+0x1cf/0x410 htab_map_update_elem+0x1cf/0x410 ? lock_acquire+0xba/0x380 bpf_prog_ad6dab10433b135d_dump_bpf_hash_map+0x88/0xa9c ? find_held_lock+0x34/0xa0 bpf_iter_run_prog+0x81/0x16e __bpf_hash_map_seq_show+0x145/0x180 bpf_seq_read+0xff/0x3d0 vfs_read+0xad/0x1c0 ksys_read+0x5f/0xe0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... The bucket_lock first grabbed in seq_ops->next() called by bpf_seq_read(), and then grabbed again in htab_map_update_elem() in the bpf program, causing deadlocks. Actually, we do not need bucket_lock here, we can just use rcu_read_lock() similar to netlink iterator where the rcu_read_{lock,unlock} likes below: seq_ops->start(): rcu_read_lock(); seq_ops->next(): rcu_read_unlock(); /* next element */ rcu_read_lock(); seq_ops->stop(); rcu_read_unlock(); Compared to old bucket_lock mechanism, if concurrent updata/delete happens, we may visit stale elements, miss some elements, or repeat some elements. I think this is a reasonable compromise. For users wanting to avoid stale, missing/repeated accesses, bpf_map batch access syscall interface can be used. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200902235340.2001375-1-yhs@fb.com
2020-09-03locking/lockdep: Fix "USED" <- "IN-NMI" inversionspeterz@infradead.org
During the LPC RCU BoF Paul asked how come the "USED" <- "IN-NMI" detector doesn't trip over rcu_read_lock()'s lockdep annotation. Looking into this I found a very embarrasing typo in verify_lock_unused(): - if (!(class->usage_mask & LOCK_USED)) + if (!(class->usage_mask & LOCKF_USED)) fixing that will indeed cause rcu_read_lock() to insta-splat :/ The above typo means that instead of testing for: 0x100 (1 << LOCK_USED), we test for 8 (LOCK_USED), which corresponds to (1 << LOCK_ENABLED_HARDIRQ). So instead of testing for _any_ used lock, it will only match any lock used with interrupts enabled. The rcu_read_lock() annotation uses .check=0, which means it will not set any of the interrupt bits and will thus never match. In order to properly fix the situation and allow rcu_read_lock() to correctly work, split LOCK_USED into LOCK_USED and LOCK_USED_READ and by having .read users set USED_READ and test USED, pure read-recursive locks are permitted. Fixes: f6f48e180404 ("lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20200902160323.GK1362448@hirez.programming.kicks-ass.net
2020-08-30Merge tag 'x86-urgent-2020-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three interrupt related fixes for X86: - Move disabling of the local APIC after invoking fixup_irqs() to ensure that interrupts which are incoming are noted in the IRR and not ignored. - Unbreak affinity setting. The rework of the entry code reused the regular exception entry code for device interrupts. The vector number is pushed into the errorcode slot on the stack which is then lifted into an argument and set to -1 because that's regs->orig_ax which is used in quite some places to check whether the entry came from a syscall. But it was overlooked that orig_ax is used in the affinity cleanup code to validate whether the interrupt has arrived on the new target. It turned out that this vector check is pointless because interrupts are never moved from one vector to another on the same CPU. That check is a historical leftover from the time where x86 supported multi-CPU affinities, but not longer needed with the now strict single CPU affinity. Famous last words ... - Add a missing check for an empty cpumask into the matrix allocator. The affinity change added a warning to catch the case where an interrupt is moved on the same CPU to a different vector. This triggers because a condition with an empty cpumask returns an assignment from the allocator as the allocator uses for_each_cpu() without checking the cpumask for being empty. The historical inconsistent for_each_cpu() behaviour of ignoring the cpumask and unconditionally claiming that CPU0 is in the mask struck again. Sigh. plus a new entry into the MAINTAINER file for the HPE/UV platform" * tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/matrix: Deal with the sillyness of for_each_cpu() on UP x86/irq: Unbreak interrupt affinity setting x86/hotplug: Silence APIC only after all interrupts are migrated MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers
2020-08-30Merge tag 'locking-urgent-2020-08-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A set of fixes for lockdep, tracing and RCU: - Prevent recursion by using raw_cpu_* operations - Fixup the interrupt state in the cpu idle code to be consistent - Push rcu_idle_enter/exit() invocations deeper into the idle path so that the lock operations are inside the RCU watching sections - Move trace_cpu_idle() into generic code so it's called before RCU goes idle. - Handle raw_local_irq* vs. local_irq* operations correctly - Move the tracepoints out from under the lockdep recursion handling which turned out to be fragile and inconsistent" * tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep,trace: Expose tracepoints lockdep: Only trace IRQ edges mips: Implement arch_irqs_disabled() arm64: Implement arch_irqs_disabled() nds32: Implement arch_irqs_disabled() locking/lockdep: Cleanup x86/entry: Remove unused THUNKs cpuidle: Move trace_cpu_idle() into generic code cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic sched,idle,rcu: Push rcu_idle deeper into the idle path cpuidle: Fixup IRQ state lockdep: Use raw_cpu_*() for per-cpu variables
2020-08-30genirq/matrix: Deal with the sillyness of for_each_cpu() on UPThomas Gleixner
Most of the CPU mask operations behave the same way, but for_each_cpu() and it's variants ignore the cpumask argument and claim that CPU0 is always in the mask. This is historical, inconsistent and annoying behaviour. The matrix allocator uses for_each_cpu() and can be called on UP with an empty cpumask. The calling code does not expect that this succeeds but until commit e027fffff799 ("x86/irq: Unbreak interrupt affinity setting") this went unnoticed. That commit added a WARN_ON() to catch cases which move an interrupt from one vector to another on the same CPU. The warning triggers on UP. Add a check for the cpumask being empty to prevent this. Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2020-08-27dma-pool: Fix an uninitialized variable bug in atomic_pool_expand()Dan Carpenter
The "page" pointer can be used with out being initialized. Fixes: d7e673ec2c8e ("dma-pool: Only allocate from CMA when in same memory zone") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-08-26lockdep,trace: Expose tracepointsPeter Zijlstra
The lockdep tracepoints are under the lockdep recursion counter, this has a bunch of nasty side effects: - TRACE_IRQFLAGS doesn't work across the entire tracepoint - RCU-lockdep doesn't see the tracepoints either, hiding numerous "suspicious RCU usage" warnings. Pull the trace_lock_*() tracepoints completely out from under the lockdep recursion handling and completely rely on the trace level recusion handling -- also, tracing *SHOULD* not be taking locks in any case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.782688941@infradead.org