summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2020-04-27rcutorture: Add torture tests for RCU Tasks RudePaul E. McKenney
This commit adds the definitions required to torture the rude flavor of RCU tasks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu-tasks: Add an RCU-tasks rude variantPaul E. McKenney
This commit adds a "rude" variant of RCU-tasks that has as quiescent states schedule(), cond_resched_tasks_rcu_qs(), userspace execution, and (in theory, anyway) cond_resched(). In other words, RCU-tasks rude readers are regions of code with preemption disabled, but excluding code early in the CPU-online sequence and late in the CPU-offline sequence. Updates make use of IPIs and force an IPI and a context switch on each online CPU. This variant is useful in some situations in tracing. Suggested-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: Apply EXPORT_SYMBOL_GPL() feedback from Qiujun Huang. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Apply review feedback from Steve Rostedt. ]
2020-04-27rcu-tasks: Refactor RCU-tasks to allow variants to be addedPaul E. McKenney
This commit splits out generic processing from RCU-tasks-specific processing in order to allow additional flavors to be added. It also adds a def_bool TASKS_RCU_GENERIC to enable the common RCU-tasks infrastructure code. This is primarily, but not entirely, a code-movement commit. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcutorture: Add a test for synchronize_rcu_mult()Paul E. McKenney
This commit adds a crude test for synchronize_rcu_mult(). This is currently a smoke test rather than a high-quality stress test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu-tasks: Create struct to hold state informationPaul E. McKenney
This commit creates an rcu_tasks struct to hold state information for RCU Tasks. This is a preparation commit for adding additional flavors of Tasks RCU, each of which would have its own rcu_tasks struct. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu-tasks: Move Tasks RCU to its own filePaul E. McKenney
This code-movement-only commit is in preparation for adding an additional flavor of Tasks RCU, which relies on workqueues to detect grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add per-task state to RCU CPU stall warningsPaul E. McKenney
Currently, an RCU-preempt CPU stall warning simply lists the PIDs of those tasks holding up the current grace period. This can be helpful, but more can be even more helpful. To this end, this commit adds the nesting level, whether the task thinks it was preempted in its current RCU read-side critical section, whether RCU core has asked this task for a quiescent state, whether the expedited-grace-period hint is set, and whether the task believes that it is on the blocked-tasks list (it must be, or it would not be printed, but if things are broken, best not to take too much for granted). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27sched/core: Add function to sample state of locked-down taskPaul E. McKenney
A running task's state can be sampled in a consistent manner (for example, for diagnostic purposes) simply by invoking smp_call_function_single() on its CPU, which may be obtained using task_cpu(), then having the IPI handler verify that the desired task is in fact still running. However, if the task is not running, this sampling can in theory be done immediately and directly. In practice, the task might start running at any time, including during the sampling period. Gaining a consistent sample of a not-running task therefore requires that something be done to lock down the target task's state. This commit therefore adds a try_invoke_on_locked_down_task() function that invokes a specified function if the specified task can be locked down, returning true if successful and if the specified function returns true. Otherwise this function simply returns false. Given that the function passed to try_invoke_on_nonrunning_task() might be invoked with a runqueue lock held, that function had better be quite lightweight. The function is passed the target task's task_struct pointer and the argument passed to try_invoke_on_locked_down_task(), allowing easy access to task state and to a location for further variables to be passed in and out. Note that the specified function will be called even if the specified task is currently running. The function can use ->on_rq and task_curr() to quickly and easily determine the task's state, and can return false if this state is not to the function's liking. The caller of the try_invoke_on_locked_down_task() would then see the false return value, and could take appropriate action, for example, trying again later or sending an IPI if matters are more urgent. It is expected that use cases such as the RCU CPU stall warning code will simply return false if the task is currently running. However, there are use cases involving nohz_full CPUs where the specified function might instead fall back to an alternative sampling scheme that relies on heavier synchronization (such as memory barriers) in the target task. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> [ paulmck: Apply feedback from Peter Zijlstra and Steven Rostedt. ] [ paulmck: Invoke if running to handle feedback from Mathieu Desnoyers. ] Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu-tasks: Use context-switch hook for PREEMPT=y kernelsPaul E. McKenney
Currently, the PREEMPT=y version of rcu_note_context_switch() does not invoke rcu_tasks_qs(), and we need it to in order to keep RCU Tasks Trace's IPIs down to a dull roar. This commit therefore enables this hook. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add comments marking transitions between RCU watching and notPaul E. McKenney
It is not as clear as it might be just where in RCU's idle entry/exit code RCU stops and starts watching the current CPU. This commit therefore adds comments calling out the transitions. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcutorture: Add test of holding scheduler locks across rcu_read_unlock()Paul E. McKenney
Now that it should be safe to hold scheduler locks across rcu_read_unlock(), even in cases where the corresponding RCU read-side critical section might have been preempted and boosted, the commit adds a test of this capability to rcutorture. This has been tested on current mainline (which can deadlock in this situation), and lockdep duly reported the expected deadlock. On -rcu, lockdep is silent, thus far, anyway. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Don't use negative nesting depth in __rcu_read_unlock()Lai Jiangshan
Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_read_unlock() being invoked within such a handler that interrupted an __rcu_read_unlock_special(), in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for __rcu_read_unlock() to set the nesting level negative. This commit therefore removes this recursion-protection code from __rcu_read_unlock(). [ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ] [ paulmck: Adjust other checks given no more negative nesting. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs fieldLai Jiangshan
The ->rcu_read_unlock_special.b.deferred_qs field is set to true in rcu_read_unlock_special() but never set to false. This is not particularly useful, so this commit removes this field. The only possible justification for this field is to ease debugging of RCU deferred quiscent states, but the combination of the other ->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course ->rcu_read_lock_nesting should cover debugging needs. And if this last proves incorrect, this patch can always be reverted, along with the required setting of ->rcu_read_unlock_special.b.deferred_qs to false in rcu_preempt_deferred_qs_irqrestore(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Don't set nesting depth negative in rcu_preempt_deferred_qs()Lai Jiangshan
Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_preempt_deferred_qs() being invoked within such a handler that interrupted an extended RCU read-side critical section, in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for rcu_preempt_deferred_qs() to set the nesting level negative. This commit therefore removes this recursion-protection code from rcu_preempt_deferred_qs(). [ paulmck: Fix typo in commit log per Steve Rostedt. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Make rcu_read_unlock_special() safe for rq/pi locksPaul E. McKenney
The scheduler is currently required to hold rq/pi locks across the entire RCU read-side critical section or not at all. This is inconvenient and leaves traps for the unwary, including the author of this commit. But now that excessively long grace periods enable scheduling-clock interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in rcu_read_unlock_special() can be dispensed with. In other words, the rcu_read_unlock_special() function can refrain from doing wakeups unless such wakeups are guaranteed safe. This commit therefore avoids unsafe wakeups, freeing the scheduler to hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU read-side critical section might have been preempted. This commit also updates RCU's requirements documentation. This commit is inspired by a patch from Lai Jiangshan: https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com This commit is further intended to be a step towards his goal of permitting the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock(). Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add KCSAN stubs to update.cPaul E. McKenney
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add rcu_gp_might_be_stalled()Paul E. McKenney
This commit adds rcu_gp_might_be_stalled(), which returns true if there is some reason to believe that the RCU grace period is stalled. The use case is where an RCU free-memory path needs to allocate memory in order to free it, a situation that should be avoided where possible. But where it is necessary, there is always the alternative of using synchronize_rcu() to wait for a grace period in order to avoid the allocation. And if the grace period is stalled, allocating memory to asynchronously wait for it is a bad idea of epic proportions: Far better to let others use the memory, because these others might actually be able to free that memory before the grace period ends. Thus, rcu_gp_might_be_stalled() can be used to help decide whether allocating memory on an RCU free path is a semi-reasonable course of action. Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu/tree: Count number of batched kfree_rcu() locklesslyJoel Fernandes (Google)
We can relax the correctness of counting of number of queued objects in favor of not hurting performance, by locklessly sampling per-cpu counters. This should be Ok since under high memory pressure, it should not matter if we are off by a few objects while counting. The shrinker will still do the reclaim. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Remove unused "flags" variable. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu/tree: Add a shrinker to prevent OOM due to kfree_rcu() batchingJoel Fernandes (Google)
To reduce grace periods and improve kfree() performance, we have done batching recently dramatically bringing down the number of grace periods while giving us the ability to use kfree_bulk() for efficient kfree'ing. However, this has increased the likelihood of OOM condition under heavy kfree_rcu() flood on small memory systems. This patch introduces a shrinker which starts grace periods right away if the system is under memory pressure due to existence of objects that have still not started a grace period. With this patch, I do not observe an OOM anymore on a system with 512MB RAM and 8 CPUs, with the following rcuperf options: rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_mult=2 Otherwise it easily OOMs with the above parameters. NOTE: 1. On systems with no memory pressure, the patch has no effect as intended. 2. In the future, we can use this same mechanism to prevent grace periods from happening even more, by relying on shrinkers carefully. Cc: urezki@gmail.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcuperf: Add ability to increase object allocation sizeJoel Fernandes (Google)
This allows us to increase memory pressure dynamically using a new rcuperf boot command line parameter called 'rcumult'. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert rcu_nohz_full_cpu() ULONG_CMP_LT() to time_before()Paul E. McKenney
This commit converts the ULONG_CMP_LT() in rcu_nohz_full_cpu() to time_before() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert rcu_initiate_boost() ULONG_CMP_GE() to time_after()Paul E. McKenney
This commit converts the ULONG_CMP_GE() in rcu_initiate_boost() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert ULONG_CMP_GE() to time_after() for jiffy comparisonPaul E. McKenney
This commit converts the ULONG_CMP_GE() in rcu_gp_fqs_loop() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Replace 1 by trueJules Irenge
Coccinelle reports a warning at use_softirq declaration WARNING: Assignment of 0/1 to bool variable The root cause is use_softirq a variable of bool type is initialised with the integer 1 Replacing 1 with value true solve the issue. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Replace assigned pointer ret value by corresponding boolean valueJules Irenge
Coccinelle reports warnings at rcu_read_lock_held_common() WARNING: Assignment of 0/1 to bool variable To fix this, the assigned pointer ret values are replaced by corresponding boolean value. Given that ret is a pointer of bool type Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Mark rcu_state.gp_seq to detect more concurrent writesPaul E. McKenney
The rcu_state structure's gp_seq field is only to be modified by the RCU grace-period kthread, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. This commit applies KCSAN-specific primitives, so cannot go upstream until KCSAN does. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Get rid of some doc warnings in update.cMauro Carvalho Chehab
This commit escapes *ret, because otherwise the documentation system thinks that this is an incomplete emphasis block: ./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:70: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:82: WARNING: Inline emphasis start-string without end-string. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Fix the (t=0 jiffies) false positiveZhaolong Zhang
It is possible that an over-long grace period will end while the RCU CPU stall warning message is printing. In this case, the estimate of the offending grace period's duration can be erroneous due to refetching of rcu_state.gp_start, which will now be the time of the newly started grace period. Computation of this duration clearly needs to use the start time for the old over-long grace period, not the fresh new one. This commit avoids such errors by causing both print_other_cpu_stall() and print_cpu_stall() to reuse the value previously fetched by their caller. Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Expedite first two FQS scans under callback-overload conditionsPaul E. McKenney
Even if some CPUs have excessive numbers of callbacks, RCU's grace-period kthread will still wait normally between successive force-quiescent-state scans. The first two are the most important, as they are the ones that enlist aid from the scheduler when overloaded. This commit therefore omits the wait before the first and the second force-quiescent-state scan under callback-overload conditions. This approach was inspired by a discussion with Jeff Roberson. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Use data_race() for RCU CPU stall-warning printsPaul E. McKenney
Although the accesses used to determine whether or not a stall should be printed are an integral part of the concurrency algorithm governing use of the corresponding variables, the values that are simply printed are ancillary. As such, it is best to use data_race() for these accesses in order to provide the greatest latitude in the use of KCSAN for the other accesses that are an integral part of the algorithm. This commit therefore changes the relevant uses of READ_ONCE() to data_race(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add WRITE_ONCE() to rcu_node ->boost_tasksPaul E. McKenney
The rcu_node structure's ->boost_tasks field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27srcu: Add data_race() to ->srcu_lock_count and ->srcu_unlock_count arraysPaul E. McKenney
The srcu_data structure's ->srcu_lock_count and ->srcu_unlock_count arrays are read and written locklessly, so this commit adds the data_race() to the diagnostic-print loads from these arrays in order mark them as known and approved data-racy accesses. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being used only by rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add READ_ONCE and data_race() to rcu_node ->boost_tasksPaul E. McKenney
The rcu_node structure's ->boost_tasks field is read locklessly, so this commit adds the READ_ONCE() to one load in order to avoid destructive compiler optimizations. The other load is from a diagnostic print, so data_race() suffices. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add *_ONCE() and data_race() to rcu_node ->exp_tasks plus lockingPaul E. McKenney
There are lockless loads from the rcu_node structure's ->exp_tasks field, so this commit causes all stores to use WRITE_ONCE() and all lockless loads to use READ_ONCE() or data_race(), with the latter for debug prints. This code also did a unprotected traversal of the linked list pointed into by ->exp_tasks, so this commit also acquires the rcu_node structure's ->lock to properly protect this traversal. This list was traversed unprotected only when printing an RCU CPU stall warning for an expedited grace period, so the odds of seeing this in production are not all that high. This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Mark rcu_state.ncpus to detect concurrent writesPaul E. McKenney
The rcu_state structure's ncpus field is only to be modified by the CPU-hotplug CPU-online code path, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27srcu: Add KCSAN stubsPaul E. McKenney
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add KCSAN stubsPaul E. McKenney
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27PM: hibernate: Freeze kernel threads in software_resume()Dexuan Cui
Currently the kernel threads are not frozen in software_resume(), so between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(), system_freezable_power_efficient_wq can still try to submit SCSI commands and this can cause a panic since the low level SCSI driver (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept any SCSI commands: https://lkml.org/lkml/2020/4/10/47 At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying to resolve the issue from hv_storvsc, but with the help of Bart Van Assche, I realized it's better to fix software_resume(), since this looks like a generic issue, not only pertaining to SCSI. Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull pid leak fix from Eric Biederman: "Oleg noticed that put_pid(thread_pid) was not getting called when proc was not compiled in. Let's get that fixed before 5.7 is released and causes problems for anyone" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Put thread_pid in release_task not proc_flush_pid
2020-04-25Merge tag 'sched-urgent-2020-04-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: - an uclamp accounting fix - three frequency invariance fixes and a readability improvement" * tag 'sched-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix reset-on-fork from RT with uclamp x86, sched: Move check for CPU type to caller function x86, sched: Don't enable static key when starting secondary CPUs x86, sched: Account for CPUs with less than 4 cores in freq. invariance x86, sched: Bail out of frequency invariance if base frequency is unknown
2020-04-25Merge tag 'perf-urgent-2020-04-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two changes: - fix exit event records - extend x86 PMU driver enumeration to add Intel Jasper Lake CPU support" * tag 'perf-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: fix parent pid/tid in task exit events perf/x86/cstate: Add Jasper Lake CPU support
2020-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix memory leak in netfilter flowtable, from Roi Dayan. 2) Ref-count leaks in netrom and tipc, from Xiyu Yang. 3) Fix warning when mptcp socket is never accepted before close, from Florian Westphal. 4) Missed locking in ovs_ct_exit(), from Tonghao Zhang. 5) Fix large delays during PTP synchornization in cxgb4, from Rahul Lakkireddy. 6) team_mode_get() can hang, from Taehee Yoo. 7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver, from Niklas Schnelle. 8) Fix handling of bpf XADD on BTF memory, from Jann Horn. 9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson. 10) Missing queue memory release in iwlwifi pcie code, from Johannes Berg. 11) Fix NULL deref in macvlan device event, from Taehee Yoo. 12) Initialize lan87xx phy correctly, from Yuiko Oshino. 13) Fix looping between VRF and XFRM lookups, from David Ahern. 14) etf packet scheduler assumes all sockets are full sockets, which is not necessarily true. From Eric Dumazet. 15) Fix mptcp data_fin handling in RX path, from Paolo Abeni. 16) fib_select_default() needs to handle nexthop objects, from David Ahern. 17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun. 18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca. 19) Correct rx/tx stats in bcmgenet driver, from Doug Berger. 20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix from Luke Nelson. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits) selftests/bpf: Fix a couple of broken test_btf cases tools/runqslower: Ensure own vmlinux.h is picked up first bpf: Make bpf_link_fops static bpftool: Respect the -d option in struct_ops cmd selftests/bpf: Add test for freplace program with expected_attach_type bpf: Propagate expected_attach_type when verifying freplace programs bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd bpf, x86_32: Fix logic error in BPF_LDX zero-extension bpf, x86_32: Fix clobbering of dst for BPF_JSET bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension bpf: Fix reStructuredText markup net: systemport: suppress warnings on failed Rx SKB allocations net: bcmgenet: suppress warnings on failed Rx SKB allocations macsec: avoid to set wrong mtu mac80211: sta_info: Add lockdep condition for RCU list usage mac80211: populate debugfs only after cfg80211 init net: bcmgenet: correct per TX/RX ring statistics net: meth: remove spurious copyright text net: phy: bcm84881: clear settings on link down chcr: Fix CPU hard lockup ...
2020-04-24bpf: Make bpf_link_fops staticZou Wei
Fix the following sparse warning: kernel/bpf/syscall.c:2289:30: warning: symbol 'bpf_link_fops' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/1587609160-117806-1-git-send-email-zou_wei@huawei.com
2020-04-24bpf: Propagate expected_attach_type when verifying freplace programsToke Høiland-Jørgensen
For some program types, the verifier relies on the expected_attach_type of the program being verified in the verification process. However, for freplace programs, the attach type was not propagated along with the verifier ops, so the expected_attach_type would always be zero for freplace programs. This in turn caused the verifier to sometimes make the wrong call for freplace programs. For all existing uses of expected_attach_type for this purpose, the result of this was only false negatives (i.e., freplace functions would be rejected by the verifier even though they were valid programs for the target they were replacing). However, should a false positive be introduced, this can lead to out-of-bounds accesses and/or crashes. The fix introduced in this patch is to propagate the expected_attach_type to the freplace program during verification, and reset it after that is done. Fixes: be8704ff07d2 ("bpf: Introduce dynamic program extensions") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158773526726.293902.13257293296560360508.stgit@toke.dk
2020-04-24bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fdAndrii Nakryiko
Fix bug of not putting bpf_link in LINK_UPDATE command. Also enforce zeroed old_prog_fd if no BPF_F_REPLACE flag is specified. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200424052045.4002963-1-andriin@fb.com
2020-04-24proc: Put thread_pid in release_task not proc_flush_pidEric W. Biederman
Oleg pointed out that in the unlikely event the kernel is compiled with CONFIG_PROC_FS unset that release_task will now leak the pid. Move the put_pid out of proc_flush_pid into release_task to fix this and to guarantee I don't make that mistake again. When possible it makes sense to keep get and put in the same function so it can easily been seen how they pair up. Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc") Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-24Merge tag 'trace-v5.7-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few tracing fixes: - Two fixes for memory leaks detected by kmemleak - Removal of some dead code - A few local functions turned static" * tag 'trace-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Convert local functions in tracing_map.c to static tracing: Remove DECLARE_TRACE_NOARGS ftrace: Fix memory leak caused by not freeing entry in unregister_ftrace_direct() tracing: Fix memory leaks in trace_events_hist.c
2020-04-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull SIGCHLD fix from Eric Biederman: "Christof Meerwald reported that do_notify_parent has not been successfully populating si_pid and si_uid for multi-threaded processes. This is the one-liner fix. Strictly speaking a one-liner plus comment" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Avoid corrupting si_pid and si_uid in do_notify_parent
2020-04-22tracing: Convert local functions in tracing_map.c to staticJason Yan
Fix the following sparse warning: kernel/trace/tracing_map.c:286:6: warning: symbol 'tracing_map_array_clear' was not declared. Should it be static? kernel/trace/tracing_map.c:297:6: warning: symbol 'tracing_map_array_free' was not declared. Should it be static? kernel/trace/tracing_map.c:319:26: warning: symbol 'tracing_map_array_alloc' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20200410073312.38855-1-yanaijie@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-22ftrace: Fix memory leak caused by not freeing entry in ↵Steven Rostedt (VMware)
unregister_ftrace_direct() kmemleak reported the following: unreferenced object 0xffff90d47127a920 (size 32): comm "modprobe", pid 1766, jiffies 4294792031 (age 162.568s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 22 01 00 00 00 00 ad de ........"....... 00 78 12 a7 ff ff ff ff 00 00 b6 c0 ff ff ff ff .x.............. backtrace: [<00000000bb79e72e>] register_ftrace_direct+0xcb/0x3a0 [<00000000295e4f79>] do_one_initcall+0x72/0x340 [<00000000873ead18>] do_init_module+0x5a/0x220 [<00000000974d9de5>] load_module+0x2235/0x2550 [<0000000059c3d6ce>] __do_sys_finit_module+0xc0/0x120 [<000000005a8611b4>] do_syscall_64+0x60/0x230 [<00000000a0cdc49e>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 The entry used to save the direct descriptor needs to be freed when unregistering. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>