Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar:
- Debugging for smp_call_function()
- RT raw/non-raw lock ordering fixes
- Strict grace periods for KASAN
- New smp_call_function() torture test
- Torture-test updates
- Documentation updates
- Miscellaneous fixes
[ This doesn't actually pull the tag - I've dropped the last merge from
the RCU branch due to questions about the series. - Linus ]
* tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
smp: Make symbol 'csd_bug_count' static
kernel/smp: Provide CSD lock timeout diagnostics
smp: Add source and destination CPUs to __call_single_data
rcu: Shrink each possible cpu krcp
rcu/segcblist: Prevent useless GP start if no CBs to accelerate
torture: Add gdb support
rcutorture: Allow pointer leaks to test diagnostic code
rcutorture: Hoist OOM registry up one level
refperf: Avoid null pointer dereference when buf fails to allocate
rcutorture: Properly synchronize with OOM notifier
rcutorture: Properly set rcu_fwds for OOM handling
torture: Add kvm.sh --help and update help message
rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
torture: Update initrd documentation
rcutorture: Replace HTTP links with HTTPS ones
locktorture: Make function torture_percpu_rwsem_init() static
torture: document --allcpus argument added to the kvm.sh script
rcutorture: Output number of elapsed grace periods
rcutorture: Remove KCSAN stubs
rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull documentation updates from Mauro Carvalho Chehab:
"A series of patches addressing warnings produced by make htmldocs.
This includes:
- kernel-doc markup fixes
- ReST fixes
- Updates at the build system in order to support newer versions of
the docs build toolchain (Sphinx)
After this series, the number of html build warnings should reduce
significantly, and building with Sphinx 3.1 or later should now be
supported (although it is still recommended to use Sphinx 2.4.4).
As agreed with Jon, I should be sending you a late pull request by the
end of the merge window addressing remaining issues with docs build,
as there are a number of warning fixes that depends on pull requests
that should be happening along the merge window.
The end goal is to have a clean htmldocs build on Kernel 5.10.
PS. It should be noticed that Sphinx 3.0 is not currently supported,
as it lacks support for C domain namespaces. Such feature, needed in
order to document uAPI system calls with Sphinx 3.x, was added only on
Sphinx 3.1"
* tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (75 commits)
PM / devfreq: remove a duplicated kernel-doc markup
mm/doc: fix a literal block markup
workqueue: fix a kernel-doc warning
docs: virt: user_mode_linux_howto_v2.rst: fix a literal block markup
Input: sparse-keymap: add a description for @sw
rcu/tree: docs: document bkvcache new members at struct kfree_rcu_cpu
nl80211: docs: add a description for s1g_cap parameter
usb: docs: document altmode register/unregister functions
kunit: test.h: fix a bad kernel-doc markup
drivers: core: fix kernel-doc markup for dev_err_probe()
docs: bio: fix a kerneldoc markup
kunit: test.h: solve kernel-doc warnings
block: bio: fix a warning at the kernel-doc markups
docs: powerpc: syscall64-abi.rst: fix a malformed table
drivers: net: hamradio: fix document location
net: appletalk: Kconfig: Fix docs location
dt-bindings: fix references to files converted to yaml
memblock: get rid of a :c:type leftover
math64.h: kernel-docs: Convert some markups into normal comments
media: uAPI: buffer.rst: remove a left-over documentation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
|
|
Changeset 53c72b590b3a ("rcu/tree: cache specified number of objects")
added new members for struct kfree_rcu_cpu, but didn't add the
corresponding at the kernel-doc markup, as repoted when doing
"make htmldocs":
./kernel/rcu/tree.c:3113: warning: Function parameter or member 'bkvcache' not described in 'kfree_rcu_cpu'
./kernel/rcu/tree.c:3113: warning: Function parameter or member 'nr_bkv_objs' not described in 'kfree_rcu_cpu'
So, move the description for bkvcache to kernel-doc, and add a
description for nr_bkv_objs.
Fixes: 53c72b590b3a ("rcu/tree: cache specified number of objects")
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects updates from Thomas Gleixner:
"A small set of updates for debug objects:
- Make all debug object descriptors constant. There is no reason to
have them writeable.
- Free the per CPU object pool after CPU unplug to avoid memory
waste"
* tag 'core-debugobjects-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Free per CPU pool after CPU unplug
treewide: Make all debug_obj_descriptors const
debugobjects: Allow debug_obj_descr to be const
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull v5.10 RCU changes from Paul E. McKenney:
- Debugging for smp_call_function().
- Strict grace periods for KASAN. The point of this series is to find
RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
further disabled by dfefault. Finally, the help text includes
a goodly list of scary caveats.
- New smp_call_function() torture test.
- Torture-test updates.
- Documentation updates.
- Miscellaneous fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix more fallout of recent RCU-lockdep changes in CPU idle code
and two devfreq issues.
Specifics:
- Export rcu_idle_{enter,exit} to modules to fix build issues
introduced by recent RCU-lockdep fixes (Borislav Petkov)
- Add missing return statement to a stub function in the ACPI
processor driver to fix a build issue introduced by recent
RCU-lockdep fixes (Rafael Wysocki)
- Fix recently introduced suspicious RCU usage warnings in the PSCI
cpuidle driver and drop stale comments regarding RCU_NONIDLE()
usage from enter_s2idle_proper() (Ulf Hansson)
- Fix error code path in the tegra30 devfreq driver (Dan Carpenter)
- Add missing information to devfreq_summary debugfs (Chanwoo Choi)"
* tag 'pm-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset
PM / devfreq: tegra30: Disable clock on error in probe
PM / devfreq: Add timer type to devfreq_summary debugfs
cpuidle: Drop misleading comments about RCU usage
cpuidle: psci: Fix suspicious RCU usage
rcu/tree: Export rcu_idle_{enter,exit} to modules
|
|
This should make it harder for the kernel to corrupt the debug object
descriptor, used to call functions to fixup state and track debug objects,
by moving the structure to read-only memory.
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200815004027.2046113-3-swboyd@chromium.org
|
|
Fix this link error:
ERROR: modpost: "rcu_idle_enter" [drivers/acpi/processor.ko] undefined!
ERROR: modpost: "rcu_idle_exit" [drivers/acpi/processor.ko] undefined!
when CONFIG_ACPI_PROCESSOR is built as module. PeterZ says that in light
of ARM needing those soon too, they should simply be exported.
Fixes: 1fecfdbb7acc ("ACPI: processor: Take over RCU-idle for C3-BM idle")
Reported-by: Sven Joachim <svenjoac@gmx.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paul E. McKenney <paulmckrcu@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The rcu_tasks_trace_postgp() function uses for_each_process_thread()
to scan the task list without the benefit of RCU read-side protection,
which can result in use-after-free errors on task_struct structures.
This error was missed because the TRACE01 rcutorture scenario enables
lockdep, but also builds with CONFIG_PREEMPT_NONE=y. In this situation,
preemption is disabled everywhere, so lockdep thinks everywhere can
be a legitimate RCU reader. This commit therefore adds the needed
rcu_read_lock() and rcu_read_unlock().
Note that this bug can occur only after an RCU Tasks Trace CPU stall
warning, which by default only happens after a grace period has extended
for ten minutes (yes, not a typo, minutes).
Fixes: 4593e772b502 ("rcu-tasks: Add stall warnings for RCU Tasks Trace")
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.7.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
When rcu_tasks_trace_postgp() function detects an RCU Tasks Trace
CPU stall, it adds all tasks blocking the current grace period to
a list, invoking get_task_struct() on each to prevent them from
being freed while on the list. It then traverses that list,
printing stall-warning messages for each one that is still blocking
the current grace period and removing it from the list. The list
removal invokes the matching put_task_struct().
This of course means that in the admittedly unlikely event that some
task executes its outermost rcu_read_unlock_trace() in the meantime, it
won't be removed from the list and put_task_struct() won't be executing,
resulting in a task_struct leak. This commit therefore makes the list
removal and put_task_struct() unconditional, stopping the leak.
Note further that this bug can occur only after an RCU Tasks Trace CPU
stall warning, which by default only happens after a grace period has
extended for ten minutes (yes, not a typo, minutes).
Fixes: 4593e772b502 ("rcu-tasks: Add stall warnings for RCU Tasks Trace")
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.7.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The more intense grace-period processing resulting from the 50x RCU
Tasks Trace grace-period speedups exposed the following race condition:
o Task A running on CPU 0 executes rcu_read_lock_trace(),
entering a read-side critical section.
o When Task A eventually invokes rcu_read_unlock_trace()
to exit its read-side critical section, this function
notes that the ->trc_reader_special.s flag is zero and
and therefore invoke wil set ->trc_reader_nesting to zero
using WRITE_ONCE(). But before that happens...
o The RCU Tasks Trace grace-period kthread running on some other
CPU interrogates Task A, but this fails because this task is
currently running. This kthread therefore sends an IPI to CPU 0.
o CPU 0 receives the IPI, and thus invokes trc_read_check_handler().
Because Task A has not yet cleared its ->trc_reader_nesting
counter, this function sees that Task A is still within its
read-side critical section. This function therefore sets the
->trc_reader_nesting.b.need_qs flag, AKA the .need_qs flag.
Except that Task A has already checked the .need_qs flag, which
is part of the ->trc_reader_special.s flag. The .need_qs flag
therefore remains set until Task A's next rcu_read_unlock_trace().
o Task A now invokes synchronize_rcu_tasks_trace(), which cannot
start a new grace period until the current grace period completes.
And thus cannot return until after that time.
But Task A's .need_qs flag is still set, which prevents the current
grace period from completing. And because Task A is blocked, it
will never execute rcu_read_unlock_trace() until its call to
synchronize_rcu_tasks_trace() returns.
We are therefore deadlocked.
This race is improbable, but 80 hours of rcutorture made it happen twice.
The race was possible before the grace-period speedup, but roughly 50x
less probable. Several thousand hours of rcutorture would have been
necessary to have a reasonable chance of making this happen before this
50x speedup.
This commit therefore eliminates this deadlock by setting
->trc_reader_nesting to a large negative number before checking the
.need_qs and zeroing (or decrementing with respect to its initial
value) ->trc_reader_nesting. For its part, the IPI handler's
trc_read_check_handler() function adds a check for negative values,
deferring evaluation of the task in this case. Taken together, these
changes avoid this deadlock scenario.
Fixes: 276c410448db ("rcu-tasks: Split ->trc_reader_need_end")
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.7.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The various RCU tasks flavors currently wait 100 milliseconds between each
grace period in order to prevent CPU-bound loops and to favor efficiency
over latency. However, RCU Tasks Trace needs to have a grace-period
latency of roughly 25 milliseconds, which is completely infeasible given
the 100-millisecond per-grace-period sleep. This commit therefore reduces
this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.
Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Many workloads are quite sensitive to IPIs, and such workloads should
build kernels with CONFIG_TASKS_TRACE_RCU_READ_MB=y to prevent RCU
Tasks Trace from using them under normal conditions. However, other
workloads are quite happy to permit more IPIs if doing so makes BPF
program updates go faster. This commit therefore sets the default
value for the rcupdate.rcu_task_ipi_delay kernel parameter to zero for
kernels that have been built with CONFIG_TASKS_TRACE_RCU_READ_MB=n,
while retaining the old default of (HZ / 10) for kernels that have
indicated an aversion to IPIs via CONFIG_TASKS_TRACE_RCU_READ_MB=y.
Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Commit 8344496e8b49 ("rcu-tasks: Conditionally compile
show_rcu_tasks_gp_kthreads()") introduced conditional
compilation of several functions, but forgot one occurrence of
show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of
an unused static function. This commit uses "static inline" to avoid
these complaints and possibly also to avoid emitting an actual definition
of this function.
Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()")
Cc: <stable@vger.kernel.org> # 5.8.x
Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The RCU Tasks Trace grace periods are too slow, as in 40x slower than
those of RCU Tasks. This is due to my having assumed a one-second grace
period was OK, and thus not having optimized any further. This commit
provides the first step in this optimization process, namely by allowing
the task_list scan backoff interval to be specified on a per-flavor basis,
and then speeding up the scans for RCU Tasks Trace. However, kernels
built with CONFIG_TASKS_TRACE_RCU_READ_MB=y continue to use the old slower
backoff, consistent with that Kconfig option's goal of reducing IPIs.
Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The n_heavy_reader_attempts, n_heavy_reader_updates, and
n_heavy_reader_ofl_updates variables are not used outside of their
translation unit, so this commit marks them static.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
strictgp.2020.08.24a: Strict grace periods for KASAN testing.
|
|
scftorture.2020.08.24a: Torture tests for smp_call_function() and friends.
|
|
'torture.2020.08.24a' into HEAD
doc.2020.08.24a: Documentation updates.
fixes.2020.09.03b: Miscellaneous fixes.
torture.2020.08.24a: Torture-test updates.
|
|
CPUs can go offline shortly after kfree_call_rcu() has been invoked,
which can leave memory stranded until those CPUs come back online.
This commit therefore drains the kcrp of each CPU, not just the
ones that happen to be online.
Acked-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_segcblist_accelerate() function returns true iff it is necessary
to request another grace period. A tracing session showed that this
function unnecessarily requests grace periods.
For example, consider the following sequence of events:
1. Callbacks are queued only on the NEXT segment of CPU A's callback list.
2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT.
3. Thus rcu_segcblist_accelerate() returns true, requesting grace period N.
4. RCU's grace-period kthread wakes up on CPU B and starts grace period N.
4. CPU A notices the new grace period and invokes RCU_SOFTIRQ.
5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but
there are no new callbacks. However, rcu_segcblist_accelerate()
nevertheless (uselessly) requests a new grace period N+1.
This extra grace period results in additional lock contention and also
additional wakeups, all for no good reason.
This commit therefore adds a check to rcu_segcblist_accelerate() that
prevents the return of true when there are no new callbacks.
This change reduces the number of grace periods (GPs) and wakeups in each
of eleven five-second rcutorture runs as follows:
+----+-------------------+-------------------+
| # | Number of GPs | Number of Wakeups |
+====+=========+=========+=========+=========+
| 1 | With | Without | With | Without |
+----+---------+---------+---------+---------+
| 2 | 75 | 89 | 113 | 119 |
+----+---------+---------+---------+---------+
| 3 | 62 | 91 | 105 | 123 |
+----+---------+---------+---------+---------+
| 4 | 60 | 79 | 98 | 110 |
+----+---------+---------+---------+---------+
| 5 | 63 | 79 | 99 | 112 |
+----+---------+---------+---------+---------+
| 6 | 57 | 89 | 96 | 123 |
+----+---------+---------+---------+---------+
| 7 | 64 | 85 | 97 | 118 |
+----+---------+---------+---------+---------+
| 8 | 58 | 83 | 98 | 113 |
+----+---------+---------+---------+---------+
| 9 | 57 | 77 | 89 | 104 |
+----+---------+---------+---------+---------+
| 10 | 66 | 82 | 98 | 119 |
+----+---------+---------+---------+---------+
| 11 | 52 | 82 | 83 | 117 |
+----+---------+---------+---------+---------+
The reduction in the number of wakeups ranges from 5% to 40%.
Cc: urezki@gmail.com
[ paulmck: Rework commit log and comment. ]
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit adds an rcutorture.leakpointer module parameter that
intentionally leaks an RCU-protected pointer out of the RCU read-side
critical section and checks to see if the corresponding grace period
has elapsed, emitting a WARN_ON_ONCE() if so. This module parameter can
be used to test facilities like CONFIG_RCU_STRICT_GRACE_PERIOD that end
grace periods quickly.
While in the area, also document rcutorture.irqreader, which was
previously left out.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently, registering and unregistering the OOM notifier is done
right before and after the test, respectively. This will not work
well for multi-threaded tests, so this commit hoists this registering
and unregistering up into the rcu_torture_fwd_prog_init() and
rcu_torture_fwd_prog_cleanup() functions.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently in the unlikely event that buf fails to be allocated it
is dereferenced a few times. Use the errexit flag to determine if
buf should be written to to avoid the null pointer dereferences.
Addresses-Coverity: ("Dereference after null check")
Fixes: f518f154ecef ("refperf: Dynamically allocate experiment-summary output buffer")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The current rcutorture forward-progress code assumes that it is the
only cause of out-of-memory (OOM) events. For script-based rcutorture
testing, this assumption is in fact correct. However, testing based
on modprobe/rmmod might well encounter external OOM events, which could
happen at any time.
This commit therefore properly synchronizes the interaction between
rcutorture's forward-progress testing and its OOM notifier by adding a
global mutex.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The conversion of rcu_fwds to dynamic allocation failed to actually
allocate the required structure. This commit therefore allocates it,
frees it, and updates rcu_fwds accordingly. While in the area, it
abstracts the cleanup actions into rcu_torture_fwd_prog_cleanup().
Fixes: 5155be9994e5 ("rcutorture: Dynamically allocate rcu_fwds structure")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit adds code to print the grace-period number at the start
of the test along with both the grace-period number and the number of
elapsed grace periods at the end of the test. Note that variants of
RCU)without the notion of a grace-period number (for example, Tiny RCU)
just print zeroes.
[ paulmck: Adjust commit log. ]
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
KCSAN is now in mainline, so this commit removes the stubs for the
data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS()
macros.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The "cpu" parameter to rcu_report_qs_rdp() is not used, with rdp->cpu
being used instead. Furtheremore, every call to rcu_report_qs_rdp()
invokes it on rdp->cpu. This commit therefore removes this unused "cpu"
parameter and converts a check of rdp->cpu against smp_processor_id()
to a WARN_ON_ONCE().
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The CONFIG_PREEMPT=n instance of rcu_read_unlock is even more
aggressively than that of CONFIG_PREEMPT=y in deferring reporting
quiescent states to the RCU core. This is just what is wanted in normal
use because it reduces overhead, but the resulting delay is not what
is wanted for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
This commit therefore adds an rcu_read_unlock_strict() function that
checks for exceptional conditions, and reports the newly started
quiescent state if it is safe to do so, also doing a spin-delay if
requested via rcutree.rcu_unlock_delay. This commit also adds a call
to rcu_read_unlock_strict() from the CONFIG_PREEMPT=n instance of
__rcu_read_unlock().
[ paulmck: Fixed bug located by kernel test robot <lkp@intel.com> ]
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
A kernel built with CONFIG_RCU_STRICT_GRACE_PERIOD=y needs a quiescent
state to appear very shortly after a CPU has noticed a new grace period.
Placing an RCU reader immediately after this point is ineffective because
this normally happens in softirq context, which acts as a big RCU reader.
This commit therefore introduces a new per-CPU work_struct, which is
used at the end of rcu_core() processing to schedule an RCU read-side
critical section from within a clean environment.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The goal of this series is to increase the probability of tools like
KASAN detecting that an RCU-protected pointer was used outside of its
RCU read-side critical section. Thus far, the approach has been to make
grace periods and callback processing happen faster. Another approach
is to delay the pointer leaker. This commit therefore allows a delay
to be applied to exit from RCU read-side critical sections.
This slowdown is specified by a new rcutree.rcu_unlock_delay kernel boot
parameter that specifies this delay in microseconds, defaulting to zero.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently, each CPU discovers the end of a given grace period on its
own time, which is again good for efficiency but bad for fast grace
periods, given that it is things like kfree() within the RCU callbacks
that will cause trouble for pointers leaked from RCU read-side critical
sections. This commit therefore uses on_each_cpu() to IPI each CPU
after grace-period cleanup in order to inform each CPU of the end of
the old grace period in a timely manner, but only in kernels build with
CONFIG_RCU_STRICT_GRACE_PERIOD=y.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently, each CPU discovers the beginning of a given grace period
on its own time, which is again good for efficiency but bad for fast
grace periods. This commit therefore uses on_each_cpu() to IPI each
CPU after grace-period initialization in order to inform each CPU of
the new grace period in a timely manner, but only in kernels build with
CONFIG_RCU_STRICT_GRACE_PERIOD=y.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
A given CPU normally notes a new grace period during one RCU_SOFTIRQ,
but avoids reporting the corresponding quiescent state until some later
RCU_SOFTIRQ. This leisurly approach improves efficiency by increasing
the number of update requests served by each grace period, but is not
what is needed for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
This commit therefore adds a new rcu_strict_gp_check_qs() function
which, in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, simply enters and
immediately exist an RCU read-side critical section. If the CPU is
in a quiescent state, the rcu_read_unlock() will attempt to report an
immediate quiescent state. This rcu_strict_gp_check_qs() function is
invoked from note_gp_changes(), so that a CPU just noticing a new grace
period might immediately report a quiescent state for that grace period.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_preempt_deferred_qs_irqrestore() function is invoked at
the end of an RCU read-side critical section (for example, directly
from rcu_read_unlock()) and, if .need_qs is set, invokes rcu_qs() to
report the new quiescent state. This works, except that rcu_qs() only
updates per-CPU state, leaving reporting of the actual quiescent state
to a later call to rcu_report_qs_rdp(), for example from within a later
RCU_SOFTIRQ instance. Although this approach is exactly what you want if
you are more concerned about efficiency than about short grace periods,
in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, short grace periods are
the name of the game.
This commit therefore makes rcu_preempt_deferred_qs_irqrestore() directly
invoke rcu_report_qs_rdp() in CONFIG_RCU_STRICT_GRACE_PERIOD=y, thus
shortening grace periods.
Historical note: To the best of my knowledge, causing rcu_read_unlock()
to directly report a quiescent state first appeared in Jim Houston's
and Joe Korty's JRCU. This is the second instance of a Linux-kernel RCU
feature being inspired by JRCU, the first being RCU callback offloading
(as in the RCU_NOCB_CPU Kconfig option).
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The ->rcu_read_unlock_special.b.need_qs field in the task_struct
structure indicates that the RCU core needs a quiscent state from the
corresponding task. The __rcu_read_unlock() function checks this (via
an eventual call to rcu_preempt_deferred_qs_irqrestore()), and if set
reports a quiscent state immediately upon exit from the outermost RCU
read-side critical section.
Currently, this flag is only set when the scheduling-clock interrupt
decides that the current RCU grace period is too old, as in about
one full second too old. But if the kernel has been built with
CONFIG_RCU_STRICT_GRACE_PERIOD=y, we clearly do not want to wait that
long. This commit therefore sets the .need_qs field immediately at the
start of the RCU read-side critical section from within __rcu_read_lock()
in order to unconditionally enlist help from __rcu_read_unlock().
But note the additional check for rcu_state.gp_kthread, which prevents
attempts to awaken RCU's grace-period kthread during early boot before
there is a scheduler. Leaving off this check results in early boot hangs.
So early that there is no console output. Thus, this additional check
fails until such time as RCU's grace-period kthread has been created,
avoiding these empty-console hangs.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The value of DEFAULT_RCU_BLIMIT is normally set to 10, the idea being to
avoid needless response-time degradation due to RCU callback invocation.
However, when CONFIG_RCU_STRICT_GRACE_PERIOD=y it is better to avoid
throttling callback execution in order to better detect pointer
leaks from RCU read-side critical sections. This commit therefore
sets the value of DEFAULT_RCU_BLIMIT to 1000 in kernels built with
CONFIG_RCU_STRICT_GRACE_PERIOD=y.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
If there are idle CPUs, RCU's grace-period kthread will wait several
jiffies before even thinking about polling them. This promotes
efficiency, which is normally a good thing, but when the kernel
has been built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, we care more
about short grace periods. This commit therefore restricts the
default jiffies_till_first_fqs value to zero in kernels built with
CONFIG_RCU_STRICT_GRACE_PERIOD=y, which causes RCU's grace-period kthread
to poll for idle CPUs immediately after starting a grace period.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Because strict RCU grace periods will complete more quickly, they will
experience greater lock contention on each leaf rcu_node structure's
->lock. This commit therefore reduces the leaf fanout in order to reduce
this lock contention.
Note that this also has the effect of reducing the number of CPUs
supported to 16 in the case of CONFIG_RCU_FANOUT_LEAF=2 or 81 in the
case of CONFIG_RCU_FANOUT_LEAF=3. However, greater numbers of CPUs are
probably a bad idea when using CONFIG_RCU_STRICT_GRACE_PERIOD=y. Those
wishing to live dangerously are free to edit their kernel/rcu/Kconfig
files accordingly.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
People running automated tests have asked for a way to make RCU minimize
grace-period duration in order to increase the probability of KASAN
detecting a pointer being improperly leaked from an RCU read-side critical
section, for example, like this:
rcu_read_lock();
p = rcu_dereference(gp);
do_something_with(p); // OK
rcu_read_unlock();
do_something_else_with(p); // BUG!!!
The rcupdate.rcu_expedited boot parameter is a start in this direction,
given that it makes calls to synchronize_rcu() instead invoke the faster
(and more wasteful) synchronize_rcu_expedited(). However, this does
nothing to shorten RCU grace periods that are instead initiated by
call_rcu(), and RCU pointer-leak bugs can involve call_rcu() just as
surely as they can synchronize_rcu().
This commit therefore adds a RCU_STRICT_GRACE_PERIOD Kconfig option
that will be used to shorten normal (non-expedited) RCU grace periods.
This commit also dumps out a message when this option is in effect.
Later commits will actually shorten grace periods.
Reported-by Jann Horn <jannh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit further avoids conflation of rcuperf with the kernel's perf
feature by renaming kernel/rcu/rcuperf.c to kernel/rcu/rcuscale.c, and
also by similarly renaming the functions and variables inside this file.
This has the side effect of changing the names of the kernel boot
parameters, so kernel-parameters.txt and ver_functions.sh are also
updated. The rcutorture --torture type was also updated from rcuperf
to rcuscale.
[ paulmck: Fix bugs located by Stephen Rothwell. ]
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The x86/entry work removed all uses of __rcu_is_watching(), therefore
this commit removes it entirely.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The RCU grace-period kthread's force-quiescent state (FQS) loop should
never see an offline CPU that has not yet reported a quiescent state.
After all, the offline CPU should have reported a quiescent state
during the CPU-offline process, or, failing that, by rcu_gp_init()
if it ran concurrently with either the CPU going offline or the last
task on a leaf rcu_node structure exiting its RCU read-side critical
section while all CPUs corresponding to that structure are offline.
The FQS loop should therefore complain if it does see an offline CPU
that has not yet reported a quiescent state.
And it does, but only once the grace period has been in force for a
full second. This commit therefore makes this warning more aggressive,
so that it will trigger as soon as the condition makes its appearance.
Light testing with TREE03 and hotplug shows no warnings. This commit
also converts the warning to WARN_ON_ONCE() in order to stave off possible
log spam.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Since at least v4.19, the FQS loop no longer reports quiescent states
for offline CPUs except in emergency situations.
This commit therefore fixes the comment in rcu_gp_init() to match the
current code.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit increases RCU's ability to defend itself by emitting a warning
if one of the nocb CB kthreads invokes the GP kthread's wait function.
This warning augments a similar check that is carried out at the end
of rcutorture testing and when RCU CPU stall warnings are emitted.
The problem with those checks is that the miscreants have long since
departed and disposed of any and all evidence.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
When the rcu_cpu_started per-CPU variable was added by commit
f64c6013a202 ("rcu/x86: Provide early rcu_cpu_starting() callback"),
there were multiple sets of per-CPU rcu_data structures. Therefore, the
rcu_cpu_started flag was added as a separate per-CPU variable. But now
there is only one set of per-CPU rcu_data structures, so this commit
moves rcu_cpu_started to a new ->cpu_started field in that structure.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Given that sysfs can change the value of rcu_cpu_stall_ftrace_dump at any
time, this commit adds a READ_ONCE() to the accesses to that variable.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|