summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-06-08rcu: Move docbook comments out of rcupdate.hPaul E. McKenney
The include/linux/rcupdate.h file is included by more than 200 files, so shrinking it should provide some build-time benefits. This commit therefore moves several docbook comments from rcupdate.h to kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus reducing the number of times that the compiler has to scan these comments. This likely provides only a small benefit, but every little bit helps. This commit also fixes a malformed bulleted list noted by the 0day Test Robot. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Flag need for rcu_node_tree.h and rcu_segcblist.h visibilityPaul E. McKenney
The rcu_node_tree.h and rcu_segcblist.h header files in the include/linux directory might appear at first sight to be internal to the RCU implementation. However, the definitions in these files are needed to determine the size of TREE SRCU's srcu_struct structure, so they must be externally visible, which is why they live in include/linux. This commit adds comments to this effect to those files. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Add memory barriers for NOCB leader wakeupPaul E. McKenney
Wait/wakeup operations do not guarantee ordering on their own. Instead, either locking or memory barriers are required. This commit therefore adds memory barriers to wake_nocb_leader() and nocb_leader_wait(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Krister Johansen <kjlx@templeofstupid.com> Cc: <stable@vger.kernel.org> # 4.6.x
2017-06-08rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKEPaul E. McKenney
The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags are used to mediate wakeups for the no-CBs CPU kthreads. The "NOGP" really doesn't make any sense, so this commit does s/NOGP/NOCB/. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08sched: Rely on synchronize_rcu_mult() de-duplicationPaul E. McKenney
The synchronize_rcu_mult() function now detects duplicate requests for the same grace-period flavor and waits only once for each flavor. This commit therefore removes the ugly #ifdef from sched_cpu_deactivate() because synchronize_rcu_mult(call_rcu, call_rcu_sched) now does what the #ifdef used to be needed for. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2017-06-08rcu: Make synchronize_rcu_mult() check for duplicatesPaul E. McKenney
Currently, doing synchronize_rcu_mult(call_rcu, call_rcu) might (or might not) wait for two RCU grace periods. One approach is of course "don't do that!", but in CONFIG_PREEMPT=n kernels, synchronize_rcu_mult(call_rcu, call_rcu_sched) does exactly that. This results in an ugly #ifdef in sched_cpu_deactivate(). This commit therefore makes __wait_rcu_gp() check for duplicates, which in turn allows duplicates to be passed to synchronize_rcu_mult() without risk of waiting twice on the same type of grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Add DEBUG_OBJECTS_RCU_HEAD functionalityPaul E. McKenney
This commit adds DEBUG_OBJECTS_RCU_HEAD checking to detect call_srcu() counterparts to double-free bugs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Shrink Tiny SRCU a bitPaul E. McKenney
In Tiny SRCU, __srcu_read_lock() is a trivial function, outweighed by its EXPORT_SYMBOL_GPL(), and on many architectures, its call sequence. This commit therefore moves it to srcutiny.h so that it can be inlined. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Make SRCU be once again optionalPaul E. McKenney
Commit d160a727c40e ("srcu: Make SRCU be built by default") in response to build errors, which were caused by code that included srcu.h despite !SRCU. However, srcutiny.o is almost 2K of code, which is not insignificant for those attempting to run the Linux kernel on IoT devices. This commit therefore makes SRCU be once again optional, and adjusts srcu.h to allow error-free inclusion in !SRCU kernel builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Nicolas Pitre <nico@linaro.org>
2017-06-08rcu: Add lockdep_assert_held() teeth to tree_plugin.hPaul E. McKenney
Comments can be helpful, but assertions carry more force. This commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to enforce lock-held and interrupt-disabled preconditions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Add lockdep_assert_held() teeth to tree.cPaul E. McKenney
Comments can be helpful, but assertions carry more force. This commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to enforce lock-held and interrupt-disabled preconditions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Print non-default exp_holdoff values at boot timePaul E. McKenney
This commit makes srcu_bootup_announce() check for non-default values of the auto-expedite holdoff time exp_holdoff and print a message if so. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Make exp_holdoff module parameter be staticPaul E. McKenney
Because exp_holdoff is not used outside of srcutree.c, it can be static. This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Update rcu_bootup_announce_oddness()Paul E. McKenney
This commit updates rcu_bootup_announce_oddness() to check additional Kconfig options and module/boot parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Print out rcupdate.c non-default boot-time settingsPaul E. McKenney
This commit adds a rcupdate_announce_bootup_oddness() function to print out non-default values of significant kernel boot parameter settings to aid in debugging. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()Paul E. McKenney
This commit adds WARN_ON_ONCE() calls that trigger if either rcu_sched_qs() or rcu_bh_qs() are invoked with preemption enabled. In the immortal words of Peter Zijlstra: "these are much harder to ignore than comments". Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08doc: Take tail recursion into account in RCU requirementsPaul E. McKenney
This commit classifies tail recursion as an alternative way to write a loop, with similar limitations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Document auto-expediting requirementPaul E. McKenney
This commit documents the auto-expediting requirement satisfied by commits 2da4b2a7fd8d ("srcu: Expedite first synchronize_srcu() when idle") and 22607d66bbc3 ("srcu: Specify auto-expedite holdoff time"). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Add "git diff" output to testid.txt filePaul E. McKenney
Currently, when running from a git archive, the testid.txt file contains only the branch name, the output of "git status", and the SHA-1 of the current HEAD. This is useful, but does not uniquely identify the source code that was built. This commit therefore adds the output of "git diff HEAD", which means that if two testid.txt files compare equal, they correspond to exactly the same source code. Give or take the possibility of SHA-1 collisions, that is. ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Add writer_holdoff boot parameterPaul E. McKenney
This commit adds a writer_holdoff boot parameter to rcuperf, which is intended to be used to test Tree SRCU's auto-expediting. This boot parameter is in microseconds, and defaults to zero (that is, disabled). Set it to a bit larger than srcutree.exp_holdoff, keeping the nanosecond/microsecond conversion, to force Tree SRCU to auto-expedite more aggressively. This commit also adds documentation for this parameter, and fixes some alphabetization while in the neighborhood. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu-cbmc: Use /usr/bin/awk instead of /bin/awkPriyalee Kushwaha
Most OS distribution have awk in /usr/bin not in /bin Without this patch, kernel-devsrc fails to build as runtime dependency for srcu-cbmc script /bin/awk is not found. Signed-off-by: Kushwaha, Priyalee <priyalee.kushwaha@intel.com> Acked-by: Lance Roy <ldr709@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Set more user-friendly defaultsPaul E. McKenney
Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built as a module, rcuperf.shutdown. This commit therefore sets the default for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown to zero if rcuperf is built as a module and to one otherwise. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Shrink Tiny SRCU a bit morePaul E. McKenney
This commit rearranges Tiny SRCU's srcu_struct structure, substitutes u8 for bool, and shrinks counters down to short. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Reduce CPUs dedicated to testing Classic SRCUPaul E. McKenney
Given that the plan is to retire Classic SRCU in the near future, this commit reduces the number of CPUs dedicated to testing Classic SRCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Make Classic and Tree SRCU announce themselves at bootupPaul E. McKenney
Currently, the only way to tell whether a given kernel is running Classic, Tiny, or Tree SRCU is to look at the .config file, which can easily be lost or associated with the wrong kernel. This commit therefore has Classic and Tree SRCU identify themselves at boot time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Add the ability to test tiny RCU flavorsPaul E. McKenney
This commit adds a TINY rcuperf test scenario, which allows performance testing of Tiny RCU and Tiny SRCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08docs: Fix typo in Documentation/memory-barriers.txtStan Drozd
This commit changes "architecure" to the correct spelling, "architecture". Signed-off-by: Stan Drozd <drozdziak1@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08atomics: Add header comment so spin_unlock_wait()Paul E. McKenney
There is material describing the ordering guarantees provided by spin_unlock_wait(), but it is not necessarily easy to find. This commit therefore adds a docbook header comment to this function informally describing its semantics. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org>
2017-06-08doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()Paul E. McKenney
This commit explicitly states that surrounding a non-value-returning atomic read-modify atomic operations provides full ordering, just as is provided by value-returning atomic read-modify-write operations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Add test for dynamically initialized srcu_structPaul E. McKenney
This commit adds a perf_type of "srcud", which species that rcuperf test SRCU on a dynamically initialized srcu_struct. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08checkpatch: Remove checks for expedited grace periodsPaul E. McKenney
There was a time when the expedited grace-period primitives (synchronize_rcu_expedited(), synchronize_rcu_bh_expedited(), and synchronize_sched_expedited()) used rather antisocial kernel facilities like try_stop_cpus(). However, they have since been housebroken to use only single-CPU IPIs, and typically cause less disturbance than a scheduling-clock interrupt. Furthermore, this disturbance can be eliminated entirely using NO_HZ_FULL on the one hand or the rcupdate.rcu_normal boot parameter on the other. This commit therefore removes checkpatch's complaints about use of the expedited RCU primitives. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Make sync_rcu_preempt_exp_done() return boolPaul E. McKenney
The sync_rcu_preempt_exp_done() function returns a logical expression, but its return type is nevertheless int. This commit therefore changes the return type to bool. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Add a Kconfig-fragment file for Classic SRCUPaul E. McKenney
This commit adds a Kconfig-fragment file for Classic SRCU to ease performance comparisons with Tree SRCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Add ability to performance-test call_rcu() and friendsPaul E. McKenney
This commit upgrades rcuperf so that it can do performance testing on asynchronous grace-period primitives such as call_srcu(). There is a new rcuperf.gp_async module parameter that specifies this new behavior, with the pre-existing rcuperf.gp_exp testing expedited grace periods such as synchronize_rcu_expedited, and with the default being to test synchronous non-expedited grace periods such as synchronize_rcu(). There is also a new rcuperf.gp_async_max module parameter that specifies the maximum number of outstanding callbacks per writer kthread, defaulting to 1,000. When this limit is exceeded, the writer thread invokes the appropriate flavor of rcu_barrier() to wait for callbacks to drain. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Removed the redundant initialization noted by Arnd Bergmann. ]
2017-06-08rcu: Remove obsolete reference to synchronize_kernel()Paul E. McKenney
The synchronize_kernel() primitive was removed in favor of synchronize_sched() more than a decade ago, and it seems likely that rather few kernel hackers are familiar with it. Its continued presence is therefore providing more confusion than enlightenment. This commit therefore removes the reference from the synchronize_sched() header comment, and adds the corresponding information to the synchronize_rcu(0 header comment. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Remove conflicting Kconfig optionsPaul E. McKenney
The TREE and TREE54 rcuperf scenarios' Kconfig fragment files specified conflicting values for CONFIG_RCU_TRACE. This commit therefore removes the =n line in favor of the =y line. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Defer expedited/normal check to end of testPaul E. McKenney
Current rcuperf startup checks to see if the user asked to measure only expedited grace periods, yet constrained all grace periods to be normal, or if the user asked to measure only normal grace periods, yet constrained all grace periods to be expedited. Useless tests of this sort are aborted. Unfortunately, making RCU work through the mid-boot dead zone [1] puts RCU into expedited-only mode during that zone. Which happens to also be the exact time that rcuperf carries out the aforementioned check. So if the user asks rcuperf to measure only normal grace periods (the default), rcuperf will now always complain and terminate the test. This commit therefore moves the checks to rcu_perf_cleanup(). This has the disadvantage of failing to abort useless tests, but avoids the need to create yet another kthread and the need to do fiddly checks involving the holdoff time. (Yes, another approach is to do the checks in a late-stage init function, but that would require some way to communicate badness to rcuperf's kthreads, and seems not worth the bother.) [1] https://lwn.net/Articles/716148/ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Complain if blocking in preemptible RCU read-side critical sectionPaul E. McKenney
Although preemptible RCU allows its read-side critical sections to be preempted, general blocking is forbidden. The reason for this is that excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a voluntarily blocked task doesn't care how high you boost its priority. Because preemptible RCU is a global mechanism, one ill-behaved reader hurts everyone. Hence the prohibition against general blocking in RCU-preempt read-side critical sections. Preemption yes, blocking no. This commit enforces this prohibition. There is a special exception for the -rt patchset (which they kindly volunteered to implement): It is OK to block (as opposed to merely being preempted) within an RCU-preempt read-side critical section, but only if the blocking is subject to priority inheritance. This exception permits CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble. Why doesn't this exception also apply to mainline's rt_mutex? Because of the possibility that someone does general blocking while holding an rt_mutex. Yes, the priority boosting will affect the rt_mutex, but it won't help with the task doing general blocking while holding that rt_mutex. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Eliminate possibility of destructive counter overflowPaul E. McKenney
Earlier versions of Tree SRCU were subject to a counter overflow bug that could theoretically result in too-short grace periods. This commit eliminates this problem by adding an update-side memory barrier. The short explanation is that if the updater sums the unlock counts too late to see a given __srcu_read_unlock() increment, that CPU's next __srcu_read_lock() must see the new value of ->srcu_idx, thus incrementing the other bank of counters. This eliminates the possibility of destructive counter overflow as long as the srcu_read_lock() nesting level does not exceed floor(ULONG_MAX/NR_CPUS/2), which should be an eminently reasonable nesting limit, especially on 64-bit systems. Reported-by: Lance Roy <ldr709@gmail.com> Suggested-by: Lance Roy <ldr709@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Update test scenarios based on new Kconfig dependenciesPaul E. McKenney
A number of the rcutorture test scenarios were not using the desired Kconfig options because dependencies were preventing the selections in the Kconfig-fragment files from being honored. This commit therefore updates the Kconfig-fragment files to account for these changes in dependencies. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* optionsPaul E. McKenney
The rcutorture scripting handles the CONFIG_*_TORTURE_TEST Kconfig options specially, and therefore greps them out of the Kconfig-fragment files. Unfortunately, a poor choice of grep pattern means that the CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP, CONFIG_RCU_TORTURE_TEST_SLOW_INIT, and CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT Kconfig options are also grepped out, preventing rcutorture from using them. This commit therefore fixes the offending grep pattern to focus only on the CONFIG_*_TORTURE_TEST Kconfig options. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Prevent rcu_barrier() from starting needless grace periodsPaul E. McKenney
Currently rcu_barrier() uses call_rcu() to enqueue new callbacks on each CPU with a non-empty callback list. This works, but means that rcu_barrier() forces grace periods that are not otherwise needed. The key point is that rcu_barrier() never needs to wait for a grace period, but instead only for all pre-existing callbacks to be invoked. This means that rcu_barrier()'s new callbacks should be placed in the callback-list segment containing the last pre-existing callback. This commit makes this change using the new rcu_segcblist_entrain() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Add a scenario for Classic SRCUPaul E. McKenney
A robust combination of paranoia and cowardice has resulted in retaining Classic SRCU (CONFIG_CLASSIC_SRCU) as a backup for the shiny new Tiny and Tree SRCU implementations. If it is to be a viable backup, it of course needs to be tested. This commit therefore adds an rcutorture scenario named SRCU-C for Classic SRCU. This commit also adds this scenario to the set that are run by default. Once sufficient good experience has accumulated for Tiny and Tree SRCU, this test will be removed, along with the Classic SRCU implementation itself. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Add a scenario for Tiny SRCUPaul E. McKenney
This commit adds an SRCU-t rcutorture scenario for the new Tiny SRCU implementation, removing the need to pass the --bootargs parameter to kvm.sh to run Tiny SRCU tests. This commit also adds SRCU-t to the set of scenarios that are run by default. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Fix bug in reporting Kconfig mis-settingsPaul E. McKenney
Kconfig "select" clauses can defeat Kconfig-fragment file attempts to clear a given Kconfig variable, and dependencies can defeat attempts to set a given Kconfig variable. Because "select" clauses and dependencies can be added at any time, there needs to be a way to verify that the Kconfig-fragment file's requests were honored. And there is, except that it is buggy. This commit therefore provides the needed fix. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Add three-level tree test for Tree SRCUPaul E. McKenney
This commit adds a test for a three-level srcu_node tree for Tree SRCU in the existing SRCU-P scenario. This requires enabling CONFIG_RCU_EXPERT, so the CONFIG_RCU_EXPERT=n scenario is now SRCU-N. The reason for using SRCU-P for the tall tree is that preemption raises the possibility of locating more bugs than does the non-preemptive SRCU-N. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcutorture: Add lockdep to one of the SRCU scenariosPaul E. McKenney
Back when SRCU was simpler, there wasn't much need for lockdep. However, with Tree SRCU, it is needed. This commit therefore adds CONFIG_PROVE_LOCKING to the SRCU-P scenario. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Allow use of Classic SRCU from both process and interrupt contextPaolo Bonzini
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Cc: stable@vger.kernel.org Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <linuc.decode@gmail.com> Suggested-by: Linu Cherian <linuc.decode@gmail.com> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Allow use of Tiny/Tree SRCU from both process and interrupt contextPaolo Bonzini
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on a single variable. Therefore, as Peter Zijlstra pointed out, Tiny SRCU's implementation already supports mixed-context use of srcu_read_lock() and srcu_read_unlock(), at least as long as uses of srcu_read_lock() and srcu_read_unlock() in each handler are nested and paired properly. In other words, it is still illegal to (say) invoke srcu_read_lock() in an interrupt handler and to invoke the matching srcu_read_unlock() in a softirq handler. Therefore, the only change required for Tiny SRCU is to its comments. Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <linuc.decode@gmail.com> Suggested-by: Linu Cherian <linuc.decode@gmail.com> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-04Linux 4.12-rc4Linus Torvalds