summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2012-01-06 08:02:40 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2012-01-06 08:02:40 -0800
commit423d091dfe58d3109d84c408810a7cfa82f6f184 (patch)
tree43c4385d1dc7219582f924d42db1f3e203a577bd /Documentation
parent1483b3823542c9721eddf09a077af1e02ac96b50 (diff)
parent919b83452b2e7c1dbced0456015508b4b9585db3 (diff)
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) cpu: Export cpu_up() rcu: Apply ACCESS_ONCE() to rcu_boost() return value Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" docs: Additional LWN links to RCU API rcu: Augment rcu_batch_end tracing for idle and callback state rcu: Add rcutorture tests for srcu_read_lock_raw() rcu: Make rcutorture test for hotpluggability before offlining CPUs driver-core/cpu: Expose hotpluggability to the rest of the kernel rcu: Remove redundant rcu_cpu_stall_suppress declaration rcu: Adaptive dyntick-idle preparation rcu: Keep invoking callbacks if CPU otherwise idle rcu: Irq nesting is always 0 on rcu_enter_idle_common rcu: Don't check irq nesting from rcu idle entry/exit rcu: Permit dyntick-idle with callbacks pending rcu: Document same-context read-side constraints rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass rcu: Remove dynticks false positives and RCU failures rcu: Reduce latency of rcu_prepare_for_idle() rcu: Eliminate RCU_FAST_NO_HZ grace-period hang rcu: Avoid needlessly IPIing CPUs at GP end ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/checklist.txt6
-rw-r--r--Documentation/RCU/rcu.txt10
-rw-r--r--Documentation/RCU/stallwarn.txt16
-rw-r--r--Documentation/RCU/torture.txt13
-rw-r--r--Documentation/RCU/trace.txt4
-rw-r--r--Documentation/RCU/whatisRCU.txt19
-rw-r--r--Documentation/atomic_ops.txt87
-rw-r--r--Documentation/lockdep-design.txt63
8 files changed, 198 insertions, 20 deletions
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 0c134f8afc6f..bff2d8be1e18 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -328,6 +328,12 @@ over a rather long period of time, but improvements are always welcome!
RCU rather than SRCU, because RCU is almost always faster and
easier to use than is SRCU.
+ If you need to enter your read-side critical section in a
+ hardirq or exception handler, and then exit that same read-side
+ critical section in the task that was interrupted, then you need
+ to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid
+ the lockdep checking that would otherwise this practice illegal.
+
Also unlike other forms of RCU, explicit initialization
and cleanup is required via init_srcu_struct() and
cleanup_srcu_struct(). These are passed a "struct srcu_struct"
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
index 31852705b586..bf778332a28f 100644
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@@ -38,11 +38,11 @@ o How can the updater tell when a grace period has completed
Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the
same effect, but require that the readers manipulate CPU-local
- counters. These counters allow limited types of blocking
- within RCU read-side critical sections. SRCU also uses
- CPU-local counters, and permits general blocking within
- RCU read-side critical sections. These two variants of
- RCU detect grace periods by sampling these counters.
+ counters. These counters allow limited types of blocking within
+ RCU read-side critical sections. SRCU also uses CPU-local
+ counters, and permits general blocking within RCU read-side
+ critical sections. These variants of RCU detect grace periods
+ by sampling these counters.
o If I am running on a uniprocessor kernel, which can only do one
thing at a time, why should I wait for a grace period?
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 4e959208f736..083d88cbc089 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -101,6 +101,11 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
messages.
+o A hardware or software issue shuts off the scheduler-clock
+ interrupt on a CPU that is not in dyntick-idle mode. This
+ problem really has happened, and seems to be most likely to
+ result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
+
o A bug in the RCU implementation.
o A hardware failure. This is quite unlikely, but has occurred
@@ -109,12 +114,11 @@ o A hardware failure. This is quite unlikely, but has occurred
This resulted in a series of RCU CPU stall warnings, eventually
leading the realization that the CPU had failed.
-The RCU, RCU-sched, and RCU-bh implementations have CPU stall
-warning. SRCU does not have its own CPU stall warnings, but its
-calls to synchronize_sched() will result in RCU-sched detecting
-RCU-sched-related CPU stalls. Please note that RCU only detects
-CPU stalls when there is a grace period in progress. No grace period,
-no CPU stall warnings.
+The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning.
+SRCU does not have its own CPU stall warnings, but its calls to
+synchronize_sched() will result in RCU-sched detecting RCU-sched-related
+CPU stalls. Please note that RCU only detects CPU stalls when there is
+a grace period in progress. No grace period, no CPU stall warnings.
To diagnose the cause of the stall, inspect the stack traces.
The offending function will usually be near the top of the stack.
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index 783d6c134d3f..d67068d0d2b9 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -61,11 +61,24 @@ nreaders This is the number of RCU reading threads supported.
To properly exercise RCU implementations with preemptible
read-side critical sections.
+onoff_interval
+ The number of seconds between each attempt to execute a
+ randomly selected CPU-hotplug operation. Defaults to
+ zero, which disables CPU hotplugging. In HOTPLUG_CPU=n
+ kernels, rcutorture will silently refuse to do any
+ CPU-hotplug operations regardless of what value is
+ specified for onoff_interval.
+
shuffle_interval
The number of seconds to keep the test threads affinitied
to a particular subset of the CPUs, defaults to 3 seconds.
Used in conjunction with test_no_idle_hz.
+shutdown_secs The number of seconds to run the test before terminating
+ the test and powering off the system. The default is
+ zero, which disables test termination and system shutdown.
+ This capability is useful for automated testing.
+
stat_interval The number of seconds between output of torture
statistics (via printk()). Regardless of the interval,
statistics are printed when the module is unloaded.
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index aaf65f6c6cd7..49587abfc2f7 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -105,14 +105,10 @@ o "dt" is the current value of the dyntick counter that is incremented
or one greater than the interrupt-nesting depth otherwise.
The number after the second "/" is the NMI nesting depth.
- This field is displayed only for CONFIG_NO_HZ kernels.
-
o "df" is the number of times that some other CPU has forced a
quiescent state on behalf of this CPU due to this CPU being in
dynticks-idle state.
- This field is displayed only for CONFIG_NO_HZ kernels.
-
o "of" is the number of times that some other CPU has forced a
quiescent state on behalf of this CPU due to this CPU being
offline. In a perfect world, this might never happen, but it
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 6ef692667e2f..6bbe8dcdc3da 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -4,6 +4,7 @@ to start learning about RCU:
1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
+4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
What is RCU?
@@ -834,6 +835,8 @@ SRCU: Critical sections Grace period Barrier
srcu_read_lock synchronize_srcu N/A
srcu_read_unlock synchronize_srcu_expedited
+ srcu_read_lock_raw
+ srcu_read_unlock_raw
srcu_dereference
SRCU: Initialization/cleanup
@@ -855,27 +858,33 @@ list can be helpful:
a. Will readers need to block? If so, you need SRCU.
-b. What about the -rt patchset? If readers would need to block
+b. Is it necessary to start a read-side critical section in a
+ hardirq handler or exception handler, and then to complete
+ this read-side critical section in the task that was
+ interrupted? If so, you need SRCU's srcu_read_lock_raw() and
+ srcu_read_unlock_raw() primitives.
+
+c. What about the -rt patchset? If readers would need to block
in an non-rt kernel, you need SRCU. If readers would block
in a -rt kernel, but not in a non-rt kernel, SRCU is not
necessary.
-c. Do you need to treat NMI handlers, hardirq handlers,
+d. Do you need to treat NMI handlers, hardirq handlers,
and code segments with preemption disabled (whether
via preempt_disable(), local_irq_save(), local_bh_disable(),
or some other mechanism) as if they were explicit RCU readers?
If so, you need RCU-sched.
-d. Do you need RCU grace periods to complete even in the face
+e. Do you need RCU grace periods to complete even in the face
of softirq monopolization of one or more of the CPUs? For
example, is your code subject to network-based denial-of-service
attacks? If so, you need RCU-bh.
-e. Is your workload too update-intensive for normal use of
+f. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms?
If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
-f. Otherwise, use RCU.
+g. Otherwise, use RCU.
Of course, this all assumes that you have determined that RCU is in fact
the right tool for your job.
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index 3bd585b44927..27f2b21a9d5c 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -84,6 +84,93 @@ compiler optimizes the section accessing atomic_t variables.
*** YOU HAVE BEEN WARNED! ***
+Properly aligned pointers, longs, ints, and chars (and unsigned
+equivalents) may be atomically loaded from and stored to in the same
+sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE()
+macro should be used to prevent the compiler from using optimizations
+that might otherwise optimize accesses out of existence on the one hand,
+or that might create unsolicited accesses on the other.
+
+For example consider the following code:
+
+ while (a > 0)
+ do_something();
+
+If the compiler can prove that do_something() does not store to the
+variable a, then the compiler is within its rights transforming this to
+the following:
+
+ tmp = a;
+ if (a > 0)
+ for (;;)
+ do_something();
+
+If you don't want the compiler to do this (and you probably don't), then
+you should use something like the following:
+
+ while (ACCESS_ONCE(a) < 0)
+ do_something();
+
+Alternatively, you could place a barrier() call in the loop.
+
+For another example, consider the following code:
+
+ tmp_a = a;
+ do_something_with(tmp_a);
+ do_something_else_with(tmp_a);
+
+If the compiler can prove that do_something_with() does not store to the
+variable a, then the compiler is within its rights to manufacture an
+additional load as follows:
+
+ tmp_a = a;
+ do_something_with(tmp_a);
+ tmp_a = a;
+ do_something_else_with(tmp_a);
+
+This could fatally confuse your code if it expected the same value
+to be passed to do_something_with() and do_something_else_with().
+
+The compiler would be likely to manufacture this additional load if
+do_something_with() was an inline function that made very heavy use
+of registers: reloading from variable a could save a flush to the
+stack and later reload. To prevent the compiler from attacking your
+code in this manner, write the following:
+
+ tmp_a = ACCESS_ONCE(a);
+ do_something_with(tmp_a);
+ do_something_else_with(tmp_a);
+
+For a final example, consider the following code, assuming that the
+variable a is set at boot time before the second CPU is brought online
+and never changed later, so that memory barriers are not needed:
+
+ if (a)
+ b = 9;
+ else
+ b = 42;
+
+The compiler is within its rights to manufacture an additional store
+by transforming the above code into the following:
+
+ b = 42;
+ if (a)
+ b = 9;
+
+This could come as a fatal surprise to other code running concurrently
+that expected b to never have the value 42 if a was zero. To prevent
+the compiler from doing this, write something like:
+
+ if (a)
+ ACCESS_ONCE(b) = 9;
+ else
+ ACCESS_ONCE(b) = 42;
+
+Don't even -think- about doing this without proper use of memory barriers,
+locks, or atomic operations if variable a can change at runtime!
+
+*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***
+
Now, we move onto the atomic operation interfaces typically implemented with
the help of assembly code.
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index abf768c681e2..5dbc99c04f6e 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -221,3 +221,66 @@ when the chain is validated for the first time, is then put into a hash
table, which hash-table can be checked in a lockfree manner. If the
locking chain occurs again later on, the hash table tells us that we
dont have to validate the chain again.
+
+Troubleshooting:
+----------------
+
+The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
+Exceeding this number will trigger the following lockdep warning:
+
+ (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
+
+By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
+desktop systems have less than 1,000 lock classes, so this warning
+normally results from lock-class leakage or failure to properly
+initialize locks. These two problems are illustrated below:
+
+1. Repeated module loading and unloading while running the validator
+ will result in lock-class leakage. The issue here is that each
+ load of the module will create a new set of lock classes for
+ that module's locks, but module unloading does not remove old
+ classes (see below discussion of reuse of lock classes for why).
+ Therefore, if that module is loaded and unloaded repeatedly,
+ the number of lock classes will eventually reach the maximum.
+
+2. Using structures such as arrays that have large numbers of
+ locks that are not explicitly initialized. For example,
+ a hash table with 8192 buckets where each bucket has its own
+ spinlock_t will consume 8192 lock classes -unless- each spinlock
+ is explicitly initialized at runtime, for example, using the
+ run-time spin_lock_init() as opposed to compile-time initializers
+ such as __SPIN_LOCK_UNLOCKED(). Failure to properly initialize
+ the per-bucket spinlocks would guarantee lock-class overflow.
+ In contrast, a loop that called spin_lock_init() on each lock
+ would place all 8192 locks into a single lock class.
+
+ The moral of this story is that you should always explicitly
+ initialize your locks.
+
+One might argue that the validator should be modified to allow
+lock classes to be reused. However, if you are tempted to make this
+argument, first review the code and think through the changes that would
+be required, keeping in mind that the lock classes to be removed are
+likely to be linked into the lock-dependency graph. This turns out to
+be harder to do than to say.
+
+Of course, if you do run out of lock classes, the next thing to do is
+to find the offending lock classes. First, the following command gives
+you the number of lock classes currently in use along with the maximum:
+
+ grep "lock-classes" /proc/lockdep_stats
+
+This command produces the following output on a modest system:
+
+ lock-classes: 748 [max: 8191]
+
+If the number allocated (748 above) increases continually over time,
+then there is likely a leak. The following command can be used to
+identify the leaking lock classes:
+
+ grep "BD" /proc/lockdep
+
+Run the command and save the output, then compare against the output from
+a later run of this command to identify the leakers. This same output
+can also help you find situations where runtime lock initialization has
+been omitted.