From 77095901b895a64b6d775879b54c73472ba21e68 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 2 Jul 2018 08:25:57 -0700 Subject: doc: Update removal of RCU-bh/sched update machinery The RCU-bh update API is now defined in terms of that of RCU-bh and RCU-sched, so this commit updates the documentation accordingly. In addition, although RCU-sched persists in !PREEMPT kernels, in the PREEMPT case its update API is now defined in terms of that of RCU-preempt, so this commit also updates the documentation accordingly. While in the area, this commit removes the documentation for the now-obsolete synchronize_rcu_mult() and clarifies the Tasks RCU documentation. Signed-off-by: Paul E. McKenney --- .../Design/Data-Structures/Data-Structures.html | 23 +--- .../Expedited-Grace-Periods.html | 7 +- .../RCU/Design/Requirements/Requirements.html | 149 +++++---------------- 3 files changed, 45 insertions(+), 134 deletions(-) (limited to 'Documentation/RCU/Design') diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 50be87e59937..1d2051c0c3fc 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -1374,8 +1374,7 @@ that is, if the CPU is currently idle. Accessor Functions

The following listing shows the -rcu_get_root(), rcu_for_each_node_breadth_first, -rcu_for_each_nonleaf_node_breadth_first(), and +rcu_get_root(), rcu_for_each_node_breadth_first and rcu_for_each_leaf_node() function and macros:

@@ -1388,13 +1387,9 @@ Accessor Functions
   7   for ((rnp) = &(rsp)->node[0]; \
   8        (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
   9
- 10 #define rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) \
- 11   for ((rnp) = &(rsp)->node[0]; \
- 12        (rnp) < (rsp)->level[NUM_RCU_LVLS - 1]; (rnp)++)
- 13
- 14 #define rcu_for_each_leaf_node(rsp, rnp) \
- 15   for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \
- 16        (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
+ 10 #define rcu_for_each_leaf_node(rsp, rnp) \
+ 11   for ((rnp) = (rsp)->level[NUM_RCU_LVLS - 1]; \
+ 12        (rnp) < &(rsp)->node[NUM_RCU_NODES]; (rnp)++)
 

The rcu_get_root() simply returns a pointer to the @@ -1407,10 +1402,7 @@ macro takes advantage of the layout of the rcu_node structures in the rcu_state structure's ->node[] array, performing a breadth-first traversal by simply traversing the array in order. -The rcu_for_each_nonleaf_node_breadth_first() macro operates -similarly, but traverses only the first part of the array, thus excluding -the leaf rcu_node structures. -Finally, the rcu_for_each_leaf_node() macro traverses only +Similarly, the rcu_for_each_leaf_node() macro traverses only the last part of the array, thus traversing only the leaf rcu_node structures. @@ -1418,15 +1410,14 @@ the last part of the array, thus traversing only the leaf   Quick Quiz: - What do rcu_for_each_nonleaf_node_breadth_first() and + What does rcu_for_each_leaf_node() do if the rcu_node tree contains only a single node? Answer: In the single-node case, - rcu_for_each_nonleaf_node_breadth_first() is a no-op - and rcu_for_each_leaf_node() traverses the single node. + rcu_for_each_leaf_node() traverses the single node.   diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html index 7394f034be65..ffd612bfa436 100644 --- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html +++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html @@ -12,10 +12,9 @@ high efficiency and minimal disturbance, expedited grace periods accept lower efficiency and significant disturbance to attain shorter latencies.

-There are three flavors of RCU (RCU-bh, RCU-preempt, and RCU-sched), -but only two flavors of expedited grace periods because the RCU-bh -expedited grace period maps onto the RCU-sched expedited grace period. -Each of the remaining two implementations is covered in its own section. +There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier +third RCU-bh flavor having been implemented in terms of the other two. +Each of the two implementations is covered in its own section.

  1. diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 51f39f65002d..701b5c53607f 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -1306,8 +1306,6 @@ doing so would degrade real-time response.

    This non-requirement appeared with preemptible RCU. -If you need a grace period that waits on non-preemptible code regions, use -RCU-sched.

    Parallelism Facts of Life

    @@ -2165,14 +2163,9 @@ however, this is not a panacea because there would be severe restrictions on what operations those callbacks could invoke.

    -Perhaps surprisingly, synchronize_rcu(), -synchronize_rcu_bh() -(discussed below), -synchronize_sched(), +Perhaps surprisingly, synchronize_rcu() and synchronize_rcu_expedited(), -synchronize_rcu_bh_expedited(), and -synchronize_sched_expedited() -will all operate normally +will operate normally during very early boot, the reason being that there is only one CPU and preemption is disabled. This means that the call synchronize_rcu() (or friends) @@ -2861,15 +2854,22 @@ The other four flavors are listed below, with requirements for each described in a separate section.

      -
    1. Bottom-Half Flavor -
    2. Sched Flavor +
    3. Bottom-Half Flavor (Historical) +
    4. Sched Flavor (Historical)
    5. Sleepable RCU
    6. Tasks RCU -
    7. - Waiting for Multiple Grace Periods
    -

    Bottom-Half Flavor

    +

    Bottom-Half Flavor (Historical)

    + +

    +The RCU-bh flavor of RCU has since been expressed in terms of +the other RCU flavors as part of a consolidation of the three +flavors into a single flavor. +The read-side API remains, and continues to disable softirq and to +be accounted for by lockdep. +Much of the material in this section is therefore strictly historical +in nature.

    The softirq-disable (AKA “bottom-half”, @@ -2929,8 +2929,20 @@ includes call_rcu_bh(), rcu_barrier_bh(), and rcu_read_lock_bh_held(). +However, the update-side APIs are now simple wrappers for other RCU +flavors, namely RCU-sched in CONFIG_PREEMPT=n kernels and RCU-preempt +otherwise. -

    Sched Flavor

    +

    Sched Flavor (Historical)

    + +

    +The RCU-sched flavor of RCU has since been expressed in terms of +the other RCU flavors as part of a consolidation of the three +flavors into a single flavor. +The read-side API remains, and continues to disable preemption and to +be accounted for by lockdep. +Much of the material in this section is therefore strictly historical +in nature.

    Before preemptible RCU, waiting for an RCU grace period had the @@ -3150,94 +3162,14 @@ The tasks-RCU API is quite compact, consisting only of call_rcu_tasks(), synchronize_rcu_tasks(), and rcu_barrier_tasks(). - -

    -Waiting for Multiple Grace Periods

    - -

    -Perhaps you have an RCU protected data structure that is accessed from -RCU read-side critical sections, from softirq handlers, and from -hardware interrupt handlers. -That is three flavors of RCU, the normal flavor, the bottom-half flavor, -and the sched flavor. -How to wait for a compound grace period? - -

    -The best approach is usually to “just say no!” and -insert rcu_read_lock() and rcu_read_unlock() -around each RCU read-side critical section, regardless of what -environment it happens to be in. -But suppose that some of the RCU read-side critical sections are -on extremely hot code paths, and that use of CONFIG_PREEMPT=n -is not a viable option, so that rcu_read_lock() and -rcu_read_unlock() are not free. -What then? - -

    -You could wait on all three grace periods in succession, as follows: - -

    -
    - 1 synchronize_rcu();
    - 2 synchronize_rcu_bh();
    - 3 synchronize_sched();
    -
    -
    - -

    -This works, but triples the update-side latency penalty. -In cases where this is not acceptable, synchronize_rcu_mult() -may be used to wait on all three flavors of grace period concurrently: - -

    -
    - 1 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
    -
    -
    - -

    -But what if it is necessary to also wait on SRCU? -This can be done as follows: - -

    -
    - 1 static void call_my_srcu(struct rcu_head *head,
    - 2        void (*func)(struct rcu_head *head))
    - 3 {
    - 4   call_srcu(&my_srcu, head, func);
    - 5 }
    - 6
    - 7 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched, call_my_srcu);
    -
    -
    - -

    -If you needed to wait on multiple different flavors of SRCU -(but why???), you would need to create a wrapper function resembling -call_my_srcu() for each SRCU flavor. - - - - - - - - -
     
    Quick Quiz:
    - But what if I need to wait for multiple RCU flavors, but I also need - the grace periods to be expedited? -
    Answer:
    - If you are using expedited grace periods, there should be less penalty - for waiting on them in succession. - But if that is nevertheless a problem, you can use workqueues - or multiple kthreads to wait on the various expedited grace - periods concurrently. -
     
    - -

    -Again, it is usually better to adjust the RCU read-side critical sections -to use a single flavor of RCU, but when this is not feasible, you can use -synchronize_rcu_mult(). +In CONFIG_PREEMPT=n kernels, trampolines cannot be preempted, +so these APIs map to +call_rcu(), +synchronize_rcu(), and +rcu_barrier(), respectively. +In CONFIG_PREEMPT=y kernels, trampolines can be preempted, +and these three APIs are therefore implemented by separate functions +that check for voluntary context switches.

    Possible Future Changes

    @@ -3248,12 +3180,6 @@ If this becomes a serious problem, it will be necessary to rework the grace-period state machine so as to avoid the need for the additional latency. -

    -Expedited grace periods scan the CPUs, so their latency and overhead -increases with increasing numbers of CPUs. -If this becomes a serious problem on large systems, it will be necessary -to do some redesign to avoid this scalability problem. -

    RCU disables CPU hotplug in a few places, perhaps most notably in the rcu_barrier() operations. @@ -3298,11 +3224,6 @@ Please note that arrangements that require RCU to remap CPU numbers will require extremely good demonstration of need and full exploration of alternatives. -

    -There is an embarrassingly large number of flavors of RCU, and this -number has been increasing over time. -Perhaps it will be possible to combine some at some future date. -

    RCU's various kthreads are reasonably recent additions. It is quite likely that adjustments will be required to more gracefully -- cgit v1.2.3