summaryrefslogtreecommitdiff
path: root/Documentation/RCU
AgeCommit message (Collapse)Author
2014-02-17documentation: Fix some inconsistencies in RTFP.txtPaul E. McKenney
Some of the early history leaves out some citations and vice versa. This commit fixes these up. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17documentation: Document call_rcu() safety mechanisms and limitationsPaul E. McKenney
The call_rcu() family of primitives will take action to accelerate grace periods when the number of callbacks pending on a given CPU becomes excessive. Although this safety mechanism can be useful, it is no substitute for users of call_rcu() having rate-limit controls in place. This commit adds this nuance to the documentation. Reported-by: "Michael S. Tsirkin" <mst@redhat.com> Reported-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-10Documentation/: update 00-INDEX filesHenrik Austad
Some of the 00-INDEX files are somewhat outdated and some folders does not contain 00-INDEX at all. Only outdated (with the notably exception of spi) indexes are touched here, the 169 folders without 00-INDEX has not been touched. New 00-INDEX - spi/* was added in a series of commits dating back to 2006 Added files (missing in (*/)00-INDEX) - dmatest.txt was added by commit 851b7e16a07d ("dmatest: run test via debugfs") - this_cpu_ops.txt was added by commit a1b2a555d637 ("percpu: add documentation on this_cpu operations") - ww-mutex-design.txt was added by commit 040a0a371005 ("mutex: Add support for wound/wait style locks") - bcache.txt was added by commit cafe56359144 ("bcache: A block layer cache") - kernel-per-CPU-kthreads.txt was added by commit 49717cb40410 ("kthread: Document ways of reducing OS jitter due to per-CPU kthreads") - phy.txt was added by commit ff764963479a ("drivers: phy: add generic PHY framework") - block/null_blk was added by commit 12f8f4fc0314 ("null_blk: documentation") - module-signing.txt was added by commit 3cafea307642 ("Add Documentation/module-signing.txt file") - assoc_array.txt was added by commit 3cb989501c26 ("Add a generic associative array implementation.") - arm/IXP4xx was part of the initial repo - arm/cluster-pm-race-avoidance.txt was added by commit 7fe31d28e839 ("ARM: mcpm: introduce helpers for platform coherency exit/setup") - arm/firmware.txt was added by commit 7366b92a77fc ("ARM: Add interface for registering and calling firmware-specific operations") - arm/kernel_mode_neon.txt was added by commit 2afd0a05241d ("ARM: 7825/1: document the use of NEON in kernel mode") - arm/tcm.txt was added by commit bc581770cfdd ("ARM: 5580/2: ARM TCM (Tightly-Coupled Memory) support v3") - arm/vlocks.txt was added by commit 9762f12d3e05 ("ARM: mcpm: Add baremetal voting mutexes") - blackfin/gptimers-example.c, Makefile was added by commit 4b60779d5ea7 ("Blackfin: add an example showing how to use the gptimers API") - devicetree/usage-model.txt was added by commit 31134efc681a ("dt: Linux DT usage model documentation") - fb/api.txt was added by commit fb21c2f42879 ("fbdev: Add FOURCC-based format configuration API") - fb/sm501.txt was added by commit e6a049807105 ("video, sm501: add edid and commandline support") - fb/udlfb.txt was added by commit 96f8d864afd6 ("fbdev: move udlfb out of staging.") - filesystems/Makefile was added by commit 1e0051ae48a2 ("Documentation/fs/: split txt and source files") - filesystems/nfs/nfsd-admin-interfaces.txt was added by commit 8a4c6e19cfed ("nfsd: document kernel interfaces for nfsd configuration") - ide/warm-plug-howto.txt was added by commit f74c91413ec6 ("ide: add warm-plug support for IDE devices (take 2)") - laptops/Makefile was added by commit d49129accc21 ("Documentation/laptop/: split txt and source files") - leds/leds-blinkm.txt was added by commit b54cf35a7f65 ("LEDS: add BlinkM RGB LED driver, documentation and update MAINTAINERS") - leds/ledtrig-oneshot.txt was added by commit 5e417281cde2 ("leds: add oneshot trigger") - leds/ledtrig-transient.txt was added by commit 44e1e9f8e705 ("leds: add new transient trigger for one shot timer activation") - m68k/README.buddha was part of the initial repo - networking/LICENSE.(qla3xxx|qlcnic|qlge) was added by commits 40839129f779, c4e84bde1d59, 5a4faa873782 - networking/Makefile was added by commit 3794f3e812ef ("docsrc: build Documentation/ sources") - networking/i40evf.txt was added by commit 105bf2fe6b32 ("i40evf: add driver to kernel build system") - networking/ipsec.txt was added by commit b3c6efbc36e2 ("xfrm: Add file to document IPsec corner case") - networking/mac80211-auth-assoc-deauth.txt was added by commit 3cd7920a2be8 ("mac80211: add auth/assoc/deauth flow diagram") - networking/netlink_mmap.txt was added by commit 5683264c3981 ("netlink: add documentation for memory mapped I/O") - networking/nf_conntrack-sysctl.txt was added by commit c9f9e0e1597f ("netfilter: doc: add nf_conntrack sysctl api documentation") lan) - networking/team.txt was added by commit 3d249d4ca7d0 ("net: introduce ethernet teaming device") - networking/vxlan.txt was added by commit d342894c5d2f ("vxlan: virtual extensible lan") - power/runtime_pm.txt was added by commit 5e928f77a09a ("PM: Introduce core framework for run-time PM of I/O devices (rev. 17)") - power/charger-manager.txt was added by commit 3bb3dbbd56ea ("power_supply: Add initial Charger-Manager driver") - RCU/lockdep-splat.txt was added by commit d7bd2d68aa2e ("rcu: Document interpretation of RCU-lockdep splats") - s390/kvm.txt was added by 5ecee4b (KVM: s390: API documentation) - s390/qeth.txt was added by commit b4d72c08b358 ("qeth: bridgeport support - basic control") - scheduler/sched-bwc.txt was added by commit 88ebc08ea9f7 ("sched: Add documentation for bandwidth control") - scsi/advansys.txt was added by commit 4bd6d7f35661 ("[SCSI] advansys: Move documentation to Documentation/scsi") - scsi/bfa.txt was added by commit 1ec90174bdb4 ("[SCSI] bfa: add readme file") - scsi/bnx2fc.txt was added by commit 12b8fc10eaf4 ("[SCSI] bnx2fc: Add driver documentation") - scsi/cxgb3i.txt was added by commit c3673464ebc0 ("[SCSI] cxgb3i: Add cxgb3i iSCSI driver.") - scsi/hpsa.txt was added by commit 992ebcf14f3c ("[SCSI] hpsa: Add hpsa.txt to Documentation/scsi") - scsi/link_power_management_policy.txt was added by commit ca77329fb713 ("[libata] Link power management infrastructure") - scsi/osd.txt was added by commit 78e0c621deca ("[SCSI] osd: Documentation for OSD library") - scsi/scsi-parameter.txt was created/moved by commit 163475fb111c ("Documentation: move SCSI parameters to their own text file") - serial/driver was part of the initial repo - serial/n_gsm.txt was added by commit 323e84122ec6 ("n_gsm: add a documentation") - timers/Makefile was added by commit 3794f3e812ef ("docsrc: build Documentation/ sources") - virt/kvm/s390.txt was added by commit d9101fca3d57 ("KVM: s390: diagnose call documentation") - vm/split_page_table_lock was added by commit 49076ec2ccaf ("mm: dynamically allocate page->ptl if it cannot be embedded to struct page") - w1/slaves/w1_ds28e04 was added by commit fbf7f7b4e2ae ("w1: Add 1-wire slave device driver for DS28E04-100") - w1/masters/omap-hdq was added by commit e0a29382c6f5 ("hdq: documentation for OMAP HDQ") - x86/early-microcode.txt was added by commit 0d91ea86a895 ("x86, doc: Documentation for early microcode loading") - x86/earlyprintk.txt was added by commit a1aade478862 ("x86/doc: mini-howto for using earlyprintk=dbgp") - x86/entry_64.txt was added by commit 8b4777a4b50c ("x86-64: Document some of entry_64.S") - x86/pat.txt was added by commit d27554d874c7 ("x86: PAT documentation") Moved files - arm/kernel_user_helpers.txt was moved out of arch/arm/kernel by commit 37b8304642c7 ("ARM: kuser: move interface documentation out of the source code") - efi-stub.txt was moved out of x86/ and down into Documentation/ in commit 4172fe2f8a47 ("EFI stub documentation updates") - laptops/hpfall.c was moved out of hwmon/ and into laptops/ in commit efcfed9bad88 ("Move hp_accel to drivers/platform/x86") - commit 5616c23ad9cd ("x86: doc: move x86-generic documentation from Doc/x86/i386"): * x86/usb-legacy-support.txt * x86/boot.txt * x86/zero_page.txt - power/video_extension.txt was moved to acpi in commit 70e66e4df191 ("ACPI / video: move video_extension.txt to Documentation/acpi") Removed files (left in 00-INDEX) - memory.txt was removed by commit 00ea8990aadf ("memory.txt: remove stray information") - gpio.txt was moved to gpio/ in commit fd8e198cfcaa ("Documentation: gpiolib: document new interface") - networking/DLINK.txt was removed by commit 168e06ae26dd ("drivers/net: delete old parallel port de600/de620 drivers") - serial/hayes-esp.txt was removed by commit f53a2ade0bb9 ("tty: esp: remove broken driver") - s390/TAPE was removed by commit 9e280f669308 ("[S390] remove tape block docu") - vm/locking was removed by commit 57ea8171d2bc ("mm: documentation: remove hopelessly out-of-date locking doc") - laptops/acer-wmi.txt was remvoed by commit 020036678e81 ("acer-wmi: Delete out-of-date documentation") Typos/misc issues - rpc-server-gss.txt was added as knfsd-rpcgss.txt in commit 030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.") - commit b88cf73d9278 ("net: add missing entries to Documentation/networking/00-INDEX") * generic-hdlc.txt was added as generic_hdlc.txt * spider_net.txt was added as spider-net.txt - w1/master/mxc-w1 was added as mxc_w1 by commit a5fd9139f74c ("w1: add 1-wire master driver for i.MX27 / i.MX31") - s390/zfcpdump.txt was added as zfcpdump by commit 6920c12a407e ("[S390] Add Documentation/s390/00-INDEX.") Signed-off-by: Henrik Austad <henrik@austad.us> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [rcu bits] Acked-by: Rob Landley <rob@landley.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rob Herring <robh+dt@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Gleb Natapov <gleb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Len Brown <len.brown@intel.com> Cc: James Bottomley <JBottomley@parallels.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12Merge branches 'doc.2013.12.03a', 'fixes.2013.12.12a', ↵Paul E. McKenney
'rcutorture.2013.12.03a' and 'sparse.2013.12.12a' into HEAD doc.2013.12.03a: Topic branch for documentation changes. fixes.2013.12.12a: Topic branch for miscellaneous fixes. rcutorture.2013.12.03a: Topic branch for new rcutorture/KVM scripting. sparse.2013.12.12a: Topic branch for sparse-RCU changes.
2013-12-03rcu: Break call_rcu() deadlock involving scheduler and perfPaul E. McKenney
Dave Jones got the following lockdep splat: > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.12.0-rc3+ #92 Not tainted > ------------------------------------------------------- > trinity-child2/15191 is trying to acquire lock: > (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50 > > but task is already holding lock: > (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #3 (&ctx->lock){-.-...}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80 > [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0 > [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0 > [<ffffffff81732052>] __schedule+0x1d2/0xa20 > [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0 > [<ffffffff817352b6>] retint_kernel+0x26/0x30 > [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50 > [<ffffffff813f0504>] pty_write+0x54/0x60 > [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0 > [<ffffffff813e5838>] tty_write+0x158/0x2d0 > [<ffffffff811c4850>] vfs_write+0xc0/0x1f0 > [<ffffffff811c52cc>] SyS_write+0x4c/0xa0 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 > > -> #2 (&rq->lock){-.-.-.}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80 > [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0 > [<ffffffff81054336>] do_fork+0x126/0x460 > [<ffffffff81054696>] kernel_thread+0x26/0x30 > [<ffffffff8171ff93>] rest_init+0x23/0x140 > [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403 > [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c > [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4 > > -> #1 (&p->pi_lock){-.-.-.}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff810979d1>] try_to_wake_up+0x31/0x350 > [<ffffffff81097d62>] default_wake_function+0x12/0x20 > [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40 > [<ffffffff8108ea38>] __wake_up_common+0x58/0x90 > [<ffffffff8108ff59>] __wake_up+0x39/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff81111b8d>] call_rcu+0x1d/0x20 > [<ffffffff81093697>] cpu_attach_domain+0x287/0x360 > [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0 > [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a > [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202 > [<ffffffff817200be>] kernel_init+0xe/0x190 > [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0 > > -> #0 (&rdp->nocb_wq){......}: > [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0 > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff8108ff43>] __wake_up+0x23/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30 > [<ffffffff81149abf>] put_ctx+0x4f/0x70 > [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230 > [<ffffffff81056b8d>] do_exit+0x30d/0xcc0 > [<ffffffff8105893c>] do_group_exit+0x4c/0xc0 > [<ffffffff810589c4>] SyS_exit_group+0x14/0x20 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 > > other info that might help us debug this: > > Chain exists of: > &rdp->nocb_wq --> &rq->lock --> &ctx->lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&ctx->lock); > lock(&rq->lock); > lock(&ctx->lock); > lock(&rdp->nocb_wq); > > *** DEADLOCK *** > > 1 lock held by trinity-child2/15191: > #0: (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230 > > stack backtrace: > CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92 > ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40 > ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0 > ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0 > Call Trace: > [<ffffffff8172a363>] dump_stack+0x4e/0x82 > [<ffffffff81726741>] print_circular_bug+0x200/0x20f > [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0 > [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60 > [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80 > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8108ff43>] ? __wake_up+0x23/0x50 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff8108ff43>] ? __wake_up+0x23/0x50 > [<ffffffff8108ff43>] __wake_up+0x23/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50 > [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30 > [<ffffffff81149abf>] put_ctx+0x4f/0x70 > [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230 > [<ffffffff81056b8d>] do_exit+0x30d/0xcc0 > [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0 > [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10 > [<ffffffff8105893c>] do_group_exit+0x4c/0xc0 > [<ffffffff810589c4>] SyS_exit_group+0x14/0x20 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 The underlying problem is that perf is invoking call_rcu() with the scheduler locks held, but in NOCB mode, call_rcu() will with high probability invoke the scheduler -- which just might want to use its locks. The reason that call_rcu() needs to invoke the scheduler is to wake up the corresponding rcuo callback-offload kthread, which does the job of starting up a grace period and invoking the callbacks afterwards. One solution (championed on a related problem by Lai Jiangshan) is to simply defer the wakeup to some point where scheduler locks are no longer held. Since we don't want to unnecessarily incur the cost of such deferral, the task before us is threefold: 1. Determine when it is likely that a relevant scheduler lock is held. 2. Defer the wakeup in such cases. 3. Ensure that all deferred wakeups eventually happen, preferably sooner rather than later. We use irqs_disabled_flags() as a proxy for relevant scheduler locks being held. This works because the relevant locks are always acquired with interrupts disabled. We may defer more often than needed, but that is at least safe. The wakeup deferral is tracked via a new field in the per-CPU and per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup. This flag is checked by the RCU core processing. The __rcu_pending() function now checks this flag, which causes rcu_check_callbacks() to initiate RCU core processing at each scheduling-clock interrupt where this flag is set. Of course this is not sufficient because scheduling-clock interrupts are often turned off (the things we used to be able to count on!). So the flags are also checked on entry to any state that RCU considers to be idle, which includes both NO_HZ_IDLE idle state and NO_HZ_FULL user-mode-execution state. This approach should allow call_rcu() to be invoked regardless of what locks you might be holding, the key word being "should". Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
2013-12-03rcu: Fix typo in Documentation/RCU/trace.txtPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-09-25rcu: Fix occurrence of "the the" in checklist.txtMichael Opdenacker
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Add "then" as suggested by Josh Triplett. ] Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-09-25rcu: Update stall-warning documentationPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-08-31Merge branches 'doc.2013.08.19a', 'fixes.2013.08.20a', 'sysidle.2013.08.31a' ↵Paul E. McKenney
and 'torture.2013.08.20a' into HEAD doc.2013.08.19a: Documentation updates fixes.2013.08.20a: Miscellaneous fixes sysidle.2013.08.31a: Detect system-wide idle state. torture.2013.08.20a: rcutorture updates.
2013-08-20rcu: Increase rcutorture test coveragePaul E. McKenney
Currently, rcutorture has separate torture_types to test synchronous, asynchronous, and expedited grace-period primitives. This has two disadvantages: (1) Three times the number of runs to cover the combinations and (2) Little testing of concurrent combinations of the three options. This commit therefore adds a pair of module parameters that control normal and expedited state, with the default being both types, randomly selected, by the fakewriter processes, thus reducing source-code size and increasing test coverage. In addtion, the writer task switches between asynchronous-normal and expedited grace-period primitives driven by the same pair of module parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-08-19rcu: Update RTFP documentationPaul E. McKenney
Note that this commit also updates the formatting of serveral of the bibtex entries to conform to that of my .bib files. I started accumulating entries back in the 1980s, back when bibtex insisted that comma (",") was a separator, not a terminator. This rule forced commas to the fronts of lines. 25 years later, bibtex allows commas to be terminators, but I am too lazy to rework all my .bib files. Keeping the same format as my .bib files allows my to simply incorporate my RCU.bib file into Documentation/RCU/RTFP.txt, which is much easier than my earlier practice of keeping track of what had changed and adding individual entries. (I sometimes find relevant papers that were published some years back, for example.) In addition, this change adds entries for papers published in the last year or so. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-08-18rcu: Fix rcu_barrier() documentationPaul E. McKenney
There was a time when rcu_barrier() was guaranteed to wait for at least a grace period, but that time ended due to energy-efficiency concerns. So now rcu_barrier() is a no-op if there are no RCU callbacks queued in the system. This commit updates the documentation to reflect this change. Now, rcu_barrier() often does wait for a grace period, so, one could imagine some modification to rcu_barrier() to more efficiently handle cases where both rcu_barrier() and a grace period are needed. But this must wait until someone shows a real-world need for a change. Reported-by: Bob Copeland <bob@cozybit.com> Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a', ↵Paul E. McKenney
'srcu.2013.06.10a' and 'tiny.2013.06.10a' into HEAD cbnum.2013.06.10a: Apply simplifications stemming from the new callback numbering. doc.2013.06.10a: Documentation updates. fixes.2013.06.10a: Miscellaneous fixes. srcu.2013.06.10a: Updates to SRCU. tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.
2013-06-10rcu: Remove TINY_PREEMPT_RCU tracing documentationPaul E. McKenney
Because TINY_PREEMPT_RCU is no more, this commit removes its tracing formats from the documentation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().Paul E. McKenney
These interfaces never did get used, so this commit removes them, their rcutorture tests, and documentation referencing them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-05-02Merge commit '8700c95adb03' into timers/nohzFrederic Weisbecker
The full dynticks tree needs the latest RCU and sched upstream updates in order to fix some dependencies. Merge a common upstream merge point that has these updates. Conflicts: include/linux/perf_event.h kernel/rcutree.h kernel/rcutree_plugin.h Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-04-03nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMONFrederic Weisbecker
We are planning to convert the dynticks Kconfig options layout into a choice menu. The user must be able to easily pick any of the following implementations: constant periodic tick, idle dynticks, full dynticks. As this implies a mutual exclusion, the two dynticks implementions need to converge on the selection of a common Kconfig option in order to ease the sharing of a common infrastructure. It would thus seem pretty natural to reuse CONFIG_NO_HZ to that end. It already implements all the idle dynticks code and the full dynticks depends on all that code for now. So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ. On the other hand we want to stay backward compatible: if CONFIG_NO_HZ is set in an older config file, we want to enable CONFIG_NO_HZ_IDLE by default. But we can't afford both at the same time or we run into a circular dependency: 1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select CONFIG_NO_HZ 2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE We might be able to support that from Kconfig/Kbuild but it may not be wise to introduce such a confusing behaviour. So to solve this, create a new CONFIG_NO_HZ_COMMON option which gathers the common code between idle and full dynticks (that common code for now is simply the idle dynticks code) and select it from their referring Kconfig. Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ to it for backward compatibility. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-03-26Merge branches 'doc.2013.03.12a', 'fixes.2013.03.13a' and ↵Paul E. McKenney
'idlenocb.2013.03.26b' into HEAD doc.2013.03.12a: Documentation changes. fixes.2013.03.13a: Miscellaneous fixes. idlenocb.2013.03.26b: Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take advantage of numbered callbacks, add callback acceleration based on numbered callbacks.
2013-03-13rcu: Add softirq-stall indications to stall-warning messagesPaul E. McKenney
If RCU's softirq handler is prevented from executing, an RCU CPU stall warning can result. Ways to prevent RCU's softirq handler from executing include: (1) CPU spinning with interrupts disabled, (2) infinite loop in some softirq handler, and (3) in -rt kernels, an infinite loop in a set of real-time threads running at priorities higher than that of RCU's softirq handler. Because this situation can be difficult to track down, this commit causes the count of RCU softirq handler invocations to be printed with RCU CPU stall warnings. This information does require some interpretation, as now documented in Documentation/RCU/stallwarn.txt. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-03-12rcu: Documentation updatePaul E. McKenney
This commit applies a few updates based on a quick review of the RCU documentations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-03-12rcu: Make bugginess of code sample more evidentPaul E. McKenney
One of the code samples in whatisRCU.txt shows a bug, but someone scanning the document quickly might mistake it for a valid use of RCU. Add some screaming comments to help keep speed-readers on track. Reported-by: Nathan Zimmer <nzimmer@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-16Merge branches 'urgent.2012.10.27a', 'doc.2012.11.16a', 'fixes.2012.11.13a', ↵Paul E. McKenney
'srcu.2012.10.27a', 'stall.2012.11.13a', 'tracing.2012.11.08a' and 'idle.2012.10.24a' into HEAD urgent.2012.10.27a: Fix for RCU user-mode transition (already in -tip). doc.2012.11.08a: Documentation updates, most notably codifying the memory-barrier guarantees inherent to grace periods. fixes.2012.11.13a: Miscellaneous fixes. srcu.2012.10.27a: Allow statically allocated and initialized srcu_struct structures (courtesy of Lai Jiangshan). stall.2012.11.13a: Add more diagnostic information to RCU CPU stall warnings, also decrease from 60 seconds to 21 seconds. hotplug.2012.11.08a: Minor updates to CPU hotplug handling. tracing.2012.11.08a: Improved debugfs tracing, courtesy of Michael Wang. idle.2012.10.24a: Updates to RCU idle/adaptive-idle handling, including a boot parameter that maps normal grace periods to expedited. Resolved conflict in kernel/rcutree.c due to side-by-side change.
2012-11-16rcu: Add documentation for the new rcuexp debugfs trace filePaul E. McKenney
This commit adds the documentation of the rcuexp debugfs trace file that records statistics for expedited grace periods. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-16rcu: Update documentation for TREE_RCU debugfs tracingPaul E. McKenney
This commit updates the tracing documentation to reflect the new format that has per-RCU-flavor directories. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-13rcu: Remove list_for_each_continue_rcu()Paul E. McKenney
The list_for_each_continue_rcu() macro is no longer used, so this commit removes it. The list_for_each_entry_continue_rcu() macro should be used instead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-08rcu: Document alternative RCU/reference-count algorithmsPaul E. McKenney
The approach for mixing RCU and reference counting listed in the RCU documentation only describes one possible approach. This approach can result in failure on the read side, which is nice if you want fresh data, but not so good if you want simple code. This commit therefore adds two additional approaches that feature unconditional reference-count acquisition by RCU readers. These approaches are very similar to that used in the security code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-08rcu: Update docs to include kfree_rcu()Kees Cook
Mention kfree_rcu() in the call_rcu() section. Additionally fix the example code for list replacement that used the wrong structure element. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Correct the name of a reference in list of RCU papersDhaval Giani
Trying to go through the history of RCU (not for the weak minded) led me to search for a non-existent paper. Correct it to the actual reference Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-09-24Merge branches 'bigrt.2012.09.23a', 'doctorture.2012.09.23a', ↵Paul E. McKenney
'fixes.2012.09.23a', 'hotplug.2012.09.23a' and 'idlechop.2012.09.23a' into HEAD bigrt.2012.09.23a contains additional commits to reduce scheduling latency from RCU on huge systems (many hundrends or thousands of CPUs). doctorture.2012.09.23a contains documentation changes and rcutorture fixes. fixes.2012.09.23a contains miscellaneous fixes. hotplug.2012.09.23a contains CPU-hotplug-related changes. idle.2012.09.23a fixes architectures for which RCU no longer considered the idle loop to be a quiescent state due to earlier adaptive-dynticks changes. Affected architectures are alpha, cris, frv, h8300, m32r, m68k, mn10300, parisc, score, xtensa, and ia64.
2012-09-23rcu: Fix CONFIG_RCU_FAST_NO_HZ stall warning messagePaul E. McKenney
The print_cpu_stall_fast_no_hz() function attempts to print -1 when the ->idle_gp_timer is not pending, but unsigned arithmetic causes it to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and 18446744073709551615 on 64-bit systems. Neither of these are the most reader-friendly values, so this commit instead causes "timer not pending" to be printed when ->idle_gp_timer is not pending. Reported-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-09-23rcu: Document SRCU dead-CPU capabilities, emphasize read-side limitsPaul E. McKenney
The current documentation did not help someone grepping for SRCU to learn that disabling preemption is not a replacement for srcu_read_lock(), so upgrade the documentation to bring this out, not just for SRCU, but also for RCU-bh. Also document the fact that SRCU readers are respected on CPUs executing in user mode, idle CPUs, and even on offline CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2012-09-23rcu: Adjust debugfs tracing for kthread-based quiescent-state forcingPaul E. McKenney
Moving quiescent-state forcing into a kthread dispenses with the need for the ->n_rp_need_fqs field, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-07-02rcu: Update documentation to cover call_srcu() and srcu_barrier().Paul E. McKenney
The advent of call_srcu() and srcu_barrier() obsoleted some of the documentation, so this commit brings that up to date. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30rcu: Introduce rcutorture testing for rcu_barrier()Paul E. McKenney
Although rcutorture does invoke rcu_barrier() and friends, it cannot really be called a torture test given that it invokes them only once at the end of the test. This commit therefore introduces heavy-duty rcutorture testing for rcu_barrier(), which may be carried out concurrently with normal rcutorture testing. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Call out dangers of expedited RCU primitivesPaul E. McKenney
The expedited RCU primitives can be quite useful, but they have some high costs as well. This commit updates and creates docbook comments calling out the costs, and updates the RCU documentation as well. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Rework detection of use of RCU by offline CPUsPaul E. McKenney
Because newly offlined CPUs continue executing after completing the CPU_DYING notifiers, they legitimately enter the scheduler and use RCU while appearing to be offline. This calls for a more sophisticated approach as follows: 1. RCU marks the CPU online during the CPU_UP_PREPARE phase. 2. RCU marks the CPU offline during the CPU_DEAD phase. 3. Diagnostics regarding use of read-side RCU by offline CPUs use RCU's accounting rather than the cpu_online_map. (Note that __call_rcu() still uses cpu_online_map to detect illegal invocations within CPU_DYING notifiers.) 4. Offline CPUs are prevented from hanging the system by force_quiescent_state(), which pays attention to cpu_online_map. Some additional work (in a later commit) will be needed to guarantee that force_quiescent_state() waits a full jiffy before assuming that a CPU is offline, for example, when called from idle entry. (This commit also makes the one-jiffy wait explicit, since the old-style implicit wait can now be defeated by RCU_FAST_NO_HZ and by rcutorture.) This approach avoids the false positives encountered when attempting to use more exact classification of CPU online/offline state. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Update stall-warning documentationPaul E. McKenney
Add documentation of CONFIG_RCU_CPU_STALL_VERBOSE, CONFIG_RCU_CPU_STALL_INFO, and RCU_STALL_DELAY_DELTA. Describe multiple stall-warning messages from a single stall, and the timing of the subsequent messages. Add headings. Remove RCU_SECONDS_TILL_STALL_RECHECK because this value is now computed at runtime from RCU_CPU_STALL_TIMEOUT, so that sysfs changes to the timeout value now directly affect the RCU_SECONDS_TILL_STALL_RECHECK value. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Add CPU-stall capability to rcutorturePaul E. McKenney
Add module parameters to rcutorture that induce a CPU stall. The stall_cpu parameter specifies how long to stall in seconds, defaulting to zero, which indicates no stalling is to be undertaken. The stall_cpu_holdoff parameter specifies how many seconds after insmod (or boot, if rcutorture is built into the kernel) that this stall is to start. The default value for stall_cpu_holdoff is ten seconds. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Make documentation give more realistic rcutorture durationPaul E. McKenney
The torture.txt documentation gives an example rcutorture run with a 100-second duration. This is ridiculously short, unless maybe testing a fix for a egregious bug. Use a more-realistic one-hour duration for the example. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcutorture: Permit holding off CPU-hotplug operations during bootPaul E. McKenney
When rcutorture is started automatically at boot time, it might well also start CPU-hotplug operations at that time, which might not be desirable. This commit therefore adds an rcutorture parameter that allows CPU-hotplug operations to be held off for the specified number of seconds after the start of boot. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Bring RTFP.txt up to date.Paul E. McKenney
Add publications from 2010 and 2011 to RTFP.txt. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11docs: Additional LWN links to RCU APIKees Cook
Tyler Hicks pointed me at an additional article on RCU and I figured it should probably be mentioned with the others. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Add rcutorture CPU-hotplug capabilityPaul E. McKenney
Running CPU-hotplug operations concurrently with rcutorture has historically been a good way to find bugs in both RCU and CPU hotplug. This commit therefore adds an rcutorture module parameter called "onoff_interval" that causes a randomly selected CPU-hotplug operation to be executed at the specified interval, in seconds. The default value of "onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug operations. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Add rcutorture system-shutdown capabilityPaul E. McKenney
Although it is easy to run rcutorture tests under KVM, there is currently no nice way to run such a test for a fixed time period, collect all of the rcutorture data, and then shut the system down cleanly. This commit therefore adds an rcutorture module parameter named "shutdown_secs" that specified the run duration in seconds, after which rcutorture terminates the test and powers the system down. The default value for "shutdown_secs" is zero, which disables shutdown. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Add documentation for raw SRCU read-side primitivesPaul E. McKenney
Update various files in Documentation/RCU to reflect srcu_read_lock_raw() and srcu_read_unlock_raw(). Credit to Peter Zijlstra for suggesting use of the existing _raw suffix instead of the earlier bulkref names. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Document failing tick as cause of RCU CPU stall warningPaul E. McKenney
One of lclaudio's systems was seeing RCU CPU stall warnings from idle. These turned out to be caused by a bug that stopped scheduling-clock tick interrupts from being sent to a given CPU for several hundred seconds. This commit therefore updates the documentation to call this out as a possible cause for RCU CPU stall warnings. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11rcu: Track idleness independent of idle tasksPaul E. McKenney
Earlier versions of RCU used the scheduling-clock tick to detect idleness by checking for the idle task, but handled idleness differently for CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side critical sections in the idle task, for example, for tracing. A more fine-grained detection of idleness is therefore required. This commit presses the old dyntick-idle code into full-time service, so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is always invoked at the beginning of an idle loop iteration. Similarly, rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked at the end of an idle-loop iteration. This allows the idle task to use RCU everywhere except between consecutive rcu_idle_enter() and rcu_idle_exit() calls, in turn allowing architecture maintainers to specify exactly where in the idle loop that RCU may be used. Because some of the userspace upcall uses can result in what looks to RCU like half of an interrupt, it is not possible to expect that the irq_enter() and irq_exit() hooks will give exact counts. This patch therefore expands the ->dynticks_nesting counter to 64 bits and uses two separate bitfields to count process/idle transitions and interrupt entry/exit transitions. It is presumed that userspace upcalls do not happen in the idle loop or from usermode execution (though usermode might do a system call that results in an upcall). The counter is hard-reset on each process/idle transition, which avoids the interrupt entry/exit error from accumulating. Overflow is avoided by the 64-bitness of the ->dyntick_nesting counter. This commit also adds warnings if a non-idle task asks RCU to enter idle state (and these checks will need some adjustment before applying Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246). In addition, validation of ->dynticks and ->dynticks_nesting is added. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-09-28rcu: Document interpretation of RCU-lockdep splatsPaul E. McKenney
There has been quite a bit of confusion about what RCU-lockdep splats mean, so this commit adds some documentation describing how to interpret them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28rcu: Update documentation for additional RCU lockdep functionsPaul E. McKenney
Add documentation for rcu_dereference_bh_check(), rcu_dereference_sched_check(), srcu_dereference_check(), and rcu_dereference_index_check(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28rcu: Not necessary to pass rcu_read_lock_held() to rcu_dereference_protected()Michal Hocko
Since ca5ecddf (rcu: define __rcu address space modifier for sparse) rcu_dereference_check() use rcu_read_lock_held() as a part of condition automatically. Therefore, callers of rcu_dereference_check() no longer need to pass rcu_read_lock_held() to rcu_dereference_check(). Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>