summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/DocBook/genericirq.tmpl84
-rw-r--r--Documentation/DocBook/kernel-locking.tmpl14
-rw-r--r--Documentation/RCU/checklist.txt46
-rw-r--r--Documentation/RCU/stallwarn.txt18
-rw-r--r--Documentation/RCU/trace.txt13
-rw-r--r--Documentation/cputopology.txt23
-rw-r--r--Documentation/feature-removal-schedule.txt28
-rw-r--r--Documentation/kernel-parameters.txt11
-rw-r--r--Documentation/kprobes.txt8
-rw-r--r--Documentation/pcmcia/driver-changes.txt25
10 files changed, 185 insertions, 85 deletions
diff --git a/Documentation/DocBook/genericirq.tmpl b/Documentation/DocBook/genericirq.tmpl
index 1448b33fd222..fb10fd08c05c 100644
--- a/Documentation/DocBook/genericirq.tmpl
+++ b/Documentation/DocBook/genericirq.tmpl
@@ -28,7 +28,7 @@
</authorgroup>
<copyright>
- <year>2005-2006</year>
+ <year>2005-2010</year>
<holder>Thomas Gleixner</holder>
</copyright>
<copyright>
@@ -100,6 +100,10 @@
<listitem><para>Edge type</para></listitem>
<listitem><para>Simple type</para></listitem>
</itemizedlist>
+ During the implementation we identified another type:
+ <itemizedlist>
+ <listitem><para>Fast EOI type</para></listitem>
+ </itemizedlist>
In the SMP world of the __do_IRQ() super-handler another type
was identified:
<itemizedlist>
@@ -153,6 +157,7 @@
is still available. This leads to a kind of duality for the time
being. Over time the new model should be used in more and more
architectures, as it enables smaller and cleaner IRQ subsystems.
+ It's deprecated for three years now and about to be removed.
</para>
</chapter>
<chapter id="bugs">
@@ -217,6 +222,7 @@
<itemizedlist>
<listitem><para>handle_level_irq</para></listitem>
<listitem><para>handle_edge_irq</para></listitem>
+ <listitem><para>handle_fasteoi_irq</para></listitem>
<listitem><para>handle_simple_irq</para></listitem>
<listitem><para>handle_percpu_irq</para></listitem>
</itemizedlist>
@@ -233,33 +239,33 @@
are used by the default flow implementations.
The following helper functions are implemented (simplified excerpt):
<programlisting>
-default_enable(irq)
+default_enable(struct irq_data *data)
{
- desc->chip->unmask(irq);
+ desc->chip->irq_unmask(data);
}
-default_disable(irq)
+default_disable(struct irq_data *data)
{
- if (!delay_disable(irq))
- desc->chip->mask(irq);
+ if (!delay_disable(data))
+ desc->chip->irq_mask(data);
}
-default_ack(irq)
+default_ack(struct irq_data *data)
{
- chip->ack(irq);
+ chip->irq_ack(data);
}
-default_mask_ack(irq)
+default_mask_ack(struct irq_data *data)
{
- if (chip->mask_ack) {
- chip->mask_ack(irq);
+ if (chip->irq_mask_ack) {
+ chip->irq_mask_ack(data);
} else {
- chip->mask(irq);
- chip->ack(irq);
+ chip->irq_mask(data);
+ chip->irq_ack(data);
}
}
-noop(irq)
+noop(struct irq_data *data))
{
}
@@ -278,12 +284,27 @@ noop(irq)
<para>
The following control flow is implemented (simplified excerpt):
<programlisting>
-desc->chip->start();
+desc->chip->irq_mask();
handle_IRQ_event(desc->action);
-desc->chip->end();
+desc->chip->irq_unmask();
</programlisting>
</para>
- </sect3>
+ </sect3>
+ <sect3 id="Default_FASTEOI_IRQ_flow_handler">
+ <title>Default Fast EOI IRQ flow handler</title>
+ <para>
+ handle_fasteoi_irq provides a generic implementation
+ for interrupts, which only need an EOI at the end of
+ the handler
+ </para>
+ <para>
+ The following control flow is implemented (simplified excerpt):
+ <programlisting>
+handle_IRQ_event(desc->action);
+desc->chip->irq_eoi();
+ </programlisting>
+ </para>
+ </sect3>
<sect3 id="Default_Edge_IRQ_flow_handler">
<title>Default Edge IRQ flow handler</title>
<para>
@@ -294,20 +315,19 @@ desc->chip->end();
The following control flow is implemented (simplified excerpt):
<programlisting>
if (desc->status &amp; running) {
- desc->chip->hold();
+ desc->chip->irq_mask();
desc->status |= pending | masked;
return;
}
-desc->chip->start();
+desc->chip->irq_ack();
desc->status |= running;
do {
if (desc->status &amp; masked)
- desc->chip->enable();
+ desc->chip->irq_unmask();
desc->status &amp;= ~pending;
handle_IRQ_event(desc->action);
} while (status &amp; pending);
desc->status &amp;= ~running;
-desc->chip->end();
</programlisting>
</para>
</sect3>
@@ -342,9 +362,9 @@ handle_IRQ_event(desc->action);
<para>
The following control flow is implemented (simplified excerpt):
<programlisting>
-desc->chip->start();
handle_IRQ_event(desc->action);
-desc->chip->end();
+if (desc->chip->irq_eoi)
+ desc->chip->irq_eoi();
</programlisting>
</para>
</sect3>
@@ -375,8 +395,7 @@ desc->chip->end();
mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when
you want to use the delayed interrupt disable feature and your
hardware is not capable of retriggering an interrupt.)
- The delayed interrupt disable can be runtime enabled, per interrupt,
- by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field.
+ The delayed interrupt disable is not configurable.
</para>
</sect2>
</sect1>
@@ -387,13 +406,13 @@ desc->chip->end();
contains all the direct chip relevant functions, which
can be utilized by the irq flow implementations.
<itemizedlist>
- <listitem><para>ack()</para></listitem>
- <listitem><para>mask_ack() - Optional, recommended for performance</para></listitem>
- <listitem><para>mask()</para></listitem>
- <listitem><para>unmask()</para></listitem>
- <listitem><para>retrigger() - Optional</para></listitem>
- <listitem><para>set_type() - Optional</para></listitem>
- <listitem><para>set_wake() - Optional</para></listitem>
+ <listitem><para>irq_ack()</para></listitem>
+ <listitem><para>irq_mask_ack() - Optional, recommended for performance</para></listitem>
+ <listitem><para>irq_mask()</para></listitem>
+ <listitem><para>irq_unmask()</para></listitem>
+ <listitem><para>irq_retrigger() - Optional</para></listitem>
+ <listitem><para>irq_set_type() - Optional</para></listitem>
+ <listitem><para>irq_set_wake() - Optional</para></listitem>
</itemizedlist>
These primitives are strictly intended to mean what they say: ack means
ACK, masking means masking of an IRQ line, etc. It is up to the flow
@@ -458,6 +477,7 @@ desc->chip->end();
<para>
This chapter contains the autogenerated documentation of the internal functions.
</para>
+!Ikernel/irq/irqdesc.c
!Ikernel/irq/handle.c
!Ikernel/irq/chip.c
</chapter>
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index a0d479d1e1dd..f66f4df18690 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1645,7 +1645,9 @@ the amount of locking which needs to be done.
all the readers who were traversing the list when we deleted the
element are finished. We use <function>call_rcu()</function> to
register a callback which will actually destroy the object once
- the readers are finished.
+ all pre-existing readers are finished. Alternatively,
+ <function>synchronize_rcu()</function> may be used to block until
+ all pre-existing are finished.
</para>
<para>
But how does Read Copy Update know when the readers are
@@ -1714,7 +1716,7 @@ the amount of locking which needs to be done.
- object_put(obj);
+ list_del_rcu(&amp;obj-&gt;list);
cache_num--;
-+ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu, obj);
++ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu);
}
/* Must be holding cache_lock */
@@ -1725,14 +1727,6 @@ the amount of locking which needs to be done.
if (++cache_num > MAX_CACHE_SIZE) {
struct object *i, *outcast = NULL;
list_for_each_entry(i, &amp;cache, list) {
-@@ -85,6 +94,7 @@
- obj-&gt;popularity = 0;
- atomic_set(&amp;obj-&gt;refcnt, 1); /* The cache holds a reference */
- spin_lock_init(&amp;obj-&gt;lock);
-+ INIT_RCU_HEAD(&amp;obj-&gt;rcu);
-
- spin_lock_irqsave(&amp;cache_lock, flags);
- __cache_add(obj);
@@ -104,12 +114,11 @@
struct object *cache_find(int id)
{
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 790d1a812376..0c134f8afc6f 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -218,13 +218,22 @@ over a rather long period of time, but improvements are always welcome!
include:
a. Keeping a count of the number of data-structure elements
- used by the RCU-protected data structure, including those
- waiting for a grace period to elapse. Enforce a limit
- on this number, stalling updates as needed to allow
- previously deferred frees to complete.
-
- Alternatively, limit only the number awaiting deferred
- free rather than the total number of elements.
+ used by the RCU-protected data structure, including
+ those waiting for a grace period to elapse. Enforce a
+ limit on this number, stalling updates as needed to allow
+ previously deferred frees to complete. Alternatively,
+ limit only the number awaiting deferred free rather than
+ the total number of elements.
+
+ One way to stall the updates is to acquire the update-side
+ mutex. (Don't try this with a spinlock -- other CPUs
+ spinning on the lock could prevent the grace period
+ from ever ending.) Another way to stall the updates
+ is for the updates to use a wrapper function around
+ the memory allocator, so that this wrapper function
+ simulates OOM when there is too much memory awaiting an
+ RCU grace period. There are of course many other
+ variations on this theme.
b. Limiting update rate. For example, if updates occur only
once per hour, then no explicit rate limiting is required,
@@ -365,3 +374,26 @@ over a rather long period of time, but improvements are always welcome!
and the compiler to freely reorder code into and out of RCU
read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this.
+
+17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
+ the __rcu sparse checks to validate your RCU code. These
+ can help find problems as follows:
+
+ CONFIG_PROVE_RCU: check that accesses to RCU-protected data
+ structures are carried out under the proper RCU
+ read-side critical section, while holding the right
+ combination of locks, or whatever other conditions
+ are appropriate.
+
+ CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
+ same object to call_rcu() (or friends) before an RCU
+ grace period has elapsed since the last time that you
+ passed that same object to call_rcu() (or friends).
+
+ __rcu sparse checks: tag the pointer to the RCU-protected data
+ structure with __rcu, and sparse will warn you if you
+ access that pointer without the services of one of the
+ variants of rcu_dereference().
+
+ These debugging aids can help you find problems that are
+ otherwise extremely difficult to spot.
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 44c6dcc93d6d..862c08ef1fde 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -80,6 +80,24 @@ o A CPU looping with bottom halves disabled. This condition can
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule().
+o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
+ happen to preempt a low-priority task in the middle of an RCU
+ read-side critical section. This is especially damaging if
+ that low-priority task is not permitted to run on any other CPU,
+ in which case the next RCU grace period can never complete, which
+ will eventually cause the system to run out of memory and hang.
+ While the system is in the process of running itself out of
+ memory, you might see stall-warning messages.
+
+o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
+ is running at a higher priority than the RCU softirq threads.
+ This will prevent RCU callbacks from ever being invoked,
+ and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
+ RCU grace periods from ever completing. Either way, the
+ system will eventually run out of memory and hang. In the
+ CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
+ messages.
+
o A bug in the RCU implementation.
o A hardware failure. This is quite unlikely, but has occurred
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index efd8cc95c06b..a851118775d8 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -125,6 +125,17 @@ o "b" is the batch limit for this CPU. If more than this number
of RCU callbacks is ready to invoke, then the remainder will
be deferred.
+o "ci" is the number of RCU callbacks that have been invoked for
+ this CPU. Note that ci+ql is the number of callbacks that have
+ been registered in absence of CPU-hotplug activity.
+
+o "co" is the number of RCU callbacks that have been orphaned due to
+ this CPU going offline.
+
+o "ca" is the number of RCU callbacks that have been adopted due to
+ other CPUs going offline. Note that ci+co-ca+ql is the number of
+ RCU callbacks registered on this CPU.
+
There is also an rcu/rcudata.csv file with the same information in
comma-separated-variable spreadsheet format.
@@ -180,7 +191,7 @@ o "s" is the "signaled" state that drives force_quiescent_state()'s
o "jfq" is the number of jiffies remaining for this grace period
before force_quiescent_state() is invoked to help push things
- along. Note that CPUs in dyntick-idle mode thoughout the grace
+ along. Note that CPUs in dyntick-idle mode throughout the grace
period will not report on their own, but rather must be check by
some other CPU via force_quiescent_state().
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index f1c5c4bccd3e..902d3151f527 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -14,25 +14,39 @@ to /proc/cpuinfo.
identifier (rather than the kernel's). The actual value is
architecture and platform dependent.
-3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
+3) /sys/devices/system/cpu/cpuX/topology/book_id:
+
+ the book ID of cpuX. Typically it is the hardware platform's
+ identifier (rather than the kernel's). The actual value is
+ architecture and platform dependent.
+
+4) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
internel kernel map of cpuX's hardware threads within the same
core as cpuX
-4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
+5) /sys/devices/system/cpu/cpuX/topology/core_siblings:
internal kernel map of cpuX's hardware threads within the same
physical_package_id.
+6) /sys/devices/system/cpu/cpuX/topology/book_siblings:
+
+ internal kernel map of cpuX's hardware threads within the same
+ book_id.
+
To implement it in an architecture-neutral way, a new source file,
-drivers/base/topology.c, is to export the 4 attributes.
+drivers/base/topology.c, is to export the 4 or 6 attributes. The two book
+related sysfs files will only be created if CONFIG_SCHED_BOOK is selected.
For an architecture to support this feature, it must define some of
these macros in include/asm-XXX/topology.h:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
+#define topology_book_id(cpu)
#define topology_thread_cpumask(cpu)
#define topology_core_cpumask(cpu)
+#define topology_book_cpumask(cpu)
The type of **_id is int.
The type of siblings is (const) struct cpumask *.
@@ -45,6 +59,9 @@ not defined by include/asm-XXX/topology.h:
3) thread_siblings: just the given CPU
4) core_siblings: just the given CPU
+For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
+default definitions for topology_book_id() and topology_book_cpumask().
+
Additionally, CPU topology information is provided under
/sys/devices/system/cpu and includes these files. The internal
source for the output is in brackets ("[]").
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 842aa9de84a6..5e2bc4ab897a 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -386,34 +386,6 @@ Who: Tejun Heo <tj@kernel.org>
----------------------------
-What: Support for VMware's guest paravirtuliazation technique [VMI] will be
- dropped.
-When: 2.6.37 or earlier.
-Why: With the recent innovations in CPU hardware acceleration technologies
- from Intel and AMD, VMware ran a few experiments to compare these
- techniques to guest paravirtualization technique on VMware's platform.
- These hardware assisted virtualization techniques have outperformed the
- performance benefits provided by VMI in most of the workloads. VMware
- expects that these hardware features will be ubiquitous in a couple of
- years, as a result, VMware has started a phased retirement of this
- feature from the hypervisor. We will be removing this feature from the
- Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
- technical reasons (read opportunity to remove major chunk of pvops)
- arise.
-
- Please note that VMI has always been an optimization and non-VMI kernels
- still work fine on VMware's platform.
- Latest versions of VMware's product which support VMI are,
- Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
- releases for these products will continue supporting VMI.
-
- For more details about VMI retirement take a look at this,
- http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
-
-Who: Alok N Kataria <akataria@vmware.com>
-
-----------------------------
-
What: Support for lcd_switch and display_get in asus-laptop driver
When: March 2010
Why: These two features use non-standard interfaces. There are the
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 8dd7248508a9..3a0009e03d14 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -455,7 +455,7 @@ and is between 256 and 4096 characters. It is defined in the file
[ARM] imx_timer1,OSTS,netx_timer,mpu_timer2,
pxa_timer,timer3,32k_counter,timer0_1
[AVR32] avr32
- [X86-32] pit,hpet,tsc,vmi-timer;
+ [X86-32] pit,hpet,tsc;
scx200_hrt on Geode; cyclone on IBM x440
[MIPS] MIPS
[PARISC] cr16
@@ -2153,6 +2153,11 @@ and is between 256 and 4096 characters. It is defined in the file
Reserves a hole at the top of the kernel virtual
address space.
+ reservelow= [X86]
+ Format: nn[K]
+ Set the amount of memory to reserve for BIOS at
+ the bottom of the address space.
+
reset_devices [KNL] Force drivers to reset the underlying device
during initialization.
@@ -2435,6 +2440,10 @@ and is between 256 and 4096 characters. It is defined in the file
disables clocksource verification at runtime.
Used to enable high-resolution timer mode on older
hardware, and in virtualized environment.
+ [x86] noirqtime: Do not use TSC to do irq accounting.
+ Used to run time disable IRQ_TIME_ACCOUNTING on any
+ platforms where RDTSC is slow and this accounting
+ can add overhead.
turbografx.map[2|3]= [HW,JOY]
TurboGraFX parallel port interface
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 1762b81fcdf2..741fe66d6eca 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -542,9 +542,11 @@ Kprobes does not use mutexes or allocate memory except during
registration and unregistration.
Probe handlers are run with preemption disabled. Depending on the
-architecture, handlers may also run with interrupts disabled. In any
-case, your handler should not yield the CPU (e.g., by attempting to
-acquire a semaphore).
+architecture and optimization state, handlers may also run with
+interrupts disabled (e.g., kretprobe handlers and optimized kprobe
+handlers run without interrupt disabled on x86/x86-64). In any case,
+your handler should not yield the CPU (e.g., by attempting to acquire
+a semaphore).
Since a return probe is implemented by replacing the return
address with the trampoline's address, stack backtraces and calls
diff --git a/Documentation/pcmcia/driver-changes.txt b/Documentation/pcmcia/driver-changes.txt
index 26c0f9c00545..dd04361dd361 100644
--- a/Documentation/pcmcia/driver-changes.txt
+++ b/Documentation/pcmcia/driver-changes.txt
@@ -1,4 +1,29 @@
This file details changes in 2.6 which affect PCMCIA card driver authors:
+* pcmcia_loop_config() and autoconfiguration (as of 2.6.36)
+ If struct pcmcia_device *p_dev->config_flags is set accordingly,
+ pcmcia_loop_config() now sets up certain configuration values
+ automatically, though the driver may still override the settings
+ in the callback function. The following autoconfiguration options
+ are provided at the moment:
+ CONF_AUTO_CHECK_VCC : check for matching Vcc
+ CONF_AUTO_SET_VPP : set Vpp
+ CONF_AUTO_AUDIO : auto-enable audio line, if required
+ CONF_AUTO_SET_IO : set ioport resources (->resource[0,1])
+ CONF_AUTO_SET_IOMEM : set first iomem resource (->resource[2])
+
+* pcmcia_request_configuration -> pcmcia_enable_device (as of 2.6.36)
+ pcmcia_request_configuration() got renamed to pcmcia_enable_device(),
+ as it mirrors pcmcia_disable_device(). Configuration settings are now
+ stored in struct pcmcia_device, e.g. in the fields config_flags,
+ config_index, config_base, vpp.
+
+* pcmcia_request_window changes (as of 2.6.36)
+ Instead of win_req_t, drivers are now requested to fill out
+ struct pcmcia_device *p_dev->resource[2,3,4,5] for up to four ioport
+ ranges. After a call to pcmcia_request_window(), the regions found there
+ are reserved and may be used immediately -- until pcmcia_release_window()
+ is called.
+
* pcmcia_request_io changes (as of 2.6.36)
Instead of io_req_t, drivers are now requested to fill out
struct pcmcia_device *p_dev->resource[0,1] for up to two ioport