Age | Commit message (Collapse) | Author |
|
Squash aerdrv_core.c into aerdrv.c. No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
|
|
The AER driver only needed the pcie_device to get to the port pci_dev.
Save the pci_dev pointer directly in struct aer_rpc and remove the
unnecessary indirection.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Remove unused "struct pcie_device *" parameters to handle_error_source()
and aer_process_err_devices(). No functional change intended.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Just move the actual function up so that it is visible to its user
aer_recover_queue().
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Pass the service type to pcie_do_fatal_recovery() instead of assuming AER.
We will make DPC also use pcie_do_fatal_recovery(), and it needs to do
things a little differently for AER and DPC.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Add generic pcie_port_find_service() routine.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
|
|
Move the error reporting callbacks from aerdrv_core.c to err.c, where they
can be used by DPC in addition to AER.
As part of aerdrv_core.c, these callbacks were built under CONFIG_PCIEAER.
Moving them to the new err.c means they will now be built under
CONFIG_PCIEPORTBUS, so adjust the definition of pci_uevent_ers() to match.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: in reset_link(), initialize "driver" even if CONFIG_PCIEAER is
unset, update pci_uevent_ers() #ifdef wrapper]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Rename error recovery interfaces with "pcie_" prefix so they can be made
non-static.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: move declaration to later patch, leave functions static]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
|
|
PCIe ERR_FATAL errors mean the Link is unreliable. Components on the Link
may need to be reset to return to reliable operation (PCIe r4.0, sec
6.2.2). We previously handled these errors much differently depending on
whether the platform supports Downstream Port Containment (DPC) (PCIe r4.0,
sec 6.2.10) or not.
The AER driver has historically logged the error details, called
driver-supplied pci_error_handlers callbacks, and reset the Link. This
reset downstream devices, but did not remove them from the PCI subsystem,
re-enumerate them, or call their driver .remove() or .probe() methods.
DPC is different because the hardware automatically disables the Link when
it detects ERR_FATAL, which resets downstream devices. There's no
opportunity for pci_error_handlers callbacks before resetting the Link.
The DPC driver removes affected devices (which calls their driver .remove()
methods), brings the Link back up, and re-enumerates (which calls driver
.probe() methods).
Align AER ERR_FATAL handling with DPC by resetting the Link in software,
skipping the driver pci_error_handlers callbacks, removing the devices from
the PCI subsystem, and re-enumerating. The idea is that drivers and
devices should see the same behavior for ERR_FATAL events, regardless of
whether they're handled by AER or DPC.
Here are the basic ERR_FATAL recovery steps, showing the previous AER
behavior, the AER behavior after this patch, and the DPC behavior:
AER AER DPC
previous new behavior
-------- --- --------
Log error yes yes yes (minimal)
drv.error_detected() yes no no
Reset Link yes yes yes
drv.mmio_enabled() yes no no
drv.slot_reset() yes no no
drv.resume() yes no no
Remove PCI devices no yes yes
(calls drv.remove())
Re-enumerate no yes yes
(calls drv.probe())
N.B. With DPC, the Link reset happens before the driver .remove() calls,
while with AER, the reset happens *after* the .remove() calls. The goal is
to eventually do the reset before .remove() for AER as well.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, squash doc patch into this, remove unused
"result_data"]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
|
|
Remove pointless comments that tell us the file name, remove blank line
comments, follow multi-line comment conventions. No functional change
intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- skip AER driver error recovery callbacks for correctable errors
reported via ACPI APEI, as we already do for errors reported via the
native path (Tyler Baicar)
- fix DPC shared interrupt handling (Alex Williamson)
- print full DPC interrupt number (Keith Busch)
- enable DPC only if AER is available (Keith Busch)
- simplify DPC code (Bjorn Helgaas)
- calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
Helgaas)
- enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
Helgaas)
- move ASPM internal interfaces out of public header (Bjorn Helgaas)
- allow hot-removal of VGA devices (Mika Westerberg)
- speed up unplug and shutdown by assuming Thunderbolt controllers
don't support Command Completed events (Lukas Wunner)
- add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
Jay Cornwall)
- expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)
- clean up PCI DMA interface usage (Christoph Hellwig)
- remove PCI pool API (replaced with DMA pool) (Romain Perier)
- deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
Kaya)
- move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)
- add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)
- remove warnings on sysfs mmap failure (Bjorn Helgaas)
- quiet ROM validation messages (Alex Deucher)
- remove redundant memory alloc failure messages (Markus Elfring)
- fill in types for compile-time VGA and other I/O port resources
(Bjorn Helgaas)
- make "pci=pcie_scan_all" work for Root Ports as well as Downstream
Ports to help AmigaOne X1000 (Bjorn Helgaas)
- add SPDX tags to all PCI files (Bjorn Helgaas)
- quirk Marvell 9128 DMA aliases (Alex Williamson)
- quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)
- fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
Cassel)
- use DMA API to get MSI address for DesignWare IP (Niklas Cassel)
- fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)
- fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)
- add support for ARTPEC-7 SoC (Niklas Cassel)
- add endpoint-mode support for ARTPEC (Niklas Cassel)
- add Cadence PCIe host and endpoint controller driver (Cyrille
Pitchen)
- handle multiple INTx status bits being set in dra7xx (Vignesh R)
- translate dra7xx hwirq range to fix INTD handling (Vignesh R)
- remove deprecated Exynos PHY initialization code (Jaehoon Chung)
- fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)
- fix NULL pointer dereference in iProc BCMA driver (Ray Jui)
- fix Keystone interrupt-controller-node lookup (Johan Hovold)
- constify qcom driver structures (Julia Lawall)
- rework Tegra config space mapping to increase space available for
endpoints (Vidya Sagar)
- simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)
- remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)
- add support for Global Fabric Manager Server (GFMS) event to
Microsemi Switchtec switch driver (Logan Gunthorpe)
- add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)
* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
PCI: endpoint: Fix EPF device name to support multi-function devices
PCI: endpoint: Add the function number as argument to EPC ops
PCI: cadence: Add host driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
PCI: Add vendor ID for Cadence
PCI: Add generic function to probe PCI host controllers
PCI: generic: fix missing call of pci_free_resource_list()
PCI: OF: Add generic function to parse and allocate PCI resources
PCI: Regroup all PCI related entries into drivers/pci/Makefile
PCI/DPC: Reformat DPC register definitions
PCI/DPC: Add and use DPC Status register field definitions
PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
PCI/DPC: Remove unnecessary RP PIO register structs
PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
PCI/DPC: Make RP PIO log size check more generic
PCI/DPC: Rename local "status" to "dpc_status"
PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
...
|
|
* pci/spdx:
PCI: Add SPDX GPL-2.0+ to replace implicit GPL v2 or later statement
PCI: Add SPDX GPL-2.0+ to replace GPL v2 or later boilerplate
PCI: Add SPDX GPL-2.0 to replace COPYING boilerplate
PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplate
PCI: Add SPDX GPL-2.0 when no license was specified
|
|
* pci/misc:
PCI: Add dummy pci_irqd_intx_xlate() for CONFIG_PCI=n build
PCI: Add wrappers for dev_printk()
PCI: Remove unnecessary messages for memory allocation failures
PCI: Add #defines for Completion Timeout Disable feature
hinic: Replace PCI pool old API
net: e100: Replace PCI pool old API
block: DAC960: Replace PCI pool old API
MAINTAINERS: Include more PCI files
PCI: Remove unneeded kallsyms include
powerpc/pci: Unroll two pass loop when scanning bridges
powerpc/pci: Use for_each_pci_bridge() helper
|
|
Add SPDX GPL-2.0 to all PCI files that referred to the kernel default
"COPYING" file, which specifies GPL version 2.
Remove the boilerplate language referring to the GPL and "COPYING", relying
on the assertion in b24413180f56 ("License cleanup: add SPDX GPL-2.0
license identifier to files with no license") that the SPDX identifier may
be used instead of the full boilerplate text.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Devices can go offline when erors reported. This patch adds a change
to the kernel object and lets udev know of error. When device resumes,
a change is also set reporting device as online. Therefore, EEH and
AER events are better propagated to user space for PCI devices in all
arches.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add PCI-specific dev_printk() wrappers and use them to simplify the code
slightly. No functional change intended.
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
[bhelgaas: squash into one patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
get_device_error_info() reads error information from registers in the AER
capability. If we call it for a device that has no AER capability, it
should return an error, but previously it returned success.
Return 0 (error) if the device doesn't have an AER capability.
Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
|
|
PCIe correctable errors are corrected by hardware. Software may log them,
but no other software intervention is required.
There are two paths to enter the AER recovery code: (1) the native path
where Linux fields the AER interrupt and reads the AER registers directly,
and (2) the ACPI path where firmware reads the AER registers and hands them
off to Linux via the ACPI APEI path.
The AER do_recovery() function calls driver error reporting callbacks
(error_detected(), mmio_enabled(), resume(), etc), attempts recovery (for
fatal errors), and logs a "AER: Device recovery successful" message.
Since there's nothing to recover for correctable errors, the native path
already skips do_recovery(), so it doesn't call the driver callbacks and or
emit the message. Make the APEI path do the same.
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Previously, if an non-fatal error was reported by an endpoint, we
called report_error_detected() for the endpoint, every sibling on the
bus, and their descendents. If any of them did not implement the
.error_detected() method, do_recovery() failed, leaving all these
devices unrecovered.
For example, the system described in the bugzilla below has two devices:
0000:74:02.0 [19e5:a230] SAS controller, driver has .error_detected()
0000:74:03.0 [19e5:a235] SATA controller, driver lacks .error_detected()
When a device such as 74:02.0 reported a non-fatal error, do_recovery()
failed because 74:03.0 lacked an .error_detected() method. But per PCIe
r3.1, sec 6.2.2.2.2, such an error does not compromise the Link and
does not affect 74:03.0:
Non-fatal errors are uncorrectable errors which cause a particular
transaction to be unreliable but the Link is otherwise fully functional.
Isolating Non-fatal from Fatal errors provides Requester/Receiver logic
in a device or system management software the opportunity to recover from
the error without resetting the components on the Link and disturbing
other transactions in progress. Devices not associated with the
transaction in error are not impacted by the error.
Report non-fatal errors only to the endpoint that reported them. We really
want to check for AER_NONFATAL here, but the current code structure doesn't
allow that. Looking for pci_channel_io_normal is the best we can do now.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Fix various typos and whitespace errors:
s/Synopsis/Synopsys/
s/Designware/DesignWare/
s/Keystine/Keystone/
s/gpio/GPIO/
s/pcie/PCIe/
s/phy/PHY/
s/confgiruation/configuration/
No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Save the position of the error reporting capability so it doesn't need to
be rediscovered during error handling.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Lukas Wunner <lukas@wunner.de>
|
|
When handling AER events, we previously allocated a struct aer_err_info,
processed the error, and freed the struct. But aer_isr_one_error() is
serialized by rpc_mutex, so we never need more than one copy of the struct,
and the struct is only about 70 bytes, so we're not saving much by
allocating it dynamically.
Embed a struct aer_err_info directly in struct aer_rpc, which is allocated
at probe-time by aer_probe().
[bhelgaas: changelog]
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Per the PCI Firmware spec, r3.0, sec 4.5.1, on ACPI systems, the OS must
not use AER unless _OSC is present and _OSC grants AER control to the OS.
The aerdriver.forceload kernel parameter was a way to enable Linux AER
support on ACPI systems that lack _OSC or fail to grant control the the OS.
Enabling Linux AER support when the firmware doesn't want us to is a recipe
for problems, e.g., the firmware might be handling AER itself.
Remove the aerdriver.forceload kernel parameter and related supporting
code.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The aerdriver.nosourceid kernel parameter was intended for working around
broken chipsets don't supply the source ID for AER events. We recently
added PCI_BUS_FLAGS_NO_AERSID, which can be set by quirks for the same
purpose.
Remove the aerdriver.nosourceid kernel parameter. For anything other than
debugging, asking users to find and use kernel parameters is a poor user
experience. Instead, we should add PCI_BUS_FLAGS_NO_AERSID quirks for any
hardware that needs it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Allow root port buses to choose to skip source id matching when finding the
faulting device. Certain root port devices may return an incorrect source
ID and recommend to scan child device registers for AER notifications.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
A Root Port's AER structure (rpc) contains a queue of events. aer_irq()
enqueues AER status information and schedules aer_isr() to dequeue and
process it. When we remove a device, aer_remove() waits for the queue to
be empty, then frees the rpc struct.
But aer_isr() references the rpc struct after dequeueing and possibly
emptying the queue, which can cause a use-after-free error as in the
following scenario with two threads, aer_isr() on the left and a
concurrent aer_remove() on the right:
Thread A Thread B
-------- --------
aer_irq():
rpc->prod_idx++
aer_remove():
wait_event(rpc->prod_idx == rpc->cons_idx)
# now blocked until queue becomes empty
aer_isr(): # ...
rpc->cons_idx++ # unblocked because queue is now empty
... kfree(rpc)
mutex_unlock(&rpc->rpc_mutex)
To prevent this problem, use flush_work() to wait until the last scheduled
instance of aer_isr() has completed before freeing the rpc struct in
aer_remove().
I reproduced this use-after-free by flashing a device FPGA and
re-enumerating the bus to find the new device. With SLUB debug, this
crashes with 0x6b bytes (POISON_FREE, the use-after-free magic number) in
GPR25:
pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
Unable to handle kernel paging request for data at address 0x27ef9e3e
Workqueue: events aer_isr
GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
NIP [602f5328] pci_walk_bus+0xd4/0x104
[bhelgaas: changelog, stable tag]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org
|
|
Bit 7 of the "Header Type" register indicates a multi-function device when
set. Bits 0-6 contain encoded values, where 0x1 indicates a PCI-PCI
bridge. It is incorrect to test this as though it were a mask.
For example, while the PCI 3.0 spec only defines values 0x0, 0x1, and 0x2,
it's conceivable that a future spec could define 0x3 to mean something
else; then tests for "(hdr_type & 0x7f) & PCI_HEADER_TYPE_BRIDGE" would
incorrectly succeed for this new 0x3 header type.
Test bits 0-6 of the Header Type for equality with PCI_HEADER_TYPE_BRIDGE.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
AER errors might be recorded when powering-on devices. These errors can be
ignored, so firmware usually clears them before the OS enumerates devices.
However, firmware is not involved when devices are added via hotplug, so
the OS may discover power-up errors that should be ignored. The same may
happen when powering up devices when resuming after suspend.
Clear the AER error status registers during enumeration and resume.
[bhelgaas: changelog, remove repetitive comments]
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Previously we assumed that PCIe Root Ports and Downstream Ports had Links
on their secondary side. That is true in most systems, but it is possible
to connect a switch with either an Upstream or a Downstream Port leading
downstream.
Instead of relying on the component type to identify devices that have
links leading downstream, use the "dev->has_secondary_link" field.
[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Fix various whitespace errors.
No functional change.
[bhelgaas: fix other similar problems]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Miscellaneous
- Remove duplicate disable from pcie_portdrv_remove() (Yinghai Lu)
- Fix whitespace, capitalization, and spelling errors (Bjorn Helgaas)"
* tag 'pci-v3.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Remove duplicate pci_disable_device() from pcie_portdrv_remove()
PCI: Fix whitespace, capitalization, and spelling errors
|
|
This patch enhances the type safety for the kfifo API. It is now safe
to put const data into a non const FIFO and the API will now generate a
compiler warning when reading from the fifo where the destination
address is pointing to a const variable.
As a side effect the kfifo_put() does now expect the value of an element
instead a pointer to the element. This was suggested Russell King. It
make the handling of the kfifo_put easier since there is no need to
create a helper variable for getting the address of a pointer or to pass
integers of different sizes.
IMHO the API break is okay, since there are currently only six users of
kfifo_put().
The code is also cleaner by kicking out the "if (0)" expressions.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix whitespace, capitalization, and spelling errors. No functional change.
I know "busses" is not an error, but "buses" was more common, so I used it
consistently.
Signed-off-by: Marta Rybczynska <rybczynska@gmail.com> (pci_reset_bridge_secondary_bus())
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
One PCI bus reset function to rule them all.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"PCI device hotplug
- Add pci_alloc_dev() interface (Gu Zheng)
- Add pci_bus_get()/put() for reference counting (Jiang Liu)
- Fix SR-IOV reference count issues (Jiang Liu)
- Remove unused acpi_pci_roots list (Jiang Liu)
MSI
- Conserve interrupt resources on x86 (Alexander Gordeev)
AER
- Force fatal severity when component has been reset (Betty Dall)
- Reset link below Root Port as well as Downstream Port (Betty Dall)
- Fix "Firmware first" flag setting (Bjorn Helgaas)
- Don't parse HEST for non-PCIe devices (Bjorn Helgaas)
ASPM
- Warn when we can't disable ASPM as driver requests (Bjorn Helgaas)
Miscellaneous
- Add CircuitCo PCI IDs (Darren Hart)
- Add AMD CZ SATA and SMBus PCI IDs (Shane Huang)
- Work around Ivytown NTB BAR size issue (Jon Mason)
- Detect invalid initial BAR values (Kevin Hao)
- Add pcibios_release_device() (Sebastian Ott)
- Fix powerpc & sparc PCI_UNKNOWN power state usage (Bjorn Helgaas)"
* tag 'pci-v3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits)
MAINTAINERS: Add ACPI folks for ACPI-related things under drivers/pci
PCI: Add CircuitCo vendor ID and subsystem ID
PCI: Use pdev->pm_cap instead of pci_find_capability(..,PCI_CAP_ID_PM)
PCI: Return early on allocation failures to unindent mainline code
PCI: Simplify IOV implementation and fix reference count races
PCI: Drop redundant setting of bus->is_added in virtfn_add_bus()
unicore32/PCI: Remove redundant call of pci_bus_add_devices()
m68k/PCI: Remove redundant call of pci_bus_add_devices()
PCI / ACPI / PM: Use correct power state strings in messages
PCI: Fix comment typo for pcie_pme_remove()
PCI: Rename pci_release_bus_bridge_dev() to pci_release_host_bridge_dev()
PCI: Fix refcount issue in pci_create_root_bus() error recovery path
ia64/PCI: Clean up pci_scan_root_bus() usage
PCI/AER: Reset link for devices below Root Port or Downstream Port
ACPI / APEI: Force fatal AER severity when component has been reset
PCI/AER: Remove "extern" from function declarations
PCI/AER: Move AER severity defines to aer.h
PCI/AER: Set dev->__aer_firmware_first only for matching devices
PCI/AER: Factor out HEST device type matching
PCI/AER: Don't parse HEST table for non-PCIe devices
...
|
|
When a PCIe device reports a fatal error, we reset the link leading
to it. Previously we only did this for devices below Downstream Ports,
not for devices directly below Root Ports.
This patch changes that so we reset the link leading to devices below
Root Ports just like we do for those below Downstream Ports.
[bhelgaas: changelog, keep dev_printk(KERN_DEBUG)]
Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.
WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()
This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer(). The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.
The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.
Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Change to remove local PCI_BUS() define and use the new PCI_BUS_NUM()
interface from PCI.
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
|
|
The function aer_recover_queue() calls pci_get_domain_bus_and_slot(), which
requires that the caller decrement the reference count with pci_dev_put().
This patch adds the missing call to pci_dev_put().
Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Shuah Khan <shuah.khan@hp.com>
CC: stable@vger.kernel.org
|
|
* pci/misc:
PCI/AER: Report success only when every device has AER-aware driver
Conflicts:
drivers/pci/pcie/aer/aerdrv_core.c
|
|
When an error is detected on a PCIe device which does not have an
AER-aware driver, prevent AER infrastructure from reporting
successful error recovery.
This is because the report_error_detected() function that gets
called in the first phase of recovery process allows forward
progress even when the driver for the device does not have AER
capabilities. It seems that all callbacks (in pci_error_handlers
structure) registered by drivers that gets called during error
recovery are not mandatory. So the intention of the infrastructure
design seems to be to allow forward progress even when a specific
callback has not been registered by a driver. However, if error
handler structure itself has not been registered, it doesn't make
sense to allow forward progress.
As a result of the current design, in the case of a single device
having an AER-unaware driver or in the case of any function in a
multi-function card having an AER-unaware driver, a successful
recovery is reported.
Typical scenario this happens is when a PCI device is detached
from a KVM host and the pci-stub driver on the host claims the
device. The pci-stub driver does not have error handling capabilities
but the AER infrastructure still reports that the device recovered
successfully.
The changes proposed here leaves the device(s)in an unrecovered state
if the driver for the device or for any device in the subtree
does not have error handler structure registered. This reflects
the true state of the device and prevents any partial recovery (or no
recovery at all) reported as successful.
[bhelgaas: changelog]
Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linas Vepstas <linasvepstas@gmail.com>
Reviewed-by: Myron Stowe <myron.stowe@redhat.com>
|
|
If a PCI device and its parents are put into D3cold, unbinding the
device will trigger deadlock as follow:
- driver_unbind
- device_release_driver
- device_lock(dev) <--- previous lock here
- __device_release_driver
- pm_runtime_get_sync
...
- rpm_resume(dev)
- rpm_resume(dev->parent)
...
- pci_pm_runtime_resume
...
- pci_set_power_state
- __pci_start_power_transition
- pci_wakeup_bus(dev->parent->subordinate)
- pci_walk_bus
- device_lock(dev) <--- deadlock here
If we do not do device_lock in pci_walk_bus, we can avoid deadlock.
Device_lock in pci_walk_bus is introduced in commit:
d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
is: https://lkml.org/lkml/2006/5/26/38. The patch author Zhang Yanmin
said device_lock is added to pci_walk_bus because:
Some error handling functions call pci_walk_bus. For example, PCIe
aer. Here we lock the device, so the driver wouldn't detach from the
device, as the cb might call driver's callback function.
So I fixed the deadlock as follows:
- remove device_lock from pci_walk_bus
- add device_lock into callback if callback will call driver's callback
I checked pci_walk_bus users one by one, and found only PCIe aer needs
device lock.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v3.6+
CC: Zhang Yanmin <yanmin.zhang@intel.com>
|
|
* pci/trivial:
PCI: Drop duplicate const in DECLARE_PCI_FIXUP_SECTION
PCI: Drop bogus default from ARCH_SUPPORTS_MSI
PCI: cpqphp: Remove unreachable path
PCI: Remove bus number resource debug messages
PCI/AER: Print completion message at KERN_INFO to match starting message
PCI: Fix drivers/pci/pci.c kernel-doc warnings
|
|
* pci/stephen-const:
make drivers with pci error handlers const
scsi: make pci error handlers const
netdev: make pci_error_handlers const
PCI: Make pci_error_handlers const
|
|
Since pci_error_handlers is just a function table make it const.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
|
|
The completion message in do_recovery() is currently KERN_DEBUG,
while the starting message in aer_print_port_info() is KERN_INFO.
This changes the completion message to KERN_INFO to match the
starting message.
[bhelgaas: changelog, use dev_info() instead of dev_printk(KERN_INFO)]
Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Use PCI Express Capability access functions to simplify PCIe AER.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Introduce an inline function pci_pcie_type(dev) to extract PCIe
device type from pci_dev->pcie_flags_reg field, and prepare for
removing pci_dev->pcie_type.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
In addition to native PCIe AER, now APEI (ACPI Platform Error
Interface) GHES (Generic Hardware Error Source) can be used to report
PCIe AER errors too. To add support to APEI GHES PCIe AER recovery,
aer_recover_queue is added to export the recovery function in native
PCIe AER driver.
Recoverable PCIe AER errors are reported via NMI in APEI GHES. Then
APEI GHES uses irq_work to delay the error processing into an IRQ
handler. But PCIe AER recovery can be very time-consuming, so
aer_recover_queue, which can be used in IRQ handler, delays the real
recovery action into the process context, that is, work queue.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|