summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-20PCI/PM: Expand PM reset messages to mention D3hot (not just D3)Bjorn Helgaas
pci_pm_reset() resets a device by putting it in D3hot and bringing it back to D0. Clarify related messages to mention "D3hot" explicitly instead of just "D3". Link: https://lore.kernel.org/r/20191101204558.210235-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Apply D2 delay as milliseconds, not microsecondsBjorn Helgaas
PCI_PM_D2_DELAY is defined as 200, which is milliseconds, but previously we used udelay(), which only waited for 200 microseconds. Use msleep() instead so we wait the correct amount of time. See PCIe r5.0, sec 5.9. Link: https://lore.kernel.org/r/20191101204558.210235-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Use pci_WARN() to include device informationBjorn Helgaas
Add and use pci_WARN() wrappers so warnings include device information. Link: https://lore.kernel.org/r/20191017212851.54237-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Use PCI dev_printk() wrappers for consistencyBjorn Helgaas
Use the PCI dev_printk() wrappers for consistency with the rest of the PCI core. No functional change intended. Link: https://lore.kernel.org/r/20191017212851.54237-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Wrap long lines in documentationBjorn Helgaas
Documentation/power/pci.rst is wrapped to fit in 80 columns, but directory structure changes made a few lines longer. Wrap them so they all fit in 80 columns again. Link: https://lore.kernel.org/r/20191014230016.240912-7-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Note that PME can be generated from D0Bjorn Helgaas
Per PCIe r5.0 sec 7.5.2.1, PME may be generated from D0, so update Documentation/power/pci.rst to reflect that. Link: https://lore.kernel.org/r/20191016194450.68959-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Make power management op coding style consistentBjorn Helgaas
Some of the power management ops use this style: struct device_driver *drv = dev->driver; if (drv && drv->pm && drv->pm->prepare(dev)) drv->pm->prepare(dev); while others use this: const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; if (pm && pm->runtime_resume) pm->runtime_resume(dev); Convert the first style to the second so they're all consistent. Remove local "error" variables when unnecessary. No functional change intended. Link: https://lore.kernel.org/r/20191014230016.240912-6-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Run resume fixups before disabling wakeup eventsBjorn Helgaas
pci_pm_resume() and pci_pm_restore() call pci_pm_default_resume(), which runs resume fixups before disabling wakeup events: static void pci_pm_default_resume(struct pci_dev *pci_dev) { pci_fixup_device(pci_fixup_resume, pci_dev); pci_enable_wake(pci_dev, PCI_D0, false); } pci_pm_runtime_resume() does both of these, but in the opposite order: pci_enable_wake(pci_dev, PCI_D0, false); pci_fixup_device(pci_fixup_resume, pci_dev); We should always use the same ordering unless there's a reason to do otherwise. Change pci_pm_runtime_resume() to call pci_pm_default_resume() instead of open-coding this, so the fixups are always done before disabling wakeup events. pci_pm_default_resume() is called from pci_pm_runtime_resume(), which is under #ifdef CONFIG_PM. If SUSPEND and HIBERNATION are disabled, PM_SLEEP is disabled also, so move pci_pm_default_resume() from #ifdef CONFIG_PM_SLEEP to #ifdef CONFIG_PM. Link: https://lore.kernel.org/r/20191014230016.240912-5-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Clear PCIe PME Status even for legacy power managementBjorn Helgaas
Previously, pci_pm_resume_noirq() cleared the PME Status bit in the Root Status register only if the device had no driver or the driver did not implement legacy power management. It should clear PME Status regardless of what sort of power management the driver supports, so do this before checking for legacy power management. This affects Root Ports and Root Complex Event Collectors, for which the usual driver is the PCIe portdrv, which implements new power management, so this change is just on principle, not to fix any actual defects. Fixes: a39bd851dccf ("PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver") Link: https://lore.kernel.org/r/20191014230016.240912-4-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Correct pci_pm_thaw_noirq() documentationBjorn Helgaas
According to the documentation, pci_pm_thaw_noirq() did not put the device into the full-power state and restore its standard configuration registers. This is incorrect, so update the documentation to match the code. Link: https://lore.kernel.org/r/20191014230016.240912-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-20PCI/PM: Always return devices to D0 when thawingDexuan Cui
pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its configuration registers, but previously it only did that for devices whose drivers implemented the new power management ops. Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing devices, creating a hibernation image, thawing devices, writing the image, and powering off. The fact that thawing did not return devices with legacy power management to D0 caused errors, e.g., in this path: pci_pm_thaw_noirq if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver return pci_legacy_resume_early(dev) # ... legacy PM skips the rest pci_set_power_state(pci_dev, PCI_D0) pci_restore_state(pci_dev) pci_pm_thaw if (pci_has_legacy_pm_support(pci_dev)) pci_legacy_resume drv->resume mlx4_resume ... pci_enable_msix_range ... if (dev->current_state != PCI_D0) # <--- return -EINVAL; which caused these warnings: mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95 PM: Device a6d1:00:02.0 failed to thaw: error -95 Return devices to D0 and restore config registers for all devices, not just those whose drivers support new power management. [bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(), update comment, add stable tag, commit log] Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org # v4.13+
2019-11-14PCI: Unify ACS quirk desired vs provided checkingBjorn Helgaas
Most of the ACS quirks have a similar pattern of: acs_flags &= ~( <controls provided by this device> ); return acs_flags ? 0 : 1; Pull this out into a helper function to simplify the quirks slightly. The helper function is also a convenient place for comments about what the list of ACS controls means. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-14PCI: Make ACS quirk implementations more uniformBjorn Helgaas
The ACS quirks differ in needless ways, which makes them look more different than they really are. Reorder the ACS flags in order of definitions in the spec: PCI_ACS_SV Source Validation PCI_ACS_TB Translation Blocking PCI_ACS_RR P2P Request Redirect PCI_ACS_CR P2P Completion Redirect PCI_ACS_UF Upstream Forwarding PCI_ACS_EC P2P Egress Control PCI_ACS_DT Direct Translated P2P (PCIe r5.0, sec 7.7.8.2) and use similar code structure in all. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-14PCI: Do not use bus number zero from EA capabilitySubbaraya Sundeep
As per PCIe r5.0, sec 7.8.5.2, fixed bus numbers of a bridge must be zero when no function that uses EA is located behind it. Hence, if EA supplies bus numbers of zero, assign bus numbers normally. A secondary bus can never have a bus number of zero, so setting a bridge's Secondary Bus Number to zero makes downstream devices unreachable. [bhelgaas: retain bool return value so "zero is invalid" logic is local] Fixes: 2dbce5901179 ("PCI: Assign bus numbers present in EA capability for bridges") Link: https://lore.kernel.org/r/1572850664-9861-1-git-send-email-sundeep.lkml@gmail.com Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v5.2+
2019-11-14PCI: Avoid double hpmemsize MMIO window assignmentNicholas Johnson
Previously, the kernel sometimes assigned more MMIO or MMIO_PREF space than desired. For example, if the user requested 128M of space with "pci=realloc,hpmemsize=128M", we sometimes assigned 256M: pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M With this patch applied: pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M This happened when in the first pass, the MMIO_PREF succeeded but the MMIO failed. In the next pass, because MMIO_PREF was already assigned, the attempt to assign MMIO_PREF returned an error code instead of success (nothing more to do, already allocated). Hence, the size which was actually allocated, but thought to have failed, was placed in the MMIO window. The bug resulted in the MMIO_PREF being added to the MMIO window, which meant doubling if MMIO_PREF size = MMIO size. With a large MMIO_PREF, the MMIO window would likely fail to be assigned altogether due to lack of 32-bit address space. Change find_free_bus_resource() to do the following: - Return first unassigned resource of the correct type. - If there is none, return first assigned resource of the correct type. - If none of the above, return NULL. Returning an assigned resource of the correct type allows the caller to distinguish between already assigned and no resource of the correct type. Add checks in pbus_size_io() and pbus_size_mem() to return success if resource returned from find_free_bus_resource() is already allocated. This avoids pbus_size_io() and pbus_size_mem() returning error code to __pci_bus_size_bridges() when a resource has been successfully assigned in a previous pass. This fixes the existing behaviour where space for a resource could be reserved multiple times in different parent bridge windows. Link: https://lore.kernel.org/lkml/20190531171216.20532-2-logang@deltatee.com/T/#u Link: https://bugzilla.kernel.org/show_bug.cgi?id=203243 Link: https://lore.kernel.org/r/PS2P216MB075563AA6AD242AA666EDC6A80760@PS2P216MB0755.KORP216.PROD.OUTLOOK.COM Reported-by: Kit Chow <kchow@gigaio.com> Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au> Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2019-11-13ACPI / hotplug / PCI: Allocate resources directly under the non-hotplug bridgeMika Westerberg
Valerio and others reported that commit 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") prevents some recent LG and HP laptops from booting with endless loop of: ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835) ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835) ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835) ... What seems to happen is that during boot, after the initial PCI enumeration when EC is enabled the platform triggers ACPI Notify() to one of the root ports. The root port itself looks like this: pci 0000:00:1b.0: PCI bridge to [bus 02-3a] pci 0000:00:1b.0: bridge window [mem 0xc4000000-0xda0fffff] pci 0000:00:1b.0: bridge window [mem 0x80000000-0xa1ffffff 64bit pref] The BIOS has configured the root port so that it does not have I/O bridge window. Now when the ACPI Notify() is triggered ACPI hotplug handler calls acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system is using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() to allocate resources. The device connected to the root port is a PCIe switch (Thunderbolt controller) with two hotplug downstream ports. Because of the hotplug ports __pci_bus_size_bridges() tries to add "additional I/O" of 256 bytes to each (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k as that's the minimum I/O window size so each hotplug port gets 4k I/O window and the same happens for the root port (which is also hotplug port). This means 3 * 4k = 12k I/O window. Because of this pci_assign_unassigned_bridge_resources() ends up opening a I/O bridge window for the root port at first available I/O address which seems to be in range 0x1000 - 0x3fff. Normally this range is used for ACPI stuff such as GPE bits (below is part of /proc/ioports): 1800-1803 : ACPI PM1a_EVT_BLK 1804-1805 : ACPI PM1a_CNT_BLK 1808-180b : ACPI PM_TMR 1810-1815 : ACPI CPU throttle 1850-1850 : ACPI PM2_CNT_BLK 1854-1857 : pnp 00:05 1860-187f : ACPI GPE0_BLK However, when the ACPI Notify() happened this range was not yet reserved for ACPI/PNP (that happens later) so PCI gets it. It then starts writing to this range and accidentally stomps over GPE bits among other things causing the endless stream of messages about missing GPE handler. This problem does not happen if "pci=hpiosize=0" is passed in the kernel command line. The reason is that then the kernel does not try to allocate the additional 256 bytes for each hotplug port. Fix this by allocating resources directly below the non-hotplug bridges where a new device may appear as a result of ACPI Notify(). This avoids the hotplug bridges and prevents opening the additional I/O window. Fixes: 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug") Link: https://bugzilla.kernel.org/show_bug.cgi?id=203617 Link: https://lore.kernel.org/r/20191030150545.19885-1-mika.westerberg@linux.intel.com Reported-by: Valerio Passini <passini.valerio@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org
2019-11-12PCI: pciehp: Prevent deadlock on disconnectMika Westerberg
This addresses deadlocks in these common cases in hierarchies containing two switches: - All involved ports are runtime suspended and they are unplugged. This can happen easily if the drivers involved automatically enable runtime PM (xHCI for example does that). - System is suspended (e.g., closing the lid on a laptop) with a dock + something else connected, and the dock is unplugged while suspended. These cases lead to the following deadlock: INFO: task irq/126-pciehp:198 blocked for more than 120 seconds. irq/126-pciehp D 0 198 2 0x80000000 Call Trace: schedule+0x2c/0x80 schedule_timeout+0x246/0x350 wait_for_completion+0xb7/0x140 kthread_stop+0x49/0x110 free_irq+0x32/0x70 pcie_shutdown_notification+0x2f/0x50 pciehp_remove+0x27/0x50 pcie_port_remove_service+0x36/0x50 device_release_driver+0x12/0x20 bus_remove_device+0xec/0x160 device_del+0x13b/0x350 device_unregister+0x1a/0x60 remove_iter+0x1e/0x30 device_for_each_child+0x56/0x90 pcie_port_device_remove+0x22/0x40 pcie_portdrv_remove+0x20/0x60 pci_device_remove+0x3e/0xc0 device_release_driver_internal+0x18c/0x250 device_release_driver+0x12/0x20 pci_stop_bus_device+0x6f/0x90 pci_stop_bus_device+0x31/0x90 pci_stop_and_remove_bus_device+0x12/0x20 pciehp_unconfigure_device+0x88/0x140 pciehp_disable_slot+0x6a/0x110 pciehp_handle_presence_or_link_change+0x263/0x400 pciehp_ist+0x1c9/0x1d0 irq_thread_fn+0x24/0x60 irq_thread+0xeb/0x190 kthread+0x120/0x140 INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds. irq/190-pciehp D 0 2288 2 0x80000000 Call Trace: __schedule+0x2a2/0x880 schedule+0x2c/0x80 schedule_preempt_disabled+0xe/0x10 mutex_lock+0x2c/0x30 pci_lock_rescan_remove+0x15/0x20 pciehp_unconfigure_device+0x4d/0x140 pciehp_disable_slot+0x6a/0x110 pciehp_handle_presence_or_link_change+0x263/0x400 pciehp_ist+0x1c9/0x1d0 irq_thread_fn+0x24/0x60 irq_thread+0xeb/0x190 kthread+0x120/0x140 What happens here is that the whole hierarchy is runtime resumed and the parent PCIe downstream port, which got the hot-remove event, starts removing devices below it, taking pci_lock_rescan_remove() lock. When the child PCIe port is runtime resumed it calls pciehp_check_presence() which ends up calling pciehp_card_present() and pciehp_check_link_active(). Both of these use pcie_capability_read_word(), which notices that the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND with the capability value set to 0. When pciehp gets this value it thinks that its child device is also hot-removed and schedules its IRQ thread to handle the event. The deadlock happens when the child's IRQ thread runs and tries to acquire pci_lock_rescan_remove() which is already taken by the parent and the parent waits for the child's IRQ thread to finish. Prevent this from happening by checking the return value of pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop performing any hot-removal activities. [bhelgaas: add common scenarios to commit log] Link: https://lore.kernel.org/r/20191029170022.57528-2-mika.westerberg@linux.intel.com Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-11-12PCI: pciehp: Do not disable interrupt twice on suspendMika Westerberg
We try to keep PCIe hotplug ports runtime suspended when entering system suspend. Because the PCIe portdrv sets the DPM_FLAG_NEVER_SKIP flag, the PM core always calls system suspend/resume hooks even if the device is left runtime suspended. Since PCIe hotplug driver re-used the same function for both runtime suspend and system suspend, it ended up disabling hotplug interrupt twice and the second time following was printed: pciehp 0000:03:01.0:pcie204: pcie_do_write_cmd: no response from device Prevent this from happening by checking whether the device is already runtime suspended when the system suspend hook is called. Fixes: 9c62f0bfb832 ("PCI: pciehp: Implement runtime PM callbacks") Link: https://lore.kernel.org/r/20191029170022.57528-1-mika.westerberg@linux.intel.com Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-11PCI: pciehp: Refactor infinite loop in pcie_poll_cmd()Andy Shevchenko
Infinite timeout loops are hard to read. Refactor it to plausible 'do {} while ()'. Note, the supplied timeout can't be negative for current use, though if it's not dividable to 10, we may go below 0, that's why type of the parameter is int. And thus, we may move the check to the loop condition. No functional change intended. Link: https://lore.kernel.org/r/20191108111855.85866-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com>
2019-11-11PCI: Apply Cavium ACS quirk to ThunderX2 and ThunderX3George Cherian
Enhance the ACS quirk for Cavium Processors. Add the root port vendor IDs for ThunderX2 and ThunderX3 series of processors. [bhelgaas: add Fixes: and stable tag] Fixes: f2ddaf8dfd4a ("PCI: Apply Cavium ThunderX ACS quirk to more Root Ports") Link: https://lore.kernel.org/r/20191111024243.GA11408@dc5-eodlnx05.marvell.com Signed-off-by: George Cherian <george.cherian@marvell.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Robert Richter <rrichter@marvell.com> Cc: stable@vger.kernel.org # v4.12+
2019-10-25PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER controlOlof Johansson
Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"), Linux handled DPC events regardless of whether firmware had granted it ownership of AER or DPC, e.g., via _OSC. PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to control of AER, so after eed85ff4c0da7, Linux handles DPC events only if it has control of AER. On platforms that do not grant OS control of AER via _OSC, Linux DPC handling worked before eed85ff4c0da7 but not after. To make Linux DPC handling work on those platforms the same way they did before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux handle DPC events regardless of whether it has control of AER. [bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/] Link: https://lore.kernel.org/r/20191023192205.97024-1-olof@lixom.net Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-23PCI: Warn if no host bridge NUMA node infoYunsheng Lin
In pci_call_probe(), we try to run driver probe functions on the node where the device is attached. If we don't know which node the device is attached to, the driver will likely run on the wrong node. This will still work, but performance will not be as good as it could be. On NUMA systems, warn if we don't know which node a PCI host bridge is attached to. This is likely an indication that ACPI didn't supply a _PXM method or the DT didn't supply a "numa-node-id" property. [bhelgaas: commit log, check bus node] Link: https://lore.kernel.org/r/1571467543-26125-1-git-send-email-linyunsheng@huawei.com Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-23PCI: Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parametersNicholas Johnson
The existing "pci=hpmemsize=nn[KMG]" kernel parameter overrides the default size of both the non-prefetchable and the prefetchable MMIO windows for hotplug bridges. Add "pci=hpmmiosize=nn[KMG]" to override the default size of only the non-prefetchable MMIO window. Add "pci=hpmmioprefsize=nn[KMG]" to override the default size of only the prefetchable MMIO window. Link: https://lore.kernel.org/r/SL2P216MB0187E4D0055791957B7E2660806B0@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-10-20Linux 5.4-rc4Linus Torvalds
2019-10-20Merge tag 'kbuild-fixes-v5.4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild fixes from Masahiro Yamada: - fix a bashism of setlocalversion - do not use the too new --sort option of tar * tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kheaders: substituting --sort in archive creation scripts: setlocalversion: fix a bashism kbuild: update comment about KBUILD_ALLDIRS
2019-10-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Prevent a NULL pointer dereference in the X2APIC code in case of a CPU hotplug failure. - Prevent boot failures on HP superdome machines by invalidating the level2 kernel pagetable entries outside of the kernel area as invalid so BIOS reserved space won't be touched unintentionally. Also ensure that memory holes are rounded up to the next PMD boundary correctly. - Enable X2APIC support on Hyper-V to prevent boot failures. - Set the paravirt name when running on Hyper-V for consistency - Move a function under the appropriate ifdef guard to prevent build warnings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard x86/hyperv: Set pv_info.name to "Hyper-V" x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu x86/hyperv: Make vapic support x2apic mode x86/boot/64: Round memory hole size up to next PMD page x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
2019-10-20Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A small set of irq chip driver fixes and updates: - Update the SIFIVE PLIC interrupt driver to use the fasteoi handler to address the shortcomings of the existing flow handling which was prone to lose interrupts - Use the proper limit for GIC interrupt line numbers - Add retrigger support for the recently merged Anapurna Labs Fabric interrupt controller to make it complete - Enable the ATMEL AIC5 interrupt controller driver on the new SAM9X60 SoC" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/sifive-plic: Switch to fasteoi flow irqchip/gic-v3: Fix GIC_LINE_NR accessor irqchip/atmel-aic5: Add support for sam9x60 irqchip irqchip/al-fic: Add support for irq retrigger
2019-10-20Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull hrtimer fixlet from Thomas Gleixner: "A single commit annotating the lockcless access to timer->base with READ_ONCE() and adding the WRITE_ONCE() counterparts for completeness" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Annotate lockless access to timer->base
2019-10-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull stop-machine fix from Thomas Gleixner: "A single fix, amending stop machine with WRITE/READ_ONCE() to address the fallout of KCSAN" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stop_machine: Avoid potential race behaviour
2019-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "I was battling a cold after some recent trips, so quite a bit piled up meanwhile, sorry about that. Highlights: 1) Fix fd leak in various bpf selftests, from Brian Vazquez. 2) Fix crash in xsk when device doesn't support some methods, from Magnus Karlsson. 3) Fix various leaks and use-after-free in rxrpc, from David Howells. 4) Fix several SKB leaks due to confusion of who owns an SKB and who should release it in the llc code. From Eric Biggers. 5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet. 6) Jumbo packets don't work after resume on r8169, as the BIOS resets the chip into non-jumbo mode during suspend. From Heiner Kallweit. 7) Corrupt L2 header during MPLS push, from Davide Caratti. 8) Prevent possible infinite loop in tc_ctl_action, from Eric Dumazet. 9) Get register bits right in bcmgenet driver, based upon chip version. From Florian Fainelli. 10) Fix mutex problems in microchip DSA driver, from Marek Vasut. 11) Cure race between route lookup and invalidation in ipv4, from Wei Wang. 12) Fix performance regression due to false sharing in 'net' structure, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits) net: reorder 'struct net' fields to avoid false sharing net: dsa: fix switch tree list net: ethernet: dwmac-sun8i: show message only when switching to promisc net: aquantia: add an error handling in aq_nic_set_multicast_list net: netem: correct the parent's backlog when corrupted packet was dropped net: netem: fix error path for corrupted GSO frames macb: propagate errors when getting optional clocks xen/netback: fix error path of xenvif_connect_data() net: hns3: fix mis-counting IRQ vector numbers issue net: usb: lan78xx: Connect PHY before registering MAC vsock/virtio: discard packets if credit is not respected vsock/virtio: send a credit update when buffer size is changed mlxsw: spectrum_trap: Push Ethernet header before reporting trap net: ensure correct skb->tstamp in various fragmenters net: bcmgenet: reset 40nm EPHY on energy detect net: bcmgenet: soft reset 40nm EPHYs before MAC init net: phy: bcm7xxx: define soft_reset for 40nm EPHY net: bcmgenet: don't set phydev->link from MAC net: Update address for MediaTek ethernet driver in MAINTAINERS ipv4: fix race condition between route lookup and invalidation ...
2019-10-19net: reorder 'struct net' fields to avoid false sharingEric Dumazet
Intel test robot reported a ~7% regression on TCP_CRR tests that they bisected to the cited commit. Indeed, every time a new TCP socket is created or deleted, the atomic counter net->count is touched (via get_net(net) and put_net(net) calls) So cpus might have to reload a contended cache line in net_hash_mix(net) calls. We need to reorder 'struct net' fields to move @hash_mix in a read mostly cache line. We move in the first cache line fields that can be dirtied often. We probably will have to address in a followup patch the __randomize_layout that was added in linux-4.13, since this might break our placement choices. Fixes: 355b98553789 ("netns: provide pure entropy for net_hash_mix()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: dsa: fix switch tree listVivien Didelot
If there are multiple switch trees on the device, only the last one will be listed, because the arguments of list_add_tail are swapped. Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: ethernet: dwmac-sun8i: show message only when switching to promiscMans Rullgard
Printing the info message every time more than the max number of mac addresses are requested generates unnecessary log spam. Showing it only when the hw is not already in promiscous mode is equally informative without being annoying. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: aquantia: add an error handling in aq_nic_set_multicast_listChenwandun
add an error handling in aq_nic_set_multicast_list, it may not work when hw_multicast_list_set error; and at the same time it will remove gcc Wunused-but-set-variable warning. Signed-off-by: Chenwandun <chenwandun@huawei.com> Reviewed-by: Igor Russkikh <igor.russkikh@aquantia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19Merge branch 'netem-fix-further-issues-with-packet-corruption'David S. Miller
Jakub Kicinski says: ==================== net: netem: fix further issues with packet corruption This set is fixing two more issues with the netem packet corruption. First patch (which was previously posted) avoids NULL pointer dereference if the first frame gets freed due to allocation or checksum failure. v2 improves the clarity of the code a little as requested by Cong. Second patch ensures we don't return SUCCESS if the frame was in fact dropped. Thanks to this commit message for patch 1 no longer needs the "this will still break with a single-frame failure" disclaimer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: netem: correct the parent's backlog when corrupted packet was droppedJakub Kicinski
If packet corruption failed we jump to finish_segs and return NET_XMIT_SUCCESS. Seeing success will make the parent qdisc increment its backlog, that's incorrect - we need to return NET_XMIT_DROP. Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: netem: fix error path for corrupted GSO framesJakub Kicinski
To corrupt a GSO frame we first perform segmentation. We then proceed using the first segment instead of the full GSO skb and requeue the rest of the segments as separate packets. If there are any issues with processing the first segment we still want to process the rest, therefore we jump to the finish_segs label. Commit 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames") started using the pointer to the first segment in the "rest of segments processing", but as mentioned above the first segment may had already been freed at this point. Backlog corrections for parent qdiscs have to be adjusted. Fixes: 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19macb: propagate errors when getting optional clocksMichael Tretter
The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver marks clock as not available if it receives an error when trying to get a clock. This is wrong, because a clock controller might return -EPROBE_DEFER if a clock is not available, but will eventually become available. In these cases, the driver would probe successfully but will never be able to adjust the clocks, because the clocks were not available during probe, but became available later. For example, the clock controller for the ZynqMP is implemented in the PMU firmware and the clocks are only available after the firmware driver has been probed. Use devm_clk_get_optional() in instead of devm_clk_get() to get the optional clock and propagate all errors to the calling function. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19xen/netback: fix error path of xenvif_connect_data()Juergen Gross
xenvif_connect_data() calls module_put() in case of error. This is wrong as there is no related module_get(). Remove the superfluous module_put(). Fixes: 279f438e36c0a7 ("xen-netback: Don't destroy the netdev until the vif is shut down") Cc: <stable@vger.kernel.org> # 3.12 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: hns3: fix mis-counting IRQ vector numbers issueYonglong Liu
Currently, the num_msi_left means the vector numbers of NIC, but if the PF supported RoCE, it contains the vector numbers of NIC and RoCE(Not expected). This may cause interrupts lost in some case, because of the NIC module used the vector resources which belongs to RoCE. This patch adds a new variable num_nic_msi to store the vector numbers of NIC, and adjust the default TQP numbers and rss_size according to the value of num_nic_msi. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Rather a lot of fixes, almost all affecting mm/" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits) scripts/gdb: fix debugging modules on s390 kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register mm/thp: allow dropping THP from page cache mm/vmscan.c: support removing arbitrary sized pages from mapping mm/thp: fix node page state in split_huge_page_to_list() proc/meminfo: fix output alignment mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition mm: include <linux/huge_mm.h> for is_vma_temporary_stack zram: fix race between backing_dev_show and backing_dev_store mm/memcontrol: update lruvec counters in mem_cgroup_move_account ocfs2: fix panic due to ocfs2_wq is null hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic() mm: memblock: do not enforce current limit for memblock_phys* family mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size mm/gup: fix a misnamed "write" argument, and a related bug mm/gup_benchmark: add a missing "w" to getopt string ocfs2: fix error handling in ocfs2_setattr() mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release mm/memunmap: don't access uninitialized memmap in memunmap_pages() ...
2019-10-19scripts/gdb: fix debugging modules on s390Ilya Leoshkevich
Currently lx-symbols assumes that module text is always located at module->core_layout->base, but s390 uses the following layout: +------+ <- module->core_layout->base | GOT | +------+ <- module->core_layout->base + module->arch->plt_offset | PLT | +------+ <- module->core_layout->base + module->arch->plt_offset + | TEXT | module->arch->plt_size +------+ Therefore, when trying to debug modules on s390, all the symbol addresses are skewed by plt_offset + plt_size. Fix by adding plt_offset + plt_size to module_addr in load_module_symbols(). Link: http://lkml.kernel.org/r/20191017085917.81791-1-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe registerSong Liu
Attaching uprobe to text section in THP splits the PMD mapped page table into PTE mapped entries. On uprobe detach, we would like to regroup PMD mapped page table entry to regain performance benefit of THP. However, the regroup is broken For perf_event based trace_uprobe. This is because perf_event based trace_uprobe calls uprobe_unregister twice on close: first in TRACE_REG_PERF_CLOSE, then in TRACE_REG_PERF_UNREGISTER. The second call will split the PMD mapped page table entry, which is not the desired behavior. Fix this by only use FOLL_SPLIT_PMD for uprobe register case. Add a WARN() to confirm uprobe unregister never work on huge pages, and abort the operation when this WARN() triggers. Link: http://lkml.kernel.org/r/20191017164223.2762148-6-songliubraving@fb.com Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT") Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19mm/thp: allow dropping THP from page cacheKirill A. Shutemov
Once a THP is added to the page cache, it cannot be dropped via /proc/sys/vm/drop_caches. Fix this issue with proper handling in invalidate_mapping_pages(). Link: http://lkml.kernel.org/r/20191017164223.2762148-5-songliubraving@fb.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19mm/vmscan.c: support removing arbitrary sized pages from mappingWilliam Kucharski
__remove_mapping() assumes that pages can only be either base pages or HPAGE_PMD_SIZE. Ask the page what size it is. Link: http://lkml.kernel.org/r/20191017164223.2762148-4-songliubraving@fb.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19mm/thp: fix node page state in split_huge_page_to_list()Kirill A. Shutemov
Make sure split_huge_page_to_list() handles the state of shmem THP and file THP properly. Link: http://lkml.kernel.org/r/20191017164223.2762148-3-songliubraving@fb.com Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19proc/meminfo: fix output alignmentKirill A. Shutemov
Patch series "Fixes for THP in page cache", v2. This patch (of 5): Add extra space for FileHugePages and FilePmdMapped, so the output is aligned with other rows. Link: http://lkml.kernel.org/r/20191017164223.2762148-2-songliubraving@fb.com Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batchBen Dooks (Codethink)
mm_init.c needs to include <linux/mman.h> for the definition of vm_committed_as_batch. Fixes the following sparse warning: mm/mm_init.c:141:5: warning: symbol 'vm_committed_as_batch' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20191016091509.26708-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definitionBen Dooks
The generic_file_vm_ops is defined in <linux/ramfs.h> so include it to fix the following warning: mm/filemap.c:2717:35: warning: symbol 'generic_file_vm_ops' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20191008102311.25432-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-19mm: include <linux/huge_mm.h> for is_vma_temporary_stackBen Dooks
Include <linux/huge_mm.h> for the definition of is_vma_temporary_stack to fix the following sparse warning: mm/rmap.c:1673:6: warning: symbol 'is_vma_temporary_stack' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20191009151155.27763-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>