summaryrefslogtreecommitdiff
path: root/drivers/base
AgeCommit message (Collapse)Author
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-13mm: only display online cpus of the numa nodeZhen Lei
When I execute numactl -H (which reads /sys/devices/system/node/nodeX/cpumap and displays cpumask_of_node for each node), I get different result on X86 and arm64. For each numa node, the former only displayed online CPUs, and the latter displayed all possible CPUs. Unfortunately, both Linux documentation and numactl manual have not described it clear. I sent a mail to ask for help, and Michal Hocko replied that he preferred to print online cpus because it doesn't really make much sense to bind anything on offline nodes. Will said: "I suspect the vast majority (if not all) code that reads this file was developed for x86, so having the same behaviour for arm64 sounds like something we should do ASAP before people try to special case with things like #ifdef __aarch64__. I'd rather have this in 4.14 if possible." Link: http://lkml.kernel.org/r/1506678805-15392-2-git-send-email-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Libin <huawei.libin@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-11ACPI: properties: Align return codes of __acpi_node_get_property_reference()Sakari Ailus
acpi_fwnode_get_reference_args(), the function implementing ACPI support for fwnode_property_get_reference_args(), returns directly error codes from __acpi_node_get_property_reference(). The latter uses different error codes than the OF implementation. In particular, the OF implementation uses -ENOENT to indicate that the property is not found, a reference entry is empty and there are no more references. Document and align the error codes for property for fwnode_property_get_reference_args() so that they match with of_parse_phandle_with_args(). Fixes: 3e3119d3088f (device property: Introduce fwnode_property_get_reference_args) Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-10-10device property: Track owner device of device propertyJarkko Nikula
Deletion of subdevice will remove device properties associated to parent when they share the same firmware node after commit 478573c93abd (driver core: Don't leak secondary fwnode on device removal). This was observed with a driver adding subdevice that driver wasn't able to read device properties after rmmod/modprobe cycle. Consider the lifecycle of it: parent device registration ACPI_COMPANION_SET() device_add_properties() pset_copy_set() set_secondary_fwnode(dev, &p->fwnode) device_add() parent probe read device properties ACPI_COMPANION_SET(subdevice, ACPI_COMPANION(parent)) device_add(subdevice) parent remove device_del(subdevice) device_remove_properties() set_secondary_fwnode(dev, NULL); pset_free() Parent device will have its primary firmware node pointing to an ACPI node and secondary firmware node point to device properties. ACPI_COMPANION_SET() call in parent probe will set the subdevice's firmware node to point to the same 'struct fwnode_handle' and the associated secondary firmware node, i.e. the device properties as the parent. When subdevice is deleted in parent remove that will remove those device properties and attempt to read device properties in next parent probe call will fail. Fix this by tracking the owner device of device properties and delete them only when owner device is being deleted. Fixes: 478573c93abd (driver core: Don't leak secondary fwnode on device removal) Cc: 4.9+ <stable@vger.kernel.org> # 4.9+ Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-10-03Merge tag 'driver-core-4.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are a few small fixes for 4.14-rc4. The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one straggler made it in through some other tree (odds are, one of mine...) So there's a simple removal of the last user, and then finally the macro is removed from the tree. There's a fix for old crazy udev instances that insist on reloading a module when it is removed from the kernel due to the new uevents for bind/unbind. This fixes the reported regression, hopefully some year in the future we can drop the workaround, once users update to the latest version, but I'm not holding my breath. And then there's a build fix for a linker warning, and a buffer overflow fix to match the PCI fixes you took through the PCI tree in the same area. All of these have been in linux-next for a few weeks while I've been traveling, sorry for the delay" * tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: remove DRIVER_ATTR fpga: altera-cvp: remove DRIVER_ATTR() usage driver core: platform: Don't read past the end of "driver_override" buffer base: arch_topology: fix section mismatch build warnings driver core: suppress sending MODALIAS in UNBIND uevents
2017-09-26PM / OPP: Call notifier without holding opp_table->lockViresh Kumar
The notifier callbacks may want to call some OPP helper routines which may try to take the same opp_table->lock again and cause a deadlock. One such usecase was reported by Chanwoo Choi, where calling dev_pm_opp_disable() leads us to the devfreq's OPP notifier handler, which further calls dev_pm_opp_find_freq_floor() and it deadlocks. We don't really need the opp_table->lock to be held across the notifier call though, all we want to make sure is that the 'opp' doesn't get freed while being used from within the notifier chain. We can do it with help of dev_pm_opp_get/put() as well. Let's do it. Cc: 4.11+ <stable@vger.kernel.org> # 4.11+ Fixes: 5b650b388844 "PM / OPP: Take kref from _find_opp_table()" Reported-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-09-22Merge tag 'pm-4.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a cpufreq regression introduced by recent changes related to the generic DT driver, an initialization time memory leak in cpuidle on ARM, a PM core bug that may cause system suspend/resume to fail on some systems, a request type validation issue in the PM QoS framework and two documentation-related issues. Specifics: - Fix a regression in cpufreq on systems using DT as the source of CPU configuration information where two different code paths attempt to create the cpufreq-dt device object (there can be only one) and fix up the "compatible" matching for some TI platforms on top of that (Viresh Kumar, Dave Gerlach). - Fix an initialization time memory leak in cpuidle on ARM which occurs if the cpuidle driver initialization fails (Stefan Wahren). - Fix a PM core function that checks whether or not there are any system suspend/resume callbacks for a device, but forgets to check legacy callbacks which then may be skipped incorrectly and the system may crash and/or the device may become unusable after a suspend-resume cycle (Rafael Wysocki). - Fix request type validation for latency tolerance PM QoS requests which may lead to unexpected behavior (Jan Schönherr). - Fix a broken link to PM documentation from a header file and a typo in a PM document (Geert Uytterhoeven, Rafael Wysocki)" * tag 'pm-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: ti-cpufreq: Support additional am43xx platforms ARM: cpuidle: Avoid memleak if init fail cpufreq: dt-platdev: Add some missing platforms to the blacklist PM: core: Fix device_pm_check_callbacks() PM: docs: Drop an excess character from devices.rst PM / QoS: Use the correct variable to check the QoS request type driver core: Fix link to device power management documentation
2017-09-22Merge branches 'pm-core', 'pm-qos' and 'pm-docs'Rafael J. Wysocki
* pm-core: PM: core: Fix device_pm_check_callbacks() * pm-qos: PM / QoS: Use the correct variable to check the QoS request type * pm-docs: PM: docs: Drop an excess character from devices.rst driver core: Fix link to device power management documentation
2017-09-19PM: core: Fix device_pm_check_callbacks()Rafael J. Wysocki
The device_pm_check_callbacks() function doesn't check legacy ->suspend and ->resume callback pointers under the device's bus type, class and driver, so in some cases it may set the no_pm_callbacks flag for the device incorrectly and then the callbacks may be skipped during system suspend/resume, which shouldn't happen. Fixes: aa8e54b55947 (PM / sleep: Go direct_complete if driver has no callbacks) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
2017-09-18driver core: platform: Don't read past the end of "driver_override" bufferNicolai Stange
When printing the driver_override parameter when it is 4095 and 4094 bytes long, the printing code would access invalid memory because we need count+1 bytes for printing. Reject driver_override values of these lengths in driver_override_store(). This is in close analogy to commit 4efe874aace5 ("PCI: Don't read past the end of sysfs "driver_override" buffer") from Sasha Levin. Fixes: 3d713e0e382e ("driver core: platform: add device binding path 'driver_override'") Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-18base: arch_topology: fix section mismatch build warningsSudeep Holla
Commit 2ef7a2953c81 ("arm, arm64: factorize common cpu capacity default code") introduced init_cpu_capacity_callback and init_cpu_capacity_notifier which are referenced from initcall and are missing __init{,data} annotations resulting the below section mismatch build warnings. "WARNING: vmlinux.o(.text+0xbab790): Section mismatch in reference from the function init_cpu_capacity_callback() to the variable .init.text:$x The function init_cpu_capacity_callback() references the variable __init $x. This is often because init_cpu_capacity_callback lacks a __init annotation or the annotation of $x is wrong." This patch fixes the above build warnings by adding the required annotations. Fixes: 2ef7a2953c81 ("arm, arm64: factorize common cpu capacity default code") Cc: Juri Lelli <juri.lelli@arm.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-18PM / QoS: Use the correct variable to check the QoS request typeJan H. Schönherr
Use the actual function argument for the validation of the request type, instead of the type field in a fresh (supposedly zero-initialized) request structure. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-09-17dma-coherent: fix rmem_dma_device_init regressionArnd Bergmann
My recent bug fix introduced another bug, which caused rmem_dma_device_init to always fail, as rmem->priv is never set to anything. This restores the previous behavior, calling dma_init_coherent_memory() whenever ->priv is NULL. Fixes: d35b0996fef3 ("dma-coherent: fix dma_declare_coherent_memory() logic error") Reported-by: Roy Pledge <roy.pledge@nxp.com> Tested-by: Roy Pledge <roy.pledge@nxp.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-09-16firmware: Restore support for built-in firmwareMarkus Trippelsdorf
Commit 5620a0d1aac ("firmware: delete in-kernel firmware") removed the entire firmware directory. Unfortunately it thereby also removed the support for built-in firmware. This restores the ability to build firmware directly into the kernel by pruning the original Makefile to the necessary minimum. The default for EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/. Fixes: 5620a0d1aac ("firmware: delete in-kernel firmware") Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Acked-by: Greg K-H <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-12Merge tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping updates from Christoph Hellwig: - removal of the old dma_alloc_noncoherent interface - remove unused flags to dma_declare_coherent_memory - restrict OF DMA configuration to specific physical busses - use the iommu mailing list for dma-mapping questions and patches * tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping: dma-coherent: fix dma_declare_coherent_memory() logic error ARM: imx: mx31moboard: Remove unused 'dma' variable dma-coherent: remove an unused variable MAINTAINERS: use the iommu list for the dma-mapping subsystem dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags dma-coherent: remove the DMA_MEMORY_INCLUDES_CHILDREN flag of: restrict DMA configuration dma-mapping: remove dma_alloc_noncoherent and dma_free_noncoherent i825xx: switch to switch to dma_alloc_attrs au1000_eth: switch to dma_alloc_attrs sgiseeq: switch to dma_alloc_attrs dma-mapping: reduce dma_mapping_error inline bloat
2017-09-10Revert "firmware: add sanity check on shutdown/suspend"Linus Torvalds
This reverts commit 81f95076281fdd3bc382e004ba1bce8e82fccbce. It causes random failures of firmware loading at resume time (well, random for me, it seems to be more reliable for others) because the firmware disabling is not actually synchronous with any particular resume event, and at least the btusb driver that uses a workqueue to load the firmware at resume seems to occasionally hit the "firmware loading is disabled" logic because the firmware loader hasn't gotten the resume event yet. Some kind of sanity check for not trying to load firmware when it's not possible might be a good thing, but this commit was not it. Greg seems to have silently suffered the same issue, and pointed to the likely culprit, and Gabriel C verified the revert fixed it for him too. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Pointed-at-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Gabriel C <nix.or.die@gmail.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08treewide: make "nr_cpu_ids" unsignedAlexey Dobriyan
First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08mm: change the call sites of numa statistics itemsKemi Wang
Patch series "Separate NUMA statistics from zone statistics", v2. Each page allocation updates a set of per-zone statistics with a call to zone_statistics(). As discussed in 2017 MM summit, these are a substantial source of overhead in the page allocator and are very rarely consumed. This significant overhead in cache bouncing caused by zone counters (NUMA associated counters) update in parallel in multi-threaded page allocation (pointed out by Dave Hansen). A link to the MM summit slides: http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf To mitigate this overhead, this patchset separates NUMA statistics from zone statistics framework, and update NUMA counter threshold to a fixed size of MAX_U16 - 2, as a small threshold greatly increases the update frequency of the global counter from local per cpu counter (suggested by Ying Huang). The rationality is that these statistics counters don't need to be read often, unlike other VM counters, so it's not a problem to use a large threshold and make readers more expensive. With this patchset, we see 31.3% drop of CPU cycles(537-->369, see below) for per single page allocation and reclaim on Jesper's page_bench03 benchmark. Meanwhile, this patchset keeps the same style of virtual memory statistics with little end-user-visible effects (only move the numa stats to show behind zone page stats, see the first patch for details). I did an experiment of single page allocation and reclaim concurrently using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based server (88 processors with 126G memory) with different size of threshold of pcp counter. Benchmark provided by Jesper D Brouer(increase loop times to 10000000): https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench Threshold CPU cycles Throughput(88 threads) 32 799 241760478 64 640 301628829 125 537 358906028 <==> system by default 256 468 412397590 512 428 450550704 4096 399 482520943 20000 394 489009617 30000 395 488017817 65533 369(-31.3%) 521661345(+45.3%) <==> with this patchset N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics This patch (of 3): In this patch, NUMA statistics is separated from zone statistics framework, all the call sites of NUMA stats are changed to use numa-stats-specific functions, it does not have any functionality change except that the number of NUMA stats is shown behind zone page stats when users *read* the zone info. E.g. cat /proc/zoneinfo ***Base*** ***With this patch*** nr_free_pages 3976 nr_free_pages 3976 nr_zone_inactive_anon 0 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 0 nr_zone_inactive_file 0 nr_zone_active_file 0 nr_zone_active_file 0 nr_zone_unevictable 0 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_zone_write_pending 0 nr_mlock 0 nr_mlock 0 nr_page_table_pages 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_kernel_stack 0 nr_bounce 0 nr_bounce 0 nr_zspages 0 nr_zspages 0 numa_hit 0 *nr_free_cma 0* numa_miss 0 numa_hit 0 numa_foreign 0 numa_miss 0 numa_interleave 0 numa_foreign 0 numa_local 0 numa_interleave 0 numa_other 0 numa_local 0 *nr_free_cma 0* numa_other 0 ... ... vm stats threshold: 10 vm stats threshold: 10 ... ... The next patch updates the numa stats counter size and threshold. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com Signed-off-by: Kemi Wang <kemi.wang@intel.com> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Ying Huang <ying.huang@intel.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06mm, memory_hotplug: remove zone restrictionsMichal Hocko
Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has to precede the Movable zone in the physical memory range. The purpose of the movable zone is, however, not bound to any physical memory restriction. It merely defines a class of migrateable and reclaimable memory. There are users (e.g. CMA) who might want to reserve specific physical memory ranges for their own purpose. Moreover our pfn walkers have to be prepared for zones overlapping in the physical range already because we do support interleaving NUMA nodes and therefore zones can interleave as well. This means we can allow each memory block to be associated with a different zone. Loosen the current onlining semantic and allow explicit onlining type on any memblock. That means that online_{kernel,movable} will be allowed regardless of the physical address of the memblock as long as it is offline of course. This might result in moveble zone overlapping with other kernel zones. Default onlining then becomes a bit tricky but still sensible. echo online > memoryXY/state will online the given block to 1) the default zone if the given range is outside of any zone 2) the enclosing zone if such a zone doesn't interleave with any other zone 3) the default zone if more zones interleave for this range where default zone is movable zone only if movable_node is enabled otherwise it is a kernel zone. Here is an example of the semantic with (movable_node is not present but it work in an analogous way). We start with following memblocks, all of them offline: memory34/valid_zones:Normal Movable memory35/valid_zones:Normal Movable memory36/valid_zones:Normal Movable memory37/valid_zones:Normal Movable memory38/valid_zones:Normal Movable memory39/valid_zones:Normal Movable memory40/valid_zones:Normal Movable memory41/valid_zones:Normal Movable Now, we online block 34 in default mode and block 37 as movable root@test1:/sys/devices/system/node/node1# echo online > memory34/state root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state memory34/valid_zones:Normal memory35/valid_zones:Normal Movable memory36/valid_zones:Normal Movable memory37/valid_zones:Movable memory38/valid_zones:Normal Movable memory39/valid_zones:Normal Movable memory40/valid_zones:Normal Movable memory41/valid_zones:Normal Movable As we can see all other blocks can still be onlined both into Normal and Movable zones and the Normal is default because the Movable zone spans only block37 now. root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state memory34/valid_zones:Normal memory35/valid_zones:Normal Movable memory36/valid_zones:Normal Movable memory37/valid_zones:Movable memory38/valid_zones:Movable Normal memory39/valid_zones:Movable Normal memory40/valid_zones:Movable Normal memory41/valid_zones:Movable Now the default zone for blocks 37-41 has changed because movable zone spans that range. root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state memory34/valid_zones:Normal memory35/valid_zones:Normal Movable memory36/valid_zones:Normal Movable memory37/valid_zones:Movable memory38/valid_zones:Normal Movable memory39/valid_zones:Normal memory40/valid_zones:Movable Normal memory41/valid_zones:Movable Note that the block 39 now belongs to the zone Normal and so block38 falls into Normal by default as well. For completness root@test1:/sys/devices/system/node/node1# for i in memory[34]? do echo online > $i/state 2>/dev/null done memory34/valid_zones:Normal memory35/valid_zones:Normal memory36/valid_zones:Normal memory37/valid_zones:Movable memory38/valid_zones:Normal memory39/valid_zones:Normal memory40/valid_zones:Movable memory41/valid_zones:Movable Implementation wise the change is quite straightforward. We can get rid of allow_online_pfn_range altogether. online_pages allows only offline nodes already. The original default_zone_for_pfn will become default_kernel_zone_for_pfn. New default_zone_for_pfn implements the above semantic. zone_for_pfn_range is slightly reorganized to implement kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes a catch all default behavior. Link: http://lkml.kernel.org/r/20170714121233.16861-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kani Toshimitsu <toshi.kani@hpe.com> Cc: <slaoub@gmail.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06mm, memory_hotplug: display allowed zones in the preferred orderingMichal Hocko
Prior to commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") we used to allow to change the valid zone types of a memory block if it is adjacent to a different zone type. This fact was reflected in memoryNN/valid_zones by the ordering of printed zones. The first one was default (echo online > memoryNN/state) and the other one could be onlined explicitly by online_{movable,kernel}. This behavior was removed by the said patch and as such the ordering was not all that important. In most cases a kernel zone would be default anyway. The only exception is movable_node handled by "mm, memory_hotplug: support movable_node for hotpluggable nodes". Let's reintroduce this behavior again because later patch will remove the zone overlap restriction and so user will be allowed to online kernel resp. movable block regardless of its placement. Original behavior will then become significant again because it would be non-trivial for users to see what is the default zone to online into. Implementation is really simple. Pull out zone selection out of move_pfn_range into zone_for_pfn_range helper and use it in show_valid_zones to display the zone for default onlining and then both kernel and movable if they are allowed. Default online zone is not duplicated. Link: http://lkml.kernel.org/r/20170714121233.16861-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kani Toshimitsu <toshi.kani@hpe.com> Cc: <slaoub@gmail.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-05Merge tag 'devprop-4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework updates from Rafael Wysocki: "These introduce fwnode operations for all of the separate types of 'firmware nodes' that can be handled by the device properties framework, make the framework use const fwnode arguments all over, add a helper for the consolidated handling of node references and switch over the framework to the new UUID API. Specifics: - Introduce fwnode operations for all of the separate types of 'firmware nodes' that can be handled by the device properties framework and drop the type field from struct fwnode_handle (Sakari Ailus, Arnd Bergmann). - Make the device properties framework use const fwnode arguments where possible (Sakari Ailus). - Add a helper for the consolidated handling of node references to the device properties framework (Sakari Ailus). - Switch over the ACPI part of the device properties framework to the new UUID API (Andy Shevchenko)" * tag 'devprop-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: device property: Switch to use new generic UUID API device property: export irqchip_fwnode_ops device property: Introduce fwnode_property_get_reference_args device property: Constify fwnode property API device property: Constify argument to pset fwnode backend ACPI: Constify internal fwnode arguments ACPI: Constify acpi_bus helper functions, switch to macros ACPI: Prepare for constifying acpi_get_next_subnode() fwnode argument device property: Get rid of struct fwnode_handle type field ACPI: Use IS_ERR_OR_NULL() instead of non-NULL check in is_acpi_data_node()
2017-09-05Merge tag 'pm-4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "This time (again) cpufreq gets the majority of changes which mostly are driver updates (including a major consolidation of intel_pstate), some schedutil governor modifications and core cleanups. There also are some changes in the system suspend area, mostly related to diagnostics and debug messages plus some renames of things related to suspend-to-idle. One major change here is that suspend-to-idle is now going to be preferred over S3 on systems where the ACPI tables indicate to do so and provide requsite support (the Low Power Idle S0 _DSM in particular). The system sleep documentation and the tools related to it are updated too. The rest is a few cpuidle changes (nothing major), devfreq updates, generic power domains (genpd) framework updates and a few assorted modifications elsewhere. Specifics: - Drop the P-state selection algorithm based on a PID controller from intel_pstate and make it use the same P-state selection method (based on the CPU load) for all types of systems in the active mode (Rafael Wysocki, Srinivas Pandruvada). - Rework the cpufreq core and governors to make it possible to take cross-CPU utilization updates into account and modify the schedutil governor to actually do so (Viresh Kumar). - Clean up the handling of transition latency information in the cpufreq core and untangle it from the information on which drivers cannot do dynamic frequency switching (Viresh Kumar). - Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek cpufreq driver and update its DT bindings (Sean Wang). - Modify the cpufreq dt-platdev driver to autimatically create cpufreq devices for the new (v2) Operating Performance Points (OPP) DT bindings and update its whitelist of supported systems (Viresh Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley Xiao). - Add support for Ux500 to the cpufreq-dt driver and drop the obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann). - Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem Nguyen). - Fix and clean up assorted issues in the cpufreq drivers and core (Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva, Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla). - Update the IO-wait boost handling in the schedutil governor to make it less aggressive (Joel Fernandes). - Rework system suspend diagnostics to make it print fewer messages to the kernel log by default, add a sysfs knob to allow more suspend-related messages to be printed and add Low Power S0 Idle constraints checks to the ACPI suspend-to-idle code (Rafael Wysocki, Srinivas Pandruvada). - Prefer suspend-to-idle over S3 on ACPI-based systems with the ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM interface present in the ACPI tables (Rafael Wysocki). - Update documentation related to system sleep and rename a number of items in the code to make it cleare that they are related to suspend-to-idle (Rafael Wysocki). - Export a variable allowing device drivers to check the target system sleep state from the core system suspend code (Florian Fainelli). - Clean up the cpuidle subsystem to handle the polling state on x86 in a more straightforward way and to use %pOF instead of full_name (Rafael Wysocki, Rob Herring). - Update the devfreq framework to fix and clean up a few minor issues (Chanwoo Choi, Rob Herring). - Extend diagnostics in the generic power domains (genpd) framework and clean it up slightly (Thara Gopinath, Rob Herring). - Fix and clean up a couple of issues in the operating performance points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz). - Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling (AVS) driver (David Wu). - Fix the usage of notifiers in CPU power management on some platforms (Alex Shi). - Update the pm-graph system suspend/hibernation and boot profiling utility (Todd Brandt). - Make it possible to run the cpupower utility without CPU0 (Prarit Bhargava)" * tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits) cpuidle: Make drivers initialize polling state cpuidle: Move polling state initialization code to separate file cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol cpufreq: imx6q: Fix imx6sx low frequency support cpufreq: speedstep-lib: make several arrays static, makes code smaller PM: docs: Delete the obsolete states.txt document PM: docs: Describe high-level PM strategies and sleep states PM / devfreq: Fix memory leak when fail to register device PM / devfreq: Add dependency on PM_OPP PM / devfreq: Move private devfreq_update_stats() into devfreq PM / devfreq: Convert to using %pOF instead of full_name PM / AVS: rockchip-io: add io selectors and supplies for RV1108 cpufreq: ti: Fix 'of_node_put' being called twice in error handling path cpufreq: dt-platdev: Drop few entries from whitelist cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2 ARM: ux500: don't select CPUFREQ_DT cpuidle: Convert to using %pOF instead of full_name cpufreq: Convert to using %pOF instead of full_name PM / Domains: Convert to using %pOF instead of full_name cpufreq: Cap the default transition delay value to 10 ms ...
2017-09-05dma-coherent: fix dma_declare_coherent_memory() logic errorArnd Bergmann
A recent change interprets the return code of dma_init_coherent_memory as an error value, but it is instead a boolean, where 'true' indicates success. This leads causes the caller to always do the wrong thing, and also triggers a compile-time warning about it: drivers/base/dma-coherent.c: In function 'dma_declare_coherent_memory': drivers/base/dma-coherent.c:99:15: error: 'mem' may be used uninitialized in this function [-Werror=maybe-uninitialized] I ended up changing the code a little more, to give use the usual error handling, as this seemed the best way to fix up the warning and make the code look reasonable at the same time. Fixes: 2436bdcda53f ("dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-09-04dma-coherent: remove an unused variableChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
2017-09-04Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: ACPI / PM: Check low power idle constraints for debug only PM / s2idle: Rename platform operations structure PM / s2idle: Rename ->enter_freeze to ->enter_s2idle PM / s2idle: Rename freeze_state enum and related items PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE ACPI / PM: Prefer suspend-to-idle over S3 on some systems platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle PM / suspend: Define pr_fmt() in suspend.c PM / suspend: Use mem_sleep_labels[] strings in messages PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq() PM / core: Add error argument to dpm_show_time() PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq() PM / s2idle: Rearrange the main suspend-to-idle loop PM / timekeeping: Print debug messages when requested PM / sleep: Mark suspend/hibernation start and finish PM / sleep: Do not print debug messages by default PM / suspend: Export pm_suspend_target_state
2017-09-04Merge branches 'pm-core', 'pm-opp', 'pm-domains', 'pm-cpu' and 'pm-avs'Rafael J. Wysocki
* pm-core: PM / wakeup: Set power.can_wakeup if wakeup_sysfs_add() fails * pm-opp: PM / OPP: Fix get sharing CPUs when hotplug is used PM / OPP: OF: Use pr_debug() instead of pr_err() while adding OPP table * pm-domains: PM / Domains: Convert to using %pOF instead of full_name PM / Domains: Extend generic power domain debugfs PM / Domains: Add time accounting to various genpd states * pm-cpu: PM / CPU: replace raw_notifier with atomic_notifier * pm-avs: PM / AVS: rockchip-io: add io selectors and supplies for RV1108
2017-09-01dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flagsChristoph Hellwig
DMA_MEMORY_IO was never used in the tree, so remove it. That means there is no need for the DMA_MEMORY_MAP flag either now, so remove it as well and change dma_declare_coherent_memory to return a normal errno value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com>
2017-09-01dma-coherent: remove the DMA_MEMORY_INCLUDES_CHILDREN flagChristoph Hellwig
This flag was never implemented or used. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2017-08-31driver core: bus: Fix a potential double freeChristophe JAILLET
The .release function of driver_ktype is 'driver_release()'. This function frees the container_of this kobject. So, this memory must not be freed explicitly in the error handling path of 'bus_add_driver()'. Otherwise a double free will occur. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28Do not disable driver and bus shutdown hook when class shutdown hook is set.Michal Suchanek
As seen from the implementation of the single class shutdown hook this is not very sound design. Rename the class shutdown hook to shutdown_pre to make it clear it runs before the driver shutdown hook. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28base: topology: constify attribute_group structures.Arvind Yadav
attribute_group are not supposed to change at runtime. All functions working with attribute_group provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28base: Convert to using %pOF instead of full_nameRob Herring
Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each node. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-25PM / Domains: Convert to using %pOF instead of full_nameRob Herring
Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each node. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-14Merge 4.13-rc5 into driver-core-nextGreg Kroah-Hartman
We want the fixes in here as well for testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-13Merge tag 'driver-core-4.13-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are three firmware core fixes for 4.13-rc5. All three of these fix reported issues and have been floating around for a few weeks. They have been in linux-next with no reported problems" * tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: firmware: avoid invalid fallback aborts by using killable wait firmware: fix batched requests - send wake up on failure on direct lookups firmware: fix batched requests - wake all waiters
2017-08-11PM / s2idle: Rename freeze_state enum and related itemsRafael J. Wysocki
Rename the freeze_state enum representing the suspend-to-idle state machine states to s2idle_states and rename the related variables and functions accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-10firmware: enable a debug print for batched requestsLuis R. Rodriguez
Otherwise there is no easy way this actually happened. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-10firmware: define pr_fmtLuis R. Rodriguez
For some reason we have always forgotten this. Without this we don't get a nice prefix on our pr_debug() / pr_*() messages. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-10firmware: send -EINTR on signal abort on fallback mechanismLuis R. Rodriguez
Right now we send -EAGAIN to a syfs write which got interrupted. Userspace can't tell what happened though, send -EINTR if we were killed due to a signal so userspace can tell things apart. This is only applicable to the fallback mechanism. Reported-by: Martin Fuzzey <mfuzzey@parkeon.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-10firmware: avoid invalid fallback aborts by using killable waitLuis R. Rodriguez
Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion is interrupted") added via 4.0 added support to abort the fallback mechanism when a signal was detected and wait_for_completion_interruptible() returned -ERESTARTSYS -- for instance when a user hits CTRL-C. The abort was overly *too* effective. When a child process terminates (successful or not) the signal SIGCHLD can be sent to the parent process which ran the child in the background and later triggered a sync request for firmware through a sysfs interface which relies on the fallback mechanism. This signal in turn can be recieved by the interruptible wait we constructed on firmware_class and detects it as an abort *before* userspace could get a chance to write the firmware. Upon failure -EAGAIN is returned, so userspace is also kept in the dark about exactly what happened. We can reproduce the issue with the fw_fallback.sh selftest: Before this patch: $ sudo tools/testing/selftests/firmware/fw_fallback.sh ... tools/testing/selftests/firmware/fw_fallback.sh: error - sync firmware request cancelled due to SIGCHLD After this patch: $ sudo tools/testing/selftests/firmware/fw_fallback.sh ... tools/testing/selftests/firmware/fw_fallback.sh: SIGCHLD on sync ignored as expected Fix this by making the wait killable -- only killable by SIGKILL (kill -9). We loose the ability to allow userspace to cancel a write with CTRL-C (SIGINT), however its been decided the compromise to require SIGKILL is worth the gains. Chances of this issue occuring are low due to the number of drivers upstream exclusively relying on the fallback mechanism for firmware (2 drivers), however this is observed in the field with custom drivers with sysfs triggers to load firmware. Only distributions relying on the fallback mechanism are impacted as well. An example reported issue was on Android, as follows: 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally unrelated to firmware loading, it could be sleep 2; for all we care ] 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which ends up calling request_firmware() kernel side 3) The firmware loading fallback mechanism is used, the request is sent to userspace and pid 1 waits in the kernel on wait_* 4) before firmware loading completes pid 42 dies (for any reason, even normal termination) 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which causes -ERESTARTSYS to be returned from wait_* 6) The kernel's wait aborts and return -EAGAIN for the request_firmware() caller. Cc: stable <stable@vger.kernel.org> # 4.0 Fixes: 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion is interrupted") Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Martin Fuzzey <mfuzzey@parkeon.com> Reported-by: Martin Fuzzey <mfuzzey@parkeon.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-10firmware: fix batched requests - send wake up on failure on direct lookupsLuis R. Rodriguez
Fix batched requests from waiting forever on failure. The firmware API batched requests feature has been broken since the API call request_firmware_direct() was introduced on commit bba3a87e982ad ("firmware: Introduce request_firmware_direct()"), added on v3.14 *iff* the firmware being requested was not present in *certain kernel builds* [0]. When no firmware is found the worker which goes on to finish never informs waiters queued up of this, so any batched request will stall in what seems to be forever (MAX_SCHEDULE_TIMEOUT). Sadly, a reboot will also stall, as the reboot notifier was only designed to kill custom fallback workers. The issue seems to the user as a type of soft lockup, what *actually* happens underneath the hood is a wait call which never completes as we failed to issue a completion on error. For device drivers with optional firmware schemes (ie, Intel iwlwifi, or Netronome -- even though it uses request_firmware() and not request_firmware_direct()), this could mean that when you boot a system with multiple cards the firmware will seem to never load on the system, or that the card is just not responsive even the driver initialization. Due to differences in scheduling possible this should not always trigger -- one would need to to ensure that multiple requests are in place at the right time for this to work, also release_firmware() must not be called prior to any other incoming request. The complexity may not be worth supporting batched requests in the future given the wait mechanism is only used also for the fallback mechanism. We'll keep it for now and just fix it. Its reported that at least with the Intel WiFi cards on one system this issue was creeping up 50% of the boots [0]. Before this commit batched requests testing revealed: ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n CONFIG_FW_LOADER_USER_HELPER=y Most common Linux distribution setup. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() FAIL OK request_firmware_direct() FAIL OK request_firmware_nowait(uevent=true) FAIL OK request_firmware_nowait(uevent=false) FAIL OK ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n CONFIG_FW_LOADER_USER_HELPER=n Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() FAIL OK request_firmware_direct() FAIL OK request_firmware_nowait(uevent=true) FAIL OK request_firmware_nowait(uevent=false) FAIL OK ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y CONFIG_FW_LOADER_USER_HELPER=y Google Android setup. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() OK OK request_firmware_direct() FAIL OK request_firmware_nowait(uevent=true) OK OK request_firmware_nowait(uevent=false) OK OK ============================================================================ Ater this commit batched testing results: ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n CONFIG_FW_LOADER_USER_HELPER=y Most common Linux distribution setup. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() OK OK request_firmware_direct() OK OK request_firmware_nowait(uevent=true) OK OK request_firmware_nowait(uevent=false) OK OK ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n CONFIG_FW_LOADER_USER_HELPER=n Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() OK OK request_firmware_direct() OK OK request_firmware_nowait(uevent=true) OK OK request_firmware_nowait(uevent=false) OK OK ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y CONFIG_FW_LOADER_USER_HELPER=y Google Android setup. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() OK OK request_firmware_direct() OK OK request_firmware_nowait(uevent=true) OK OK request_firmware_nowait(uevent=false) OK OK ============================================================================ [0] https://bugzilla.kernel.org/show_bug.cgi?id=195477 Cc: stable <stable@vger.kernel.org> # v3.14 Fixes: bba3a87e982ad ("firmware: Introduce request_firmware_direct()" Reported-by: Nicolas <nbroeking@me.com> Reported-by: John Ewalt <jewalt@lgsinnovations.com> Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-10firmware: fix batched requests - wake all waitersLuis R. Rodriguez
The firmware cache mechanism serves two purposes, the secondary purpose is not well documented nor understood. This fixes a regression with the secondary purpose of the firmware cache mechanism: batched requests on successful lookups. Without this fix *any* time a batched request is triggered, secondary requests for which the batched request mechanism was designed for will seem to last forver and seem to never return. This issue is present for all kernel builds possible, and a hard reset is required. The firmware cache is used for: 1) Addressing races with file lookups during the suspend/resume cycle by keeping firmware in memory during the suspend/resume cycle 2) Batched requests for the same file rely only on work from the first file lookup, which keeps the firmware in memory until the last release_firmware() is called Batched requests *only* take effect if secondary requests come in prior to the first user calling release_firmware(). The devres name used for the internal firmware cache is used as a hint other pending requests are ongoing, the firmware buffer data is kept in memory until the last user of the buffer calls release_firmware(), therefore serializing requests and delaying the release until all requests are done. Batched requests wait for a wakup or signal so we can rely on the first file fetch to write to the pending secondary requests. Commit 5b029624948d ("firmware: do not use fw_lock for fw_state protection") ported the firmware API to use swait, and in doing so failed to convert complete_all() to swake_up_all() -- it used swake_up(), loosing the ability for *some* batched requests to take effect. We *could* fix this by just using swake_up_all() *but* swait is now known to be very special use case, so its best to just move away from it. So we just go back to using completions as before commit 5b029624948d ("firmware: do not use fw_lock for fw_state protection") given this was using complete_all(). Without this fix it has been reported plugging in two Intel 6260 Wifi cards on a system will end up enumerating the two devices only 50% of the time [0]. The ported swake_up() should have actually handled the case with two devices, however, *if more than two cards are used* the swake_up() would not have sufficed. This change is only part of the required fixes for batched requests. Another fix is provided in the next patch. This particular change should fix the cases where more than three requests with the same firmware name is used, otherwise batched requests will wait for MAX_SCHEDULE_TIMEOUT and just timeout eventually. Below is a summary of tests triggering batched requests on different kernel builds. Before this patch: ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n CONFIG_FW_LOADER_USER_HELPER=y Most common Linux distribution setup. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() FAIL FAIL request_firmware_direct() FAIL FAIL request_firmware_nowait(uevent=true) FAIL FAIL request_firmware_nowait(uevent=false) FAIL FAIL ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n CONFIG_FW_LOADER_USER_HELPER=n Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() FAIL FAIL request_firmware_direct() FAIL FAIL request_firmware_nowait(uevent=true) FAIL FAIL request_firmware_nowait(uevent=false) FAIL FAIL ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y CONFIG_FW_LOADER_USER_HELPER=y Google Android setup. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() FAIL FAIL request_firmware_direct() FAIL FAIL request_firmware_nowait(uevent=true) FAIL FAIL request_firmware_nowait(uevent=false) FAIL FAIL ============================================================================ After this patch: ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n CONFIG_FW_LOADER_USER_HELPER=y Most common Linux distribution setup. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() FAIL OK request_firmware_direct() FAIL OK request_firmware_nowait(uevent=true) FAIL OK request_firmware_nowait(uevent=false) FAIL OK ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n CONFIG_FW_LOADER_USER_HELPER=n Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() FAIL OK request_firmware_direct() FAIL OK request_firmware_nowait(uevent=true) FAIL OK request_firmware_nowait(uevent=false) FAIL OK ============================================================================ CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y CONFIG_FW_LOADER_USER_HELPER=y Google Android setup. API-type no-firmware-found firmware-found ---------------------------------------------------------------------- request_firmware() OK OK request_firmware_direct() FAIL OK request_firmware_nowait(uevent=true) OK OK request_firmware_nowait(uevent=false) OK OK ============================================================================ [0] https://bugzilla.kernel.org/show_bug.cgi?id=195477 CC: <stable@vger.kernel.org> [4.10+] Cc: Ming Lei <ming.lei@redhat.com> Fixes: 5b029624948d ("firmware: do not use fw_lock for fw_state protection") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-08PM / wakeup: Set power.can_wakeup if wakeup_sysfs_add() failsRafael J. Wysocki
Currently, an error from wakeup_sysfs_add() in device_set_wakeup_capable() causes the device's power.can_wakeup flag to remain unset even though the device technically is capable of signaling wakeup. If wakeup_sysfs_add() fails user space may not be able to enable the device to wake up the system from sleep states, but at least for some devices that does not matter. For this reason, set or clear power.can_wakeup upfront and if wakeup_sysfs_add() returns an error, print a message to the log. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-03initcall_debug: add deferred probe timesTodd Poynor
initcall_debug attributes all deferred device probe retries for the late_initcall level to function deferred_probe_initcall. Add logs of the individual device probe routines called, to identify which drivers are executing for how long during the initcall path. Deferred probes that occur after initcall processing are not shown. Example log messages added: [ 0.505119] deferred probe my-sound-device @ 6 [ 0.517656] deferred probe my-sound-device returned after 1227 usecs Signed-off-by: Todd Poynor <toddpoynor@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-01PM / OPP: Fix get sharing CPUs when hotplug is usedWaldemar Rymarkiewicz
We fail dev_pm_opp_of_get_sharing_cpus() when possible CPU device does not exist. This can happen on platforms where not all possible CPUs are available at start up ie. hotplugged out. The CPU device is not registered in the system so we are not able to check struct device to set the sharing CPUs bitmask properly. Example (real use case): 2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available for Linux at boot up. cpufreq-dt driver + OPP v2 fail to register opp_table due to the fact there is no struct device for cpu1 (remains offline at bootup). To solve the bug, stop using device struct to check device_node. Instead get CPU device_node directly from device tree with of_get_cpu_node(). Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-25Merge tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma mapping fixes from Christoph Hellwig: "split the global dma coherent pool from the per-device pool. This fixes a regression in the earlier 4.13 pull requests where the global pool would override a per-device CMA pool (Vladimir Murzin)" * tag 'dma-mapping-4.13-2' of git://git.infradead.org/users/hch/dma-mapping: ARM: NOMMU: Wire-up default DMA interface dma-coherent: introduce interface for default DMA pool
2017-07-24PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()Rafael J. Wysocki
Restore the pm_wakeup_pending() check in __device_suspend_noirq() removed by commit eed4d47efe95 (ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle) as that allows the function to return earlier if there's a wakeup event pending already (so that it may spend less time on carrying out operations that will be reversed shortly anyway) and rework the main suspend-to-idle loop to take that optimization into account. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-24PM / core: Add error argument to dpm_show_time()Rafael J. Wysocki
Make the core device suspend/resume code also call dpm_show_time() on failures and add an error argument to this function so that the message printed by it can reflect the success or failure condition. This makes the debug messages in question look less confusing in the failing cases. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-24PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()Rafael J. Wysocki
Put the device interrupts disabling and enabling as well as cpuidle_pause() and cpuidle_resume() called during the "noirq" stages of system suspend into separate functions to allow the core suspend-to-idle code to be optimized (later). The only functional difference this makes is that debug facilities and diagnostic tools will not include the above operations into the "noirq" device suspend/resume duration measurements. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-24PM / Domains: Extend generic power domain debugfsThara Gopinath
This patch extends the existing generic power domain debugfs. Changes involve the following - Introduce a unique debugfs entry for each generic power domain with the following attributes - current_state - Displays current state of the domain. - devices - Displays the devices associated with this domain. - sub_domains - Displays the sub power domains. - active_time - Displays the time the domain was in active state in ms. - total_idle_time - Displays the time the domain was in any of the idle states in ms. - idle_states - Displays the various idle states and the time spent in each idle state in ms. Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>