summaryrefslogtreecommitdiff
path: root/Documentation
AgeCommit message (Collapse)Author
2017-09-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "The iwlwifi firmware compat fix is in here as well as some other stuff: 1) Fix request socket leak introduced by BPF deadlock fix, from Eric Dumazet. 2) Fix VLAN handling with TXQs in mac80211, from Johannes Berg. 3) Missing __qdisc_drop conversions in prio and qfq schedulers, from Gao Feng. 4) Use after free in netlink nlk groups handling, from Xin Long. 5) Handle MTU update properly in ipv6 gre tunnels, from Xin Long. 6) Fix leak of ipv6 fib tables on netns teardown, from Sabrina Dubroca with follow-on fix from Eric Dumazet. 7) Need RCU and preemption disabled during generic XDP data patch, from John Fastabend" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits) bpf: make error reporting in bpf_warn_invalid_xdp_action more clear Revert "mdio_bus: Remove unneeded gpiod NULL check" bpf: devmap, use cond_resched instead of cpu_relax bpf: add support for sockmap detach programs net: rcu lock and preempt disable missing around generic xdp bpf: don't select potentially stale ri->map from buggy xdp progs net: tulip: Constify tulip_tbl net: ethernet: ti: netcp_core: no need in netif_napi_del davicom: Display proper debug level up to 6 net: phy: sfp: rename dt properties to match the binding dt-binding: net: sfp binding documentation dt-bindings: add SFF vendor prefix dt-bindings: net: don't confuse with generic PHY property ip6_tunnel: fix setting hop_limit value for ipv6 tunnel ip_tunnel: fix setting ttl and tos value in collect_md mode ipv6: fix typo in fib6_net_exit() tcp: fix a request socket leak sctp: fix missing wake ups in some situations netfilter: xt_hashlimit: fix build error caused by 64bit division netfilter: xt_hashlimit: alloc hashtable with right size ...
2017-09-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - most of the rest of MM - a small number of misc things - lib/ updates - checkpatch - autofs updates - ipc/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits) ipc: optimize semget/shmget/msgget for lots of keys ipc/sem: play nicer with large nsops allocations ipc/sem: drop sem_checkid helper ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t ipc: convert ipc_namespace.count from atomic_t to refcount_t kcov: support compat processes sh: defconfig: cleanup from old Kconfig options mn10300: defconfig: cleanup from old Kconfig options m32r: defconfig: cleanup from old Kconfig options drivers/pps: use surrounding "if PPS" to remove numerous dependency checks drivers/pps: aesthetic tweaks to PPS-related content cpumask: make cpumask_next() out-of-line kmod: move #ifdef CONFIG_MODULES wrapper to Makefile kmod: split off umh headers into its own file MAINTAINERS: clarify kmod is just a kernel module loader kmod: split out umh code into its own file test_kmod: flip INT checks to be consistent test_kmod: remove paranoid UINT_MAX check on uint range processing vfat: deduplicate hex2bin() ...
2017-09-09watchdog: Revert "iTCO_wdt: all versions count down twice"Wim Van Sebroeck
This reverts commit 1fccb73011ea8a5fa0c6d357c33fa29c695139ea. Reported as Bug 196509 - iTCO_wdt regression reboot before timeout expire Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2017-09-08dt-binding: net: sfp binding documentationBaruch Siach
Add device-tree binding documentation SFP transceivers. Support for SFP transceivers has been recently introduced (drivers/net/phy/sfp.c). Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08dt-bindings: add SFF vendor prefixBaruch Siach
Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08dt-bindings: net: don't confuse with generic PHY propertyBaruch Siach
This complements commit 9a94b3a4bd (dt-binding: phy: don't confuse with Ethernet phy properties). The generic PHY 'phys' property sometime appears in the same node with the Ethernet PHY 'phy' or 'phy-handle' properties. Add a warning in ethernet.txt to reduce confusion. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08drivers/pps: aesthetic tweaks to PPS-related contentRobert P. J. Day
Collection of aesthetic adjustments to various PPS-related files, directories and Documentation, some quite minor just for the sake of consistency, including: * Updated example of pps device tree node (courtesy Rodolfo G.) * "PPS-API" -> "PPS API" * "pps_source_info_s" -> "pps_source_info" * "ktimer driver" -> "pps-ktimer driver" * "ppstest /dev/pps0" -> "ppstest /dev/pps1" to match example * Add missing PPS-related entries to MAINTAINERS file * Other trivialities Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708261048220.8106@localhost.localdomain Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08rbtree: cache leftmost node internallyDavidlohr Bueso
Patch series "rbtree: Cache leftmost node internally", v4. A series to extending rbtrees to internally cache the leftmost node such that we can have fast overlap check optimization for all interval tree users[1]. The benefits of this series are that: (i) Unify users that do internal leftmost node caching. (ii) Optimize all interval tree users. (iii) Convert at least two new users (epoll and procfs) to the new interface. This patch (of 16): Red-black tree semantics imply that nodes with smaller or greater (or equal for duplicates) keys always be to the left and right, respectively. For the kernel this is extremely evident when considering our rb_first() semantics. Enabling lookups for the smallest node in the tree in O(1) can save a good chunk of cycles in not having to walk down the tree each time. To this end there are a few core users that explicitly do this, such as the scheduler and rtmutexes. There is also the desire for interval trees to have this optimization allowing faster overlap checking. This patch introduces a new 'struct rb_root_cached' which is just the root with a cached pointer to the leftmost node. The reason why the regular rb_root was not extended instead of adding a new structure was that this allows the user to have the choice between memory footprint and actual tree performance. The new wrappers on top of the regular rb_root calls are: - rb_first_cached(cached_root) -- which is a fast replacement for rb_first. - rb_insert_color_cached(node, cached_root, new) - rb_erase_cached(node, cached_root) In addition, augmented cached interfaces are also added for basic insertion and deletion operations; which becomes important for the interval tree changes. With the exception of the inserts, which adds a bool for updating the new leftmost, the interfaces are kept the same. To this end, porting rb users to the cached version becomes really trivial, and keeping current rbtree semantics for users that don't care about the optimization requires zero overhead. Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08hmm: heterogeneous memory management documentationJérôme Glisse
Patch series "HMM (Heterogeneous Memory Management)", v25. Heterogeneous Memory Management (HMM) (description and justification) Today device driver expose dedicated memory allocation API through their device file, often relying on a combination of IOCTL and mmap calls. The device can only access and use memory allocated through this API. This effectively split the program address space into object allocated for the device and useable by the device and other regular memory (malloc, mmap of a file, share memory, â) only accessible by CPU (or in a very limited way by a device by pinning memory). Allowing different isolated component of a program to use a device thus require duplication of the input data structure using device memory allocator. This is reasonable for simple data structure (array, grid, image, â) but this get extremely complex with advance data structure (list, tree, graph, â) that rely on a web of memory pointers. This is becoming a serious limitation on the kind of work load that can be offloaded to device like GPU. New industry standard like C++, OpenCL or CUDA are pushing to remove this barrier. This require a shared address space between GPU device and CPU so that GPU can access any memory of a process (while still obeying memory protection like read only). This kind of feature is also appearing in various other operating systems. HMM is a set of helpers to facilitate several aspects of address space sharing and device memory management. Unlike existing sharing mechanism that rely on pining pages use by a device, HMM relies on mmu_notifier to propagate CPU page table update to device page table. Duplicating CPU page table is only one aspect necessary for efficiently using device like GPU. GPU local memory have bandwidth in the TeraBytes/ second range but they are connected to main memory through a system bus like PCIE that is limited to 32GigaBytes/second (PCIE 4.0 16x). Thus it is necessary to allow migration of process memory from main system memory to device memory. Issue is that on platform that only have PCIE the device memory is not accessible by the CPU with the same properties as main memory (cache coherency, atomic operations, ...). To allow migration from main memory to device memory HMM provides a set of helper to hotplug device memory as a new type of ZONE_DEVICE memory which is un-addressable by CPU but still has struct page representing it. This allow most of the core kernel logic that deals with a process memory to stay oblivious of the peculiarity of device memory. When page backing an address of a process is migrated to device memory the CPU page table entry is set to a new specific swap entry. CPU access to such address triggers a migration back to system memory, just like if the page was swap on disk. HMM also blocks any one from pinning a ZONE_DEVICE page so that it can always be migrated back to system memory if CPU access it. Conversely HMM does not migrate to device memory any page that is pin in system memory. To allow efficient migration between device memory and main memory a new migrate_vma() helpers is added with this patchset. It allows to leverage device DMA engine to perform the copy operation. This feature will be use by upstream driver like nouveau mlx5 and probably other in the future (amdgpu is next suspect in line). We are actively working on nouveau and mlx5 support. To test this patchset we also worked with NVidia close source driver team, they have more resources than us to test this kind of infrastructure and also a bigger and better userspace eco-system with various real industry workload they can be use to test and profile HMM. The expected workload is a program builds a data set on the CPU (from disk, from network, from sensors, â). Program uses GPU API (OpenCL, CUDA, ...) to give hint on memory placement for the input data and also for the output buffer. Program call GPU API to schedule a GPU job, this happens using device driver specific ioctl. All this is hidden from programmer point of view in case of C++ compiler that transparently offload some part of a program to GPU. Program can keep doing other stuff on the CPU while the GPU is crunching numbers. It is expected that CPU will not access the same data set as the GPU while GPU is working on it, but this is not mandatory. In fact we expect some small memory object to be actively access by both GPU and CPU concurrently as synchronization channel and/or for monitoring purposes. Such object will stay in system memory and should not be bottlenecked by system bus bandwidth (rare write and read access from both CPU and GPU). As we are relying on device driver API, HMM does not introduce any new syscall nor does it modify any existing ones. It does not change any POSIX semantics or behaviors. For instance the child after a fork of a process that is using HMM will not be impacted in anyway, nor is there any data hazard between child COW or parent COW of memory that was migrated to device prior to fork. HMM assume a numbers of hardware features. Device must allow device page table to be updated at any time (ie device job must be preemptable). Device page table must provides memory protection such as read only. Device must track write access (dirty bit). Device must have a minimum granularity that match PAGE_SIZE (ie 4k). Reviewer (just hint): Patch 1 HMM documentation Patch 2 introduce core infrastructure and definition of HMM, pretty small patch and easy to review Patch 3 introduce the mirror functionality of HMM, it relies on mmu_notifier and thus someone familiar with that part would be in better position to review Patch 4 is an helper to snapshot CPU page table while synchronizing with concurrent page table update. Understanding mmu_notifier makes review easier. Patch 5 is mostly a wrapper around handle_mm_fault() Patch 6 add new add_pages() helper to avoid modifying each arch memory hot plug function Patch 7 add a new memory type for ZONE_DEVICE and also add all the logic in various core mm to support this new type. Dan Williams and any core mm contributor are best people to review each half of this patchset Patch 8 special case HMM ZONE_DEVICE pages inside put_page() Kirill and Dan Williams are best person to review this Patch 9 allow to uncharge a page from memory group without using the lru list field of struct page (best reviewer: Johannes Weiner or Vladimir Davydov or Michal Hocko) Patch 10 Add support to uncharge ZONE_DEVICE page from a memory cgroup (best reviewer: Johannes Weiner or Vladimir Davydov or Michal Hocko) Patch 11 add helper to hotplug un-addressable device memory as new type of ZONE_DEVICE memory (new type introducted in patch 3 of this serie). This is boiler plate code around memory hotplug and it also pick a free range of physical address for the device memory. Note that the physical address do not point to anything (at least as far as the kernel knows). Patch 12 introduce a new hmm_device class as an helper for device driver that want to expose multiple device memory under a common fake device driver. This is usefull for multi-gpu configuration. Anyone familiar with device driver infrastructure can review this. Boiler plate code really. Patch 13 add a new migrate mode. Any one familiar with page migration is welcome to review. Patch 14 introduce a new migration helper (migrate_vma()) that allow to migrate a range of virtual address of a process using device DMA engine to perform the copy. It is not limited to do copy from and to device but can also do copy between any kind of source and destination memory. Again anyone familiar with migration code should be able to verify the logic. Patch 15 optimize the new migrate_vma() by unmapping pages while we are collecting them. This can be review by any mm folks. Patch 16 add unaddressable memory migration to helper introduced in patch 7, this can be review by anyone familiar with migration code Patch 17 add a feature that allow device to allocate non-present page on the GPU when migrating a range of address to device memory. This is an helper for device driver to avoid having to first allocate system memory before migration to device memory Patch 18 add a new kind of ZONE_DEVICE memory for cache coherent device memory (CDM) Patch 19 add an helper to hotplug CDM memory Previous patchset posting : v1 http://lwn.net/Articles/597289/ v2 https://lkml.org/lkml/2014/6/12/559 v3 https://lkml.org/lkml/2014/6/13/633 v4 https://lkml.org/lkml/2014/8/29/423 v5 https://lkml.org/lkml/2014/11/3/759 v6 http://lwn.net/Articles/619737/ v7 http://lwn.net/Articles/627316/ v8 https://lwn.net/Articles/645515/ v9 https://lwn.net/Articles/651553/ v10 https://lwn.net/Articles/654430/ v11 http://www.gossamer-threads.com/lists/linux/kernel/2286424 v12 http://www.kernelhub.org/?msg=972982&p=2 v13 https://lwn.net/Articles/706856/ v14 https://lkml.org/lkml/2016/12/8/344 v15 http://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1304107.html v16 http://www.spinics.net/lists/linux-mm/msg119814.html v17 https://lkml.org/lkml/2017/1/27/847 v18 https://lkml.org/lkml/2017/3/16/596 v19 https://lkml.org/lkml/2017/4/5/831 v20 https://lwn.net/Articles/720715/ v21 https://lkml.org/lkml/2017/4/24/747 v22 http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05176.html v23 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1404788.html v24 https://lwn.net/Articles/726691/ This patch (of 19): This adds documentation for HMM (Heterogeneous Memory Management). It presents the motivation behind it, the features necessary for it to be useful and and gives an overview of how this is implemented. Link: http://lkml.kernel.org/r/20170817000548.32038-2-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sherry Cheung <SCheung@nvidia.com> Cc: Subhash Gutti <sgutti@nvidia.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08Merge tag 'platform-drivers-x86-v4.14-1' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver updates from Darren Hart: "Several fixes from static analysis and message noise reduction. Correct WMI core and related drivers to evaluate instance number 0x0 in accordance with the documentation. Add intel-telemetry support for Gemini Lake. Various individual driver fixes noted below. dell-wmi: - Update dell_wmi_check_descriptor_buffer() to new model intel-vbtn: - reduce unnecessary messages for normal users - match power button on press rather than release intel-hid: - reduce unnecessary messages for normal users thinkpad_acpi: - Fix warning about deprecated hwmon_device_register wmi: - Fix check for method instance number ideapad-laptop: - Expose conservation mode switch intel_pmc_core: - Make the driver PCH family agnostic peaq-wmi: - Evaluate wmi method with instance number 0x0 - silence a static checker warning mxm-wmi: - Evaluate wmi method with instance number 0x0 asus-wmi: - Evaluate wmi method with instance number 0x0 intel_scu_ipc: - make intel_scu_ipc_pdata_t const intel_mid_powerbtn: - make mid_pb_ddata const - fix error return code in mid_pb_probe() hp-wmi: - Remove unused macro helper - Correctly determine method id in WMI calls dell-wmi: - Fix driver interface version query intel_telemetry: - remove redundant macro definition - Add GLK PSS Event Table alienware-wmi: - fix format string overflow warning ibm_rtl: - remove unnecessary static in ibm_rtl_write() msi-wmi: - remove unnecessary static in msi_wmi_notify()" * tag 'platform-drivers-x86-v4.14-1' of git://git.infradead.org/linux-platform-drivers-x86: (23 commits) platform/x86: dell-wmi: Update dell_wmi_check_descriptor_buffer() to new model platform/x86: intel-vbtn: reduce unnecessary messages for normal users platform/x86: intel-hid: reduce unnecessary messages for normal users platform/x86: thinkpad_acpi: Fix warning about deprecated hwmon_device_register platform/x86: wmi: Fix check for method instance number platform/x86: ideapad-laptop: Expose conservation mode switch platform/x86: intel_pmc_core: Make the driver PCH family agnostic platform/x86: peaq-wmi: Evaluate wmi method with instance number 0x0 platform/x86: mxm-wmi: Evaluate wmi method with instance number 0x0 platform/x86: asus-wmi: Evaluate wmi method with instance number 0x0 platform/x86: intel_scu_ipc: make intel_scu_ipc_pdata_t const platform/x86: intel_mid_powerbtn: make mid_pb_ddata const platform/x86: intel_mid_powerbtn: fix error return code in mid_pb_probe() platform/x86: hp-wmi: Remove unused macro helper platform/x86: hp-wmi: Correctly determine method id in WMI calls platform/x86: intel-vbtn: match power button on press rather than release platform/x86: dell-wmi: Fix driver interface version query platform/x86: intel_telemetry: remove redundant macro definition platform/x86: intel_telemetry: Add GLK PSS Event Table platform/x86: alienware-wmi: fix format string overflow warning ...
2017-09-08Merge tag 'arc-4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC updates from Vineet Gupta: - Support for HSDK board hosting a Quad core HS38x4 based SoC running @1GHz (and some prerrquisite changes such as ability to scoot the kernel code/data from start of memory map etc) - Quite a few updates for EZChip (Mellanox) platform - Fixes to fault/exception printing * tag 'arc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (26 commits) ARC: Re-enable MMU upon Machine Check exception ARC: Show fault information passed to show_kernel_fault_diag() ARC: [plat-hsdk] initial port for HSDK board ARC: mm: Decouple RAM base address from kernel link address ARCv2: IOC: Tighten up the contraints (specifically base / size alignment) ARC: [plat-axs103] refactor the DT fudging code ARC: [plat-axs103] use clk driver #2: Add core pll node to DT to manage cpu clk ARC: [plat-axs103] use clk driver #1: Get rid of platform specific cpu clk setting ARCv2: SLC: provide a line based flush routine for debugging ARC: Hardcode ARCH_DMA_MINALIGN to max line length we may have ARC: [plat-eznps] handle extra aux regs #2: kernel/entry exit ARC: [plat-eznps] handle extra aux regs #1: save/restore on context switch ARC: [plat-eznps] avoid toggling of DPC register ARC: [plat-eznps] Update the init sequence of aux regs per cpu. ARC: [plat-eznps] new command line argument for HW scheduler at MTM ARC: set boot print log level to PR_INFO ARC: [plat-eznps] Handle user memory error same in simulation and silicon ARC: [plat-eznps] use schd.wft instruction instead of sleep at idle task ARC: create cpu specific version of arch_cpu_idle() ARC: [plat-eznps] spinlock aware for MTM ...
2017-09-08Merge tag 'pci-v4.14-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add enhanced Downstream Port Containment support, which prints more details about Root Port Programmed I/O errors (Dongdong Liu) - add Layerscape ls1088a and ls2088a support (Hou Zhiqiang) - add MediaTek MT2712 and MT7622 support (Ryder Lee) - add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang) - add Qualcom IPQ8074 support (Varadarajan Narayanan) - add R-Car r8a7743/5 device tree support (Biju Das) - add Rockchip per-lane PHY support for better power management (Shawn Lin) - fix IRQ mapping for hot-added devices by replacing the pci_fixup_irqs() boot-time design with a host bridge hook called at probe-time (Lorenzo Pieralisi, Matthew Minter) - fix race when enabling two devices that results in upstream bridge not being enabled correctly (Srinath Mannam) - fix pciehp power fault infinite loop (Keith Busch) - fix SHPC bridge MSI hotplug events by enabling bus mastering (Aleksandr Bezzubikov) - fix a VFIO issue by correcting PCIe capability sizes (Alex Williamson) - fix an INTD issue on Xilinx and possibly other drivers by unifying INTx IRQ domain support (Paul Burton) - avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg Roedel) - allow APM X-Gene device assignment to guests by adding an ACS quirk (Feng Kan) - fix driver crashes by disabling Extended Tags on Broadcom HT2100 (Extended Tags support is required for PCIe Receivers but not Requesters, and we now enable them by default when Requesters support them) (Sinan Kaya) - fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs use the real Requester ID (not a phantom RID) (Robin Murphy) - prevent assignment of Intel VMD children to guests (which may be supported eventually, but isn't yet) by not associating an IOMMU with them (Jon Derrick) - fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott Bauer) - fix a Function-Level Reset issue with Intel 750 NVMe by waiting longer (up to 60sec instead of 1sec) for device to become ready (Sinan Kaya) - fix a Function-Level Reset issue on iProc Stingray by working around hardware defects in the CRS implementation (Oza Pawandeep) - fix an issue with Intel NVMe P3700 after an iProc reset by adding a delay during shutdown (Oza Pawandeep) - fix a Microsoft Hyper-V lockdep issue by polling instead of blocking in compose_msi_msg() (Stephen Hemminger) - fix a wireless LAN driver timeout by clearing DesignWare MSI interrupt status after it is handled, not before (Faiz Abbas) - fix DesignWare ATU enable checking (Jisheng Zhang) - reduce Layerscape dependencies on the bootloader by doing more initialization in the driver (Hou Zhiqiang) - improve Intel VMD performance allowing allocation of more IRQ vectors than present CPUs (Keith Busch) - improve endpoint framework support for initial DMA mask, different BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon Vijay Abraham I, Stan Drozd) - rework CRS support to add periodic messages while we poll during enumeration and after Function-Level Reset and prepare for possible other uses of CRS (Sinan Kaya) - clean up Root Port AER handling by removing unnecessary code and moving error handler methods to struct pcie_port_service_driver (Christoph Hellwig) - clean up error handling paths in various drivers (Bjorn Andersson, Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen, Lorenzo Pieralisi, Sergei Shtylyov) - clean up SR-IOV resource handling by disabling VF decoding before updating the corresponding resource structs (Gavin Shan) - clean up DesignWare-based drivers by unifying quirks to update Class Code and Interrupt Pin and related handling of write-protected registers (Hou Zhiqiang) - clean up by adding empty generic pcibios_align_resource() and pcibios_fixup_bus() and removing empty arch-specific implementations (Palmer Dabbelt) - request exclusive reset control for several drivers to allow cleanup elsewhere (Philipp Zabel) - constify various structures (Arvind Yadav, Bhumika Goyal) - convert from full_name() to %pOF (Rob Herring) - remove unused variables from iProc, HiSi, Altera, Keystone (Shawn Lin) * tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits) PCI: xgene: Clean up whitespace PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset PCI: xgene: Fix platform_get_irq() error handling PCI: xilinx-nwl: Fix platform_get_irq() error handling PCI: rockchip: Fix platform_get_irq() error handling PCI: altera: Fix platform_get_irq() error handling PCI: spear13xx: Fix platform_get_irq() error handling PCI: artpec6: Fix platform_get_irq() error handling PCI: armada8k: Fix platform_get_irq() error handling PCI: dra7xx: Fix platform_get_irq() error handling PCI: exynos: Fix platform_get_irq() error handling PCI: iproc: Clean up whitespace PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP PCI: iproc: Add 500ms delay during device shutdown PCI: Fix typos and whitespace errors PCI: Remove unused "res" variable from pci_resource_io() PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag() PCI/AER: Reformat AER register definitions iommu/vt-d: Prevent VMD child devices from being remapping targets x86/PCI: Use is_vmd() rather than relying on the domain number ...
2017-09-08Merge tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Radim Krčmář: "First batch of KVM changes for 4.14 Common: - improve heuristic for boosting preempted spinlocks by ignoring VCPUs in user mode ARM: - fix for decoding external abort types from guests - added support for migrating the active priority of interrupts when running a GICv2 guest on a GICv3 host - minor cleanup PPC: - expose storage keys to userspace - merge kvm-ppc-fixes with a fix that missed 4.13 because of vacations - fixes s390: - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups x86: - emulate Hyper-V TSC frequency MSRs - add nested INVPCID - emulate EPTP switching VMFUNC - support Virtual GIF - support 5 level page tables - speedup nested VM exits by packing byte operations - speedup MMIO by using hardware provided physical address - a lot of fixes and cleanups, especially nested" * tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) KVM: arm/arm64: Support uaccess of GICC_APRn KVM: arm/arm64: Extract GICv3 max APRn index calculation KVM: arm/arm64: vITS: Drop its_ite->lpi field KVM: arm/arm64: vgic: constify seq_operations and file_operations KVM: arm/arm64: Fix guest external abort matching KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fd KVM: s390: vsie: cleanup mcck reinjection KVM: s390: use WARN_ON_ONCE only for checking KVM: s390: guestdbg: fix range check KVM: PPC: Book3S HV: Report storage key support to userspace KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9 KVM: PPC: Book3S HV: Fix invalid use of register expression KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER KVM: PPC: e500mc: Fix a NULL dereference KVM: PPC: e500: Fix some NULL dereferences on error KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list KVM: s390: we are always in czam mode KVM: s390: expose no-DAT to guest and migration support KVM: s390: sthyi: remove invalid guest write access ...
2017-09-08kokr/memory-barriers.txt: Apply atomic_t.txt changeSeongJae Park
This commit applies memory-barriers.txt part of upstream change, commit 706eeb3e9c6f ("Documentation/locking/atomic: Add documents for new atomic_t APIs") to Korean translation. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-09-08kokr/doc: Update memory-barriers.txt for read-to-write dependenciesSeongJae Park
This commit applies upstream change, commit 66ce3a4dcb9f ("doc: Update memory-barriers.txt for read-to-write dependencies") to Korean translation. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-09-08docs-rst: don't require adjustbox anymoreMauro Carvalho Chehab
Only the media PDF book was requiring adjustbox, in order to scale big tables. That worked pretty good with Sphinx versions 1.4 and 1.5, but Spinx 1.6 changed the way tables are produced, by introducing some weird macros before tabulary. That causes adjustbox to fail. So, it can't be used anymore, and its usage was removed from the media book. So, let's remove it from conf.py and sphinx-pre-install. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-09-08docs-rst: conf.py: only setup notice box colors if Sphinx < 1.6Mauro Carvalho Chehab
Sphinx 1.5 added a new way to change backward colors for note boxes, but kept backward compatibility with 1.4. On Sphinx 1.6, the old way stopped working, in favor of a new less hackish way. Unfortunately, this is currently too buggy to be used, and the old way doesn't work anymore. So, we have no option but to stick with boring notice boxes. One example of such bug is the notice that it is inside struct v4l2_plane, at the "bytesused" field. At least, add a notice about how to use, as maybe some day the bug will vanish. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-09-08docs-rst: conf.py: remove lscape from LaTeX preambleMauro Carvalho Chehab
Only the media book used this extension in the past, but it is not required anymore. Cleanup patch only. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-09-08Merge branch 'kvm-ppc-fixes' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc This fix was intended for 4.13, but didn't get in because both maintainers were on vacation. Paul Mackerras: "It adds mutual exclusion between list_add_rcu and list_del_rcu calls on the kvm->arch.spapr_tce_tables list. Without this, userspace could potentially trigger corruption of the list and cause a host crash or worse."
2017-09-07Merge branch 'gperf-removal'Linus Torvalds
Remove our use of 'gperf' for generating perfect hashes from some of our build tools. This removal was prompted by Masahiro Yamada sending out a patch that removes all our pre-generated files, and when I tested it, I noticed that the gperf version I have (3.1) apparently generates code that no longer works with out code-base because the function interfaces generated by gperf have changed. We really don't care that much, and the gperf people changed their interfaces in ways that makes it annoying to work with them. Tools that make it hard to use them should not be used, and the kernel is not at all interested in some autoconf mess. So remove the gperf dependency entirely. It turns out that if you ignore the pre-generated files, the use of gperf apparently saved us a whopping fifteen lines of code. It obviously wasn't worth it, considering that the pre-generated files are about 500 lines. I sent this out as a patch about three weeks ago, and got absolutely zero responses. So let's see if anybody notices now that I merge it. Because there might be serious bugs here, but it WorksForMe(tm). * gperf-removal: Remove gperf usage from toolchain
2017-09-07Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, zfcp and a host of minor updates. The major driver change here is the elimination of the block based cciss driver in favour of the SCSI based hpsa driver (which now drives all the legacy cases cciss used to be required for). Plus a reset handler clean up and the redo of the SAS SMP handler to use bsg lib" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: scsi-mq: Always unprepare before requeuing a request scsi: Show .retries and .jiffies_at_alloc in debugfs scsi: Improve requeuing behavior scsi: Call scsi_initialize_rq() for filesystem requests scsi: qla2xxx: Reset the logo flag, after target re-login. scsi: qla2xxx: Fix slow mem alloc behind lock scsi: qla2xxx: Clear fc4f_nvme flag scsi: qla2xxx: add missing includes for qla_isr scsi: qla2xxx: Fix an integer overflow in sysfs code scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2() scsi: aacraid: get rid of one level of indentation scsi: aacraid: fix indentation errors scsi: storvsc: fix memory leak on ring buffer busy scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough scsi: smartpqi: remove the smp_handler stub scsi: hpsa: remove the smp_handler stub scsi: bsg-lib: pass the release callback through bsg_setup_queue scsi: Rework handling of scsi_device.vpd_pg8[03] scsi: Rework the code for caching Vital Product Data (VPD) scsi: rcu: Introduce rcu_swap_protected() ...
2017-09-08Merge branches 'mediatek-mt2712', 'rockchip-rk3328' and 'uniphier-thermal' ↵Zhang Rui
into thermal-soc
2017-09-07Merge tag 'devicetree-for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: "There's a few orphans in the conversion to %pOF printf specifiers included here that no one else picked up. Summary: - Convert more DT code to use of_property_read_* API. - Improve DT overlay support when adding multiple overlays - Convert printk's to %pOF format specifiers. Most went via subsystem trees, but picked up the remaining orphans - Correct unittests to use preferred "okay" for "status" property value - Add a KASLR seed property - Vendor prefixes for Mellanox, Theobroma System, Adaptrum, Moxa - Fix modalias buffer handling - Clean-up of include paths for building dtbs - Add bindings for amc6821, isl1208, tsl2x7x, srf02, and srf10 devices - Add nvmem bindings for MediaTek MT7623 and MT7622 SoC - Add compatible string for Allwinner H5 Mali-450 GPU - Fix links to old OpenFirmware docs with new mirror on devicetree.org - Remove status property from binding doc examples" * tag 'devicetree-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (45 commits) devicetree: Adjust status "ok" -> "okay" under drivers/of/ dt-bindings: Remove "status" from examples dt-bindings: pinctrl: sh-pfc: Use generic node name dt-bindings: Add vendor Mellanox dt-binding: net/phy: fix interrupts description virt: Convert to using %pOF instead of full_name macintosh: Convert to using %pOF instead of full_name ide: pmac: Convert to using %pOF instead of full_name microblaze: Convert to using %pOF instead of full_name dt-bindings: usb: musb: Grammar s/the/to/, s/is/are/ of: Use PLATFORM_DEVID_NONE definition of/device: Fix of_device_get_modalias() buffer handling of/device: Prevent buffer overflow in of_device_modalias() dt-bindings: add amc6821, isl1208 trivial bindings dt-bindings: add vendor prefix for Theobroma Systems of: search scripts/dtc/include-prefixes path for both CPP and DTC of: remove arch/$(SRCARCH)/boot/dts from include search path for CPP of: remove drivers/of/testcase-data from include search path for CPP of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not registered iio: srf08: add device tree binding for srf02 and srf10 ...
2017-09-07Merge tag 'leds_for_4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED updates from Jacek Anaszewski: "LED class drivers improvements: leds-pca955x: - add Device Tree support and bindings - use devm_led_classdev_register() - add GPIO support - prevent crippled LED class device name - check for I2C errors leds-gpio: - add optional retain-state-shutdown DT property - allow LED to retain state at shutdown leds-tlc591xx: - merge conditional tests - add missing of_node_put leds-powernv: - delete an error message for a failed memory allocation in powernv_led_create() leds-is31fl32xx.c - convert to using custom %pOF printf format specifier Constify attribute_group structures in: - leds-blinkm - leds-lm3533 Make several arrays static const in: - leds-aat1290 - leds-lp5521 - leds-lp5562 - leds-lp8501" * tag 'leds_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: leds: pca955x: check for I2C errors leds: gpio: Allow LED to retain state at shutdown dt-bindings: leds: gpio: Add optional retain-state-shutdown property leds: powernv: Delete an error message for a failed memory allocation in powernv_led_create() leds: lp8501: make several arrays static const leds: lp5562: make several arrays static const leds: lp5521: make several arrays static const leds: aat1290: make array max_mm_current_percent static const leds: pca955x: Prevent crippled LED device name leds: lm3533: constify attribute_group structure dt-bindings: leds: add pca955x leds: pca955x: add GPIO support leds: pca955x: use devm_led_classdev_register leds: pca955x: add device tree support leds: Convert to using %pOF instead of full_name leds: blinkm: constify attribute_group structures. leds: tlc591xx: add missing of_node_put leds: tlc591xx: merge conditional tests
2017-09-07Merge tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull dmaengine updates from Vinod Koul: "This one features the usual updates to the drivers and one good part of removing DA_SG from core as it has no users. Summary: - Remove DMA_SG support as we have no users for this feature - New driver for Altera / Intel mSGDMA IP core - Support for memset in dmatest and qcom_hidma driver - Update for non cyclic mode in k3dma, bunch of update in bam_dma, bcm sba-raid - Constify device ids across drivers" * tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (52 commits) dmaengine: sun6i: support V3s SoC variant dmaengine: sun6i: make gate bit in sun8i's DMA engines a common quirk dmaengine: rcar-dmac: document R8A77970 bindings dmaengine: xilinx_dma: Fix error code format specifier dmaengine: altera: Use macros instead of structs to describe the registers dmaengine: ti-dma-crossbar: Fix dra7 reserve function dmaengine: pl330: constify amba_id dmaengine: pl08x: constify amba_id dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_COMPLETED dmaengine: bcm-sba-raid: Explicitly ACK mailbox message after sending dmaengine: bcm-sba-raid: Add debugfs support dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_RECEIVED dmaengine: bcm-sba-raid: Re-factor sba_process_deferred_requests() dmaengine: bcm-sba-raid: Pre-ack async tx descriptor dmaengine: bcm-sba-raid: Peek mbox when we have no free requests dmaengine: bcm-sba-raid: Alloc resources before registering DMA device dmaengine: bcm-sba-raid: Improve sba_issue_pending() run duration dmaengine: bcm-sba-raid: Increase number of free sba_request dmaengine: bcm-sba-raid: Allow arbitrary number free sba_request dmaengine: bcm-sba-raid: Remove reqs_free_count from sba_device ...
2017-09-07Merge tag 'mfd-next-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "New Drivers - RK805 Power Management IC (PMIC) - ROHM BD9571MWV-M MFD Power Management IC (PMIC) - Texas Instruments TPS68470 Power Management IC (PMIC) & LEDs New Device Support: - Add support for HiSilicon Hi6421v530 to hi6421-pmic-core - Add support for X-Powers AXP806 to axp20x - Add support for X-Powers AXP813 to axp20x - Add support for Intel Sunrise Point LPSS to intel-lpss-pci New Functionality: - Amend API to provide register layout; atmel-smc Fix-ups: - DT re-work; omap, nokia - Header file location change {I2C => MFD}; dm355evm_msp, tps65010 - Fix chip ID formatting issue(s); rk808 - Optionally register touchscreen devices; da9052-core - Documentation improvements; twl-core - Constification; rtsx_pcr, ab8500-core, da9055-i2c, da9052-spi - Drop unnecessary static declaration; max8925-i2c - Kconfig changes (missing deps and remove module support) - Slim down oversized licence statement; hi6421-pmic-core - Use managed resources (devm_*); lp87565 - Supply proper error checking/handling; t7l66xb Bug Fixes: - Fix counter duplication issue; da9052-core - Fix potential NULL deference issue; max8998 - Leave SPI-NOR write-protection bit alone; lpc_ich - Ensure device is put into reset during suspend; intel-lpss - Correct register offset variable size; omap-usb-tll" * tag 'mfd-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (61 commits) mfd: intel_soc_pmic: Differentiate between Bay and Cherry Trail CRC variants mfd: intel_soc_pmic: Export separate mfd-cell configs for BYT and CHT dt-bindings: mfd: Add bindings for ZII RAVE devices mfd: omap-usb-tll: Fix register offsets mfd: da9052: Constify spi_device_id mfd: intel-lpss: Put I2C and SPI controllers into reset state on suspend mfd: da9055: Constify i2c_device_id mfd: intel-lpss: Add missing PCI ID for Intel Sunrise Point LPSS devices mfd: t7l66xb: Handle return value of clk_prepare_enable mfd: Add ROHM BD9571MWV-M PMIC DT bindings mfd: intel_soc_pmic_chtwc: Turn Kconfig option into a bool mfd: lp87565: Convert to use devm_mfd_add_devices() mfd: Add support for TPS68470 device mfd: lpc_ich: Do not touch SPI-NOR write protection bit on Haswell/Broadwell mfd: syscon: atmel-smc: Add helper to retrieve register layout mfd: axp20x: Use correct platform device ID for many PEK dt-bindings: mfd: axp20x: Introduce bindings for AXP813 mfd: axp20x: Add support for AXP813 PMIC dt-bindings: mfd: axp20x: Add AXP806 to supported list of chips mfd: Add ROHM BD9571MWV-M MFD PMIC driver ...
2017-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - a new GPIO bit-banging driver implementing PS/2 protocol - a new power key driver for Rockchip RK805 PMIC - bunch of patches constifying various device ID structures - Elan I2C touchpad driver now supports devices with 2 buttons - other assorted fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (76 commits) Input: byd - make array seq static, reduces object code size Input: xilinx_ps2 - fix multiline comment style Input: pxa27x_keypad - handle return value of clk_prepare_enable Input: tegra-kbc - handle return value of clk_prepare_enable Input: PS/2 gpio bit banging driver for serio bus Input: xen-kbdfront - enable auto repeat for xen keyboard frontend driver Input: ambakmi - constify amba_id Input: atmel_mxt_ts - add support for reset line Input: atmel_mxt_ts - use more managed resources Input: wacom_w8001 - constify serio_device_id Input: tsc40 - constify serio_device_id Input: touchwin - constify serio_device_id Input: touchright - constify serio_device_id Input: touchit213 - constify serio_device_id Input: penmount - constify serio_device_id Input: mtouch - constify serio_device_id Input: inexio - constify serio_device_id Input: hampshire - constify serio_device_id Input: gunze - constify serio_device_id Input: fujitsu_ts - constify serio_device_id ...
2017-09-07Merge tag 'media/v4.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: "Brazil's Independence Day pull request :-) This is one of the biggest media pull requests, with 625 patches affecting almost all parts of media (RC, DVB, V4L2, CEC, docs). This contains: - A lot of new drivers: * DVB frontends: mxl5xx, stv0910, stv6111; * camera flash: as3645a led driver; * HDMI receiver: adv748X; * camera sensor: Omnivision 6650 5M driver (ov6650); * HDMI CEC: ao-cec meson driver; * V4L2: Qualcom camss driver; * Remote controller: gpio-ir-tx, pwm-ir-tx and zx-irdec drivers. - The DDbridge DVB driver got a massive update, with makes it in sync with modern hardware from that vendor; - There's an important milestone on this series: the DVB documentation was written in 2003, but only started to be updated in 2007. It also used to contain several gaps from the time it was kept out of tree, mentioning error codes and device nodes that never existed upstream. On this series, it received a massive update: all non-deprecated digital TV APIs are now in sync with the current implementation; - Some DVB APIs that aren't used by any upstream driver got removed; - Other parts of the media documentation algo got updated, fixing some bugs on its PDF output and making it compatible with Sphinx version 1.6. As the number of hacks required to build PDF output reduced, I hope we'll have less troubles as newer versions of our documentation toolchain are released (famous last words); - As usual, lots of driver cleanups and improvements" * tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (624 commits) media: leds: as3645a: add V4L2_FLASH_LED_CLASS dependency media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers media: Revert "[media] v4l: async: make v4l2 coexist with devicetree nodes in a dt overlay" media: staging: atomisp: sh_css_calloc shall return a pointer to the allocated space media: Revert "[media] lirc_dev: remove superfluous get/put_device() calls" media: add qcom_camss.rst to v4l-drivers rst file media: dvb headers: make checkpatch happier media: dvb uapi: move frontend legacy API to another part of the book media: pixfmt-srggb12p.rst: better format the table for PDF output media: docs-rst: media: Don't use \small for V4L2_PIX_FMT_SRGGB10 documentation media: index.rst: don't write "Contents:" on PDF output media: pixfmt*.rst: replace a two dots by a comma media: vidioc-g-fmt.rst: adjust table format media: vivid.rst: add a blank line to correct ReST format media: v4l2 uapi book: get rid of driver programming's chapter media: format.rst: use the right markup for important notes media: docs-rst: cardlists: change their format to flat-tables media: em28xx-cardlist.rst: update to reflect last changes media: v4l2-event.rst: adjust table to fit on PDF output media: docs: don't show ToC for each part on PDF output ...
2017-09-07Merge tag 'sound-4.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "We have touched quite a lot of files but with fewer changes at this cycle; as you can see, most of changes are trivial fixes, especially constification patches. Among the massive attacks by constification gangs, we had a few core changes (mostly for ASoC core), as well the fixes and the updates by major vendors. Some highlights: ALSA core: - Fix possible races in control API user-TLV codes - Small cleanup of PCM core ASoC: - Continued work for componentization; still half-baked, but we're certainly progressing - Use of devres for jack detection GPIOs, rather as a cleanup - Jack detection support for Qualcomm MSM8916 - Support for Allwinner H3, Cirrus Logic CS43130, Intel Kabylake systems with RT5663, Realtek RT274, TI TLV320AIC32x6 and Wolfson WM8523" * tag 'sound-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (512 commits) ALSA: hda/ca0132 - Fix memory leak at error path ALSA: hda: Fix forget to free resource in error handling code path in hda_codec_driver_probe ASoC: cs43130: Fix unused compiler warnings for PM runtime ASoC: cs43130: Fix possible Oops with invalid dev_id ASoC: cs43130: fix spelling mistake: "irq_occurrance" -> "irq_occurrence" ALSA: atmel: Remove leftovers of AVR32 removal ALSA: atmel: convert AC97c driver to GPIO descriptor API ALSA: hda/realtek - Enable jack detection function for Intel ALC700 ALSA: hda: Fix regression of hdmi eld control created based on invalid pcm ASoC: Intel: Skylake: Add IPC to configure the copier secondary pins ASoC: add missing compile rule for max98371 ASoC: add missing compile rule for sirf-audio-codec ASoC: add missing compile rule for max98371 ASoC: cs43130: Add devicetree bindings for CS43130 ASoC: cs43130: Add support for CS43130 codec ASoC: make clock direction configurable in asoc-simple ALSA: ctxfi: Remove null check before kfree ASoC: max98927: Changed device property read function ASoC: max98927: Modified DAPM widget and map to enable/disable VI sense path ASoC: max98927: Added PM suspend and resume function ...
2017-09-07Merge tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds
Pull MMC updates from Ulf Hansson: "MMC core: - Continue to refactor the mmc block code to prepare for blkmq - Move mmc block debugfs into block module - Next step for eMMC CMDQ by adding a new mmc host interface for it - Move Kconfig option MMC_DEBUG from core to host - Some additional minor improvements MMC host: - Declare structs as const when applicable - Explicitly request exclusive reset control when applicable - Improve some error paths and other various cleanups - sdhci: Preparations to support SDHCI OMAP - sdhci: Improve some PM related code - sdhci: Re-factoring and modernizations - sdhci-xenon: Add runtime PM and system sleep support - sdhci-xenon: Add support for eMMC HS400 Enhanced Strobe - sdhci-cadence: Add system sleep support - sdhci-of-at91: Improve system sleep support - dw_mmc: Add support for Hisilicon hi3660 - sunxi: Add support for A83T eMMC - sunxi: Add support for DDR52 mode - meson-gx: Add support for UHS-I SD-cards - meson-gx: Cleanups and improvements - tmio: Fix CMD12 (STOP) handling - tmio: Cleanups and improvements - renesas_sdhi: Add r8a7743/5 support - renesas-sdhi: Add support for R-Car Gen3 SDHI DMAC - renesas_sdhi: Cleanups and improvements" * tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (145 commits) mmc: renesas_sdhi: Add r8a7743/5 support mmc: meson-gx: fix __ffsdi2 undefined on arm32 mmc: sdhci-xenon: add runtime pm support and reimplement standby mmc: core: Move mmc_start_areq() declaration mmc: mmci: stop building qcom dml as module mmc: sunxi: Reset the device at probe time clk: sunxi-ng: Provide a default reset hook mmc: meson-gx: rework tuning function mmc: meson-gx: change default tx phase mmc: meson-gx: implement voltage switch callback mmc: meson-gx: use CCF to handle the clock phases mmc: meson-gx: implement card_busy callback mmc: meson-gx: simplify interrupt handler mmc: meson-gx: work around clk-stop issue mmc: meson-gx: fix dual data rate mode frequencies mmc: meson-gx: rework clock init function mmc: meson-gx: rework clk_set function mmc: meson-gx: rework set_ios function mmc: meson-gx: cfg init overwrite values mmc: meson-gx: initialize sane clk default before clock register ...
2017-09-07Merge branch 'pci/trivial' into nextBjorn Helgaas
* pci/trivial: PCI: Fix typos and whitespace errors PCI: Remove unused "res" variable from pci_resource_io() PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
2017-09-07Merge branch 'pci/host-rockchip' into nextBjorn Helgaas
* pci/host-rockchip: PCI: rockchip: Fix platform_get_irq() error handling PCI: rockchip: Umap IO space if probe fails PCI: rockchip: Remove IRQ domain if probe fails PCI: rockchip: Disable vpcie0v9 if resume_noirq fails PCI: rockchip: Clean up PHY if driver probe or resume fails PCI: rockchip: Factor out rockchip_pcie_deinit_phys() PCI: rockchip: Factor out rockchip_pcie_disable_clocks() PCI: rockchip: Factor out rockchip_pcie_enable_clocks() PCI: rockchip: Factor out rockchip_pcie_setup_irq() PCI: rockchip: Use gpiod_set_value_cansleep() to allow reset via expanders PCI: rockchip: Use PCI_NUM_INTX PCI: rockchip: Explicitly request exclusive reset control dt-bindings: phy-rockchip-pcie: Convert to per-lane PHY model dt-bindings: PCI: rockchip: Convert to per-lane PHY model arm64: dts: rockchip: convert PCIe to use per-lane PHYs for rk3339 PCI: rockchip: Idle inactive PHY(s) phy: rockchip-pcie: Reconstruct driver to support per-lane PHYs PCI: rockchip: Add per-lane PHY support PCI: rockchip: Factor out rockchip_pcie_get_phys() PCI: rockchip: Control optional 12v power supply dt-bindings: PCI: rockchip: Add vpcie12v-supply for Rockchip PCIe controller
2017-09-07Merge branch 'pci/host-rcar' into nextBjorn Helgaas
* pci/host-rcar: PCI: rcar: Add device tree support for r8a7743/5 PCI: rcar: Fix memory leak when no PCIe card is inserted PCI: rcar: Fix error exit path
2017-09-07Merge branch 'pci/host-qcom' into nextBjorn Helgaas
* pci/host-qcom: PCI: qcom: Add support for IPQ8074 PCIe controller dt-bindings: PCI: qcom: Add support for IPQ8074 PCI: qcom: Use block IP version for operations PCI: qcom: Explicitly request exclusive reset control PCI: qcom: Use gpiod_set_value_cansleep() to allow reset via expanders
2017-09-07Merge branch 'pci/host-mediatek' into nextBjorn Helgaas
* pci/host-mediatek: PCI: mediatek: Use PCI_NUM_INTX PCI: mediatek: Add MSI support for MT2712 and MT7622 PCI: mediatek: Use bus->sysdata to get host private data dt-bindings: PCI: Add support for MT2712 and MT7622 PCI: mediatek: Add controller support for MT2712 and MT7622 dt-bindings: PCI: Cleanup MediaTek binding text dt-bindings: PCI: Rename MediaTek binding PCI: mediatek: Switch to use platform_get_resource_byname() PCI: mediatek: Add a structure to abstract the controller generations PCI: mediatek: Rename port->index and mtk_pcie_parse_ports() PCI: mediatek: Use readl_poll_timeout() to wait for Gen2 training PCI: mediatek: Explicitly request exclusive reset control
2017-09-07Merge tag 'powerpc-4.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Nothing really major this release, despite quite a lot of activity. Just lots of things all over the place. Some things of note include: - Access via perf to a new type of PMU (IMC) on Power9, which can count both core events as well as nest unit events (Memory controller etc). - Optimisations to the radix MMU TLB flushing, mostly to avoid unnecessary Page Walk Cache (PWC) flushes when the structure of the tree is not changing. - Reworks/cleanups of do_page_fault() to modernise it and bring it closer to other architectures where possible. - Rework of our page table walking so that THP updates only need to send IPIs to CPUs where the affected mm has run, rather than all CPUs. - The size of our vmalloc area is increased to 56T on 64-bit hash MMU systems. This avoids problems with the percpu allocator on systems with very sparse NUMA layouts. - STRICT_KERNEL_RWX support on PPC32. - A new sched domain topology for Power9, to capture the fact that pairs of cores may share an L2 cache. - Power9 support for VAS, which is a new mechanism for accessing coprocessors, and initial support for using it with the NX compression accelerator. - Major work on the instruction emulation support, adding support for many new instructions, and reworking it so it can be used to implement the emulation needed to fixup alignment faults. - Support for guests under PowerVM to use the Power9 XIVE interrupt controller. And probably that many things again that are almost as interesting, but I had to keep the list short. Plus the usual fixes and cleanups as always. Thanks to: Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Arvind Yadav, Balbir Singh, Benjamin Herrenschmidt, Bhumika Goyal, Breno Leitao, Bryant G. Ly, Christophe Leroy, Cédric Le Goater, Dan Carpenter, Dou Liyang, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Hannes Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall, LABBE Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring, Masahiro Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Rashmica Gupta, Rob Herring, Rui Teng, Sam Bobroff, Santosh Sivaraj, Scott Wood, Shilpasri G Bhat, Sukadev Bhattiprolu, Suraj Jitindar Singh, Tobin C. Harding, Victor Aoqui" * tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (321 commits) powerpc/xive: Fix section __init warning powerpc: Fix kernel crash in emulation of vector loads and stores powerpc/xive: improve debugging macros powerpc/xive: add XIVE Exploitation Mode to CAS powerpc/xive: introduce H_INT_ESB hcall powerpc/xive: add the HW IRQ number under xive_irq_data powerpc/xive: introduce xive_esb_write() powerpc/xive: rename xive_poke_esb() in xive_esb_read() powerpc/xive: guest exploitation of the XIVE interrupt controller powerpc/xive: introduce a common routine xive_queue_page_alloc() powerpc/sstep: Avoid used uninitialized error axonram: Return directly after a failed kzalloc() in axon_ram_probe() axonram: Improve a size determination in axon_ram_probe() axonram: Delete an error message for a failed memory allocation in axon_ram_probe() powerpc/powernv/npu: Move tlb flush before launching ATSD powerpc/macintosh: constify wf_sensor_ops structures powerpc/iommu: Use permission-specific DEVICE_ATTR variants powerpc/eeh: Delete an error out of memory message at init time powerpc/mm: Use seq_putc() in two functions macintosh: Convert to using %pOF instead of full_name ...
2017-09-07Merge tag 'kvm-arm-for-v4.14' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Changes for v4.14 Two minor cleanups and improvements, a fix for decoding external abort types from guests, and added support for migrating the active priority of interrupts when running a GICv2 guest on a GICv3 host.
2017-09-07Merge tag 'kvm-s390-next-4.14-2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: Fixes and features for 4.14 - merge of topic branch tlb-flushing from the s390 tree to get the no-dat base features - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups
2017-09-06Merge branch 'for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: "Except for the ahci fix that fixes a boot issue, nothing major in this pull request. Some new platform controller support and device specific changes" * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata: zpodd: make arrays cdb static, reduces object code size ahci: don't use MSI for devices with the silly Intel NVMe remapping scheme dt-bindings: ata: add DT bindings for MediaTek SATA controller ata: mediatek: add support for MediaTek SATA controller pata_octeon_cf: use of_property_read_{bool|u32}() cs5536: add support for IDE controller variant ata: sata_gemini: Introduce explicit IDE pin control ata: sata_gemini: Retire custom pin control ata: ahci_platform: Add shutdown handler ata: sata_gemini: explicitly request exclusive reset control ata: Drop unnecessary static ata: Convert to using %pOF instead of full_name
2017-09-06Merge branch 'for-4.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Several notable changes this cycle: - Thread mode was merged. This will be used for cgroup2 support for CPU and possibly other controllers. Unfortunately, CPU controller cgroup2 support didn't make this pull request but most contentions have been resolved and the support is likely to be merged before the next merge window. - cgroup.stat now shows the number of descendant cgroups. - cpuset now can enable the easier-to-configure v2 behavior on v1 hierarchy" * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits) cpuset: Allow v2 behavior in v1 cgroup cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup cgroup: remove unneeded checks cgroup: misc changes cgroup: short-circuit cset_cgroup_from_root() on the default hierarchy cgroup: re-use the parent pointer in cgroup_destroy_locked() cgroup: add cgroup.stat interface with basic hierarchy stats cgroup: implement hierarchy limits cgroup: keep track of number of descent cgroups cgroup: add comment to cgroup_enable_threaded() cgroup: remove unnecessary empty check when enabling threaded mode cgroup: update debug controller to print out thread mode information cgroup: implement cgroup v2 thread support cgroup: implement CSS_TASK_ITER_THREADED cgroup: introduce cgroup->dom_cgrp and threaded css_set handling cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS cgroup: reorganize cgroup.procs / task write path cgroup: replace css_set walking populated test with testing cgrp->nr_populated_csets cgroup: distinguish local and children populated states cgroup: remove now unused list_head @pending in cgroup_apply_cftypes() ...
2017-09-06Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: "Nothing major. I introduced a flag collsion bug during v4.13 cycle which is fixed in this pull request. Fortunately, the flag is for debugging / verification and the bug isn't critical" * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix flag collision workqueue: Use TASK_IDLE workqueue: fix path to documentation workqueue: doc change for ST behavior on NUMA systems
2017-09-06dt-binding: phy: don't confuse with Ethernet phy propertiesBaruch Siach
The generic PHY 'phys' property sometime appears in the same node with the Ethernet PHY 'phy' or 'phy-handle' properties. Add a warning in phy-bindings.txt to reduce confusion. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - various misc bits - DAX updates - OCFS2 - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits) mm,fork: introduce MADV_WIPEONFORK x86,mpx: make mpx depend on x86-64 to free up VMA flag mm: add /proc/pid/smaps_rollup mm: hugetlb: clear target sub-page last when clearing huge page mm: oom: let oom_reap_task and exit_mmap run concurrently swap: choose swap device according to numa node mm: replace TIF_MEMDIE checks by tsk_is_oom_victim mm, oom: do not rely on TIF_MEMDIE for memory reserves access z3fold: use per-cpu unbuddied lists mm, swap: don't use VMA based swap readahead if HDD is used as swap mm, swap: add sysfs interface for VMA based swap readahead mm, swap: VMA based swap readahead mm, swap: fix swap readahead marking mm, swap: add swap readahead hit statistics mm/vmalloc.c: don't reinvent the wheel but use existing llist API mm/vmstat.c: fix wrong comment selftests/memfd: add memfd_create hugetlbfs selftest mm/shmem: add hugetlbfs support to memfd_create() mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas() ...
2017-09-06mm: add /proc/pid/smaps_rollupDaniel Colascione
/proc/pid/smaps_rollup is a new proc file that improves the performance of user programs that determine aggregate memory statistics (e.g., total PSS) of a process. Android regularly "samples" the memory usage of various processes in order to balance its memory pool sizes. This sampling process involves opening /proc/pid/smaps and summing certain fields. For very large processes, sampling memory use this way can take several hundred milliseconds, due mostly to the overhead of the seq_printf calls in task_mmu.c. smaps_rollup improves the situation. It contains most of the fields of /proc/pid/smaps, but instead of a set of fields for each VMA, smaps_rollup instead contains one synthetic smaps-format entry representing the whole process. In the single smaps_rollup synthetic entry, each field is the summation of the corresponding field in all of the real-smaps VMAs. Using a common format for smaps_rollup and smaps allows userspace parsers to repurpose parsers meant for use with non-rollup smaps for smaps_rollup, and it allows userspace to switch between smaps_rollup and smaps at runtime (say, based on the availability of smaps_rollup in a given kernel) with minimal fuss. By using smaps_rollup instead of smaps, a caller can avoid the significant overhead of formatting, reading, and parsing each of a large process's potentially very numerous memory mappings. For sampling system_server's PSS in Android, we measured a 12x speedup, representing a savings of several hundred milliseconds. One alternative to a new per-process proc file would have been including PSS information in /proc/pid/status. We considered this option but thought that PSS would be too expensive (by a few orders of magnitude) to collect relative to what's already emitted as part of /proc/pid/status, and slowing every user of /proc/pid/status for the sake of readers that happen to want PSS feels wrong. The code itself works by reusing the existing VMA-walking framework we use for regular smaps generation and keeping the mem_size_stats structure around between VMA walks instead of using a fresh one for each VMA. In this way, summation happens automatically. We let seq_file walk over the VMAs just as it does for regular smaps and just emit nothing to the seq_file until we hit the last VMA. Benchmarks: using smaps: iterations:1000 pid:1163 pss:220023808 0m29.46s real 0m08.28s user 0m20.98s system using smaps_rollup: iterations:1000 pid:1163 pss:220702720 0m04.39s real 0m00.03s user 0m04.31s system We're using the PSS samples we collect asynchronously for system-management tasks like fine-tuning oom_adj_score, memory use tracking for debugging, application-level memory-use attribution, and deciding whether we want to kill large processes during system idle maintenance windows. Android has been using PSS for these purposes for a long time; as the average process VMA count has increased and and devices become more efficiency-conscious, PSS-collection inefficiency has started to matter more. IMHO, it'd be a lot safer to optimize the existing PSS-collection model, which has been fine-tuned over the years, instead of changing the memory tracking approach entirely to work around smaps-generation inefficiency. Tim said: : There are two main reasons why Android gathers PSS information: : : 1. Android devices can show the user the amount of memory used per : application via the settings app. This is a less important use case. : : 2. We log PSS to help identify leaks in applications. We have found : an enormous number of bugs (in the Android platform, in Google's own : apps, and in third-party applications) using this data. : : To do this, system_server (the main process in Android userspace) will : sample the PSS of a process three seconds after it changes state (for : example, app is launched and becomes the foreground application) and about : every ten minutes after that. The net result is that PSS collection is : regularly running on at least one process in the system (usually a few : times a minute while the screen is on, less when screen is off due to : suspend). PSS of a process is an incredibly useful stat to track, and we : aren't going to get rid of it. We've looked at some very hacky approaches : using RSS ("take the RSS of the target process, subtract the RSS of the : zygote process that is the parent of all Android apps") to reduce the : accounting time, but it regularly overestimated the memory used by 20+ : percent. Accordingly, I don't think that there's a good alternative to : using PSS. : : We started looking into PSS collection performance after we noticed random : frequency spikes while a phone's screen was off; occasionally, one of the : CPU clusters would ramp to a high frequency because there was 200-300ms of : constant CPU work from a single thread in the main Android userspace : process. The work causing the spike (which is reasonable governor : behavior given the amount of CPU time needed) was always PSS collection. : As a result, Android is burning more power than we should be on PSS : collection. : : The other issue (and why I'm less sure about improving smaps as a : long-term solution) is that the number of VMAs per process has increased : significantly from release to release. After trying to figure out why we : were seeing these 200-300ms PSS collection times on Android O but had not : noticed it in previous versions, we found that the number of VMAs in the : main system process increased by 50% from Android N to Android O (from : ~1800 to ~2700) and varying increases in every userspace process. Android : M to N also had an increase in the number of VMAs, although not as much. : I'm not sure why this is increasing so much over time, but thinking about : ASLR and ways to make ASLR better, I expect that this will continue to : increase going forward. I would not be surprised if we hit 5000 VMAs on : the main Android process (system_server) by 2020. : : If we assume that the number of VMAs is going to increase over time, then : doing anything we can do to reduce the overhead of each VMA during PSS : collection seems like the right way to go, and that means outputting an : aggregate statistic (to avoid whatever overhead there is per line in : writing smaps and in reading each line from userspace). Link: http://lkml.kernel.org/r/20170812022148.178293-1-dancol@google.com Signed-off-by: Daniel Colascione <dancol@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sonny Rao <sonnyrao@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06swap: choose swap device according to numa nodeAaron Lu
If the system has more than one swap device and swap device has the node information, we can make use of this information to decide which swap device to use in get_swap_pages() to get better performance. The current code uses a priority based list, swap_avail_list, to decide which swap device to use and if multiple swap devices share the same priority, they are used round robin. This patch changes the previous single global swap_avail_list into a per-numa-node list, i.e. for each numa node, it sees its own priority based list of available swap devices. Swap device's priority can be promoted on its matching node's swap_avail_list. The current swap device's priority is set as: user can set a >=0 value, or the system will pick one starting from -1 then downwards. The priority value in the swap_avail_list is the negated value of the swap device's due to plist being sorted from low to high. The new policy doesn't change the semantics for priority >=0 cases, the previous starting from -1 then downwards now becomes starting from -2 then downwards and -1 is reserved as the promoted value. Take 4-node EX machine as an example, suppose 4 swap devices are available, each sit on a different node: swapA on node 0 swapB on node 1 swapC on node 2 swapD on node 3 After they are all swapped on in the sequence of ABCD. Current behaviour: their priorities will be: swapA: -1 swapB: -2 swapC: -3 swapD: -4 And their position in the global swap_avail_list will be: swapA -> swapB -> swapC -> swapD prio:1 prio:2 prio:3 prio:4 New behaviour: their priorities will be(note that -1 is skipped): swapA: -2 swapB: -3 swapC: -4 swapD: -5 And their positions in the 4 swap_avail_lists[nid] will be: swap_avail_lists[0]: /* node 0's available swap device list */ swapA -> swapB -> swapC -> swapD prio:1 prio:3 prio:4 prio:5 swap_avali_lists[1]: /* node 1's available swap device list */ swapB -> swapA -> swapC -> swapD prio:1 prio:2 prio:4 prio:5 swap_avail_lists[2]: /* node 2's available swap device list */ swapC -> swapA -> swapB -> swapD prio:1 prio:2 prio:3 prio:5 swap_avail_lists[3]: /* node 3's available swap device list */ swapD -> swapA -> swapB -> swapC prio:1 prio:2 prio:3 prio:4 To see the effect of the patch, a test that starts N process, each mmap a region of anonymous memory and then continually write to it at random position to trigger both swap in and out is used. On a 2 node Skylake EP machine with 64GiB memory, two 170GB SSD drives are used as swap devices with each attached to a different node, the result is: runtime=30m/processes=32/total test size=128G/each process mmap region=4G kernel throughput vanilla 13306 auto-binding 15169 +14% runtime=30m/processes=64/total test size=128G/each process mmap region=2G kernel throughput vanilla 11885 auto-binding 14879 +25% [aaron.lu@intel.com: v2] Link: http://lkml.kernel.org/r/20170814053130.GD2369@aaronlu.sh.intel.com Link: http://lkml.kernel.org/r/20170816024439.GA10925@aaronlu.sh.intel.com [akpm@linux-foundation.org: use kmalloc_array()] Link: http://lkml.kernel.org/r/20170814053130.GD2369@aaronlu.sh.intel.com Link: http://lkml.kernel.org/r/20170816024439.GA10925@aaronlu.sh.intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: "Chen, Tim C" <tim.c.chen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06mm, swap: add sysfs interface for VMA based swap readaheadHuang Ying
The sysfs interface to control the VMA based swap readahead is added as follow, /sys/kernel/mm/swap/vma_ra_enabled Enable the VMA based swap readahead algorithm, or use the original global swap readahead algorithm. /sys/kernel/mm/swap/vma_ra_max_order Set the max order of the readahead window size for the VMA based swap readahead algorithm. The corresponding ABI documentation is added too. Link: http://lkml.kernel.org/r/20170807054038.1843-5-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06fscache: remove unused ->now_uncached callbackJan Kara
Patch series "Ranged pagevec lookup", v2. In this series I make pagevec_lookup() update the index (to be consistent with pagevec_lookup_tag() and also as a preparation for ranged lookups), provide ranged variant of pagevec_lookup() and use it in places where it makes sense. This not only removes some common code but is also a measurable performance win for some use cases (see patch 4/10) where radix tree is sparse and searching & grabing of a page after the end of the range has measurable overhead. This patch (of 10): The callback doesn't ever get called. Remove it. Link: http://lkml.kernel.org/r/20170726114704.7626-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06mm, page_alloc: rip out ZONELIST_ORDER_ZONEMichal Hocko
Patch series "cleanup zonelists initialization", v1. This is aimed at cleaning up the zonelists initialization code we have but the primary motivation was bug report [2] which got resolved but the usage of stop_machine is just too ugly to live. Most patches are straightforward but 3 of them need a special consideration. Patch 1 removes zone ordered zonelists completely. I am CCing linux-api because this is a user visible change. As I argue in the patch description I do not think we have a strong usecase for it these days. I have kept sysctl in place and warn into the log if somebody tries to configure zone lists ordering. If somebody has a real usecase for it we can revert this patch but I do not expect anybody will actually notice runtime differences. This patch is not strictly needed for the rest but it made patch 6 easier to implement. Patch 7 removes stop_machine from build_all_zonelists without adding any special synchronization between iterators and updater which I _believe_ is acceptable as explained in the changelog. I hope I am not missing anything. Patch 8 then removes zonelists_mutex which is kind of ugly as well and not really needed AFAICS but a care should be taken when double checking my thinking. This patch (of 9): Supporting zone ordered zonelists costs us just a lot of code while the usefulness is arguable if existent at all. Mel has already made node ordering default on 64b systems. 32b systems are still using ZONELIST_ORDER_ZONE because it is considered better to fallback to a different NUMA node rather than consume precious lowmem zones. This argument is, however, weaken by the fact that the memory reclaim has been reworked to be node rather than zone oriented. This means that lowmem requests have to skip over all highmem pages on LRUs already and so zone ordering doesn't save the reclaim time much. So the only advantage of the zone ordering is under a light memory pressure when highmem requests do not ever hit into lowmem zones and the lowmem pressure doesn't need to reclaim. Considering that 32b NUMA systems are rather suboptimal already and it is generally advisable to use 64b kernel on such a HW I believe we should rather care about the code maintainability and just get rid of ZONELIST_ORDER_ZONE altogether. Keep systcl in place and warn if somebody tries to set zone ordering either from kernel command line or the sysctl. [mhocko@suse.com: reading vm.numa_zonelist_order will never terminate] Link: http://lkml.kernel.org/r/20170721143915.14161-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06zram: add config and doc file for writeback featureMinchan Kim
This patch adds document and kconfig for using of writeback feature. Link: http://lkml.kernel.org/r/1498459987-24562-10-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06dax: use common 4k zero page for dax mmap readsRoss Zwisler
When servicing mmap() reads from file holes the current DAX code allocates a page cache page of all zeroes and places the struct page pointer in the mapping->page_tree radix tree. This has three major drawbacks: 1) It consumes memory unnecessarily. For every 4k page that is read via a DAX mmap() over a hole, we allocate a new page cache page. This means that if you read 1GiB worth of pages, you end up using 1GiB of zeroed memory. This is easily visible by looking at the overall memory consumption of the system or by looking at /proc/[pid]/smaps: 7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12 /root/dax/data Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 1048576 kB Private_Dirty: 0 kB Referenced: 1048576 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB 2) It is slower than using a common zero page because each page fault has more work to do. Instead of just inserting a common zero page we have to allocate a page cache page, zero it, and then insert it. Here are the average latencies of dax_load_hole() as measured by ftrace on a random test box: Old method, using zeroed page cache pages: 3.4 us New method, using the common 4k zero page: 0.8 us This was the average latency over 1 GiB of sequential reads done by this simple fio script: [global] size=1G filename=/root/dax/data fallocate=none [io] rw=read ioengine=mmap 3) The fact that we had to check for both DAX exceptional entries and for page cache pages in the radix tree made the DAX code more complex. Solve these issues by following the lead of the DAX PMD code and using a common 4k zero page instead. As with the PMD code we will now insert a DAX exceptional entry into the radix tree instead of a struct page pointer which allows us to remove all the special casing in the DAX code. Note that we do still pretty aggressively check for regular pages in the DAX radix tree, especially where we take action based on the bits set in the page. If we ever find a regular page in our radix tree now that most likely means that someone besides DAX is inserting pages (which has happened lots of times in the past), and we want to find that out early and fail loudly. This solution also removes the extra memory consumption. Here is that same /proc/[pid]/smaps after 1GiB of reading from a hole with the new code: 7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12 /root/dax/data Size: 1048576 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB Overall system memory consumption is similarly improved. Another major change is that we remove dax_pfn_mkwrite() from our fault flow, and instead rely on the page fault itself to make the PTE dirty and writeable. The following description from the patch adding the vm_insert_mixed_mkwrite() call explains this a little more: "To be able to use the common 4k zero page in DAX we need to have our PTE fault path look more like our PMD fault path where a PTE entry can be marked as dirty and writeable as it is first inserted rather than waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call. Right now we can rely on having a dax_pfn_mkwrite() call because we can distinguish between these two cases in do_wp_page(): case 1: 4k zero page => writable DAX storage case 2: read-only DAX storage => writeable DAX storage This distinction is made by via vm_normal_page(). vm_normal_page() returns false for the common 4k zero page, though, just as it does for DAX ptes. Instead of special casing the DAX + 4k zero page case we will simplify our DAX PTE page fault sequence so that it matches our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper. We will instead use dax_iomap_fault() to handle write-protection faults. This means that insert_pfn() needs to follow the lead of insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If 'mkwrite' is set insert_pfn() will do the work that was previously done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path" Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>