summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-04-11mm/hmm: do not differentiate between empty entry or missing directoryJérôme Glisse
There is no point in differentiating between a range for which there is not even a directory (and thus entries) and empty entry (pte_none() or pmd_none() returns true). Simply drop the distinction ie remove HMM_PFN_EMPTY flag and merge now duplicate hmm_vma_walk_hole() and hmm_vma_walk_clear() functions. Link: http://lkml.kernel.org/r/20180323005527.758-11-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm/hmm: use uint64_t for HMM pfn instead of defining hmm_pfn_t to ulongJérôme Glisse
All device driver we care about are using 64bits page table entry. In order to match this and to avoid useless define convert all HMM pfn to directly use uint64_t. It is a first step on the road to allow driver to directly use pfn value return by HMM (saving memory and CPU cycles use for conversion between the two). Link: http://lkml.kernel.org/r/20180323005527.758-9-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm/hmm: remove HMM_PFN_READ flag and ignore peculiar architectureJérôme Glisse
Only peculiar architecture allow write without read thus assume that any valid pfn do allow for read. Note we do not care for write only because it does make sense with thing like atomic compare and exchange or any other operations that allow you to get the memory value through them. Link: http://lkml.kernel.org/r/20180323005527.758-8-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm/hmm: use struct for hmm_vma_fault(), hmm_vma_get_pfns() parametersJérôme Glisse
Both hmm_vma_fault() and hmm_vma_get_pfns() were taking a hmm_range struct as parameter and were initializing that struct with others of their parameters. Have caller of those function do this as they are likely to already do and only pass this struct to both function this shorten function signature and make it easier in the future to add new parameters by simply adding them to the structure. Link: http://lkml.kernel.org/r/20180323005527.758-7-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm/hmm: HMM should have a callback before MM is destroyedRalph Campbell
hmm_mirror_register() registers a callback for when the CPU pagetable is modified. Normally, the device driver will call hmm_mirror_unregister() when the process using the device is finished. However, if the process exits uncleanly, the struct_mm can be destroyed with no warning to the device driver. Link: http://lkml.kernel.org/r/20180323005527.758-4-jglisse@redhat.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm/hmm: fix header file if/else/endif mazeJérôme Glisse
The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. Because of this after multiple include there was multiple definition of both hmm_mm_init() and hmm_mm_destroy() leading to build failure if HMM was enabled (CONFIG_HMM set). Link: http://lkml.kernel.org/r/20180323005527.758-3-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm, vmscan, tracing: use pointer to reclaim_stat struct in trace eventSteven Rostedt
The trace event trace_mm_vmscan_lru_shrink_inactive() currently has 12 parameters! Seven of them are from the reclaim_stat structure. This structure is currently local to mm/vmscan.c. By moving it to the global vmstat.h header, we can also reference it from the vmscan tracepoints. In moving it, it brings down the overhead of passing so many arguments to the trace event. In the future, we may limit the number of arguments that a trace event may pass (ideally just 6, but more realistically it may be 8). Before this patch, the code to call the trace event is this: 0f 83 aa fe ff ff jae ffffffff811e6261 <shrink_inactive_list+0x1e1> 48 8b 45 a0 mov -0x60(%rbp),%rax 45 8b 64 24 20 mov 0x20(%r12),%r12d 44 8b 6d d4 mov -0x2c(%rbp),%r13d 8b 4d d0 mov -0x30(%rbp),%ecx 44 8b 75 cc mov -0x34(%rbp),%r14d 44 8b 7d c8 mov -0x38(%rbp),%r15d 48 89 45 90 mov %rax,-0x70(%rbp) 8b 83 b8 fe ff ff mov -0x148(%rbx),%eax 8b 55 c0 mov -0x40(%rbp),%edx 8b 7d c4 mov -0x3c(%rbp),%edi 8b 75 b8 mov -0x48(%rbp),%esi 89 45 80 mov %eax,-0x80(%rbp) 65 ff 05 e4 f7 e2 7e incl %gs:0x7ee2f7e4(%rip) # 15bd0 <__preempt_count> 48 8b 05 75 5b 13 01 mov 0x1135b75(%rip),%rax # ffffffff8231bf68 <__tracepoint_mm_vmscan_lru_shrink_inactive+0x28> 48 85 c0 test %rax,%rax 74 72 je ffffffff811e646a <shrink_inactive_list+0x3ea> 48 89 c3 mov %rax,%rbx 4c 8b 10 mov (%rax),%r10 89 f8 mov %edi,%eax 48 89 85 68 ff ff ff mov %rax,-0x98(%rbp) 89 f0 mov %esi,%eax 48 89 85 60 ff ff ff mov %rax,-0xa0(%rbp) 89 c8 mov %ecx,%eax 48 89 85 78 ff ff ff mov %rax,-0x88(%rbp) 89 d0 mov %edx,%eax 48 89 85 70 ff ff ff mov %rax,-0x90(%rbp) 8b 45 8c mov -0x74(%rbp),%eax 48 8b 7b 08 mov 0x8(%rbx),%rdi 48 83 c3 18 add $0x18,%rbx 50 push %rax 41 54 push %r12 41 55 push %r13 ff b5 78 ff ff ff pushq -0x88(%rbp) 41 56 push %r14 41 57 push %r15 ff b5 70 ff ff ff pushq -0x90(%rbp) 4c 8b 8d 68 ff ff ff mov -0x98(%rbp),%r9 4c 8b 85 60 ff ff ff mov -0xa0(%rbp),%r8 48 8b 4d 98 mov -0x68(%rbp),%rcx 48 8b 55 90 mov -0x70(%rbp),%rdx 8b 75 80 mov -0x80(%rbp),%esi 41 ff d2 callq *%r10 After the patch: 0f 83 a8 fe ff ff jae ffffffff811e626d <shrink_inactive_list+0x1cd> 8b 9b b8 fe ff ff mov -0x148(%rbx),%ebx 45 8b 64 24 20 mov 0x20(%r12),%r12d 4c 8b 6d a0 mov -0x60(%rbp),%r13 65 ff 05 f5 f7 e2 7e incl %gs:0x7ee2f7f5(%rip) # 15bd0 <__preempt_count> 4c 8b 35 86 5b 13 01 mov 0x1135b86(%rip),%r14 # ffffffff8231bf68 <__tracepoint_mm_vmscan_lru_shrink_inactive+0x28> 4d 85 f6 test %r14,%r14 74 2a je ffffffff811e6411 <shrink_inactive_list+0x371> 49 8b 06 mov (%r14),%rax 8b 4d 8c mov -0x74(%rbp),%ecx 49 8b 7e 08 mov 0x8(%r14),%rdi 49 83 c6 18 add $0x18,%r14 4c 89 ea mov %r13,%rdx 45 89 e1 mov %r12d,%r9d 4c 8d 45 b8 lea -0x48(%rbp),%r8 89 de mov %ebx,%esi 51 push %rcx 48 8b 4d 98 mov -0x68(%rbp),%rcx ff d0 callq *%rax Link: http://lkml.kernel.org/r/2559d7cb-ec60-1200-2362-04fa34fd02bb@fb.com Link: http://lkml.kernel.org/r/20180322121003.4177af15@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reported-by: Alexei Starovoitov <ast@fb.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm/vmscan: don't mess with pgdat->flags in memcg reclaimAndrey Ryabinin
memcg reclaim may alter pgdat->flags based on the state of LRU lists in cgroup and its children. PGDAT_WRITEBACK may force kswapd to sleep congested_wait(), PGDAT_DIRTY may force kswapd to writeback filesystem pages. But the worst here is PGDAT_CONGESTED, since it may force all direct reclaims to stall in wait_iff_congested(). Note that only kswapd have powers to clear any of these bits. This might just never happen if cgroup limits configured that way. So all direct reclaims will stall as long as we have some congested bdi in the system. Leave all pgdat->flags manipulations to kswapd. kswapd scans the whole pgdat, only kswapd can clear pgdat->flags once node is balanced, thus it's reasonable to leave all decisions about node state to kswapd. Why only kswapd? Why not allow to global direct reclaim change these flags? It is because currently only kswapd can clear these flags. I'm less worried about the case when PGDAT_CONGESTED falsely not set, and more worried about the case when it falsely set. If direct reclaimer sets PGDAT_CONGESTED, do we have guarantee that after the congestion problem is sorted out, kswapd will be woken up and clear the flag? It seems like there is no such guarantee. E.g. direct reclaimers may eventually balance pgdat and kswapd simply won't wake up (see wakeup_kswapd()). Moving pgdat->flags manipulation to kswapd, means that cgroup2 recalim now loses its congestion throttling mechanism. Add per-cgroup congestion state and throttle cgroup2 reclaimers if memcg is in congestion state. Currently there is no need in per-cgroup PGDAT_WRITEBACK and PGDAT_DIRTY bits since they alter only kswapd behavior. The problem could be easily demonstrated by creating heavy congestion in one cgroup: echo "+memory" > /sys/fs/cgroup/cgroup.subtree_control mkdir -p /sys/fs/cgroup/congester echo 512M > /sys/fs/cgroup/congester/memory.max echo $$ > /sys/fs/cgroup/congester/cgroup.procs /* generate a lot of diry data on slow HDD */ while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done & .... while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done & and some job in another cgroup: mkdir /sys/fs/cgroup/victim echo 128M > /sys/fs/cgroup/victim/memory.max # time cat /dev/sda > /dev/null real 10m15.054s user 0m0.487s sys 1m8.505s According to the tracepoint in wait_iff_congested(), the 'cat' spent 50% of the time sleeping there. With the patch, cat don't waste time anymore: # time cat /dev/sda > /dev/null real 5m32.911s user 0m0.411s sys 0m56.664s [aryabinin@virtuozzo.com: congestion state should be per-node] Link: http://lkml.kernel.org/r/20180406135215.10057-1-aryabinin@virtuozzo.com [ayabinin@virtuozzo.com: make congestion state per-cgroup-per-node instead of just per-cgroup[ Link: http://lkml.kernel.org/r/20180406180254.8970-2-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/20180323152029.11084-5-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTESRoman Gushchin
Patch series "indirectly reclaimable memory", v2. This patchset introduces the concept of indirectly reclaimable memory and applies it to fix the issue of when a big number of dentries with external names can significantly affect the MemAvailable value. This patch (of 3): Introduce a concept of indirectly reclaimable memory and adds the corresponding memory counter and /proc/vmstat item. Indirectly reclaimable memory is any sort of memory, used by the kernel (except of reclaimable slabs), which is actually reclaimable, i.e. will be released under memory pressure. The counter is in bytes, as it's not always possible to count such objects in pages. The name contains BYTES by analogy to NR_KERNEL_STACK_KB. Link: http://lkml.kernel.org/r/20180305133743.12746-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11Merge branches 'pm-cpuidle' and 'pm-qos'Rafael J. Wysocki
* pm-cpuidle: tick-sched: avoid a maybe-uninitialized warning cpuidle: Add definition of residency to sysfs documentation time: hrtimer: Use timerqueue_iterate_next() to get to the next timer nohz: Avoid duplication of code related to got_idle_tick nohz: Gather tick_sched booleans under a common flag field cpuidle: menu: Avoid selecting shallow states with stopped tick cpuidle: menu: Refine idle state selection for running tick sched: idle: Select idle state before stopping the tick time: hrtimer: Introduce hrtimer_next_event_without() time: tick-sched: Split tick_nohz_stop_sched_tick() cpuidle: Return nohz hint from cpuidle_select() jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC sched: idle: Do not stop the tick before cpuidle_idle_call() sched: idle: Do not stop the tick upfront in the idle loop time: tick-sched: Reorganize idle tick management code * pm-qos: PM / QoS: mark expected switch fall-throughs
2018-04-10Merge remote-tracking branch ↵Benson Leung
'origin/ib-chrome-platform-cros-ec-sysfs-debugfs-for-v4.17' into working-branch-for-4.17 Merging Enric's cros-ec sysfs and debugfs fixes from immutable branch. Signed-off-by: Benson Leung <bleung@chromium.org>
2018-04-10platform/chrome: mfd/cros_ec_dev: Add sysfs entry to set keyboard wake lid angleGwendal Grignou
This adds a sysfs attribute (/sys/class/chromeos/cros_ec/kb_wake_angle) used to set and get the keyboard wake lid angle. This attribute is present only if 2 accelerometers are controlled by the EC. This patch also moves the cros_ec features check before the device is added so the features map obtained from the EC is ready on time. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Benson Leung <bleung@chromium.org>
2018-04-10platform/chrome: cros_ec_debugfs: Add PD port info to debugfsShawn Nematbakhsh
Add info useful for debugging USB-PD port state. Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Benson Leung <bleung@chromium.org>
2018-04-10NFSv4; Clean up XDR encoding of type bitmap4Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10SUNRPC: Add a helper for encoding opaque data inlineTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10SUNRPC: Add helpers for decoding opaque and string typesTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: More fine grained attribute trackingTrond Myklebust
Currently, if the NFS_INO_INVALID_ATTR flag is set, for instance by a call to nfs_post_op_update_inode_locked(), then it will not be cleared until all the attributes have been revalidated. This means, for instance, that NFSv4 writes will always force a full attribute revalidation. Track the ctime, mtime, size and change attribute separately from the other attributes so that we can have nfs_post_op_update_inode_locked() set them correctly, and later have the cache consistency bitmask be able to clear them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Convert NFS_INO_INVALID flags to unsigned longTrond Myklebust
The cache validity attribute is unsigned long, so make sure that the flags are too. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Remove the unused return_delegation() callbackTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Add a delegation return into nfs4_proc_unlink_setup()Trond Myklebust
Ensure that when we do finally delete the file, then we return the delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Move delegation recall into the NFSv4 callback for rename_setup()Trond Myklebust
Move the delegation recall out of the generic code, and into the NFSv4 specific callback. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10NFS: Move the delegation return down into nfs4_proc_remove()Trond Myklebust
Move the delegation return out of generic code and down into the NFSv4 specific unlink code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10SUNRPC: Make num_reqs a non-atomic integerChuck Lever
If recording xprt->stat.max_slots is moved into xprt_alloc_slot, then xprt->num_reqs is never manipulated outside xprt->reserve_lock. There's no longer a need for xprt->num_reqs to be atomic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10SUNRPC: Move xprt_update_rtt callsiteChuck Lever
Since commit 33849792cbcd ("xprtrdma: Detect unreachable NFS/RDMA servers more reliably"), the xprtrdma transport now has a ->timer callout. But xprtrdma does not need to compute RTT data, only UDP needs that. Move the xprt_update_rtt call into the UDP transport implementation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10xprtrdma: "Support" call-only RPCsChuck Lever
RPC-over-RDMA version 1 credit accounting relies on there being a response message for every RPC Call. This means that RPC procedures that have no reply will disrupt credit accounting, just in the same way as a retransmit would (since it is sent because no reply has arrived). Deal with the "no reply" case the same way. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10Merge tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The big ticket items are: - support for rbd "fancy" striping (myself). The striping feature bit is now fully implemented, allowing mapping v2 images with non-default striping patterns. This completes support for --image-format 2. - CephFS quota support (Luis Henriques and Zheng Yan). This set is based on the new SnapRealm code in the upcoming v13.y.z ("Mimic") release. Quota handling will be rejected on older filesystems. - memory usage improvements in CephFS (Chengguang Xu). Directory specific bits have been split out of ceph_file_info and some effort went into improving cap reservation code to avoid OOM crashes. Also included a bunch of assorted fixes all over the place from Chengguang and others" * tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits) ceph: quota: report root dir quota usage in statfs ceph: quota: add counter for snaprealms with quota ceph: quota: cache inode pointer in ceph_snap_realm ceph: fix root quota realm check ceph: don't check quota for snap inode ceph: quota: update MDS when max_bytes is approaching ceph: quota: support for ceph.quota.max_bytes ceph: quota: don't allow cross-quota renames ceph: quota: support for ceph.quota.max_files ceph: quota: add initial infrastructure to support cephfs quotas rbd: remove VLA usage rbd: fix spelling mistake: "reregisteration" -> "reregistration" ceph: rename function drop_leases() to a more descriptive name ceph: fix invalid point dereference for error case in mdsc destroy ceph: return proper bool type to caller instead of pointer ceph: optimize memory usage ceph: optimize mds session register libceph, ceph: add __init attribution to init funcitons ceph: filter out used flags when printing unused open flags ceph: don't wait on writeback when there is no more dirty pages ...
2018-04-10Merge tag 'platform-drivers-x86-v4.17-1' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver updates from Andy Shevchenko: - Dell SMBIOS driver fixed against memory leaks. - The fujitsu-laptop driver is cleaned up and now supports hotkeys for Lifebook U7x7 models. Besides that the typo introduced by one of previous clean up series has been fixed. - Specific to x86-based laptops HID device now supports KEY_ROTATE_LOCK_TOGGLE event which is emitted, for example, by Wacom MobileStudio Pro 13. - Turbo MAX 3 technology is enabled for the rest of platforms that support Hardware-P-States feature which have core priority described by ACPI CPPC table. - Mellanox on x86 gets better support of I2C bus in use including support of hotpluggable ones. - Silead touchscreen is enabled on two tablet models, i.e Yours Y8W81 and I.T.Works TW701. - From now on the second fan on Thinkpad P50 is supported. - The topstar-laptop driver is reworked to support new models, in particular Topstar U931. * tag 'platform-drivers-x86-v4.17-1' of git://git.infradead.org/linux-platform-drivers-x86: (41 commits) platform/x86: thinkpad_acpi: Add 2nd Fan Support for Thinkpad P50 platform/x86: dell-smbios: Fix memory leaks in build_tokens_sysfs() intel-hid: support KEY_ROTATE_LOCK_TOGGLE intel-hid: clean up and sort header files platform/x86: silead_dmi: Add entry for the Yours Y8W81 tablet platform/x86: fujitsu-laptop: Support Lifebook U7x7 hotkeys platform/x86: mlx-platform: Add physical bus number auto detection platform/mellanox: mlxreg-hotplug: Change input for device create routine platform/x86: mlx-platform: Add deffered bus functionality platform/x86: mlx-platform: Use define for the channel numbers platform/x86: fujitsu-laptop: Revert UNSUPPORTED_CMD back to an int platform/x86: Fix dell driver init order platform/x86: dell-smbios: Resolve dependency error on ACPI_WMI platform/x86: dell-smbios: Resolve dependency error on DCDBAS platform/x86: Allow for SMBIOS backend defaults platform/x86: dell-smbios: Link all dell-smbios-* modules together platform/x86: dell-smbios: Rename dell-smbios source to dell-smbios-base platform/x86: dell-smbios: Correct some style warnings platform/x86: wmi: Fix misuse of vsprintf extension %pULL platform/x86: intel-hid: Reset wakeup capable flag on removal ...
2018-04-10Merge tag 'dmaengine-4.17-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull dmaengine updates from Vinod Koul: "This time we have couple of new drivers along with updates to drivers: - new drivers for the DesignWare AXI DMAC and MediaTek High-Speed DMA controllers - stm32 dma and qcom bam dma driver updates - norandom test option for dmatest" * tag 'dmaengine-4.17-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (30 commits) dmaengine: stm32-dma: properly mask irq bits dmaengine: stm32-dma: fix max items per transfer dmaengine: stm32-dma: fix DMA IRQ status handling dmaengine: stm32-dma: Improve memory burst management dmaengine: stm32-dma: fix typo and reported checkpatch warnings dmaengine: stm32-dma: fix incomplete configuration in cyclic mode dmaengine: stm32-dma: threshold manages with bitfield feature dt-bindings: stm32-dma: introduce DMA features bitfield dt-bindings: rcar-dmac: Document r8a77470 support dmaengine: rcar-dmac: Fix too early/late system suspend/resume callbacks dmaengine: dw-axi-dmac: fix spelling mistake: "catched" -> "caught" dmaengine: edma: Check the memory allocation for the memcpy dma device dmaengine: at_xdmac: fix rare residue corruption dmaengine: mediatek: update MAINTAINERS entry with MediaTek DMA driver dmaengine: mediatek: Add MediaTek High-Speed DMA controller for MT7622 and MT7623 SoC dt-bindings: dmaengine: Add MediaTek High-Speed DMA controller bindings dt-bindings: Document the Synopsys DW AXI DMA bindings dmaengine: Introduce DW AXI DMAC driver dmaengine: pl330: fix a race condition in case of threaded irqs dmaengine: imx-sdma: fix pagefault when channel is disabled during interrupt ...
2018-04-10Merge tag 'rproc-v4.17' of git://github.com/andersson/remoteprocLinus Torvalds
Pull remoteproc updates from Bjorn Andersson: - add support for generating coredumps for remoteprocs using devcoredump - add the Qualcomm sysmon driver for intra-remoteproc crash handling - a number of fixes in Qualcomm and IMX drivers * tag 'rproc-v4.17' of git://github.com/andersson/remoteproc: remoteproc: fix null pointer dereference on glink only platforms soc: qcom: qmi: add CONFIG_NET dependency remoteproc: imx_rproc: Slightly simplify code in 'imx_rproc_probe()' remoteproc: imx_rproc: Re-use existing error handling path in 'imx_rproc_probe()' remoteproc: imx_rproc: Fix an error handling path in 'imx_rproc_probe()' samples: Introduce Qualcomm QMI sample client remoteproc: qcom: Introduce sysmon remoteproc: Pass type of shutdown to subdev remove remoteproc: qcom: Register segments for core dump soc: qcom: mdt-loader: Return relocation base remoteproc: Rename "load_rsc_table" to "parse_fw" remoteproc: Add remote processor coredump support remoteproc: Remove null character write of shared mem
2018-04-10Merge tag 'trace-v4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New features: - Tom Zanussi's extended histogram work. This adds the synthetic events to have histograms from multiple event data Adds triggers "onmatch" and "onmax" to call the synthetic events Several updates to the histogram code from this - Allow way to nest ring buffer calls in the same context - Allow absolute time stamps in ring buffer - Rewrite of filter code parsing based on Al Viro's suggestions - Setting of trace_clock to global if TSC is unstable (on boot) - Better OOM handling when allocating large ring buffers - Added initcall tracepoints (consolidated initcall_debug code with them) And other various fixes and clean ups" * tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits) init: Have initcall_debug still work without CONFIG_TRACEPOINTS init, tracing: Have printk come through the trace events for initcall_debug init, tracing: instrument security and console initcall trace events init, tracing: Add initcall trace events tracing: Add rcu dereference annotation for test func that touches filter->prog tracing: Add rcu dereference annotation for filter->prog tracing: Fixup logic inversion on setting trace_global_clock defaults tracing: Hide global trace clock from lockdep ring-buffer: Add set/clear_current_oom_origin() during allocations ring-buffer: Check if memory is available before allocation lockdep: Add print_irqtrace_events() to __warn vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK) tracing: Uninitialized variable in create_tracing_map_fields() tracing: Make sure variable string fields are NULL-terminated tracing: Add action comparisons when testing matching hist triggers tracing: Don't add flag strings when displaying variable references tracing: Fix display of hist trigger expressions containing timestamps ftrace: Drop a VLA in module_exists() tracing: Mention trace_clock=global when warning about unstable clocks tracing: Default to using trace_global_clock if sched_clock is unstable ...
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
2018-04-10Merge tag 'rtc-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "This contains a few series that have been in preparation for a while and that will help systems with RTCs that will fail in 2038, 2069 or 2100. Subsystem: - Add tracepoints - Rework of the RTC/nvmem API to allow drivers to discard struct nvmem_config after registration - New range API, drivers can now expose the useful range of the RTC - New offset API the core is now able to add an offset to the RTC time, modifying the supported range. - Multiple rtc_time64_to_tm fixes - Handle time_t overflow on 32 bit platforms in the core instead of letting drivers do crazy things. - remove rtc_control API New driver: - Intersil ISL12026 Drivers: - Drivers exposing the RTC non volatile memory have been converted to use nvmem - Removed useless time and date validation - Removed an indirection pattern that was a cargo cult from ancient drivers - Removed VLA usage - Fixed a possible race condition in probe functions - AB8540 support is dropped from ab8500 - pcf85363 now has alarm support" * tag 'rtc-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (128 commits) rtc: snvs: Fix usage of snvs_rtc_enable rtc: mt7622: fix module autoloading for OF platform drivers rtc: isl12022: use true and false for boolean values rtc: ab8500: Drop AB8540 support rtc: remove a warning during scripts/kernel-doc step rtc: 88pm860x: remove artificial limitation rtc: 88pm80x: remove artificial limitation rtc: st-lpc: remove artificial limitation rtc: mrst: remove artificial limitation rtc: mv: remove artificial limitation rtc: hctosys: Ensure system time doesn't overflow time_t parisc: time: stop validating rtc_time in .read_time rtc: pcf85063: fix clearing bits in pcf85063_start_clock rtc: at91sam9: Set name of regmap_config rtc: s5m: Remove VLA usage rtc: s5m: Move enum from rtc.h to rtc-s5m.c rtc: remove VLA usage rtc: Add useful timestamp definitions rtc: Add one offset seconds to expand RTC range rtc: Factor out the RTC range validation into rtc_valid_range() ...
2018-04-10backing: silence compiler warning using __printfMathieu Malaterre
__printf marker was added in commit d2cc4dde9206 ("bdi_register: add __printf verification, fix arg mismatch") for function `bdi_register` since it is useful to verify format and arguments. Apply equivalent gcc attribute to `bdi_register_va`. Remove warning triggered with W=1: mm/backing-dev.c:881:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-10blk-mq: remove blk_mq_delay_queue()Ming Lei
No driver uses this interface any more, so remove it. Cc: Stefan Haberland <sth@linux.vnet.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-10cpufreq: Drop cpufreq_table_validate_and_show()Viresh Kumar
This isn't used anymore. Remove the helper and update documentation accordingly. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-09Input: atmel_mxt_ts - remove platform data supportDmitry Torokhov
Now that there are no users of custom Atmel platform data, and everyone has switched to the generic device properties, we can remove support for the platform data. Acked-by: Nick Dyer <nick@shmanahar.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Benson Leung <bleung@chromium.org>
2018-04-09vfs: Remove the const from dir_context::actorDavid Howells
Remove the const marking from the actor function pointer in the dir_context struct. The const prevents the structure from being used as part of a kmalloc'd object as it makes the compiler require that the actor member be set at object initialisation time (or not at all), incuring something like the following error if you try and set it later: fs/afs/dir.c:556:20: error: assignment of read-only member 'actor' Marking the member const like this adds very little in the way of sanity checking as the type checking system is likely to provide sufficient - and if not, the kernel is very likely to oops repeatably in this case. Fixes: ac6614b76478 ("[readdir] constify ->actor") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - VHE optimizations - EL2 address space randomization - speculative execution mitigations ("variant 3a", aka execution past invalid privilege register access) - bugfixes and cleanups PPC: - improvements for the radix page fault handler for HV KVM on POWER9 s390: - more kvm stat counters - virtio gpu plumbing - documentation - facilities improvements x86: - support for VMware magic I/O port and pseudo-PMCs - AMD pause loop exiting - support for AMD core performance extensions - support for synchronous register access - expose nVMX capabilities to userspace - support for Hyper-V signaling via eventfd - use Enlightened VMCS when running on Hyper-V - allow userspace to disable MWAIT/HLT/PAUSE vmexits - usual roundup of optimizations and nested virtualization bugfixes Generic: - API selftest infrastructure (though the only tests are for x86 as of now)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits) kvm: x86: fix a prototype warning kvm: selftests: add sync_regs_test kvm: selftests: add API testing infrastructure kvm: x86: fix a compile warning KVM: X86: Add Force Emulation Prefix for "emulate the next instruction" KVM: X86: Introduce handle_ud() KVM: vmx: unify adjacent #ifdefs x86: kvm: hide the unused 'cpu' variable KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" kvm: Add emulation for movups/movupd KVM: VMX: raise internal error for exception during invalid protected mode state KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending KVM: x86: Fix misleading comments on handling pending exceptions KVM: x86: Rename interrupt.pending to interrupt.injected KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt x86/kvm: use Enlightened VMCS when running on Hyper-V x86/hyper-v: detect nested features x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits ...
2018-04-09Merge branch 'for-4.17/dax' into libnvdimm-for-nextDan Williams
2018-04-09Merge branch 'for-4.17/libnvdimm' into libnvdimm-for-nextDan Williams
2018-04-09Fix subtle macro variable shadowing in min_not_zero()Linus Torvalds
Commit 3c8ba0d61d04 ("kernel.h: Retain constant expression output for max()/min()") rewrote our min/max macros to be very clever, but in the meantime resurrected a variable name shadow issue that we had had previously fixed in commit 589a9785ee3a ("min/max: remove sparse warnings when they're nested"). That commit talks about the sparse warnings that this shadowing causes, which we ignored as just a minor annoyance. But it turns out that the sparse warning is the least of our problems. We actually have a real bug due to the shadowing through the interaction with "min_not_zero()", which ends up doing min(__x, __y) internally, and then the new declaration of "__x" and "__y" as new variables in __cmp_once() results in a complete mess of an expression, and "min_not_zero()" doesn't work at all. For some odd reason, this only ever caused (reported) problems on s390, even though it is a generic issue and most of the (obviously successful) testing of the problematic commit had happened on other architectures. Quoting Sebastian Ott: "What happened is that the bio build by the partition detection code was attempted to be split by the block layer because the block queue had a max_sector setting of 0. blk_queue_max_hw_sectors uses min_not_zero." So re-introduce the use of __UNIQUE_ID() to make sure that the min/max macros do not have these kinds of clashes. [ That said, __UNIQUE_ID() itself has several issues that make it less than wonderful. In particular, the "uniqueness" has a fallback on the line number, which means that it's not actually unique in more complex cases if you don't build with gcc or clang (which have working unique counters that aren't tied to line numbers). That historical broken fallback also means that we have that pointless "prefix" argument that doesn't actually make much sense _except_ for the known-broken case. Oh well. ] Fixes: 3c8ba0d61d04 ("kernel.h: Retain constant expression output for max()/min()") Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-09syscalls/core, syscalls/x86: Clean up compat syscall stub naming conventionDominik Brodowski
Tidy the naming convention for compat syscall subs. Hints which describe the purpose of the stub go in front and receive a double underscore to denote that they are generated on-the-fly by the COMPAT_SYSCALL_DEFINEx() macro. For the generic case, this means: t kernel_waitid # common C function (see kernel/exit.c) __do_compat_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) T __se_compat_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long, # casts them to unsigned long and then to # the declared type) T compat_sys_waitid # alias to __se_compat_sys_waitid() # (taking parameters as declared), to # be included in syscall table For x86, the naming is as follows: t kernel_waitid # common C function (see kernel/exit.c) __do_compat_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) t __se_compat_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long, # casts them to unsigned long and then to # the declared type) T __ia32_compat_sys_waitid # IA32_EMULATION 32-bit-ptregs -> C stub, # calls __se_compat_sys_waitid(); to be # included in syscall table T __x32_compat_sys_waitid # x32 64-bit-ptregs -> C stub, calls # __se_compat_sys_waitid(); to be included # in syscall table If only one of IA32_EMULATION and x32 is enabled, __se_compat_sys_waitid() may be inlined into the stub __{ia32,x32}_compat_sys_waitid(). Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-3-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09syscalls/core, syscalls/x86: Clean up syscall stub naming conventionDominik Brodowski
Tidy the naming convention for compat syscall subs. Hints which describe the purpose of the stub go in front and receive a double underscore to denote that they are generated on-the-fly by the SYSCALL_DEFINEx() macro. For the generic case, this means (0xffffffff prefix removed): 810f08d0 t kernel_waitid # common C function (see kernel/exit.c) <inline> __do_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) 810f1aa0 T __se_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long; # casts them to the declared type) 810f1aa0 T sys_waitid # alias to __se_sys_waitid() (taking # parameters as declared), to be included # in syscall table For x86, the naming is as follows: 810efc70 t kernel_waitid # common C function (see kernel/exit.c) <inline> __do_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) 810efd60 t __se_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long; # casts them to the declared type) 810f1140 T __ia32_sys_waitid # IA32_EMULATION 32-bit-ptregs -> C stub, # calls __se_sys_waitid(); to be included # in syscall table 810f1110 T sys_waitid # x86 64-bit-ptregs -> C stub, calls # __se_sys_waitid(); to be included in # syscall table For x86, sys_waitid() will be re-named to __x64_sys_waitid in a follow-up patch. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-2-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09cpuidle: menu: Refine idle state selection for running tickRafael J. Wysocki
If the tick isn't stopped, the target residency of the state selected by the menu governor may be greater than the actual time to the next tick and that means lost energy. To avoid that, make tick_nohz_get_sleep_length() return the current time to the next event (before stopping the tick) in addition to the estimated one via an extra pointer argument and make menu_select() use that value to refine the state selection when necessary. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-04-09sched: idle: Select idle state before stopping the tickRafael J. Wysocki
In order to address the issue with short idle duration predictions by the idle governor after the scheduler tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_stop_tick(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. Namely, tick_nohz_get_sleep_length() can be made call tick_nohz_next_event(), introduced earlier, to get the time to the next non-highres timer event. If that happens, tick_nohz_next_event() need not be called by __tick_nohz_idle_stop_tick() again. If it turns out that the scheduler tick cannot be stopped going forward or the next timer event is too close for the tick to be stopped, tick_nohz_get_sleep_length() can simply return the time to the next event currently programmed into the corresponding clock event device. In addition to knowing the return value of tick_nohz_next_event(), however, tick_nohz_get_sleep_length() needs to know the time to the next highres timer event, but with the scheduler tick timer excluded, which can be computed with the help of hrtimer_get_next_event(). That minimum of that number and the tick_nohz_next_event() return value is the total time to the next timer event with the assumption that the tick will be stopped. It can be returned to the idle governor which can use it for predicting idle duration (under the assumption that the tick will be stopped) and deciding whether or not it makes sense to stop the tick before putting the CPU into the selected idle state. With the above, the sleep_length field in struct tick_sched is not necessary any more, so drop it. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227 Reported-by: Doug Smythies <dsmythies@telus.net> Reported-by: Thomas Ilsche <thomas.ilsche@tu-dresden.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
2018-04-07Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull integrity updates from James Morris: "A mixture of bug fixes, code cleanup, and continues to close IMA-measurement, IMA-appraisal, and IMA-audit gaps. Also note the addition of a new cred_getsecid LSM hook by Matthew Garrett: For IMA purposes, we want to be able to obtain the prepared secid in the bprm structure before the credentials are committed. Add a cred_getsecid hook that makes this possible. which is used by a new CREDS_CHECK target in IMA: In ima_bprm_check(), check with both the existing process credentials and the credentials that will be committed when the new process is started. This will not change behaviour unless the system policy is extended to include CREDS_CHECK targets - BPRM_CHECK will continue to check the same credentials that it did previously" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ima: Fallback to the builtin hash algorithm ima: Add smackfs to the default appraise/measure list evm: check for remount ro in progress before writing ima: Improvements in ima_appraise_measurement() ima: Simplify ima_eventsig_init() integrity: Remove unused macro IMA_ACTION_RULE_FLAGS ima: drop vla in ima_audit_measurement() ima: Fix Kconfig to select TPM 2.0 CRB interface evm: Constify *integrity_status_msg[] evm: Move evm_hmac and evm_hash from evm_main.c to evm_crypto.c fuse: define the filesystem as untrusted ima: fail signature verification based on policy ima: clear IMA_HASH ima: re-evaluate files on privileged mounted filesystems ima: fail file signature verification on non-init mounted filesystems IMA: Support using new creds in appraisal policy security: Add a cred_getsecid hook
2018-04-07Merge branch 'next-tpm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull TPM updates from James Morris: "This release contains only bug fixes. There are no new major features added" * 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: tpm: fix intermittent failure with self tests tpm: add retry logic tpm: self test failure should not cause suspend to fail tpm2: add longer timeouts for creation commands. tpm_crb: use __le64 annotated variable for response buffer address tpm: fix buffer type in tpm_transmit_cmd tpm: tpm-interface: fix tpm_transmit/_cmd kdoc tpm: cmd_ready command can be issued only after granting locality
2018-04-07Merge branch 'i2c/for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: -I2C core now reports proper OF style module alias. I'd like to repeat the note from the commit msg here (Thanks, Javier!): NOTE: This patch may break out-of-tree drivers that were relying on this behavior, and only had an I2C device ID table even when the device was registered via OF. There are no remaining drivers in mainline that do this, but out-of-tree drivers have to be fixed and define a proper OF device ID table to have module auto-loading working. - new driver for the SynQuacer I2C controller - major refactoring of the QUP driver - the piix4 driver now uses request_muxed_region which should fix a long standing resource conflict with the sp5100_tco watchdog - a bunch of small core & driver improvements * 'i2c/for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (53 commits) i2c: add support for Socionext SynQuacer I2C controller dt-bindings: i2c: add binding for Socionext SynQuacer I2C i2c: Update i2c_trace_msg static key to modern api i2c: fix parameter of trace_i2c_result i2c: imx: avoid taking clk_prepare mutex in PM callbacks i2c: imx: use clk notifier for rate changes i2c: make i2c_check_addr_validity() static i2c: rcar: fix mask value of prohibited bit dt-bindings: i2c: document R8A77965 bindings i2c: pca-platform: drop gpio from platform data i2c: pca-platform: use device_property_read_u32 i2c: pca-platform: unconditionally use devm_gpiod_get_optional sh: sh7785lcr: add GPIO lookup table for i2c controller reset i2c: qup: reorganization of driver code to remove polling for qup v2 i2c: qup: reorganization of driver code to remove polling for qup v1 i2c: qup: send NACK for last read sub transfers i2c: qup: fix buffer overflow for multiple msg of maximum xfer len i2c: qup: change completion timeout according to transfer length i2c: qup: use the complete transfer length to choose DMA mode i2c: qup: proper error handling for i2c error in BAM mode ...
2018-04-07Merge tag 'powerpc-4.17-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for 4PB user address space on 64-bit, opt-in via mmap(). - Removal of POWER4 support, which was accidentally broken in 2016 and no one noticed, and blocked use of some modern instructions. - Workarounds so that the hypervisor can enable Transactional Memory on Power9. - A series to disable the DAWR (Data Address Watchpoint Register) on Power9. - More information displayed in the meltdown/spectre_v1/v2 sysfs files. - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome. - A big series to make the allocation of our pacas (per cpu area), kernel page tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9. And as usual many fixes, reworks and cleanups. Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun" * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits) powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep powerpc/64s: Fix POWER9 DD2.2 and above in cputable features powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo} powerpc: io.h: move iomap.h include so that it can use readq/writeq defs cxl: Fix possible deadlock when processing page faults from cxllib powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it powerpc/mm/radix: Update command line parsing for disable_radix powerpc/mm/radix: Parse disable_radix commandline correctly. powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix powerpc/mm/keys: Update documentation and remove unnecessary check powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() powerpc/powernv: Always stop secondaries before reboot/shutdown powerpc: hard disable irqs in smp_send_stop loop powerpc: use NMI IPI for smp_send_stop powerpc/powernv: Fix SMT4 forcing idle code ...
2018-04-07Merge branch 'next-general' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull general security layer updates from James Morris: - Convert security hooks from list to hlist, a nice cleanup, saving about 50% of space, from Sargun Dhillon. - Only pass the cred, not the secid, to kill_pid_info_as_cred and security_task_kill (as the secid can be determined from the cred), from Stephen Smalley. - Close a potential race in kernel_read_file(), by making the file unwritable before calling the LSM check (vs after), from Kees Cook. * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: convert security hooks to use hlist exec: Set file unwritable before LSM check usb, signal, security: only pass the cred, not the secid, to kill_pid_info_as_cred and security_task_kill