summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-01net: mvpp2: make the phy optionalAntoine Tenart
There is not necessarily a PHY between the GoP and the physical port. However, the driver currently makes the "phy" property mandatory, contrary to what is stated in the device tree bindings. This patch makes the PHY optional, and aligns the PPv2 driver on its device tree documentation. However if a PHY is provided, the GoP link interrupt won't be used. With this patch switches directly connected to the serdes lanes and SFP ports on the Armada 8040-db and Armada 7040-db can be used if the link interrupt is described in the device tree. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01net: mvpp2: take advantage of the is_rgmii helperAntoine Tenart
Convert all RGMII checks to use the phy_interface_mode_is_rgmii() helper. This is a cosmetic patch. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'mlxsw-next-fixes'David S. Miller
Jiri Pirko says: ==================== mlxsw: spectrum_router: Couple of fixes Ido Schimmel (2): mlxsw: spectrum_router: Trap packets hitting anycast routes mlxsw: spectrum_router: Set abort trap in all virtual routers ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01mlxsw: spectrum_router: Set abort trap in all virtual routersIdo Schimmel
When the abort mechanism is invoked a default route directing packets to the CPU is programmed in all the virtual routers currently in use. This can result in packet loss in case a new VRF is configured. Upon abort, program the default route in all virtual routers, whether they are in use or not. The patch is directed at net-next since post-abort fixes aren't critical and packet loss due to a missing default route will be insignificant compared to packet loss caused by the CPU port policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01mlxsw: spectrum_router: Trap packets hitting anycast routesIdo Schimmel
I relied on the fact that anycast routes use the loopback device as their nexthop device to trap packets hitting them to the CPU. After commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") this is no longer the case and such routes are programmed with a forward action (note the 'offload' flag): anycast cafe:: dev enp3s0np7 proto kernel metric 0 offload pref medium This will prevent the router from locally receiving packets destined to the Subnet-Router anycast address. Fix this by specifically programming anycast routes with action trap, which results in the following output: anycast cafe:: dev enp3s0np7 proto kernel metric 0 pref medium Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01mlxsw: spectrum: Forbid linking to devices that have uppersIdo Schimmel
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the device in case a port is enslaved to a master netdev such as bridge or bond. Since the driver ignores events unrelated to its ports and their uppers, it's possible to engineer situations in which the device's data path differs from the kernel's. One example to such a situation is when a port is enslaved to a bond that is already enslaved to a bridge. When the bond was enslaved the driver ignored the event - as the bond wasn't one of its uppers - and therefore a bridge port instance isn't created in the device. Until such configurations are supported forbid them by checking that the upper device doesn't have uppers of its own. Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Nogah Frankel <nogahf@mellanox.com> Tested-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'bpf-Improve-LRU-map-lookup-performance'David S. Miller
Martin KaFai Lau says: ==================== bpf: Improve LRU map lookup performance This patchset improves the lookup performance of the LRU map. Please see individual patch for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Only set node->ref = 1 if it has not been setMartin KaFai Lau
This patch writes 'node->ref = 1' only if node->ref is 0. The number of lookups/s for a ~1M entries LRU map increased by ~30% (260097 to 343313). Other writes on 'node->ref = 0' is not changed. In those cases, the same cache line has to be changed anyway. First column: Size of the LRU hash Second column: Number of lookups/s Before: > echo "$((2**20+1)): $(./map_perf_test 1024 1 $((2**20+1)) 10000000 | awk '{print $3}')" 1048577: 260097 After: > echo "$((2**20+1)): $(./map_perf_test 1024 1 $((2**20+1)) 10000000 | awk '{print $3}')" 1048577: 343313 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Inline LRU map lookupMartin KaFai Lau
Inline the lru map lookup to save the cost in making calls to bpf_map_lookup_elem() and htab_lru_map_lookup_elem(). Different LRU hash size is tested. The benefit diminishes when the cache miss starts to dominate in the bigger LRU hash. Considering the change is simple, it is still worth to optimize. First column: Size of the LRU hash Second column: Number of lookups/s Before: > for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done 513: 1132020 1025: 1056826 2049: 1007024 4097: 853298 8193: 742723 16385: 712600 32769: 688142 65537: 677028 131073: 619437 262145: 498770 524289: 316695 1048577: 260038 After: > for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done 513: 1221851 1025: 1144695 2049: 1049902 4097: 884460 8193: 773731 16385: 729673 32769: 721989 65537: 715530 131073: 671665 262145: 516987 524289: 321125 1048577: 260048 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Add lru_hash_lookup performance testMartin KaFai Lau
Create a new case to test the LRU lookup performance. At the beginning, the LRU map is fully loaded (i.e. the number of keys is equal to map->max_entries). The lookup is done through key 0 to num_map_entries and then repeats from 0 again. This patch also creates an anonymous struct to properly name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-09-01 This should be the last ipsec-next pull request for this release cycle: 1) Support netdevice ESP trailer removal when decryption is offloaded. From Yossi Kuperman. 2) Fix overwritten return value of copy_sec_ctx(). Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01Merge branch 'bpf-Add-option-to-set-mark-and-priority-in-cgroup-sock-programs'David S. Miller
David Ahern says: ==================== bpf: Add option to set mark and priority in cgroup sock programs Add option to set mark and priority in addition to bound device for newly created sockets. Also, allow the bpf programs to use the get_current_uid_gid helper meaning socket marks, priority and device can be set based on the uid/gid of the running process. Sample programs are updated to demonstrate the new options. v3 - no changes to Patches 1 and 2 which Alexei acked in previous versions - dropped change related to recursive programs in a cgroup - updated tests per dropped patch v2 - added flag to control recursive behavior as requested by Alexei - added comment to sock_filter_func_proto regarding use of get_current_uid_gid helper - updated test programs for recursive option ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Update cgroup socket examples to use uid gid helperDavid Ahern
Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Update cgrp2 socket testsDavid Ahern
Update cgrp2 bpf sock tests to check that device, mark and priority can all be set on a socket via bpf programs attached to a cgroup. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Add option to dump socket settingsDavid Ahern
Add option to dump socket settings. Will be used in the next patch to verify bpf programs are correctly setting mark, priority and device based on the cgroup attachment for the program run. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Add detach option to test_cgrp2_sockDavid Ahern
Add option to detach programs from a cgroup. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01samples/bpf: Update sock test to allow setting mark and priorityDavid Ahern
Update sock test to set mark and priority on socket create. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Allow cgroup sock filters to use get_current_uid_gid helperDavid Ahern
Allow BPF programs run on sock create to use the get_current_uid_gid helper. IPv4 and IPv6 sockets are created in a process context so there is always a valid uid/gid Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01bpf: Add mark and priority to sock options that can be setDavid Ahern
Add socket mark and priority to fields that can be set by ebpf program when a socket is created. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31Merge tag 'cifs-fixes-for-4.13-rc7-and-stable' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Two cifs bug fixes for stable" * tag 'cifs-fixes-for-4.13-rc7-and-stable' of git://git.samba.org/sfrench/cifs-2.6: CIFS: remove endian related sparse warning CIFS: Fix maximum SMB2 header size
2017-08-31Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Unfortunately a few issues that warrant sending another pull request, even if I had hoped to avoid it. This contains: - A fix for multiqueue xen-blkback, on tear down / disconnect. - A few fixups for NVMe, including a wrong bit definition, fix for host memory buffers, and an nvme rdma page size fix" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: fix the definition of the doorbell buffer config support bit nvme-pci: use dma memory for the host memory buffer descriptors nvme-rdma: default MR page size to 4k xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect
2017-08-31Merge tag 'for-4.13/dm-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - A couple fixes for bugs introduced as part of the blk_status_t block layer changes during the 4.13 merge window - A printk throttling fix to use discrete rate limiting state for each DM log level - A stable@ fix for DM multipath that delays request requeueing to avoid CPU lockup if/when the request queue is "dying" * tag 'for-4.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm mpath: do not lock up a CPU with requeuing activity dm: fix printk() rate limiting code dm mpath: retry BLK_STS_RESOURCE errors dm: fix the second dec_pending() argument in __split_and_process_bio()
2017-08-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more fixes from Andrew Morton: "6 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: scripts/dtc: fix '%zx' warning include/linux/compiler.h: don't perform compiletime_assert with -O0 mm, madvise: ensure poisoned pages are removed from per-cpu lists mm, uprobes: fix multiple free of ->uprobes_state.xol_area kernel/kthread.c: kthread_worker: don't hog the cpu mm,page_alloc: don't call __node_reclaim() with oom_lock held.
2017-08-31Merge branch 'mmu_notifier_fixes'Linus Torvalds
Merge mmu_notifier fixes from Jérôme Glisse: "The invalidate_page callback suffered from 2 pitfalls. First it used to happen after page table lock was release and thus a new page might have been setup for the virtual address before the call to invalidate_page(). This is in a weird way fixed by commit c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()") which moved the callback under the page table lock. Which also broke several existing user of the mmu_notifier API that assumed they could sleep inside this callback. The second pitfall was invalidate_page being the only callback not taking a range of address in respect to invalidation but was giving an address and a page. Lot of the callback implementer assumed this could never be THP and thus failed to invalidate the appropriate range for THP pages. By killing this callback we unify the mmu_notifier callback API to always take a virtual address range as input. There is now two clear API (I am not mentioning the youngess API which is seldomly used): - invalidate_range_start()/end() callback (which allow you to sleep) - invalidate_range() where you can not sleep but happen right after page table update under page table lock Note that a lot of existing user feels broken in respect to range_start/ range_end. Many user only have range_start() callback but there is nothing preventing them to undo what was invalidated in their range_start() callback after it returns but before any CPU page table update take place. The code pattern use in kvm or umem odp is an example on how to properly avoid such race. In a nutshell use some kind of sequence number and active range invalidation counter to block anything that might undo what the range_start() callback did. If you do not care about keeping fully in sync with CPU page table (ie you can live with CPU page table pointing to new different page for a given virtual address) then you can take a reference on the pages inside the range_start callback and drop it in range_end or when your driver is done with those pages. Last alternative is to use invalidate_range() if you can do invalidation without sleeping as invalidate_range() callback happens under the CPU page table spinlock right after the page table is updated. The first two patches convert existing mmu_notifier_invalidate_page() calls to mmu_notifier_invalidate_range() and bracket those call with call to mmu_notifier_invalidate_range_start()/end(). The next ten patches remove existing invalidate_page() callback as it can no longer happen. Finally the last page remove the invalidate_page() callback completely so it can RIP. Changes since v1: - remove more dead code in kvm (no testing impact) - more accurate end address computation (patch 2) in page_mkclean_one and try_to_unmap_one - added tested-by/reviewed-by gotten so far" * emailed patches from Jérôme Glisse <jglisse@redhat.com>: mm/mmu_notifier: kill invalidate_page KVM: update to new mmu_notifier semantic v2 xen/gntdev: update to new mmu_notifier semantic sgi-gru: update to new mmu_notifier semantic misc/mic/scif: update to new mmu_notifier semantic iommu/intel: update to new mmu_notifier semantic iommu/amd: update to new mmu_notifier semantic IB/hfi1: update to new mmu_notifier semantic IB/umem: update to new mmu_notifier semantic drm/amdgpu: update to new mmu_notifier semantic powerpc/powernv: update to new mmu_notifier semantic mm/rmap: update to new mmu_notifier semantic v2 dax: update to new mmu_notifier semantic
2017-08-31jfs should use MAX_LFS_FILESIZE when calculating s_maxbytesDave Kleikamp
jfs had previously avoided the use of MAX_LFS_FILESIZE because it hadn't accounted for the whole 32-bit index range on 32-bit systems. That has been fixed by commit 0cc3b0ec23ce ("Clarify (and fix) MAX_LFS_FILESIZE macros"), so we can simplify the code now. Suggested by Andreas Dilger. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: jfs-discussion@lists.sourceforge.net Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31scripts/dtc: fix '%zx' warningRussell King
dtc uses an incorrect format specifier for printing a uint64_t value. uint64_t may be either 'unsigned long' or 'unsigned long long' depending on the host architecture. Fix this by using %llx and casting to unsigned long long, which ensures that we always have a wide enough variable to print 64 bits of hex. HOSTCC scripts/dtc/checks.o scripts/dtc/checks.c: In function 'check_simple_bus_reg': scripts/dtc/checks.c:876:2: warning: format '%zx' expects argument of type 'size_t', but argument 4 has type 'uint64_t' [-Wformat=] snprintf(unit_addr, sizeof(unit_addr), "%zx", reg); ^ scripts/dtc/checks.c:876:2: warning: format '%zx' expects argument of type 'size_t', but argument 4 has type 'uint64_t' [-Wformat=] Link: http://lkml.kernel.org/r/20170829222034.GJ20805@n2100.armlinux.org.uk Fixes: 828d4cdd012c ("dtc: check.c fix compile error") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31include/linux/compiler.h: don't perform compiletime_assert with -O0Joe Stringer
Commit c7acec713d14 ("kernel.h: handle pointers to arrays better in container_of()") made use of __compiletime_assert() from container_of() thus increasing the usage of this macro, allowing developers to notice type conflicts in usage of container_of() at compile time. However, the implementation of __compiletime_assert relies on compiler optimizations to report an error. This means that if a developer uses "-O0" with any code that performs container_of(), the compiler will always report an error regardless of whether there is an actual problem in the code. This patch disables compile_time_assert when optimizations are disabled to allow such code to compile with CFLAGS="-O0". Example compilation failure: ./include/linux/compiler.h:547:38: error: call to `__compiletime_assert_94' declared with attribute error: pointer type mismatch in container_of() _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ ./include/linux/compiler.h:530:4: note: in definition of macro `__compiletime_assert' prefix ## suffix(); \ ^~~~~~ ./include/linux/compiler.h:547:2: note: in expansion of macro `_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ ./include/linux/build_bug.h:46:37: note: in expansion of macro `compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~~~~~~~~~~~~~~~~~ ./include/linux/kernel.h:860:2: note: in expansion of macro `BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: use do{}while(0), per Michal] Link: http://lkml.kernel.org/r/20170829230114.11662-1-joe@ovn.org Fixes: c7acec713d14c6c ("kernel.h: handle pointers to arrays better in container_of()") Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31mm, madvise: ensure poisoned pages are removed from per-cpu listsMel Gorman
Wendy Wang reported off-list that a RAS HWPOISON-SOFT test case failed and bisected it to the commit 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP"). The problem is that a page that was poisoned with madvise() is reused. The commit removed a check that would trigger if DEBUG_VM was enabled but re-enabling the check only fixes the problem as a side-effect by printing a bad_page warning and recovering. The root of the problem is that an madvise() can leave a poisoned page on the per-cpu list. This patch drains all per-cpu lists after pages are poisoned so that they will not be reused. Wendy reports that the test case in question passes with this patch applied. While this could be done in a targeted fashion, it is over-complicated for such a rare operation. Link: http://lkml.kernel.org/r/20170828133414.7qro57jbepdcyz5x@techsingularity.net Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Wang, Wendy <wendy.wang@intel.com> Tested-by: Wang, Wendy <wendy.wang@intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Hansen, Dave" <dave.hansen@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31mm, uprobes: fix multiple free of ->uprobes_state.xol_areaEric Biggers
Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable") made it possible to kill a forking task while it is waiting to acquire its ->mmap_sem for write, in dup_mmap(). However, it was overlooked that this introduced an new error path before the new mm_struct's ->uprobes_state.xol_area has been set to NULL after being copied from the old mm_struct by the memcpy in dup_mm(). For a task that has previously hit a uprobe tracepoint, this resulted in the 'struct xol_area' being freed multiple times if the task was killed at just the right time while forking. Fix it by setting ->uprobes_state.xol_area to NULL in mm_init() rather than in uprobe_dup_mmap(). With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C program given by commit 2b7e8665b4ff ("fork: fix incorrect fput of ->exe_file causing use-after-free"), provided that a uprobe tracepoint has been set on the fork_thread() function. For example: $ gcc reproducer.c -o reproducer -lpthread $ nm reproducer | grep fork_thread 0000000000400719 t fork_thread $ echo "p $PWD/reproducer:0x719" > /sys/kernel/debug/tracing/uprobe_events $ echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable $ ./reproducer Here is the use-after-free reported by KASAN: BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200 Read of size 8 at addr ffff8800320a8b88 by task reproducer/198 CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f3fb5 #255 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014 Call Trace: dump_stack+0xdb/0x185 print_address_description+0x7e/0x290 kasan_report+0x23b/0x350 __asan_report_load8_noabort+0x19/0x20 uprobe_clear_state+0x1c4/0x200 mmput+0xd6/0x360 do_exit+0x740/0x1670 do_group_exit+0x13f/0x380 get_signal+0x597/0x17d0 do_signal+0x99/0x1df0 exit_to_usermode_loop+0x166/0x1e0 syscall_return_slowpath+0x258/0x2c0 entry_SYSCALL_64_fastpath+0xbc/0xbe ... Allocated by task 199: save_stack_trace+0x1b/0x20 kasan_kmalloc+0xfc/0x180 kmem_cache_alloc_trace+0xf3/0x330 __create_xol_area+0x10f/0x780 uprobe_notify_resume+0x1674/0x2210 exit_to_usermode_loop+0x150/0x1e0 prepare_exit_to_usermode+0x14b/0x180 retint_user+0x8/0x20 Freed by task 199: save_stack_trace+0x1b/0x20 kasan_slab_free+0xa8/0x1a0 kfree+0xba/0x210 uprobe_clear_state+0x151/0x200 mmput+0xd6/0x360 copy_process.part.8+0x605f/0x65d0 _do_fork+0x1a5/0xbd0 SyS_clone+0x19/0x20 do_syscall_64+0x22f/0x660 return_from_SYSCALL_64+0x0/0x7a Note: without KASAN, you may instead see a "Bad page state" message, or simply a general protection fault. Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [4.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31kernel/kthread.c: kthread_worker: don't hog the cpuShaohua Li
If the worker thread continues getting work, it will hog the cpu and rcu stall complains. Make it a good citizen. This is triggered in a loop block device test. Link: http://lkml.kernel.org/r/5de0a179b3184e1a2183fc503448b0269f24d75b.1503697127.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31mm,page_alloc: don't call __node_reclaim() with oom_lock held.Tetsuo Handa
We are doing a last second memory allocation attempt before calling out_of_memory(). But since slab shrinker functions might indirectly wait for other thread's __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocations via sleeping locks, calling slab shrinker functions from node_reclaim() from get_page_from_freelist() with oom_lock held has possibility of deadlock. Therefore, make sure that last second memory allocation attempt does not call slab shrinker functions. Link: http://lkml.kernel.org/r/1503577106-9196-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31mm/mmu_notifier: kill invalidate_pageJérôme Glisse
The invalidate_page callback suffered from two pitfalls. First it used to happen after the page table lock was release and thus a new page might have setup before the call to invalidate_page() happened. This is in a weird way fixed by commit c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()") that moved the callback under the page table lock but this also broke several existing users of the mmu_notifier API that assumed they could sleep inside this callback. The second pitfall was invalidate_page() being the only callback not taking a range of address in respect to invalidation but was giving an address and a page. Lots of the callback implementers assumed this could never be THP and thus failed to invalidate the appropriate range for THP. By killing this callback we unify the mmu_notifier callback API to always take a virtual address range as input. Finally this also simplifies the end user life as there is now two clear choices: - invalidate_range_start()/end() callback (which allow you to sleep) - invalidate_range() where you can not sleep but happen right after page table update under page table lock Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Bernhard Held <berny156@gmx.de> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: axie <axie@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31KVM: update to new mmu_notifier semantic v2Jérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Changed since v1 (Linus Torvalds) - remove now useless kvm_arch_mmu_notifier_invalidate_page() Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Mike Galbraith <efault@gmx.de> Tested-by: Adam Borowski <kilobyte@angband.pl> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31xen/gntdev: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: xen-devel@lists.xenproject.org (moderated for non-subscribers) Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31sgi-gru: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31misc/mic/scif: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31iommu/intel: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31iommu/amd: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31IB/hfi1: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: linux-rdma@vger.kernel.org Cc: Dean Luick <dean.luick@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31IB/umem: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Leon Romanovsky <leonro@mellanox.com> Cc: linux-rdma@vger.kernel.org Cc: Artemy Kovalyov <artemyko@mellanox.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31drm/amdgpu: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31powerpc/powernv: update to new mmu_notifier semanticJérôme Glisse
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and now are bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Alistair Popple <alistair@popple.id.au> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31mm/rmap: update to new mmu_notifier semantic v2Jérôme Glisse
Replace all mmu_notifier_invalidate_page() calls by *_invalidate_range() and make sure it is bracketed by calls to *_invalidate_range_start()/end(). Note that because we can not presume the pmd value or pte value we have to assume the worst and unconditionaly report an invalidation as happening. Changed since v2: - try_to_unmap_one() only one call to mmu_notifier_invalidate_range() - compute end with PAGE_SIZE << compound_order(page) - fix PageHuge() case in try_to_unmap_one() Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Bernhard Held <berny156@gmx.de> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: axie <axie@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31dax: update to new mmu_notifier semanticJérôme Glisse
Replace all mmu_notifier_invalidate_page() calls by *_invalidate_range() and make sure it is bracketed by calls to *_invalidate_range_start()/end(). Note that because we can not presume the pmd value or pte value we have to assume the worst and unconditionaly report an invalidation as happening. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Bernhard Held <berny156@gmx.de> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: axie <axie@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-01ceph: fix readpage from fscacheYan, Zheng
ceph_readpage() unlocks page prematurely prematurely in the case that page is reading from fscache. Caller of readpage expects that page is uptodate when it get unlocked. So page shoule get locked by completion callback of fscache_read_or_alloc_pages() Cc: stable@vger.kernel.org # 4.1+, needs backporting for < 4.7 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-08-31Merge branch 'mlxsw-Add-IPv6-host-dpipe-table'David S. Miller
Jiri Pirko says: ==================== mlxsw: Add IPv6 host dpipe table This patchset adds IPv6 host dpipe table support. This will provide the ability to observe the hardware offloaded IPv6 neighbors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31mlxsw: spectrum_dpipe: Add support for controlling IPv6 neighbor countersArkadi Sharshevsky
Add support for controlling IPv6 neighbor counters via dpipe. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31mlxsw: spectrum_router: Add support for setting counters on IPv6 neighborsArkadi Sharshevsky
Add support for setting counters on IPv6 neighbors based on dpipe's host6 table counter status. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31mlxsw: spectrum_dpipe: Add support for IPv6 host table dumpArkadi Sharshevsky
Add support for IPv6 host table dump. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31mlxsw: spectrum_dpipe: Make host entry fill handler more genericArkadi Sharshevsky
Change the host entry filler helper to be applicable for both IPv4/6 addresses. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>