summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2017-04-18cma: Introduce cma_for_each_areaLaura Abbott
Frameworks (e.g. Ion) may want to iterate over each possible CMA area to allow for enumeration. Introduce a function to allow a callback. Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-18cma: Store a name in the cma structureLaura Abbott
Frameworks that may want to enumerate CMA heaps (e.g. Ion) will find it useful to have an explicit name attached to each region. Store the name in each CMA structure. Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-14Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-13zsmalloc: expand class bitMinchan Kim
Now 64K page system, zsamlloc has 257 classes so 8 class bit is not enough. With that, it corrupts the system when zsmalloc stores 65536byte data(ie, index number 256) so that this patch increases class bit for simple fix for stable backport. We should clean up this mess soon. index size 0 32 1 288 .. .. 204 52256 256 65536 Fixes: 3783689a1 ("zsmalloc: introduce zspage structure") Link: http://lkml.kernel.org/r/1492042622-12074-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13thp: fix MADV_DONTNEED vs. MADV_FREE raceKirill A. Shutemov
Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem). It's critical to not clear pmd intermittently while handling MADV_FREE to avoid race with MADV_DONTNEED: CPU0: CPU1: madvise_free_huge_pmd() pmdp_huge_get_and_clear_full() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It violates MADV_DONTNEED interface and can result is userspace misbehaviour. Basically it's the same race as with numa balancing in change_huge_pmd(), but a bit simpler to mitigate: we don't need to preserve dirty/young flags here due to MADV_FREE functionality. [kirill.shutemov@linux.intel.com: Urgh... Power is special again] Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@black.fi.intel.com Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13thp: fix MADV_DONTNEED vs. numa balancing raceKirill A. Shutemov
In case prot_numa, we are under down_read(mmap_sem). It's critical to not clear pmd intermittently to avoid race with MADV_DONTNEED which is also under down_read(mmap_sem): CPU0: CPU1: change_huge_pmd(prot_numa=1) pmdp_huge_get_and_clear_notify() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established The race makes MADV_DONTNEED miss the huge pmd and don't clear it which may break userspace. Found by code analysis, never saw triggered. Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13thp: reduce indentation level in change_huge_pmd()Kirill A. Shutemov
Patch series "thp: fix few MADV_DONTNEED races" For MADV_DONTNEED to work properly with huge pages, it's critical to not clear pmd intermittently unless you hold down_write(mmap_sem). Otherwise MADV_DONTNEED can miss the THP which can lead to userspace breakage. See example of such race in commit message of patch 2/4. All these races are found by code inspection. I haven't seen them triggered. I don't think it's worth to apply them to stable@. This patch (of 4): Restructure code in preparation for a fix. Link: http://lkml.kernel.org/r/20170302151034.27829-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-13z3fold: fix page locking in z3fold_alloc()Vitaly Wool
Stress testing of the current z3fold implementation on a 8-core system revealed it was possible that a z3fold page deleted from its unbuddied list in z3fold_alloc() would be put on another unbuddied list by z3fold_free() while z3fold_alloc() is still processing it. This has been introduced with commit 5a27aa822 ("z3fold: add kref refcounting") due to the removal of special handling of a z3fold page not on any list in z3fold_free(). To fix this, the z3fold page lock should be taken in z3fold_alloc() before the pool lock is released. To avoid deadlocking, we just try to lock the page as soon as we get a hold of it, and if trylock fails, we drop this page and take the next one. Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm/mempolicy.c: fix error handling in set_mempolicy and mbind.Chris Salls
In the case that compat_get_bitmap fails we do not want to copy the bitmap to the user as it will contain uninitialized stack data and leak sensitive data. Signed-off-by: Chris Salls <salls@cs.ucsb.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm: move pcp and lru-pcp draining into single wqMichal Hocko
We currently have 2 specific WQ_RECLAIM workqueues in the mm code. vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain per cpu lru caches. This seems more than necessary because both can run on a single WQ. Both do not block on locks requiring a memory allocation nor perform any allocations themselves. We will save one rescuer thread this way. On the other hand drain_all_pages() queues work on the system wq which doesn't have rescuer and so this depend on memory allocation (when all workers are stuck allocating and new ones cannot be created). Initially we thought this would be more of a theoretical problem but Hugh Dickins has reported: : 4.11-rc has been giving me hangs after hours of swapping load. At : first they looked like memory leaks ("fork: Cannot allocate memory"); : but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh" : before looking at /proc/meminfo one time, and the stat_refresh stuck : in D state, waiting for completion of flush_work like many kworkers. : kthreadd waiting for completion of flush_work in drain_all_pages(). This worker should be using WQ_RECLAIM as well in order to guarantee a forward progress. We can reuse the same one as for lru draining and vmstat. Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Yang Li <pku.leo@gmail.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()David Rientjes
We got need_resched() warnings in swap_cgroup_swapoff() because swap_cgroup_ctrl[type].length is particularly large. Reschedule when needed. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm, thp: fix setting of defer+madvise thp defrag modeDavid Rientjes
Setting thp defrag mode of "defer+madvise" actually sets "defer" in the kernel due to the name similarity and the out-of-order way the string is checked in defrag_store(). Check the string in the correct order so that TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG is set appropriately for "defer+madvise". Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704051814420.137626@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm/page_alloc.c: fix print order in show_free_areas()Alexander Polakov
Fixes: 11fb998986a72a ("mm: move most file-based accounting to the node") Link: http://lkml.kernel.org/r/1490377730.30219.2.camel@beget.ru Signed-off-by: Alexander Polyakov <apolyakov@beget.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08mm: fix page_vma_mapped_walk() for ksm pagesHugh Dickins
Doug Smythies reports oops with KSM in this backtrace, I've been seeing the same: page_vma_mapped_walk+0xe6/0x5b0 page_referenced_one+0x91/0x1a0 rmap_walk_ksm+0x100/0x190 rmap_walk+0x4f/0x60 page_referenced+0x149/0x170 shrink_active_list+0x1c2/0x430 shrink_node_memcg+0x67a/0x7a0 shrink_node+0xe1/0x320 kswapd+0x34b/0x720 Just as observed in commit 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte() for ksm pages"), you cannot use page->index calculations on ksm pages. page_vma_mapped_walk() is relying on __vma_address(), where a ksm page can lead it off the end of the page table, and into whatever nonsense is in the next page, ending as an oops inside check_pte()'s pte_page(). KSM tells page_vma_mapped_walk() exactly where to look for the page, it does not need any page->index calculation: and that's so also for all the normal and file and anon pages - just not for THPs and their subpages. Get out early in most cases: instead of a PageKsm test, move down the earlier not-THP-page test, as suggested by Kirill. I'm also slightly worried that this loop can stray into other vmas, so added a vm_end test to prevent surprises; though I have not imagined anything worse than a very contrived case, in which a page mlocked in the next vma might be reclaimed because it is not mlocked in this vma. Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1704031104400.1118@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-07Merge branch 'for-linus' into for-4.12/blockJens Axboe
We've added a considerable amount of fixes for stalls and issues with the blk-mq scheduling in the 4.11 series since forking off the for-4.12/block branch. We need to do improvements on top of that for 4.12, so pull in the previous fixes to make our lives easier going forward. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-05mm/usercopy: Drop extra is_vmalloc_or_module() checkLaura Abbott
Previously virt_addr_valid() was insufficient to validate if virt_to_page() could be called on an address on arm64. This has since been fixed up so there is no need for the extra check. Drop it. Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-04-05memblock: add memblock_cap_memory_range()AKASHI Takahiro
Add memblock_cap_memory_range() which will remove all the memblock regions except the memory range specified in the arguments. In addition, rework is done on memblock_mem_limit_remove_map() to re-implement it using memblock_cap_memory_range(). This function, like memblock_mem_limit_remove_map(), will not remove memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed later as "device memory." See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to address the mem limit issue"). This function is used, in a succeeding patch in the series of arm64 kdump suuport, to limit the range of usable memory, or System RAM, on crash dump kernel. (Please note that "mem=" parameter is of little use for this purpose.) Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Dennis Chen <dennis.chen@arm.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-05memblock: add memblock_clear_nomap()AKASHI Takahiro
This function, with a combination of memblock_mark_nomap(), will be used in a later kdump patch for arm64 when it temporarily isolates some range of memory from the other memory blocks in order to create a specific kernel mapping at boot time. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-04usercopy: Move enum for arch_within_stack_frames()Sahara
This patch moves the arch_within_stack_frames() return value enum up in the header files so that per-architecture implementations can reuse the same return values. Signed-off-by: Sahara <keun-o.park@darkmatter.ae> Signed-off-by: James Morse <james.morse@arm.com> [kees: adjusted naming and commit log] Signed-off-by: Kees Cook <keescook@chromium.org>
2017-04-04Merge branch 'sched/core' into locking/coreThomas Gleixner
Required for the rtmutex/sched_deadline patches which depend on both branches
2017-04-03ftrace: Have init/main.c call ftrace directly to free init memorySteven Rostedt (VMware)
Relying on free_reserved_area() to call ftrace to free init memory proved to not be sufficient. The issue is that on x86, when debug_pagealloc is enabled, the init memory is not freed, but simply set as not present. Since ftrace was uninformed of this, starting function tracing still tries to update pages that are not present according to the page tables, causing ftrace to bug, as well as killing the kernel itself. Instead of relying on free_reserved_area(), have init/main.c call ftrace directly just before it frees the init memory. Then it needs to use __init_begin and __init_end to know where the init memory location is. Looking at all archs (and testing what I can), it appears that this should work for each of them. Reported-by: kernel test robot <xiaolong.ye@intel.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-03Merge tag 'v4.11-rc5' into x86/mm, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-02kernel-api.rst: fix a series of errors when parsing C filesmchehab@s-opensource.com
./lib/string.c:134: WARNING: Inline emphasis start-string without end-string. ./mm/filemap.c:522: WARNING: Inline interpreted text or phrase reference start-string without end-string. ./mm/filemap.c:1283: ERROR: Unexpected indentation. ./mm/filemap.c:3003: WARNING: Inline interpreted text or phrase reference start-string without end-string. ./mm/vmalloc.c:1544: WARNING: Inline emphasis start-string without end-string. ./mm/page_alloc.c:4245: ERROR: Unexpected indentation. ./ipc/util.c:676: ERROR: Unexpected indentation. ./drivers/pci/irq.c:35: WARNING: Block quote ends without a blank line; unexpected unindent. ./security/security.c:109: ERROR: Unexpected indentation. ./security/security.c:110: WARNING: Definition list ends without a blank line; unexpected unindent. ./block/genhd.c:275: WARNING: Inline strong start-string without end-string. ./block/genhd.c:283: WARNING: Inline strong start-string without end-string. ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string. ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string. ./ipc/util.c:477: ERROR: Unknown target name: "s". Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-04-02Merge branch 'parisc-4.11-3' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc
2017-03-31mm/hugetlb.c: don't call region_abort if region_chg failsMike Kravetz
Changes to hugetlbfs reservation maps is a two step process. The first step is a call to region_chg to determine what needs to be changed, and prepare that change. This should be followed by a call to call to region_add to commit the change, or region_abort to abort the change. The error path in hugetlb_reserve_pages called region_abort after a failed call to region_chg. As a result, the adds_in_progress counter in the reservation map is off by 1. This is caught by a VM_BUG_ON in resv_map_release when the reservation map is freed. syzkaller fuzzer (when using an injected kmalloc failure) found this bug, that resulted in the following: kernel BUG at mm/hugetlb.c:742! Call Trace: hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493 evict+0x481/0x920 fs/inode.c:553 iput_final fs/inode.c:1515 [inline] iput+0x62b/0xa20 fs/inode.c:1542 hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306 newseg+0x422/0xd30 ipc/shm.c:575 ipcget_new ipc/util.c:285 [inline] ipcget+0x21e/0x580 ipc/util.c:639 SYSC_shmget ipc/shm.c:673 [inline] SyS_shmget+0x158/0x230 ipc/shm.c:657 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742 Link: http://lkml.kernel.org/r/1490821682-23228-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31kasan: report only the first error by defaultMark Rutland
Disable kasan after the first report. There are several reasons for this: - Single bug quite often has multiple invalid memory accesses causing storm in the dmesg. - Write OOB access might corrupt metadata so the next report will print bogus alloc/free stacktraces. - Reports after the first easily could be not bugs by itself but just side effects of the first one. Given that multiple reports usually only do harm, it makes sense to disable kasan after the first one. If user wants to see all the reports, the boot-time parameter kasan_multi_shot must be used. [aryabinin@virtuozzo.com: wrote changelog and doc, added missing include] Link: http://lkml.kernel.org/r/20170323154416.30257-1-aryabinin@virtuozzo.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm: fix section name for .data..ro_after_initKees Cook
A section name for .data..ro_after_init was added by both: commit d07a980c1b8d ("s390: add proper __ro_after_init support") and commit d7c19b066dcf ("mm: kmemleak: scan .data.ro_after_init") The latter adds incorrect wrapping around the existing s390 section, and came later. I'd prefer the s390 naming, so this moves the s390-specific name up to the asm-generic/sections.h and renames the section as used by kmemleak (and in the future, kernel/extable.c). Link: http://lkml.kernel.org/r/20170327192213.GA129375@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390 parts] Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Eddie Kovsky <ewk@edkovsky.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()Naoya Horiguchi
I found the race condition which triggers the following bug when move_pages() and soft offline are called on a single hugetlb page concurrently. Soft offlining page 0x119400 at 0x700000000000 BUG: unable to handle kernel paging request at ffffea0011943820 IP: follow_huge_pmd+0x143/0x190 PGD 7ffd2067 PUD 7ffd1067 PMD 0 [61163.582052] Oops: 0000 [#1] SMP Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check] CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P OE 4.11.0-rc2-mm1+ #2 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:follow_huge_pmd+0x143/0x190 RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202 RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000 RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80 RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000 R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800 R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000 FS: 00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0 Call Trace: follow_page_mask+0x270/0x550 SYSC_move_pages+0x4ea/0x8f0 SyS_move_pages+0xe/0x10 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fc976e03949 RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949 RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827 RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004 R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650 R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000 Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0 CR2: ffffea0011943820 ---[ end trace e4f81353a2d23232 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This bug is triggered when pmd_present() returns true for non-present hugetlb, so fixing the present check in follow_huge_pmd() prevents it. Using pmd_present() to determine present/non-present for hugetlb is not correct, because pmd_present() checks multiple bits (not only _PAGE_PRESENT) for historical reason and it can misjudge hugetlb state. Fixes: e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm: workingset: fix premature shadow node shrinking with cgroupsJohannes Weiner
Commit 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware") enabled cgroup-awareness in the shadow node shrinker, but forgot to also enable cgroup-awareness in the list_lru the shadow nodes sit on. Consequently, all shadow nodes are sitting on a global (per-NUMA node) list, while the shrinker applies the limits according to the amount of cache in the cgroup its shrinking. The result is excessive pressure on the shadow nodes from cgroups that have very little cache. Enable memcg-mode on the shadow node LRUs, such that per-cgroup limits are applied to per-cgroup lists. Fixes: 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware") Link: http://lkml.kernel.org/r/20170322005320.8165-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@tarantool.org> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm: rmap: fix huge file mmap accounting in the memcg statsJohannes Weiner
Huge pages are accounted as single units in the memcg's "file_mapped" counter. Account the correct number of base pages, like we do in the corresponding node counter. Link: http://lkml.kernel.org/r/20170322005111.3156-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm: move mm_percpu_wq initialization earlierMichal Hocko
Yang Li has reported that drain_all_pages triggers a WARN_ON which means that this function is called earlier than the mm_percpu_wq is initialized on arm64 with CMA configured: WARNING: CPU: 2 PID: 1 at mm/page_alloc.c:2423 drain_all_pages+0x244/0x25c Modules linked in: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1-next-20170310-00027-g64dfbc5 #127 Hardware name: Freescale Layerscape 2088A RDB Board (DT) task: ffffffc07c4a6d00 task.stack: ffffffc07c4a8000 PC is at drain_all_pages+0x244/0x25c LR is at start_isolate_page_range+0x14c/0x1f0 [...] drain_all_pages+0x244/0x25c start_isolate_page_range+0x14c/0x1f0 alloc_contig_range+0xec/0x354 cma_alloc+0x100/0x1fc dma_alloc_from_contiguous+0x3c/0x44 atomic_pool_init+0x7c/0x208 arm64_dma_init+0x44/0x4c do_one_initcall+0x38/0x128 kernel_init_freeable+0x1a0/0x240 kernel_init+0x10/0xfc ret_from_fork+0x10/0x20 Fix this by moving the whole setup_vmstat which is an initcall right now to init_mm_internals which will be called right after the WQ subsystem is initialized. Link: http://lkml.kernel.org/r/20170315164021.28532-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Yang Li <pku.leo@gmail.com> Tested-by: Yang Li <pku.leo@gmail.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-31mm: migrate: fix remove_migration_pte() for ksm pagesNaoya Horiguchi
I found that calling page migration for ksm pages causes the following bug: page:ffffea0004d51180 count:2 mapcount:2 mapping:ffff88013c785141 index:0x913 flags: 0x57ffffc0040068(uptodate|lru|active|swapbacked) raw: 0057ffffc0040068 ffff88013c785141 0000000000000913 0000000200000001 raw: ffffea0004d5f9e0 ffffea0004d53f60 0000000000000000 ffff88007d81b800 page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) page->mem_cgroup:ffff88007d81b800 ------------[ cut here ]------------ kernel BUG at /src/linux-dev/mm/rmap.c:1086! invalid opcode: 0000 [#1] SMP Modules linked in: ppdev parport_pc virtio_balloon i2c_piix4 pcspkr parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi ata_piix 8139too libata virtio_blk 8139cp crc32c_intel mii virtio_pci virtio_ring serio_raw virtio floppy dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 3162 Comm: bash Not tainted 4.11.0-rc2-mm1+ #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:do_page_add_anon_rmap+0x1ba/0x260 RSP: 0018:ffffc90002473b30 EFLAGS: 00010282 RAX: 0000000000000021 RBX: ffffea0004d51180 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88007dc0dfe0 RBP: ffffc90002473b58 R08: 00000000fffffffe R09: 00000000000001c1 R10: 0000000000000005 R11: 00000000000001c0 R12: ffff880139ab3d80 R13: 0000000000000000 R14: 0000700000000200 R15: 0000160000000000 FS: 00007f5195f50740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd450287000 CR3: 000000007a08e000 CR4: 00000000001406f0 Call Trace: page_add_anon_rmap+0x18/0x20 remove_migration_pte+0x220/0x2c0 rmap_walk_ksm+0x143/0x220 rmap_walk+0x55/0x60 remove_migration_ptes+0x53/0x80 migrate_pages+0x8ed/0xb60 soft_offline_page+0x309/0x8d0 store_soft_offline_page+0xaf/0xf0 dev_attr_store+0x18/0x30 sysfs_kf_write+0x3a/0x50 kernfs_fop_write+0xff/0x180 __vfs_write+0x37/0x160 vfs_write+0xb2/0x1b0 SyS_write+0x55/0xc0 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f51956339e0 RSP: 002b:00007ffcfa0dffc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f51956339e0 RDX: 000000000000000c RSI: 00007f5195f53000 RDI: 0000000000000001 RBP: 00007f5195f53000 R08: 000000000000000a R09: 00007f5195f50740 R10: 000000000000000b R11: 0000000000000246 R12: 00007f5195907400 R13: 000000000000000c R14: 0000000000000001 R15: 0000000000000000 Code: fe ff ff 48 81 c2 00 02 00 00 48 89 55 d8 e8 2e c3 fd ff 48 8b 55 d8 e9 42 ff ff ff 48 c7 c6 e0 52 a1 81 48 89 df e8 46 ad fe ff <0f> 0b 48 83 e8 01 e9 7f fe ff ff 48 83 e8 01 e9 96 fe ff ff 48 RIP: do_page_add_anon_rmap+0x1ba/0x260 RSP: ffffc90002473b30 ---[ end trace a679d00f4af2df48 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception The problem is in the following lines: new = page - pvmw.page->index + linear_page_index(vma, pvmw.address); The 'new' is calculated with 'page' which is given by the caller as a destination page and some offset adjustment for thp. But this doesn't properly work for ksm pages because pvmw.page->index doesn't change for each address but linear_page_index() changes, which means that 'new' points to different pages for each addresses backed by the ksm page. As a result, we try to set totally unrelated pages as destination pages, and that causes kernel crash. This patch fixes the miscalculation and makes ksm page migration work fine. Fixes: 3fe87967c536 ("mm: convert remove_migration_pte() to use page_vma_mapped_walk()") Link: http://lkml.kernel.org/r/1489717683-29905-1-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-28new helper: uaccess_kernel()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-26lockdep: Fix per-cpu static objectsPeter Zijlstra
Since commit 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly") we try to collapse per-cpu locks into a single class by giving them all the same key. For this key we choose the canonical address of the per-cpu object, which would be the offset into the per-cpu area. This has two problems: - there is a case where we run !0 lock->key through static_obj() and expect this to pass; it doesn't for canonical pointers. - 0 is a valid canonical address. Cure both issues by redefining the canonical address as the address of the per-cpu variable on the boot CPU. Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at all, track the boot CPU in a variable. Fixes: 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Borislav Petkov <bp@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-mm@kvack.org Cc: wfg@linux.intel.com Cc: kernel test robot <fengguang.wu@intel.com> Cc: LKP <lkp@01.org> Link: http://lkml.kernel.org/r/20170320114108.kbvcsuepem45j5cr@hirez.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-24ftrace: Allow for function tracing to record init functions on boot upSteven Rostedt (VMware)
Adding a hook into free_reserve_area() that informs ftrace that boot up init text is being free, lets ftrace safely remove those init functions from its records, which keeps ftrace from trying to modify text that no longer exists. Note, this still does not allow for tracing .init text of modules, as modules require different work for freeing its init code. Link: http://lkml.kernel.org/r/1488502497.7212.24.camel@linux.intel.com Cc: linux-mm@kvack.org Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Requested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-22bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()Jan Kara
Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() as it gets called from bdi_unregister() which is not necessarily called from bdi_destroy() and thus the name is somewhat misleading. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22bdi: Do not wait for cgwbs release in bdi_unregister()Jan Kara
Currently we wait for all cgwbs to get released in cgwb_bdi_destroy() (called from bdi_unregister()). That is however unnecessary now when cgwb->bdi is a proper refcounted reference (thus bdi cannot get released before all cgwbs are released) and when cgwb_bdi_destroy() shuts down writeback directly. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()Jan Kara
Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy() which also means that writeback has been shutdown on them. Since this wait is going away, directly shutdown writeback on cgwbs from cgwb_bdi_destroy() to avoid live writeback structures after bdi_unregister() has finished. To make that safe with concurrent shutdown from cgwb_release_workfn(), we also have to make sure wb_shutdown() returns only after the bdi_writeback structure is really shutdown. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22bdi: Unify bdi->wb_list handling for root wb_writebackJan Kara
Currently root wb_writeback structure is added to bdi->wb_list in bdi_init() and never removed. That is different from all other wb_writeback structures which get added to the list when created and removed from it before wb_shutdown(). So move list addition of root bdi_writeback to bdi_register() and list removal of all wb_writeback structures to wb_shutdown(). That way a wb_writeback structure is on bdi->wb_list if and only if it can handle writeback and it will make it easier for us to handle shutdown of all wb_writeback structures in bdi_unregister(). Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22bdi: Make wb->bdi a proper referenceJan Kara
Make wb->bdi a proper refcounted reference to bdi for all bdi_writeback structures except for the one embedded inside struct backing_dev_info. That will allow us to simplify bdi unregistration. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22bdi: Mark congested->bdi as internalJan Kara
congested->bdi pointer is used only to be able to remove congested structure from bdi->cgwb_congested_tree on structure release. Moreover the pointer can become NULL when we unregister the bdi. Rename the field to __bdi and add a comment to make it more explicit this is internal stuff of memcg writeback code and people should not use the field as such use will be likely race prone. We do not bother with converting congested->bdi to a proper refcounted reference. It will be slightly ugly to special-case bdi->wb.congested to avoid effectively a cyclic reference of bdi to itself and the reference gets cleared from bdi_unregister() making it impossible to reference a freed bdi. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-21mm, swap: Remove WARN_ON_ONCE() in free_swap_slot()Huang Ying
Before commit 452b94b8c8c7 ("mm/swap: don't BUG_ON() due to uninitialized swap slot cache"), the following bug is reported, ------------[ cut here ]------------ kernel BUG at mm/swap_slots.c:270! invalid opcode: 0000 [#1] SMP CPU: 5 PID: 1745 Comm: (sd-pam) Not tainted 4.11.0-rc1-00243-g24c534bb161b #1 Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016 RIP: 0010:free_swap_slot+0xba/0xd0 Call Trace: swap_free+0x36/0x40 do_swap_page+0x360/0x6d0 __handle_mm_fault+0x880/0x1080 handle_mm_fault+0xd0/0x240 __do_page_fault+0x232/0x4d0 do_page_fault+0x20/0x70 page_fault+0x22/0x30 ---[ end trace aefc9ede53e0ab21 ]--- This is raised by the BUG_ON(!swap_slot_cache_initialized) in free_swap_slot(). This is incorrect, because even if the swap slots cache fails to be initialized, the swap should operate properly without the swap slots cache. And the use_swap_slot_cache check later in the function will protect the uninitialized swap slots cache case. In commit 452b94b8c8c7, the BUG_ON() is replaced by WARN_ON_ONCE(). In the patch, the WARN_ON_ONCE() is removed too. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-19mm/swap: don't BUG_ON() due to uninitialized swap slot cacheLinus Torvalds
This BUG_ON() triggered for me once at shutdown, and I don't see a reason for the check. The code correctly checks whether the swap slot cache is usable or not, so an uninitialized swap slot cache is not actually problematic afaik. I've temporarily just switched the BUG_ON() to a WARN_ON_ONCE(), since I'm not sure why that seemingly pointless check was there. I suspect the real fix is to just remove it entirely, but for now we'll warn about it but not bring the machine down. Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-18x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementationKirill A. Shutemov
This patch provides all required callbacks required by the generic get_user_pages_fast() code and switches x86 over - and removes the platform specific implementation. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shutemov@linux.intel.com [ Minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18mm/gup: Provide callback to check if __GUP_fast() is allowed for the rangeKirill A. Shutemov
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. On x86, get_user_pages_fast() does a couple of sanity checks to see if we can call __get_user_pages_fast() for the range. This kind of wrapping protection should be useful for the generic code too. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-7-kirill.shutemov@linux.intel.com [ Small readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18mm/gup: Implement the dev_pagemap() logic in the generic ↵Kirill A. Shutemov
get_user_pages_fast() function This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. Prepare generic GUP_fast() to handle dev_pagemap(). At the moment, it's only implemented on x86. On non-x86, the new code will be compiled out. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18mm/gup: Mark all pages PageReferenced in generic get_user_pages_fast()Kirill A. Shutemov
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. Unlike generic GUP_fast(), the x86 version makes all pages it touches referenced. It seems required for GRU and EPT. See the following commit: 8ee53820edfd ("thp: mmu_notifier_test_young") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18mm/gup: Move page table entry dereference into helper functionKirill A. Shutemov
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. On x86 PAE, page table entry is larger than sizeof(long) and we would need to provide a helper that can read the entry atomically. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18mm/gup: Move permission checks into helpersKirill A. Shutemov
This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. On x86, we would need to do additional permission checks to determine if access is allowed. Let's abstract it out into separate helpers. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18mm/gup: Drop the arch_pte_access_permitted() MMU callbackKirill A. Shutemov
The only arch that defines it to something meaningful is x86. But x86 doesn't use the generic GUP_fast() implementation -- the only place where the callback is called. Let's drop it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>