diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-06-29 17:29:11 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-06-29 17:29:11 -0700 |
commit | 65090f30ab791810a3dc840317e57df05018559c (patch) | |
tree | f417526656da37109777e89613e140ffc59228bc /Documentation | |
parent | 349a2d52ffe59b7a0c5876fa7ee9f3eaf188b830 (diff) | |
parent | 0ed950d1f28142ccd9a9453c60df87853530d778 (diff) |
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
"191 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
pagealloc, and memory-failure)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
mm,hwpoison: make get_hwpoison_page() call get_any_page()
mm,hwpoison: send SIGBUS with error virutal address
mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
docs: remove description of DISCONTIGMEM
arch, mm: remove stale mentions of DISCONIGMEM
mm: remove CONFIG_DISCONTIGMEM
m68k: remove support for DISCONTIGMEM
arc: remove support for DISCONTIGMEM
arc: update comment about HIGHMEM implementation
alpha: remove DISCONTIGMEM and NUMA
mm/page_alloc: move free_the_page
mm/page_alloc: fix counting of managed_pages
mm/page_alloc: improve memmap_pages dbg msg
mm: drop SECTION_SHIFT in code comments
mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
mm/page_alloc: scale the number of pages that are batch freed
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 6 | ||||
-rw-r--r-- | Documentation/admin-guide/lockup-watchdogs.rst | 4 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/kernel.rst | 10 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/vm.rst | 42 | ||||
-rw-r--r-- | Documentation/dev-tools/kasan.rst | 9 | ||||
-rw-r--r-- | Documentation/vm/memory-model.rst | 45 |
6 files changed, 41 insertions, 75 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fed6c303734f..2991f6e692bd 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3591,6 +3591,12 @@ off: turn off poisoning (default) on: turn on poisoning + page_reporting.page_reporting_order= + [KNL] Minimal page reporting order + Format: <integer> + Adjust the minimal page reporting order. The page + reporting is disabled when it exceeds (MAX_ORDER-1). + panic= [KNL] Kernel behaviour on panic: delay <timeout> timeout > 0: seconds before rebooting timeout = 0: wait forever diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst index 290840c160af..3e09284a8b9b 100644 --- a/Documentation/admin-guide/lockup-watchdogs.rst +++ b/Documentation/admin-guide/lockup-watchdogs.rst @@ -39,7 +39,7 @@ in principle, they should work in any architecture where these subsystems are present. A periodic hrtimer runs to generate interrupts and kick the watchdog -task. An NMI perf event is generated every "watchdog_thresh" +job. An NMI perf event is generated every "watchdog_thresh" (compile-time initialized to 10 and configurable through sysctl of the same name) seconds to check for hardlockups. If any CPU in the system does not receive any hrtimer interrupt during that time the @@ -47,7 +47,7 @@ does not receive any hrtimer interrupt during that time the generate a kernel warning or call panic, depending on the configuration. -The watchdog task is a high priority kernel thread that updates a +The watchdog job runs in a stop scheduling thread that updates a timestamp every time it is scheduled. If that timestamp is not updated for 2*watchdog_thresh seconds (the softlockup threshold) the 'softlockup detector' (coded inside the hrtimer callback function) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 10dd4b111e5c..426162009ce9 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1297,11 +1297,11 @@ This parameter can be used to control the soft lockup detector. = ================================= The soft lockup detector monitors CPUs for threads that are hogging the CPUs -without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads -from running. The mechanism depends on the CPUs ability to respond to timer -interrupts which are needed for the 'watchdog/N' threads to be woken up by -the watchdog timer function, otherwise the NMI watchdog — if enabled — can -detect a hard lockup condition. +without rescheduling voluntarily, and thus prevent the 'migration/N' threads +from running, causing the watchdog work fail to execute. The mechanism depends +on the CPUs ability to respond to timer interrupts which are needed for the +watchdog work to be queued by the watchdog timer function, otherwise the NMI +watchdog — if enabled — can detect a hard lockup condition. stack_erasing diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 586cd4b86428..8387ad0b0b83 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -64,7 +64,7 @@ Currently, these files are in /proc/sys/vm: - overcommit_ratio - page-cluster - panic_on_oom -- percpu_pagelist_fraction +- percpu_pagelist_high_fraction - stat_interval - stat_refresh - numa_stat @@ -790,22 +790,24 @@ panic_on_oom=2+kdump gives you very strong tool to investigate why oom happens. You can get snapshot. -percpu_pagelist_fraction -======================== +percpu_pagelist_high_fraction +============================= -This is the fraction of pages at most (high mark pcp->high) in each zone that -are allocated for each per cpu page list. The min value for this is 8. It -means that we don't allow more than 1/8th of pages in each zone to be -allocated in any single per_cpu_pagelist. This entry only changes the value -of hot per cpu pagelists. User can specify a number like 100 to allocate -1/100th of each zone to each per cpu page list. +This is the fraction of pages in each zone that are can be stored to +per-cpu page lists. It is an upper boundary that is divided depending +on the number of online CPUs. The min value for this is 8 which means +that we do not allow more than 1/8th of pages in each zone to be stored +on per-cpu page lists. This entry only changes the value of hot per-cpu +page lists. A user can specify a number like 100 to allocate 1/100th of +each zone between per-cpu lists. -The batch value of each per cpu pagelist is also updated as a result. It is -set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) +The batch value of each per-cpu page list remains the same regardless of +the value of the high fraction so allocation latencies are unaffected. -The initial value is zero. Kernel does not use this value at boot time to set -the high water marks for each per cpu page list. If the user writes '0' to this -sysctl, it will revert to this default behavior. +The initial value is zero. Kernel uses this value to set the high pcp->high +mark based on the low watermark for the zone and the number of local +online CPUs. If the user writes '0' to this sysctl, it will revert to +this default behavior. stat_interval @@ -936,12 +938,12 @@ allocations, THP and hugetlbfs pages. To make it sensible with respect to the watermark_scale_factor parameter, the unit is in fractions of 10,000. The default value of -15,000 on !DISCONTIGMEM configurations means that up to 150% of the high -watermark will be reclaimed in the event of a pageblock being mixed due -to fragmentation. The level of reclaim is determined by the number of -fragmentation events that occurred in the recent past. If this value is -smaller than a pageblock then a pageblocks worth of pages will be reclaimed -(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature. +15,000 means that up to 150% of the high watermark will be reclaimed in the +event of a pageblock being mixed due to fragmentation. The level of reclaim +is determined by the number of fragmentation events that occurred in the +recent past. If this value is smaller than a pageblock then a pageblocks +worth of pages will be reclaimed (e.g. 2MB on 64-bit x86). A boost factor +of 0 will disable the feature. watermark_scale_factor diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index d3f335ffc751..83ec4a556c19 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -447,11 +447,10 @@ When a test fails due to a failed ``kmalloc``:: When a test fails due to a missing KASAN report:: - # kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:629 - Expected kasan_data->report_expected == kasan_data->report_found, but - kasan_data->report_expected == 1 - kasan_data->report_found == 0 - not ok 28 - kmalloc_double_kzfree + # kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974 + KASAN failure expected in "kfree_sensitive(ptr)", but none occurred + not ok 44 - kmalloc_double_kzfree + At the end the cumulative status of all KASAN tests is printed. On success:: diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst index ce398a7dc6cd..30e8fbed6914 100644 --- a/Documentation/vm/memory-model.rst +++ b/Documentation/vm/memory-model.rst @@ -14,15 +14,11 @@ for the CPU. Then there could be several contiguous ranges at completely distinct addresses. And, don't forget about NUMA, where different memory banks are attached to different CPUs. -Linux abstracts this diversity using one of the three memory models: -FLATMEM, DISCONTIGMEM and SPARSEMEM. Each architecture defines what +Linux abstracts this diversity using one of the two memory models: +FLATMEM and SPARSEMEM. Each architecture defines what memory models it supports, what the default memory model is and whether it is possible to manually override that default. -.. note:: - At time of this writing, DISCONTIGMEM is considered deprecated, - although it is still in use by several architectures. - All the memory models track the status of physical page frames using struct page arranged in one or more arrays. @@ -63,43 +59,6 @@ straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the The `ARCH_PFN_OFFSET` defines the first page frame number for systems with physical memory starting at address different from 0. -DISCONTIGMEM -============ - -The DISCONTIGMEM model treats the physical memory as a collection of -`nodes` similarly to how Linux NUMA support does. For each node Linux -constructs an independent memory management subsystem represented by -`struct pglist_data` (or `pg_data_t` for short). Among other -things, `pg_data_t` holds the `node_mem_map` array that maps -physical pages belonging to that node. The `node_start_pfn` field of -`pg_data_t` is the number of the first page frame belonging to that -node. - -The architecture setup code should call :c:func:`free_area_init_node` for -each node in the system to initialize the `pg_data_t` object and its -`node_mem_map`. - -Every `node_mem_map` behaves exactly as FLATMEM's `mem_map` - -every physical page frame in a node has a `struct page` entry in the -`node_mem_map` array. When DISCONTIGMEM is enabled, a portion of the -`flags` field of the `struct page` encodes the node number of the -node hosting that page. - -The conversion between a PFN and the `struct page` in the -DISCONTIGMEM model became slightly more complex as it has to determine -which node hosts the physical page and which `pg_data_t` object -holds the `struct page`. - -Architectures that support DISCONTIGMEM provide :c:func:`pfn_to_nid` -to convert PFN to the node number. The opposite conversion helper -:c:func:`page_to_nid` is generic as it uses the node number encoded in -page->flags. - -Once the node number is known, the PFN can be used to index -appropriate `node_mem_map` array to access the `struct page` and -the offset of the `struct page` from the `node_mem_map` plus -`node_start_pfn` is the PFN of that page. - SPARSEMEM ========= |