summaryrefslogtreecommitdiff
path: root/mm/memory-failure.c
AgeCommit message (Collapse)Author
2009-12-16HWPOISON: detect free buddy pages explicitlyWu Fengguang
Most free pages in the buddy system have no PG_buddy set. Introduce is_free_buddy_page() for detecting them reliably. CC: Nick Piggin <npiggin@suse.de> CC: Mel Gorman <mel@linux.vnet.ibm.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: remove the free buddy page handlerWu Fengguang
The buddy page has already be handled in the very beginning. So remove redundant code. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: introduce delete_from_lru_cache()Wu Fengguang
Introduce delete_from_lru_cache() to - clear PG_active, PG_unevictable to avoid complains at unpoison time - move the isolate_lru_page() call back to the handlers instead of the entrance of __memory_failure(), this is more hwpoison filter friendly Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: comment the possible set_page_dirty() raceWu Fengguang
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: abort on failed unmapWu Fengguang
Don't try to isolate a still mapped page. Otherwise we will hit the BUG_ON(page_mapped(page)) in __remove_from_page_cache(). Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: Turn ref argument into flags argumentAndi Kleen
Now that "ref" is just a boolean turn it into a flags argument. First step is only a single flag that makes the code's intention more clear, but more may follow. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: avoid grabbing the page count multiple times during madvise injectionWu Fengguang
If page is double referenced in madvise_hwpoison() and __memory_failure(), remove_mapping() will fail because it expects page_count=2. Fix it by not grabbing extra page count in __memory_failure(). Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: return ENXIO on invalid page numberWu Fengguang
Use a different errno than the usual EIO for invalid page numbers. This is mainly for better reporting for the injector. This also avoids calling action_result() with invalid pfn. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: remove the anonymous entryWu Fengguang
(PG_swapbacked && !PG_lru) pages should not happen. Better to treat them as unknown pages. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-16HWPOISON: Be more aggressive at freeing non LRU cachesAndi Kleen
shake_page handles more types of page caches than lru_drain_all() - per cpu page allocator pages - per CPU LRU Stops early when the page became free. Used in followon patches. Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-15mm: CONFIG_MMU for PG_mlockedHugh Dickins
Remove three degrees of obfuscation, left over from when we had CONFIG_UNEVICTABLE_LRU. MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT is CONFIG_HAVE_MLOCK is CONFIG_MMU. rmap.o (and memory-failure.o) are only built when CONFIG_MMU, so don't need such conditions at all. Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK* lines from 169 defconfigs: leave those to evolve in due course. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-04tree-wide: fix assorted typos all over the placeAndré Goddard Rosa
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-10-19HWPOISON: fix invalid page count in printk outputWu Fengguang
The madvise injector already holds a reference when passing in a page to the memory-failure code. The code corrects for this additional reference for its checks, but the final printk output didn't. Fix that. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-19HWPOISON: fix oops on ksm pagesHugh Dickins
Memory failure on a KSM page currently oopses on its NULL anon_vma in page_lock_anon_vma(): that may not be much worse than the consequence of ignoring it, but it is better to be consistent with how ZERO_PAGE and hugetlb pages and other awkward cases are treated. Just skip it. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-19HWPOISON: return early on non-LRU pagesWu Fengguang
Right now we have some trouble with non atomic access to page flags when locking the page. To plug this hole for now, limit error recovery to LRU pages for now. This could be better fixed by defining a suitable protocol, but let's go this simple way for now This avoids unnecessary races with __set_page_locked() and __SetPageSlab*() and maybe more non-atomic page flag operations. This loses isolated pages which are currently in page reclaim, but these are relatively limited compared to the total memory. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> [AK: new description, bug fixes, cleanups]
2009-09-16HWPOISON: The high level memory error handler in the VM v7Andi Kleen
Add the high level memory handler that poisons pages that got corrupted by hardware (typically by a two bit flip in a DIMM or a cache) on the Linux level. The goal is to prevent everyone from accessing these pages in the future. This done at the VM level by marking a page hwpoisoned and doing the appropriate action based on the type of page it is. The code that does this is portable and lives in mm/memory-failure.c To quote the overview comment: High level machine check handler. Handles pages reported by the hardware as being corrupted usually due to a 2bit ECC memory or cache failure. This focuses on pages detected as corrupted in the background. When the current CPU tries to consume corruption the currently running process can just be killed directly instead. This implies that if the error cannot be handled for some reason it's safe to just ignore it because no corruption has been consumed yet. Instead when that happens another machine check will happen. Handles page cache pages in various states. The tricky part here is that we can access any page asynchronous to other VM users, because memory failures could happen anytime and anywhere, possibly violating some of their assumptions. This is why this code has to be extremely careful. Generally it tries to use normal locking rules, as in get the standard locks, even if that means the error handling takes potentially a long time. Some of the operations here are somewhat inefficient and have non linear algorithmic complexity, because the data structures have not been optimized for this case. This is in particular the case for the mapping from a vma to a process. Since this case is expected to be rare we hope we can get away with this. There are in principle two strategies to kill processes on poison: - just unmap the data and wait for an actual reference before killing - kill as soon as corruption is detected. Both have advantages and disadvantages and should be used in different situations. Right now both are implemented and can be switched with a new sysctl vm.memory_failure_early_kill The default is early kill. The patch does some rmap data structure walking on its own to collect processes to kill. This is unusual because normally all rmap data structure knowledge is in rmap.c only. I put it here for now to keep everything together and rmap knowledge has been seeping out anyways Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu, Nick Piggin (who did a lot of great work) and others. Cc: npiggin@suse.de Cc: riel@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>