summaryrefslogtreecommitdiff
path: root/kernel/power/snapshot.c
AgeCommit message (Collapse)Author
2015-11-06mm, page_alloc: distinguish between being unable to sleep, unwilling to ↵Mel Gorman
sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-07Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions"Rafael J. Wysocki
Commit 84c91b7ae07c (PM / hibernate: avoid unsafe pages in e820 reserved regions) is reported to make resume from hibernation on Lenovo x230 unreliable, so revert it. We will revisit the issue the commit in question was supposed to fix in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111 Reported-by: rhn <kebuac.rhn@porcupinefactory.org> Cc: 3.17+ <stable@vger.kernel.org> # 3.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-03PM / hibernate: exclude freed pages from allocated pages printoutWonhong Kwon
hibernate_preallocate_memory() prints out that how many pages are allocated, but it doesn't take into consideration the pages freed by free_unnecessary_pages(). Therefore, it always shows the count more than actually allocated. Signed-off-by: Wonhong Kwon <wonhong.kwon@lge.com> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23PM / hibernate: Remove unused functionRickard Strandqvist
Remove the function get_safe_write_buffer() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-03PM / Hibernate: Migrate to ktime_tTina Ruchandani
This patch migrates swsusp_show_speed and its callers to using ktime_t instead of 'struct timeval' which suffers from the y2038 problem. Changes to swsusp_show_speed: - use ktime_t for start and stop times - pass start and stop times by value Calling functions affected: - load_image - load_image_lzo - save_image - save_image_lzo - hibernate_preallocate_memory Design decisions: - use ktime_t to preserve same granularity of reporting as before - use centisecs logic as before to avoid 'div by zero' issues caused by using seconds and nanoseconds directly - use monotonic time (ktime_get()) since we only care about elapsed time. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30PM / hibernate: Iterate over set bits instead of PFNs in swsusp_free()Joerg Roedel
The existing implementation of swsusp_free iterates over all pfns in the system and checks every bit in the two memory bitmaps. This doesn't scale very well with large numbers of pfns, especially when the bitmaps are not populated very densly. Change the algorithm to iterate over the set bits in the bitmaps instead to make it scale better in large memory configurations. Also add a memory_bm_clear_current() helper function that clears the bit for the last position returned from the memory bitmap. This new version adds a !NULL check for the memory bitmaps before they are walked. Not doing so causes a kernel crash when the bitmaps are NULL. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-25Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"Rafael J. Wysocki
Revert commit 6efde38f0769 (PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()) that introduced a NULL pointer dereference during system resume from hibernation: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 PGD b39c2067 PUD b39c1067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: <irrelevant list of modules> CPU: 1 PID: 4898 Comm: s2disk Tainted: G C 3.17-rc5-amd64 #1 Debian 3.17~rc5-1~exp1 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 task: ffff88023155ea40 ti: ffff8800b3b14000 task.ti: ffff8800b3b14000 RIP: 0010:[<ffffffff810a8cc1>] [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 RSP: 0018:ffff8800b3b17ea8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800b39bab00 RCX: 0000000000000001 RDX: ffff8800b39bab10 RSI: ffff8800b39bab00 RDI: 0000000000000000 RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000 R10: ffff8800b39bab10 R11: 0000000000000246 R12: ffffea0000000000 R13: ffff880232f485a0 R14: ffff88023ac27cd8 R15: ffff880232927590 FS: 00007f406d83b700(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000b3a62000 CR4: 00000000000007e0 Stack: ffff8800b39bab00 0000000000000010 ffff880232927590 ffffffff810acb4a ffff8800b39bab00 ffffffff811a955a ffff8800b39bab10 0000000000000000 ffff88023155f098 ffffffff81a6b8c0 ffff88023155ea40 0000000000000007 Call Trace: [<ffffffff810acb4a>] ? snapshot_release+0x2a/0xb0 [<ffffffff811a955a>] ? __fput+0xca/0x1d0 [<ffffffff81080627>] ? task_work_run+0x97/0xd0 [<ffffffff81012d89>] ? do_notify_resume+0x69/0xa0 [<ffffffff8151452a>] ? int_signal+0x12/0x17 Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 48 8b 05 ba 62 9c 00 49 bc 00 00 00 00 00 ea ff ff 48 8b 3d a1 62 9c 00 55 53 <48> 8b 10 48 89 50 18 48 8b 52 20 48 c7 40 28 00 00 00 00 c7 40 RIP [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 RSP <ffff8800b3b17ea8> CR2: 0000000000000000 ---[ end trace f02be86a1ec0cccb ]--- due to forbidden_pages_map being NULL in swsusp_free(). Fixes: 6efde38f0769 "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()" Reported-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-06PM / hibernate: avoid unsafe pages in e820 reserved regionsLee, Chun-Yi
When the machine doesn't well handle the e820 persistent when hibernate resuming, then it may cause page fault when writing image to snapshot buffer: [ 17.929495] BUG: unable to handle kernel paging request at ffff880069d4f000 [ 17.933469] IP: [<ffffffff810a1cf0>] load_image_lzo+0x810/0xe40 [ 17.933469] PGD 2194067 PUD 77ffff067 PMD 2197067 PTE 0 [ 17.933469] Oops: 0002 [#1] SMP ... The ffff880069d4f000 page is in e820 reserved region of resume boot kernel: [ 0.000000] BIOS-e820: [mem 0x0000000069d4f000-0x0000000069e12fff] reserved ... [ 0.000000] PM: Registered nosave memory: [mem 0x69d4f000-0x69e12fff] So snapshot.c mark the pfn to forbidden pages map. But, this page is also in the memory bitmap in snapshot image because it's an original page used by image kernel, so it will also mark as an unsafe(free) page in prepare_image(). That means the page in e820 when resuming mark as "forbidden" and "free", it causes get_buffer() treat it as an allocated unsafe page. Then snapshot_write_next() return this page to load_image, load_image writing content to this address, but this page didn't really allocated . So, we got page fault. Although the root cause is from BIOS, I think aggressive check and significant message in kernel will better then a page fault for issue tracking, especially when serial console unavailable. This patch adds code in mark_unsafe_pages() for check does free pages in nosave region. If so, then it print message and return fault to stop whole S4 resume process: [ 8.166004] PM: Image loading progress: 0% [ 8.658717] PM: 0x6796c000 in e820 nosave region: [mem 0x6796c000-0x6796cfff] [ 8.918737] PM: Read 2511940 kbytes in 1.04 seconds (2415.32 MB/s) [ 8.926633] PM: Error -14 resuming [ 8.933534] PM: Failed to load hibernation image, recovering. Reviewed-by: Takashi Iwai <tiwai@suse.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Lee, Chun-Yi <jlee@suse.com> [rjw: Subject] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_nodeJoerg Roedel
When a memory bitmap is fully populated on a large memory machine (several TB of RAM) it can take more than a minute to walk through all bits. This causes the soft lockup detector on these machine to report warnings. Avoid this by touching the soft lockup watchdog in the memory bitmap walking code. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Remove the old memory-bitmap implementationJoerg Roedel
The radix tree implementatio is proved to work the same as the old implementation now. So the old implementation can be removed to finish the switch to the radix tree for the memory bitmaps. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()Joerg Roedel
The existing implementation of swsusp_free iterates over all pfns in the system and checks every bit in the two memory bitmaps. This doesn't scale very well with large numbers of pfns, especially when the bitmaps are not populated very densly. Change the algorithm to iterate over the set bits in the bitmaps instead to make it scale better in large memory configurations. Also add a memory_bm_clear_current() helper function that clears the bit for the last position returned from the memory bitmap. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Implement position keeping in radix treeJoerg Roedel
Add code to remember the last position that was requested in the radix tree. Use it as a cache for faster linear walking of the bitmap in the memory_bm_rtree_next_pfn() function which is also added with this patch. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Add memory_rtree_find_bit functionJoerg Roedel
Add a function to find a bit in the radix tree for a given pfn. Also add code to the memory bitmap wrapper functions to use the radix tree together with the existing memory bitmap implementation. On read accesses compare the results of both bitmaps to make sure the radix tree behaves the same way. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-29PM / Hibernate: Create a Radix-Tree to store memory bitmapJoerg Roedel
This patch adds the code to allocate and build the radix tree to store the memory bitmap. The old data structure is left in place until the radix tree implementation is finished. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-05asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/*Andi Kleen
As requested by Linus add explicit __visible to the asmlinkage users. This marks functions visible to assembler. Tree sweep for rest of tree. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-4-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-04-07kernel: use macros from compiler.h instead of __attribute__((...))Gideon Israel Dsouza
To increase compiler portability there is <linux/compiler.h> which provides convenience macros for various gcc constructs. Eg: __weak for __attribute__((weak)). I've replaced all instances of gcc attributes with the right macro in the kernel subsystem. Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-12PM / Hibernate: Spelling s/anonymouns/anonymous/Geert Uytterhoeven
Spelling fix. Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-21kernel/power/snapshot.c: use memblock apis for early memory allocationsSantosh Shilimkar
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Tony Lindgren <tony@atomide.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-19Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()
2013-11-14PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()Rafael J. Wysocki
I have received a report about the BUG_ON() in free_basic_memory_bitmaps() triggering mysteriously during an aborted s2disk hibernation attempt. The only way I can explain that is that /dev/snapshot was first opened for writing (resume mode), then closed and then opened again for reading and closed again without freezing tasks. In that case the first invocation of snapshot_open() would set the free_bitmaps flag in snapshot_state, which is a static variable. That flag wouldn't be cleared later and the second invocation of snapshot_open() would just leave it like that, so the subsequent snapshot_release() would see data->frozen set and free_basic_memory_bitmaps() would be called unnecessarily. To prevent that from happening clear data->free_bitmaps in snapshot_open() when the file is being opened for reading (hibernate mode). In addition to that, replace the BUG_ON() in free_basic_memory_bitmaps() with a WARN_ON() as the kernel can continue just fine if the condition checked by that macro occurs. Fixes: aab172891542 (PM / hibernate: Fix user space driven resume regression) Reported-by: Oliver Lorenz <olli@olorenz.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
2013-11-07PM / hibernate: Avoid overflow in hibernate_preallocate_memory()Aaron Lu
When system has a lot of highmem (e.g. 16GiB using a 32 bits kernel), the code to calculate how much memory we need to preallocate in normal zone may cause overflow. As Leon has analysed: It looks that during computing 'alloc' variable there is overflow: alloc = (3943404 - 1970542) - 1978280 = -5418 (signed) And this function goes to err_out. Fix this by avoiding that overflow. References: https://bugzilla.kernel.org/show_bug.cgi?id=60817 Reported-and-tested-by: Leon Drugi <eyak@wp.pl> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-30PM / hibernate: Fix user space driven resume regressionRafael J. Wysocki
Recent commit 8fd37a4 (PM / hibernate: Create memory bitmaps after freezing user space) broke the resume part of the user space driven hibernation (s2disk), because I forgot that the resume utility loaded the image into memory without freezing user space (it still freezes tasks after loading the image). This means that during user space driven resume we need to create the memory bitmaps at the "device open" time rather than at the "freeze tasks" time, so make that happen (that's a special case anyway, so it needs to be treated in a special way). Reported-and-tested-by: Ronald <ronald645@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11mm: use zone_end_pfn() instead of zone_start_pfn+spanned_pagesXishi Qiu
Use "zone_end_pfn()" instead of "zone->zone_start_pfn + zone->spanned_pages". Simplify the code, no functional change. [akpm@linux-foundation.org: fix build] Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ...
2013-07-03mm: use totalram_pages instead of num_physpages at runtimeJiang Liu
The global variable num_physpages is scheduled to be removed, so use totalram_pages instead of num_physpages at runtime. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-03PM / Hibernate: print physical addresses consistently with other parts of kernelBjorn Helgaas
Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. Commit 69f1d475cc did this for a similar printk in this file, but I must have missed this one. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-03-21Merge tag 'pm-for-3.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates for 3.4 from Rafael Wysocki: "Assorted extensions and fixes including: * Introduction of early/late suspend/hibernation device callbacks. * Generic PM domains extensions and fixes. * devfreq updates from Axel Lin and MyungJoo Ham. * Device PM QoS updates. * Fixes of concurrency problems with wakeup sources. * System suspend and hibernation fixes." * tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (43 commits) PM / Domains: Check domain status during hibernation restore of devices PM / devfreq: add relation of recommended frequency. PM / shmobile: Make MTU2 driver use pm_genpd_dev_always_on() PM / shmobile: Make CMT driver use pm_genpd_dev_always_on() PM / shmobile: Make TMU driver use pm_genpd_dev_always_on() PM / Domains: Introduce "always on" device flag PM / Domains: Fix hibernation restore of devices, v2 PM / Domains: Fix handling of wakeup devices during system resume sh_mmcif / PM: Use PM QoS latency constraint tmio_mmc / PM: Use PM QoS latency constraint PM / QoS: Make it possible to expose PM QoS latency constraints PM / Sleep: JBD and JBD2 missing set_freezable() PM / Domains: Fix include for PM_GENERIC_DOMAINS=n case PM / Freezer: Remove references to TIF_FREEZE in comments PM / Sleep: Add more wakeup source initialization routines PM / Hibernate: Enable usermodehelpers in hibernate() error path PM / Sleep: Make __pm_stay_awake() delete wakeup source timers PM / Sleep: Fix race conditions related to wakeup source timer function PM / Sleep: Fix possible infinite loop during wakeup source destruction PM / Hibernate: print physical addresses consistently with other parts of kernel ...
2012-03-20power: remove the second argument of k[un]map_atomic()Cong Wang
Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Cong Wang <amwang@redhat.com>
2012-02-17PM / Hibernate: print physical addresses consistently with other parts of kernelBjorn Helgaas
Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-01-19PM / Hibernate: Correct additional pages number calculationNamhyung Kim
The struct bm_block is allocated by chain_alloc(), so it'd better counting it in LINKED_PAGE_DATA_SIZE. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-01-10PM/Hibernate: do not count debug pages as savableStanislaw Gruszka
When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder > 0, we have lot of free pages that are not marked so. Snapshot code account them as savable, what cause hibernate memory preallocation failure. It is pretty hard to make hibernate allocation succeed with debug_guardpage_minorder=1. This change at least make it possible when system has relatively big amount of RAM. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-16PM / Hibernate: Include storage keys in hibernation image on s390Martin Schwidefsky
For s390 there is one additional byte associated with each page, the storage key. This byte contains the referenced and changed bits and needs to be included into the hibernation image. If the storage keys are not restored to their previous state all original pages would appear to be dirty. This can cause inconsistencies e.g. with read-only filesystems. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-06PM / Hibernate: Fix free_unnecessary_pages()Rafael J. Wysocki
There is a bug in free_unnecessary_pages() that causes it to attempt to free too many pages in some cases, which triggers the BUG_ON() in memory_bm_clear_bit() for copy_bm. Namely, if count_data_pages() is initially greater than alloc_normal, we get to_free_normal equal to 0 and "save" greater from 0. In that case, if the sum of "save" and count_highmem_pages() is greater than alloc_highmem, we subtract a positive number from to_free_normal. Hence, since to_free_normal was 0 before the subtraction and is an unsigned int, the result is converted to a huge positive number that is used as the number of pages to free. Fix this bug by checking if to_free_normal is actually greater than or equal to the number we're going to subtract from it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-and-tested-by: Matthew Garrett <mjg@redhat.com> Cc: stable@kernel.org
2011-05-17Revert "PM / Hibernate: Reduce autotuned default image size"Rafael J. Wysocki
This reverts commit bea3864fb627d110933cfb8babe048b63c4fc76e (PM / Hibernate: Reduce autotuned default image size), because users are now able to resolve the issue this commit was supposed to address in a different way (i.e. by using the new /sys/power/reserved_size interface). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17PM / Hibernate: Add sysfs knob to control size of memory for driversRafael J. Wysocki
Martin reports that on his system hibernation occasionally fails due to the lack of memory, because the radeon driver apparently allocates too much of it during the device freeze stage. It turns out that the amount of memory allocated by radeon during hibernation (and presumably during system suspend too) depends on the utilization of the GPU (e.g. hibernating while there are two KDE 4 sessions with compositing enabled causes radeon to allocate more memory than for one KDE 4 session). In principle it should be possible to use image_size to make the memory preallocation mechanism free enough memory for the radeon driver, but in practice it is not easy to guess the right value because of the way the preallocation code uses image_size. For this reason, it seems reasonable to allow users to control the amount of memory reserved for driver allocations made after the hibernate preallocation, which currently is constant and amounts to 1 MB. Introduce a new sysfs file, /sys/power/reserved_size, whose value will be used as the amount of memory to reserve for the post-preallocation reservations made by device drivers, in bytes. For backwards compatibility, set its default (and initial) value to the currently used number (1 MB). References: https://bugzilla.kernel.org/show_bug.cgi?id=34102 Reported-and-tested-by: Martin Steigerwald <Martin@Lichtvoll.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-15PM / Hibernate: Reduce autotuned default image sizeRafael J. Wysocki
The hibernate image size autotuning mechanism sets the default image size to 5/2 of the total system RAM, but it is reported that on some systems device drivers allocate substantial amounts of memory during suspend and the creation of the image fails as a result (too little memory is preallocated). Modify the autotuning mechanism to use 1/3 instead of 2/5 of RAM as the default image size, which is reported to be sufficient for the affected systems. References: https://bugzilla.kernel.org/show_bug.cgi?id=30482 Reported-and-tested-by: Martin Steigerwald <Martin@Lichtvoll.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-02-16PM / Hibernate: Return error code when alloc_image_page() failsStanislaw Gruszka
Currently we return 0 in swsusp_alloc() when alloc_image_page() fails. Fix that. Also remove unneeded "error" variable since the only useful value of error is -ENOMEM. [rjw: Fixed up the changelog and changed subject.] Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> Cc: stable@kernel.org Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-10-26use clear_page()/copy_page() in favor of memset()/memcpy() on whole pagesJan Beulich
After all that's what they are intended for. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26mm: strictly nested kmap_atomic()Peter Zijlstra
Ensure kmap_atomic() usage is strictly nested Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-17PM / Hibernate: Make default image size depend on total RAM sizeRafael J. Wysocki
The default hibernation image size is currently hard coded and euqal to 500 MB, which is not a reasonable default on many contemporary systems. Make it equal 2/5 of the total RAM size (this is slightly below the maximum, i.e. 1/2 of the total RAM size, and seems to be generally suitable). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: M. Vefa Bicakci <bicave@superonline.com>
2010-10-17PM / Hibernate: Improve comments in hibernate_preallocate_memory()Rafael J. Wysocki
One comment in hibernate_preallocate_memory() is wrong, so fix it and add one more comment to clarify the meaning of the fixed one. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-09-11Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Hibernate: Avoid hitting OOM during preallocation of memory PM QoS: Correct pr_debug() misuse and improve parameter checks PM: Prevent waiting forever on asynchronous resume after failing suspend
2010-09-11PM / Hibernate: Avoid hitting OOM during preallocation of memoryRafael J. Wysocki
There is a problem in hibernate_preallocate_memory() that it calls preallocate_image_memory() with an argument that may be greater than the total number of available non-highmem memory pages. If that's the case, the OOM condition is guaranteed to trigger, which in turn can cause significant slowdown to occur during hibernation. To avoid that, make preallocate_image_memory() adjust its argument before calling preallocate_image_pages(), so that the total number of saveable non-highem pages left is not less than the minimum size of a hibernation image. Change hibernate_preallocate_memory() to try to allocate from highmem if the number of pages allocated by preallocate_image_memory() is too low. Modify free_unnecessary_pages() to take all possible memory allocation patterns into account. Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: M. Vefa Bicakci <bicave@superonline.com>
2010-09-09swap: revert special hibernation allocationHugh Dickins
Please revert 2.6.36-rc commit d2997b1042ec150616c1963b5e5e919ffd0b0ebf "hibernation: freeze swap at hibernation". It complicated matters by adding a second swap allocation path, just for hibernation; without in any way fixing the issue that it was intended to address - page reclaim after fixing the hibernation image might free swap from a page already imaged as swapcache, letting its swap be reallocated to store a different page of the image: resulting in data corruption if the imaged page were freed as clean then swapped back in. Pages freed to si->swap_map were still in danger of being reallocated by the alternative allocation path. I guess it inadvertently fixed slow SSD swap allocation for hibernation, as reported by Nigel Cunningham: by missing out the discards that occur on the usual swap allocation path; but that was unintentional, and needs a separate fix. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09hibernation: freeze swap at hibernationKAMEZAWA Hiroyuki
When taking a memory snapshot in hibernate_snapshot(), all (directly called) memory allocations use GFP_ATOMIC. Hence swap misusage during hibernation never occurs. But from a pessimistic point of view, there is no guarantee that no page allcation has __GFP_WAIT. It is better to have a global indication "we enter hibernation, don't use swap!". This patch tries to freeze new-swap-allocation during hibernation. (All user processes are frozenm so swapin is not a concern). This way, no updates will happen to swap_map[] between hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free() is called. We can be assured that swap corruption will not occur. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-19update email addressPavel Machek
pavel@suse.cz no longer works, replace it with working address. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-05-10PM / Hibernate: Snapshot cleanupJiri Slaby
Remove support of reads with offset. This means snapshot_read/write_next now does not accept count parameter. It allows to clean up the functions and snapshot handle which no longer needs to care about offsets. /dev/snapshot handler is converted to simple_{read_from,write_to}_buffer which take care of offsets. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-02-26PM / Hibernate: Fix preallocating of memoryRafael J. Wysocki
The hibernate memory preallocation code allocates memory to push some user space data out of physical RAM, so that the hibernation image is not too large. It allocates more memory than necessary for creating the image, so it has to release some pages to make room for allocations made while suspending devices and disabling nonboot CPUs, or the system will hang due to the lack of free pages to allocate from. Unfortunately, the function used for freeing these pages, free_unnecessary_pages(), contains a bug that prevents it from doing the job on all systems without highmem. Fix this problem, which is a regression from the 2.6.30 kernel, by using the right condition for the termination of the loop in free_unnecessary_pages(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-and-tested-by: Alan Jenkins <sourcejedi.lkml@googlemail.com> Cc: stable@kernel.org
2010-02-26PM / Hibernate: Remove trailing space in messageFrans Pop
Remove a trailing space from a message in swsusp_save(). Signed-off-by: Frans Pop <elendil@planet.nl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>