Age | Commit message (Collapse) | Author |
|
If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in interrupt context, so we queue this up similar
to how we handle memory failure scenarios.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
online_pages() is called from memory_block_action() when a user requests
to online a memory block via sysfs. This function needs to return a
proper error value in case of error.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
min_free_kbytes is updated during memory hotplug (by
init_per_zone_wmark_min) currently which is right thing to do in most
cases but this could be unexpected if admin increased the value to
prevent from allocation failures and the new min_free_kbytes would be
decreased as a result of memory hotadd.
This patch saves the user defined value and allows updating
min_free_kbytes only if it is higher than the saved one.
A warning is printed when the new value is ignored.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Now memcg has the same life cycle with its corresponding cgroup, and a
cgroup is freed via RCU and then mem_cgroup_css_free() will be called in
a work function, so we can simply call __mem_cgroup_free() in
mem_cgroup_css_free().
This actually reverts commit 59927fb984d ("memcg: free mem_cgroup by RCU
to fix oops").
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Now memcg has the same life cycle as its corresponding cgroup. Kill the
useless refcnt.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The cgroup core guarantees it's always safe to access the parent.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use css_get/put instead of mem_cgroup_get/put. A simple replacement
will do.
The historical reason that memcg has its own refcnt instead of always
using css_get/put, is that cgroup couldn't be removed if there're still
css refs, so css refs can't be used as long-lived reference. The
situation has changed so that rmdir a cgroup will succeed regardless css
refs, but won't be freed until css refs goes down to 0.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use css_get/put instead of mem_cgroup_get/put.
We can't do a simple replacement, because here mem_cgroup_put() is
called during mem_cgroup_css_free(), while mem_cgroup_css_free() won't
be called until css refcnt goes down to 0.
Instead we increment css refcnt in mem_cgroup_css_offline(), and then
check if there's still kmem charges. If not, css refcnt will be
decremented immediately, otherwise the refcnt will be released after the
last kmem allocation is uncahred.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use css_get()/css_put() instead of mem_cgroup_get()/mem_cgroup_put().
There are two things being done in the current code:
First, we acquired a css_ref to make sure that the underlying cgroup
would not go away. That is a short lived reference, and it is put as
soon as the cache is created.
At this point, we acquire a long-lived per-cache memcg reference count
to guarantee that the memcg will still be alive.
so it is:
enqueue: css_get
create : memcg_get, css_put
destroy: memcg_put
So we only need to get rid of the memcg_get, change the memcg_put to
css_put, and get rid of the now extra css_put.
(This changelog is mostly written by Glauber)
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use css_get/css_put instead of mem_cgroup_get/put.
Note, if at the same time someone is moving @current to a different
cgroup and removing the old cgroup, css_tryget() may return false, and
sock->sk_cgrp won't be initialized, which is fine.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem fails.
This is not correct because only memcg_propagate_kmem takes an
additional reference while mem_cgroup_sockets_init is allowed to fail as
well (although no current implementation fails) but it doesn't take any
reference. This all suggests that it should be memcg_propagate_kmem
that should clean up after itself so this patch moves mem_cgroup_put
over there.
Unfortunately this is not that easy (as pointed out by Li Zefan) because
memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it is
marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
memcg_propagate_kmem fails so the additional reference is dropped in
that case in kmem_cgroup_destroy which means that the reference would be
dropped two times.
The easiest way then would be to simply remove mem_cgrroup_put from
mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
thing.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.8]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit e4715f01be697a.
mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops
an additional reference from all parents so the additional
mem_cgrroup_put(parent) potentially causes use-after-free.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It is counterintuitive at best that mmap'ing a hugetlbfs file with
MAP_HUGETLB fails, while mmap'ing it without will a) succeed and b)
return huge pages.
v2: use is_file_hugepages(), as suggested by Jianguo
Signed-off-by: Joern Engel <joern@logfs.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ZONE_WRITEBACK
After the patch "mm: vmscan: Flatten kswapd priority loop" was merged
the scanning priority of kswapd changed.
The priority now rises until it is scanning enough pages to meet the
high watermark. shrink_inactive_list sets ZONE_WRITEBACK if a number of
pages were encountered under writeback but this value is scaled based on
the priority. As kswapd frequently scans with a higher priority now it
is relatively easy to set ZONE_WRITEBACK. This patch removes the
scaling and treates writeback pages similar to how it treats unqueued
dirty pages and congested pages. The user-visible effect should be that
kswapd will writeback fewer pages from reclaim context.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Direct reclaim is not aborting to allow compaction to go ahead properly.
do_try_to_free_pages is told to abort reclaim which is happily ignores
and instead increases priority instead until it reaches 0 and starts
shrinking file/anon equally. This patch corrects the situation by
aborting reclaim when requested instead of raising priority.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove one redundant "nid" in the comment.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When searching a vmap area in the vmalloc space, we use (addr + size -
1) to check if the value is less than addr, which is an overflow. But
we assign (addr + size) to vmap_area->va_end.
So if we come across the below case:
(addr + size - 1) : not overflow
(addr + size) : overflow
we will assign an overflow value (e.g 0) to vmap_area->va_end, And this
will trigger BUG in __insert_vmap_area, causing system panic.
So using (addr + size) to check the overflow should be the correct
behaviour, not (addr + size - 1).
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Reported-by: Ghennadi Procopciuc <unix140@gmail.com>
Tested-by: Daniel Baluta <dbaluta@ixiacom.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
These VM_<READfoo> macros aren't used very often and three of them
aren't used at all.
Expand the ones that are used in-place, and remove all the now unused
#define VM_<foo> macros.
VM_READHINTMASK, VM_NormalReadHint and VM_ClearReadHint were added just
before 2.4 and appears have never been used.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With CONFIG_MEMORY_HOTREMOVE unset, there is a compile warning:
mm/sparse.c:755: warning: `clear_hwpoisoned_pages' defined but not used
And Bisecting it ended up pointing to 4edd7ceff ("mm, hotplug: avoid
compiling memory hotremove functions when disabled").
This is because the commit above put sparse_remove_one_section() within
the protection of CONFIG_MEMORY_HOTREMOVE but the only user of
clear_hwpoisoned_pages() is sparse_remove_one_section(), and it is not
within the protection of CONFIG_MEMORY_HOTREMOVE.
So put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE should fix
the warning.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This function is nowhere used, and it has a confusing name with put_page
in mm/swap.c. So better to remove it.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
vfree() only needs schedule_work(&p->wq) if p->list was empty, otherwise
vfree_deferred->wq is already pending or it is running and didn't do
llist_del_all() yet.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In __rmqueue_fallback(), current_order loops down from MAX_ORDER - 1 to
the order passed. MAX_ORDER is typically 11 and pageblock_order is
typically 9 on x86. Integer division truncates, so pageblock_order / 2
is 4. For the first eight iterations, it's guaranteed that
current_order >= pageblock_order / 2 if it even gets that far!
So just remove the unlikely(), it's completely bogus.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The callers of build_zonelists_node always pass MAX_NR_ZONES -1 as the
zone_type argument, so we can directly use the value in
build_zonelists_node and remove zone_type argument.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The memory we used to hold the memcg arrays is currently accounted to
the current memcg. But that creates a problem, because that memory can
only be freed after the last user is gone. Our only way to know which
is the last user, is to hook up to freeing time, but the fact that we
still have some in flight kmallocs will prevent freeing to happen. I
believe therefore to be just easier to account this memory as global
overhead.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The memory we used to hold the memcg arrays is currently accounted to
the current memcg. But that creates a problem, because that memory can
only be freed after the last user is gone. Our only way to know which
is the last user, is to hook up to freeing time, but the fact that we
still have some in flight kmallocs will prevent freeing to happen. I
believe therefore to be just easier to account this memory as global
overhead.
This patch (of 2):
Disabling accounting is only relevant for some specific memcg internal
allocations. Therefore we would initially not have such check at
memcg_kmem_newpage_charge, since direct calls to the page allocator that
are marked with GFP_KMEMCG only happen outside memcg core. We are
mostly concerned with cache allocations and by having this test at
memcg_kmem_get_cache we are already able to relay the allocation to the
root cache and bypass the memcg caches altogether.
There is one exception, though: the SLUB allocator does not create large
order caches, but rather service large kmallocs directly from the page
allocator. Therefore, the following sequence, when backed by the SLUB
allocator:
memcg_stop_kmem_account();
kmalloc(<large_number>)
memcg_resume_kmem_account();
would effectively ignore the fact that we should skip accounting, since
it will drive us directly to this function without passing through the
cache selector memcg_kmem_get_cache. Such large allocations are
extremely rare but can happen, for instance, for the cache arrays.
This was never a problem in practice, because we weren't skipping
accounting for the cache arrays. All the allocations we were skipping
were fairly small. However, the fact that we were not skipping those
allocations are a problem and can prevent the memcgs from going away.
As we fix that, we need to make sure that the fix will also work with
the SLUB allocator.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reported-by: Michal Hocko <mhocko@suze.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We should check the VM_UNITIALIZED flag in s_show(). If this flag is
set, that said, the vm_struct is not fully initialized. So it is
unnecessary to try to show the information contained in vm_struct.
We checked this flag in show_numa_info(), but I think it's better to
check it earlier.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
VM_UNLIST was used to indicate that the vm_struct is not listed in
vmlist.
But after commit 4341fa454796 ("mm, vmalloc: remove list management of
vmlist after initializing vmalloc"), the meaning of this flag changed.
It now means the vm_struct is not fully initialized. So renaming it to
VM_UNINITIALIZED seems more reasonable.
Also change clear_vm_unlist to clear_vm_uninitialized_flag.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use goto to jump to the fail label to give a failure message before
returning NULL. This makes the failure handling in this function
consistent.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As we have removed the dead code in the vb_alloc, it seems there is no
place to use the alloc_map. So there is no reason to maintain the
alloc_map in vmap_block.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This function is nowhere used now, so remove it.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Space in a vmap block that was once allocated is considered dirty and
not made available for allocation again before the whole block is
recycled. The result is that free space within a vmap block is always
contiguous.
So if a vmap block has enough free space for allocation, the allocation
is impossible to fail. Thus, the fragmented block purging was never
invoked from vb_alloc(). So remove this dead code.
[ Same patches also sent by:
Chanho Min <chanho.min@lge.com>
Johannes Weiner <hannes@cmpxchg.org>
but git doesn't do "multiple authors" ]
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is an extra semi-colon so the function always returns.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When calculating pages in a node, for each zone in that node, we will
have
zone_spanned_pages_in_node
--> get_pfn_range_for_nid
zone_absent_pages_in_node
--> get_pfn_range_for_nid
That is to say, we call the get_pfn_range_for_nid to get start_pfn and
end_pfn of the node for MAX_NR_ZONES * 2 times. And this is totally
unnecessary if we call the get_pfn_range_for_nid before
zone_*_pages_in_node add two extra arguments node_start_pfn and
node_end_pfn for zone_*_pages_in_node, then we can remove the
get_pfn_range_in_node in zone_*_pages_in_node.
[akpm@linux-foundation.org: make definitions more readable]
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove struct mem_cgroup_lru_info and fold its single member, the
variably sized nodeinfo[0], directly into struct mem_cgroup. This
should make it more obvious why it has to be the last member there.
Also move the comment that's above that special last member below it, so
it is more visible to somebody that considers appending to the struct
mem_cgroup.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch is very similar to commit 84d96d897671 ("mm: madvise:
complete input validation before taking lock"): perform some basic
validation of the input to mremap() before taking the
¤t->mm->mmap_sem lock.
This also makes the MREMAP_FIXED => MREMAP_MAYMOVE dependency slightly
more explicit.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Give s_next and s_stop slab-specific names instead of exporting
"s_next" and "s_stop".
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
While doing some code inspection, I noticed that the slob constructor
method can be called with a NULL pointer. If memory is tight and slob
fails to allocate with slob_alloc() or slob_new_pages() it still calls
the ctor() method with a NULL pointer. Looking at the first ctor()
method I found, I noticed that it can not handle a NULL pointer (I'm
sure others probably can't either):
static void sighand_ctor(void *data)
{
struct sighand_struct *sighand = data;
spin_lock_init(&sighand->siglock);
init_waitqueue_head(&sighand->signalfd_wqh);
}
The solution is to only call the ctor() method if allocation succeeded.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
CPU partial support can introduce level of indeterminism that is not
wanted in certain context (like a realtime kernel). Make it
configurable.
This patch is based on Christoph Lameter's "slub: Make cpu partial slab
support configurable V2".
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Some architectures (e.g. powerpc built with CONFIG_PPC_256K_PAGES=y
CONFIG_FORCE_MAX_ZONEORDER=11) get PAGE_SHIFT + MAX_ORDER > 26.
In 3.10 kernels, CONFIG_LOCKDEP=y with PAGE_SHIFT + MAX_ORDER > 26 makes
init_lock_keys() dereference beyond kmalloc_caches[26].
This leads to an unbootable system (kernel panic at initializing SLAB)
if one of kmalloc_caches[26...PAGE_SHIFT+MAX_ORDER-1] is not NULL.
Fix this by making sure that init_lock_keys() does not dereference beyond
kmalloc_caches[26] arrays.
Signed-off-by: Christoph Lameter <cl@linux.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-Love.SAKURA.ne.jp>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: <stable@vger.kernel.org> [3.10.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
In free path, we don't check number of cpu_partial, so one slab can
be linked in cpu partial list even if cpu_partial is 0. To prevent this,
we should check number of cpu_partial in put_cpu_partial().
Acked-by: Christoph Lameeter <cl@linux.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Use existing interface node_nr_slabs and node_nr_objs to get
nr_slabs and nr_objs.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
This patch remove unused nr_partials variable.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Slab have some tunables like limit, batchcount, and sharedfactor can be
tuned through function slabinfo_write. Commit (b7454ad3: mm/sl[au]b: Move
slabinfo processing to slab_common.c) uncorrectly change /proc/slabinfo
unwriteable for slab, this patch fix it by revert to original mode.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
This patch shares s_next and s_stop between slab and slub.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
The drain_freelist is called to drain slabs_free lists for cache reap,
cache shrink, memory hotplug callback etc. The tofree parameter should
be the number of slab to free instead of the number of slab objects to
free.
This patch fix the callers that pass # of objects. Make sure they pass #
of slabs.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
"The usual stuff from trivial tree"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
treewide: relase -> release
Documentation/cgroups/memory.txt: fix stat file documentation
sysctl/net.txt: delete reference to obsolete 2.4.x kernel
spinlock_api_smp.h: fix preprocessor comments
treewide: Fix typo in printk
doc: device tree: clarify stuff in usage-model.txt.
open firmware: "/aliasas" -> "/aliases"
md: bcache: Fixed a typo with the word 'arithmetic'
irq/generic-chip: fix a few kernel-doc entries
frv: Convert use of typedef ctl_table to struct ctl_table
sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
doc: clk: Fix incorrect wording
Documentation/arm/IXP4xx fix a typo
Documentation/networking/ieee802154 fix a typo
Documentation/DocBook/media/v4l fix a typo
Documentation/video4linux/si476x.txt fix a typo
Documentation/virtual/kvm/api.txt fix a typo
Documentation/early-userspace/README fix a typo
Documentation/video4linux/soc-camera.txt fix a typo
lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
"This is the powerpc changes for the 3.11 merge window. In addition to
the usual bug fixes and small updates, the main highlights are:
- Support for transparent huge pages by Aneesh Kumar for 64-bit
server processors. This allows the use of 16M pages as transparent
huge pages on kernels compiled with a 64K base page size.
- Base VFIO support for KVM on power by Alexey Kardashevskiy
- Wiring up of our nvram to the pstore infrastructure, including
putting compressed oopses in there by Aruna Balakrishnaiah
- Move, rework and improve our "EEH" (basically PCI error handling
and recovery) infrastructure. It is no longer specific to pseries
but is now usable by the new "powernv" platform as well (no
hypervisor) by Gavin Shan.
- I fixed some bugs in our math-emu instruction decoding and made it
usable to emulate some optional FP instructions on processors with
hard FP that lack them (such as fsqrt on Freescale embedded
processors).
- Support for Power8 "Event Based Branch" facility by Michael
Ellerman. This facility allows what is basically "userspace
interrupts" for performance monitor events.
- A bunch of Transactional Memory vs. Signals bug fixes and HW
breakpoint/watchpoint fixes by Michael Neuling.
And more ... I appologize in advance if I've failed to highlight
something that somebody deemed worth it."
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
pstore: Add hsize argument in write_buf call of pstore_ftrace_call
powerpc/fsl: add MPIC timer wakeup support
powerpc/mpic: create mpic subsystem object
powerpc/mpic: add global timer support
powerpc/mpic: add irq_set_wake support
powerpc/85xx: enable coreint for all the 64bit boards
powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx
powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig
powerpc/mpic: Add get_version API both for internal and external use
powerpc: Handle both new style and old style reserve maps
powerpc/hw_brk: Fix off by one error when validating DAWR region end
powerpc/pseries: Support compression of oops text via pstore
powerpc/pseries: Re-organise the oops compression code
pstore: Pass header size in the pstore write callback
powerpc/powernv: Fix iommu initialization again
powerpc/pseries: Inform the hypervisor we are using EBB regs
powerpc/perf: Add power8 EBB support
powerpc/perf: Core EBB support for 64-bit book3s
powerpc/perf: Drop MMCRA from thread_struct
powerpc/perf: Don't enable if we have zero events
...
|