summaryrefslogtreecommitdiff
path: root/arch/s390/mm
AgeCommit message (Collapse)Author
2012-10-09s390/mm,vmem: fix vmem_add_mem()/vmem_remove_range()Heiko Carstens
vmem_add_mem() should only then insert a large page if pmd_none() is true for the specific entry. We might have a leftover from a previous mapping. In addition make vmem_remove_range()'s page table walk code more complete and fix a couple of potential endless loops (which can never happen :). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/vmalloc: have separate modules areaHeiko Carstens
Add a special module area on top of the vmalloc area, which may be only used for modules and bpf jit generated code. This makes sure that inter module branches will always happen without a trampoline and in addition having all the code within a 2GB frame is branch prediction unit friendly. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/mm: fix mapping of read-only kernel text sectionHeiko Carstens
Within the identity mapping the kernel text section is mapped read-only. However when mapping the first and last page of the text section we must round upwards and downwards respectively, if only parts of a page belong to the section. Otherwise potential rw data can be mapped read-only. So the rounding must be done just the other way we have it right now. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/mm: add page table dumperHeiko Carstens
This is more or less the same as the x86 page table dumper which was merged four years ago: 926e5392 "x86: add code to dump the (kernel) page tables for visual inspection by kernel developers". We add a file at /sys/kernel/debug/kernel_page_tables for debugging purposes so it's quite easy to see the kernel page table layout and possible odd mappings: ---[ Identity Mapping ]--- 0x0000000000000000-0x0000000000100000 1M PTE RW ---[ Kernel Image Start ]--- 0x0000000000100000-0x0000000000800000 7M PMD RO 0x0000000000800000-0x00000000008a9000 676K PTE RO 0x00000000008a9000-0x0000000000900000 348K PTE RW 0x0000000000900000-0x0000000001500000 12M PMD RW ---[ Kernel Image End ]--- 0x0000000001500000-0x0000000280000000 10219M PMD RW 0x0000000280000000-0x000003d280000000 3904G PUD I ---[ vmemmap Area ]--- 0x000003d280000000-0x000003d288c00000 140M PTE RW 0x000003d288c00000-0x000003d300000000 1908M PMD I 0x000003d300000000-0x000003e000000000 52G PUD I ---[ vmalloc Area ]--- 0x000003e000000000-0x000003e000009000 36K PTE RW 0x000003e000009000-0x000003e0000ee000 916K PTE I 0x000003e0000ee000-0x000003e000146000 352K PTE RW 0x000003e000146000-0x000003e000200000 744K PTE I 0x000003e000200000-0x000003e080000000 2046M PMD I 0x000003e080000000-0x0000040000000000 126G PUD I This usually makes only sense for kernel developers. The output with CONFIG_DEBUG_PAGEALLOC is not very helpful, because of the huge number of mapped out pages, however I decided for the time being to not add a !DEBUG_PAGEALLOC dependency. Maybe it's helpful for somebody even with that option. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/mm,pageattr: remove superfluous EXPORT_SYMBOLsHeiko Carstens
Remove some EXPORT_SYMBOLs. There is no module code anywhere which calls these pageattr functions. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/mm,pageattr: add more page table walk sanity checksHeiko Carstens
The current page table walk code in pageattr.c only checks for large pages while walking the kernel page table, but happily assumes that everything else is just fine. Add more checks so we never access invalid memory regions. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09s390/mm: fix pmd_huge() usage for kernel mappingHeiko Carstens
pmd_huge() will always return 0 on !HUGETLBFS, however we use that helper function when walking the kernel page tables to decide if we have a 1MB page frame or not. Since we create 1MB frames for the kernel 1:1 mapping independently of HUGETLBFS this can lead to incorrect storage accesses since the code can assume that we have a pointer to a page table instead of a pointer to a 1MB frame. Fix this by adding a pmd_large() primitive like other architectures have it already and remove all references to HUGETLBFS/HUGETLBPAGE from the code that walks kernel page tables. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09readahead: fault retry breaks mmap file read random detectionShaohua Li
.fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09thp, s390: architecture backend for thp on s390Gerald Schaefer
This implements the architecture backend for transparent hugepages on s390. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09thp, s390: disable thp for kvm host on s390Gerald Schaefer
This patch is part of the architecture backend for thp on s390. It disables thp for kvm hosts, because there is no kvm host hugepage support so far. Existing thp mappings are split by follow_page() with FOLL_SPLIT, and future thp mappings are prevented by setting VM_NOHUGEPAGE in mm->def_flags. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09thp, s390: thp pagetable pre-allocation for s390Gerald Schaefer
This patch is part of the architecture backend for thp on s390. It provides the pagetable pre-allocation functions pgtable_trans_huge_deposit() and pgtable_trans_huge_withdraw(). Unlike other archs, s390 has no struct page * as pgtable_t, but rather a pointer to the page table. So instead of saving the pagetable pre- allocation list info inside the struct page, it is being saved within the pagetable itself. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09thp, s390: thp splitting backend for s390Gerald Schaefer
This patch is part of the architecture backend for thp on s390. It provides the functions related to thp splitting, including serialization against gup. Unlike other archs, pmdp_splitting_flush() cannot use a tlb flushing operation to serialize against gup on s390, because that wouldn't be stopped by the disabled IRQs. So instead, smp_call_function() is called with an empty function, which will have the expected effect. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-26s390/mm: mark free_initrd_mem() as __initHeiko Carstens
Same as 0d26d1d8 "x86/mm: Mark free_initrd_mem() as __init". In addition also add the __init annotation to setup_zero_pages(). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26s390: add uninitialized_var() to suppress false positive compiler warningsHeiko Carstens
Get rid of these: arch/s390/kernel/smp.c:134:19: warning: ‘status’ may be used uninitialized in this function [-Wuninitialized] arch/s390/mm/pgtable.c:641:10: warning: ‘table’ may be used uninitialized in this function [-Wuninitialized] arch/s390/mm/pgtable.c:644:12: warning: ‘page’ may be used uninitialized in this function [-Wuninitialized] drivers/s390/cio/cio.c:1037:14: warning: ‘schid’ may be used uninitialized in this function [-Wuninitialized] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26s390/exceptions: switch to relative exception table entriesHeiko Carstens
This is the s390 port of 70627654 "x86, extable: Switch to relative exception table entries". Reduces the size of our exception tables by 50% on 64 bit builds. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26s390/mm: implement __get_user_pages_fast()Gerald Schaefer
This patch introduces the __get_user_pages_fast() function on s390, which will be needed with CONFIG_TRANSPARENT_HUGEPAGE and futex. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26s390/mm: rename addressing_mode to s390_user_modeHeiko Carstens
Renaming the globally visible variable "user_mode" to "addressing_mode" in order to fix a name clash was not a good idea. (Commit 37fe1d73 "s390/mm: rename user_mode variable to addressing_mode") Looking at the code after a couple of weeks one thinks: addressing mode of what? So rename the variable again. This time to s390_user_mode. Which hopefully makes more sense. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26s390/cpu hotplug: mask out CPU_TASKS_FROZEN in cu hotplug notifiersHeiko Carstens
Unify all our cpu hotplug notifiers to mask out the CPU_TASKS_FROZEN bit, so we don't have to add all the *_FROZEN variant cases to the notifiers. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26s390: enable large page support with CONFIG_DEBUG_PAGEALLOCGerald Schaefer
So far, large page support was completely disabled with CONFIG_DEBUG_PAGEALLOC, although it would be sufficient if only the large page kernel mapping was disabled. This patch enables large page support with CONFIG_DEBUG_PAGEALLOC, while it prevents the large kernel mapping in that case. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30s390: make use of user_mode() macro where possibleHeiko Carstens
We use the user_mode() helper already at several places but also have the open coded variant at other places. Convert the code to always use the helper function. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30s390/mm: rename user_mode variable to addressing_modeHeiko Carstens
Fix name clash with user_mode() define which is also used in common code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30s390/mm: fix fault handling for page table walk caseHeiko Carstens
Make sure the kernel does not incorrectly create a SIGBUS signal during user space accesses: For user space accesses in the switched addressing mode case the kernel may walk page tables and access user address space via the kernel mapping. If a page table entry is invalid the function __handle_fault() gets called in order to emulate a page fault and trigger all the usual actions like paging in a missing page etc. by calling handle_mm_fault(). If handle_mm_fault() returns with an error fixup handling is necessary. For the switched addressing mode case all errors need to be mapped to -EFAULT, so that the calling uaccess function can return -EFAULT to user space. Unfortunately the __handle_fault() incorrectly calls do_sigbus() if VM_FAULT_SIGBUS is set. This however should only happen if a page fault was triggered by a user space instruction. For kernel mode uaccesses the correct action is to only return -EFAULT. So user space may incorrectly see SIGBUS signals because of this bug. For current machines this would only be possible for the switched addressing mode case in conjunction with futex operations. Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30s390/mm: make page faults killableHeiko Carstens
This is the s390 variant of 37b23e05 "x86,mm: make pagefault killable". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-26s390/mm: downgrade page table after fork of a 31 bit processMartin Schwidefsky
The downgrade of the 4 level page table created by init_new_context is currently done only in start_thread31. If a 31 bit process forks the new mm uses a 4 level page table, including the task size of 2<<42 that goes along with it. This is incorrect as now a 31 bit process can map memory beyond 2GB. Define arch_dup_mmap to do the downgrade after fork. Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-20s390/comments: unify copyright messages and remove file namesHeiko Carstens
Remove the file name from the comment at top of many files. In most cases the file name was wrong anyway, so it's rather pointless. Also unify the IBM copyright statement. We did have a lot of sightly different statements and wanted to change them one after another whenever a file gets touched. However that never happened. Instead people start to take the old/"wrong" statements to use as a template for new files. So unify all of them in one go. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-05-30s390/kernel: Introduce memcpy_absolute() functionMichael Holzheu
This patch introduces the new function memcpy_absolute() that allows to copy memory using absolute addressing. This means that the prefix swap does not apply when this function is used. With this patch also all s390 kernel code that accesses absolute zero now uses the new memcpy_absolute() function. The old and less generic copy_to_absolute_zero() function is removed. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-24s390/headers: replace __s390x__ with CONFIG_64BIT where possibleHeiko Carstens
Replace __s390x__ with CONFIG_64BIT in all places that are not exported to userspace or guarded with #ifdef __KERNEL__. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16s390/pfault: add sanity checkHeiko Carstens
If the task that was found on an initial interrupt doesn't match the current task execute a WARN_ON_ONCE() and don't put the task to sleep. When this happened something went wrong between the interface of the hypervisor and the kernel. In such a case keep the tasks alive to avoid a hanging system. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16s390/pfault: use __set_task_stateHeiko Carstens
Use __set_task_state() instead of set_task_state(). Saves a couple of instructions, since the memory barrier is not needed here. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16s390/pfault: always search for task with reported pidHeiko Carstens
Make the code a bit more symmetric and always search for the task of the reported pid. This simplifies the code a bit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16s390/pfault: fix task state raceHeiko Carstens
When setting the current task state to TASK_UNINTERRUPTIBLE this can race with a different cpu. The other cpu could set the task state after it inspected it (while it was still TASK_RUNNING) to TASK_RUNNING which would change the state from TASK_UNINTERRUPTIBLE to TASK_RUNNING again. This race was always present in the pfault interrupt code but didn't cause anything harmful before commit f2db2e6c "[S390] pfault: cpu hotplug vs missing completion interrupts" which relied on the fact that after setting the task state to TASK_UNINTERRUPTIBLE the task would really sleep. Since this is not necessarily the case the result may be a list corruption of the pfault_list or, as observed, a use-after-free bug while trying to access the task_struct of a task which terminated itself already. To fix this, we need to get a reference of the affected task when receiving the initial pfault interrupt and add special handling if we receive yet another initial pfault interrupt when the task is already enqueued in the pfault list. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> # needed for v3.0 and newer Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16s390/kvm: bad rss-counter stateChristian Borntraeger
commit c3f0327f8e9d7a503f0d64573c311eddd61f197d mm: add rss counters consistency check detected the following problem with kvm on s390: BUG: Bad rss-counter state mm:00000004f73ef000 idx:0 val:-10 BUG: Bad rss-counter state mm:00000004f73ef000 idx:1 val:-5 We have to make sure that we accumulate all rss values into the mm before we replace the mm to avoid triggering this (harmless) bug message. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16s390/hugepages: clear page table for sw large page emulationGerald Schaefer
The software large page emulation on s390 did not clear the the pre-allocated page table in arch_release_hugepage() before freeing it. This could trigger the WARN_ON(!pte_none(*pte) in mm/vmalloc.c:106 and make vmap_pte_range() fail, because the page table could be reused in page_table_alloc(). This is fixed now by calling clear_table() before page_table_free(). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16s390: replace TIF_SIE with PF_VCPUMartin Schwidefsky
Replace the check for TIF_SIE in the fault handler by a check for PF_VCPU. With the last user of TIF_SIE gone we can now remove the bit. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16s390: allow absolute memory access for /dev/memMichael Holzheu
Currently dev/mem for s390 provides only real memory access. This means that the CPU prefix pages are swapped. The prefix swap for real memory works as follows: Each CPU owns a prefix register that points to a page aligned memory location "P". If this CPU accesses the address range [0,0x1fff], it is translated by the hardware to [P,P+0x1fff]. Accordingly if this CPU accesses the address range [P,P+0x1fff], it is translated by the hardware to [0,0x1fff]. Therefore, if [P,P+0x1fff] or [0,0x1fff] is read from the current /dev/mem device, the incorrectly swapped memory content is returned. With this patch the /dev/mem architecture code is modified to provide absolute memory access. This is done via the arch specific functions xlate_dev_mem_ptr() and unxlate_dev_mem_ptr(). For swapped pages on s390 the function xlate_dev_mem_ptr() now returns a new buffer with a copy of the requested absolute memory. In case the buffer was allocated, the unxlate_dev_mem_ptr() function frees it after /dev/mem code has called copy_to_user(). Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] fix tlb flushing for page table pagesMartin Schwidefsky
Git commit 36409f6353fc2d7b6516e631415f938eadd92ffa "use generic RCU page-table freeing code" introduced a tlb flushing bug. Partially revert the above git commit and go back to s390 specific page table flush code. For s390 the TLB can contain three types of entries, "normal" TLB page-table entries, TLB combined region-and-segment-table (CRST) entries and real-space entries. Linux does not use real-space entries which leaves normal TLB entries and CRST entries. The CRST entries are intermediate steps in the page-table translation called translation paths. For example a 4K page access in a three-level page table setup will create two CRST TLB entries and one page-table TLB entry. The advantage of that approach is that a page access next to the previous one can reuse the CRST entries and needs just a single read from memory to create the page-table TLB entry. The disadvantage is that the TLB flushing rules are more complicated, before any page-table may be freed the TLB needs to be flushed. In short: the generic RCU page-table freeing code is incorrect for the CRST entries, in particular the check for mm_users < 2 is troublesome. This is applicable to 3.0+ kernels. Cc: <stable@vger.kernel.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] kernel: Use local_irq_save() for memcpy_real()Michael Holzheu
Currently in the memcpy_real() function interrupts are disabled with __arch_local_irq_stnsm(). In order to notify lockdep that interrupts are disabled, with this patch local_irq_save() is used instead. The function __arch_local_irq_stnsm() is still used for switching to real mode. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Disintegrate asm/system.h for S390David Howells
Disintegrate asm/system.h for S390. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-s390@vger.kernel.org
2012-03-23[S390] Remove unncessary export of arch_pick_mmap_layoutBen Hutchings
This function is defined for use in exec, not in modules. No other architecture exports its implementation. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-11[S390] irq: external interrupt code passingHeiko Carstens
The external interrupt handlers have a parameter called ext_int_code. Besides the name this paramter does not only contain the ext_int_code but in addition also the "cpu address" (POP) which caused the external interrupt. To make the code a bit more obvious pass a struct instead so the called function can easily distinguish between external interrupt code and cpu address. The cpu address field however is named "subcode" since some external interrupt sources do not pass a cpu address but a different parameter (or none at all). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: [S390] memory hotplug: prevent memory zone interleave [S390] crash_dump: remove duplicate include [S390] KEYS: Enable the compat keyctl wrapper on s390x
2012-02-27compat: fix compile breakage on s390Heiko Carstens
The new is_compat_task() define for the !COMPAT case in include/linux/compat.h conflicts with a similar define in arch/s390/include/asm/compat.h. This is the minimal patch which fixes the build issues. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-24[S390] memory hotplug: prevent memory zone interleaveGerald Schaefer
This fixes a kernel oops with CONFIG_DEBUG_VM triggered by a VM_BUG_ON(bad_range()): kernel BUG at mm/page_alloc.c:748. With memory hotplug on System z, it is possible that the memory online/offline state is preserved over a system restart, e.g. there may be offline memory blocks in ZONE_DMA or ZONE_NORMAL. So far, the offline memory range has always been added to ZONE_MOVABLE during system start, so that it was possible to have ZONE_MOVABLE interleave with ZONE_DMA or ZONE_NORMAL. This patch fixes that by checking for zone overlap before adding memory. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-02-17[S390] incorrect PageTables counter for kvm page tablesMartin Schwidefsky
The page_table_free_pgste function is used for kvm processes to free page tables that have the pgste extension. It calls pgtable_page_ctor instead of pgtable_page_dtor which increases NR_PAGETABLE instead of decreasing it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27[S390] cleanup trap handlingMartin Schwidefsky
Move the program interruption code and the translation exception identifier to the pt_regs structure as 'int_code' and 'int_parm_long' and make the first level interrupt handler in entry[64].S store the two values. That makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct and to reduce the number of arguments to a lot of functions. Finally un-inline do_trap. Overall this saves 5812 bytes in the .text section of the 64 bit kernel. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27[S390] disable MACHINE_IS_VM check for pfaultCarsten Otte
This patch disables the check for MACHINE_IS_VM when initializing the pfault infrastructure. The code checks for successful completion of diag 258 anyway, thus it's safe to try initialization on LPAR anyway. This is needed to use pfault on kvm Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27[S390] add support for physical memory > 4TBMartin Schwidefsky
The kernel address space of a 64 bit kernel currently uses a three level page table and the vmemmap array has a fixed address and a fixed maximum size. A three level page table is good enough for systems with less than 3.8TB of memory, for bigger systems four page table levels need to be used. Each page table level costs a bit of performance, use 3 levels for normal systems and 4 levels only for the really big systems. To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a maximum of 64TB of memory. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27[S390] kvm: fix sleeping function ... at mm/page_alloc.c:2260Christian Borntraeger
commit cc772456ac9b460693492b3a3d89e8c81eda5874 [S390] fix list corruption in gmap reverse mapping added a potential dead lock: BUG: sleeping function called from invalid context at mm/page_alloc.c:2260 in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39 3 locks held by qemu-system-s39/1108: #0: (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm] #1: (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298 #2: (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298 CPU: 0 Not tainted 3.1.3 #45 Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0) 00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000 00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96 0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000 000000000000000d 000000000000000c 00000004fd597988 0000000000000000 0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960 Call Trace: ([<0000000000100926>] show_trace+0xee/0x144) [<0000000000131f3a>] __might_sleep+0x12a/0x158 [<0000000000217fb4>] __alloc_pages_nodemask+0x224/0xadc [<0000000000123086>] gmap_alloc_table+0x46/0x114 [<000000000012395c>] gmap_map_segment+0x268/0x298 [<000003e00486b014>] kvm_arch_commit_memory_region+0x44/0x6c [kvm] [<000003e004866414>] __kvm_set_memory_region+0x3b0/0x4a4 [kvm] [<000003e004866554>] kvm_set_memory_region+0x4c/0x6c [kvm] [<000003e004867c7a>] kvm_vm_ioctl+0x14a/0x314 [kvm] [<0000000000292100>] do_vfs_ioctl+0x94/0x588 [<0000000000292688>] SyS_ioctl+0x94/0xac [<000000000061e124>] sysc_noemu+0x22/0x28 [<000003fffcd5e7ca>] 0x3fffcd5e7ca 3 locks held by qemu-system-s39/1108: #0: (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm] #1: (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298 #2: (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298 Fix this by freeing the lock on the alloc path. This is ok, since the gmap table is never freed until we call gmap_free, so the table we are walking cannot go. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14[S390] pfault: ignore leftover completion interruptsHeiko Carstens
Ignore completion interrupts if the initial interrupt hasn't been received and the addressed task is not running. This case can only happen if leftover (pending) completion interrupt gets delivered which wasn't removed with the PFAULT CANCEL operation during cpu hotplug. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>