summaryrefslogtreecommitdiff
path: root/arch/tile/mm
AgeCommit message (Collapse)Author
2015-04-17tile: map data region shadow of kernel as R/WChris Metcalf
This is necessary for things like reading /proc/kcore, doing ftrace, etc. It happens by default when using huge pages to map the kernel data, but not when using small pages. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2015-04-17tile: support CONTEXT_TRACKING and thus NOHZ_FULLChris Metcalf
Add the TIF_NOHZ flag appropriately. Add call to user_exit() on entry to do_work_pending() and on entry to syscalls via do_syscall_trace_enter(), and also the top of do_syscall_trace_exit() just because it's done in x86. Add call to user_enter() at the bottom of do_work_pending() once we have no more work to do before returning to userspace. Wrap all the trap code in exception_enter() / exception_exit(). Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-04-17tile/elf: reorganize notify_exec()Davidlohr Bueso
In the future mm->exe_file will be done without mmap_sem serialization, thus isolate and reorganize the tile elf code to make the transition easier. Good users will, make use of the more standard get_mm_exe_file(), requiring only holding the mmap_sem to read the value, and relying on reference counting to make sure that the exe file won't dissappear underneath us. The visible effects of this patch are: o We now take and drop the mmap_sem more often. Instead of just in arch_setup_additional_pages(), we also do it in: 1) get_mm_exe_file() 2) to get the mm->vm_file and notify the simulator. [Note that 1) will disappear once we change the locking rules for exe_file.] o We avoid getting a free page and doing d_path() while holding the mmap_sem. This requires reordering the checks. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2015-02-13tile: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm/hugetlb: reduce arch dependent code around follow_huge_*Naoya Horiguchi
Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10tile: drop pte_file()-related helpersKirill A. Shutemov
We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Chris Metcalf <cmetcalf@ezchip.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-29vm: add VM_FAULT_SIGSEGV handling supportLinus Torvalds
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-11tile: Use the more common pr_warn instead of pr_warningJoe Perches
And other message logging neatening. Other miscellanea: o coalesce formats o realign arguments o standardize a couple of macros o use __func__ instead of embedding the function name Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2014-10-15Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
2014-10-02tile: Remove tile-specific _sinitdata and _einitdataGeert Uytterhoeven
Use standard __init_begin and __init_end instead. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2014-08-26tile: Replace __get_cpu_var usesChristoph Lameter
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds
Pull arch/tile changes from Chris Metcalf: "These mostly just address smaller issues reported to me" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch: tile: kernel: unaligned.c: Cleaning up uninitialized variables drivers/tty/hvc/hvc_tile.c: use PTR_ERR_OR_ZERO replace strict_strto* call with kstrto* tile: Update comments for generic idle conversion tile: cleanup the comment in init_pgprot tile: use BOOTMEM_DEFAULT instead of magic number 0 for reserve_bootmem flags
2014-06-04mm: page_alloc: convert hot/cold parameter and immediate callers to boolMel Gorman
cold is a bool, make it one. Make the likely case the "if" part of the block instead of the else as according to the optimisation manual this is preferred. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04hugetlb: restrict hugepage_migration_support() to x86_64Naoya Horiguchi
Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-28replace strict_strto* call with kstrto*Daniel Walter
remove obsolete calls to strict_strto* and replace them with kstrto* calls accordingly. Signed-off-by: Daniel Walter <dwalter@google.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2014-05-13tile: cleanup the comment in init_pgprotWang Sheng-Hui
In tile vmlinux, the rodata area start after the _sdata. The rodata area is included between [_sdata, __end_rodata), and is handled at an earlier point. The page walk starts at __end_rodata, not _sdata. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-11-15tile: handle pgtable_page_ctor() failKirill A. Shutemov
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-13tile: remove HUGE_VMAP dead codeChris Metcalf
A config option to allow a variant vmap() using huge pages that was never upstreamed had some bits of code related to it scattered around the tile architecture; the config option was removed downstream and this commit cleans up the scattered evidence of it from the upstream as well. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-13tile: use pmd_pfn() instead of casting via pte_tChris Metcalf
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-12arch: mm: pass userspace fault flag to generic fault handlerJohannes Weiner
Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12arch: mm: remove obsolete init OOM protectionJohannes Weiner
The memcg code can trap tasks in the context of the failing allocation until an OOM situation is resolved. They can hold all kinds of locks (fs, mm) at this point, which makes it prone to deadlocking. This series converts memcg OOM handling into a two step process that is started in the charge context, but any waiting is done after the fault stack is fully unwound. Patches 1-4 prepare architecture handlers to support the new memcg requirements, but in doing so they also remove old cruft and unify out-of-memory behavior across architectures. Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel faults, because they can gracefully unwind the stack with -ENOMEM. OOM handling is restricted to user triggered faults that have no other option. Patch 6 reworks memcg's hierarchical OOM locking to make it a little more obvious wth is going on in there: reduce locked regions, rename locking functions, reorder and document. Patch 7 implements the two-part OOM handling such that tasks are never trapped with the full charge stack in an OOM situation. This patch: Back before smart OOM killing, when faulting tasks were killed directly on allocation failures, the arch-specific fault handlers needed special protection for the init process. Now that all fault handlers call into the generic OOM killer (see commit 609838cfed97: "mm: invoke oom-killer from remaining unconverted page fault handlers"), which already provides init protection, the arch-specific leftovers can be removed. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11mm: migrate: check movability of hugepage in unmap_and_move_huge_page()Naoya Horiguchi
Currently hugepage migration works well only for pmd-based hugepages (mainly due to lack of testing,) so we had better not enable migration of other levels of hugepages until we are ready for it. Some users of hugepage migration (mbind, move_pages, and migrate_pages) do page table walk and check pud/pmd_huge() there, so they are safe. But the other users (softoffline and memory hotremove) don't do this, so without this patch they can try to migrate unexpected types of hugepages. To prevent this, we introduce hugepage_migration_support() as an architecture dependent check of whether hugepage are implemented on a pmd basis or not. And on some architecture multiple sizes of hugepages are available, so hugepage_migration_support() also checks hugepage size. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-03tile: make __write_once a synonym for __read_mostlyChris Metcalf
This was really only useful for TILE64 when we mapped the kernel data with small pages. Now we use a huge page and we really don't want to map different parts of the kernel data in different ways. We retain the __write_once name in case we want to bring it back to life at some point in the future. Note that this change uncovered a latent bug where the "smp_topology" variable happened to always be aligned mod 8 so we could store two "int" values at once, but when we eliminated __write_once it ended up only aligned mod 4. Fix with an explicit annotation. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03tile: remove support for TILE64Chris Metcalf
This chip is no longer being actively developed for (it was superceded by the TILEPro64 in 2008), and in any case the existing compiler and toolchain in the community do not support it. It's unlikely that the kernel works with TILE64 at this point as the configuration has not been tested in years. The support is also awkward as it requires maintaining a significant number of ifdefs. So, just remove it altogether. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03tile: add virt_to_kpte() API and clean up and document behaviorChris Metcalf
We use virt_to_pte(NULL, va) a lot, which isn't very obvious. I added virt_to_kpte(va) as a more obvious wrapper function, that also validates the va as being a kernel adddress. And, I fixed the semantics of virt_to_pte() so that we handle the pud and pmd the same way, and we now document the fact that we handle the final pte level differently. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03tile: parameterize VA and PA space more cleanlyChris Metcalf
The existing code relied on the hardware definition (<arch/chip.h>) to specify how much VA and PA space was available. It's convenient to allow customizing this for some configurations, so provide symbols MAX_PA_WIDTH and MAX_VA_WIDTH in <asm/page.h> that can be modified if desired. Additionally, move away from the MEM_XX_INTRPT nomenclature to define the start of various regions within the VA space. In fact the cleaner symbol is, for example, MEM_SV_START, to indicate the start of the area used for supervisor code; the actual address of the interrupt vectors is not as important, and can be changed if desired. As part of this change, convert from "intrpt1" nomenclature (which built in the old privilege-level 1 model) to a simple "intrpt". Also strip out some tilepro-specific code supporting modifying the PL the kernel could run at, since we don't actually support using different PLs in tilepro, only tilegx. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-09-03tile: don't assume user privilege is zeroChris Metcalf
Technically, user privilege is anything less than kernel privilege. We modify the existing user_mode() macro to have this semantic (and use it in a couple of places it wasn't being used before), and add an IS_KERNEL_EX1() macro to the assembly code as well. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30tile: handle super huge pages in virt_to_pteChris Metcalf
This tile-specific API had a minor bug, in that if a super huge (>4GB) page mapped a particular address range, we wouldn't handle it correctly. As part of fixing that bug, I also cleaned up some of the pud and pmd accessors to make them more consistent. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30tile: remove set/clear_fixmap APIsChris Metcalf
Nothing in the codebase was using them, and as written they took "unsigned long" as the physical address rather than "phys_addr_t", which is wrong on tilepro anyway. Rather than fixing stale APIs, just remove them; if there's ever demand for them on this platform, we can put them back. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30tile: support ASLR fullyTony Lu
With this change, tile Linux now supports address-space layout randomization for shared objects, stack, heap and vdso. Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Tony Lu <zlu@tilera.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30tile: support kprobes on tilegxTony Lu
This change includes support for Kprobes, Jprobes and Return Probes. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Tony Lu <zlu@tilera.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13tile: provide traceability for hypervisor callsChris Metcalf
This change adds infrastructure (CONFIG_TILE_HVGLUE_TRACE) that provides C code wrappers for the calls the kernel makes to the Tilera hypervisor. This allows standard kernel infrastructure like FTRACE to be able to instrument hypervisor calls. To allow direct calls to the true API, we export their names with a leading underscore as well. This is important for the few contexts where we need to make hypervisor calls without touching the stack. As part of this change, we also switch from creating the symbols with linker magic to creating them with assembler magic. This lets us provide a symbol type and generally make them appear more as symbols and less as just random values in the Elf namespace. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13tile: avoid struct vm_struct leakChris Metcalf
If ioreamp_prot() fails in ioremap_page_range() due to kernel memory exhaustion, we previously would leak a struct vm_struct. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13tile: implement gettimeofday() via vDSOChris Metcalf
This change creates the framework for vDSO calls, makes the existing rt_sigreturn() mechanism use it, and adds a fast gettimeofday(). Now that we need to expose the vDSO address to userspace, we add AT_SYSINFO_EHDR to the set of aux entries provided to userspace. (You can disable any extra vDSO support by booting with vdso=0, but the rt_sigreturn vDSO page will still be provided.) Note that glibc has supported the tile vDSO since release 2.17. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13tile: support simulator notification for ET_DYN objectsChris Metcalf
The tile code notifies the simulator of new ET_EXEC objects starting to execute so that tracing code can properly annotate the objects. However, we didn't support ET_DYN executables like ld.so, so we didn't properly load symbols, etc. This change enables that support; we use a variant of the SIM_CONTROL_DLOPEN simulator notification that newer simulators will recognize and use to set the base address for the next SIM_CONTROL_OS_EXEC notification. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13tile: support CONFIG_PREEMPTChris Metcalf
This change adds support for CONFIG_PREEMPT (full kernel preemption). In addition to the core support, this change includes a number of places where we fix up uses of smp_processor_id() and per-cpu variables. I also eliminate the PAGE_HOME_HERE and PAGE_HOME_UNKNOWN values for page homing, as it turns out they weren't being used. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13tile: remove calls to arch_flush_lazy_mmu_mode()Chris Metcalf
Since it's a no-op on tile anyway, there's no reason to be calling it in tile-specific code. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13tile: fix some issues in hugepage supportChris Metcalf
First, in huge_pte_offset(), we were erroneously checking pgd_present(), which is always true, rather than pud_present(), which is the thing that tells us if there is a top-level (L0) PTE. Fixing this means we properly look up huge page entries only when the Present bit is actually set in the PTE. Second, use the standard pte_alloc_map() instead of the hand-rolled pte_alloc_hugetlb() routine that basically was written to avoid worrying about CONFIG_HIGHPTE. However, we no longer plan to support HIGHPTE, so a separate routine was just unnecessary code duplication. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-13tile: fast-path unaligned memory access for tilegxChris Metcalf
This change enables unaligned userspace memory access via a kernel fast path on tilegx. The kernel tracks user PC/instruction pairs per-thread using a direct-mapped cache in userspace. The cache maps those PC/instruction pairs to JIT'ed instruction sequences that load or store using byte-wide load store intructions and then synthesize 2-, 4- or 8-byte load or store results. Once an instruction has been seen to generate an unaligned access once, subsequent hits on that instruction typically require overhead of only around 50 cycles if cache and TLB is hot. We support the prctl() PR_GET_UNALIGN / PR_SET_UNALIGN sys call to enable or disable unaligned fixups on a per-process basis. To do this we pull some of the tilepro unaligned support out of the single_step.c file; tilepro uses instruction disassembly for both single-step and unaligned access support. Since tilegx actually has hardware singlestep support, though, it's cleaner to keep the tilegx unaligned access code in a separate file. While we're at it, properly rename the tilepro-specific types, etc., to have tilepro suffixes instead of generic tile suffixes. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-12tile: fix tilegx vmalloc_sync_all BUG_ONChris Metcalf
As specified, the test wasn't correct, and in any case it should be a BUILD_BUG_ON. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-07-10mm: remove free_area_cacheMichel Lespinasse
Since all architectures have been converted to use vm_unmapped_area(), there is no remaining use for the free_area_cache. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09mm: invoke oom-killer from remaining unconverted page fault handlersJohannes Weiner
A few remaining architectures directly kill the page faulting task in an out of memory situation. This is usually not a good idea since that task might not even use a significant amount of memory and so may not be the optimal victim to resolve the situation. Since 2.6.29's 1c0fe6e ("mm: invoke oom-killer from page fault") there is a hook that architecture page fault handlers are supposed to call to invoke the OOM killer and let it pick the right task to kill. Convert the remaining architectures over to this hook. To have the previous behavior of simply taking out the faulting task the vm.oom_kill_allocating_task sysctl can be set to 1. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits] Cc: James Hogan <james.hogan@imgtec.com> Cc: David Howells <dhowells@redhat.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03mm/tile: prepare for removing num_physpages and simplify mem_init()Jiang Liu
Prepare for removing num_physpages and simplify mem_init(). Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03tile: normalize global variables exported by vmlinux.ldsJiang Liu
Normalize global variables exported by vmlinux.lds to conform usage guidelines from include/asm-generic/sections.h. 1) Use _text to mark the start of the kernel image including the head text, and _stext to mark the start of the .text section. 2) Export mandatory global variables __init_begin and __init_end. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03mm: concentrate modification of totalram_pages into the mm coreJiang Liu
Concentrate code to modify totalram_pages into the mm core, so the arch memory initialized code doesn't need to take care of it. With these changes applied, only following functions from mm core modify global variable totalram_pages: free_bootmem_late(), free_all_bootmem(), free_all_bootmem_node(), adjust_managed_page_count(). With this patch applied, it will be much more easier for us to keep totalram_pages and zone->managed_pages in consistence. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: David Howells <dhowells@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03mm/tile: use common help functions to free reserved pagesJiang Liu
Use common help functions to free reserved pages. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, vmalloc: change iterating a vmlist to find_vm_area()Joonsoo Kim
This patchset removes vm_struct list management after initializing vmalloc. Adding and removing an entry to vmlist is linear time complexity, so it is inefficient. If we maintain this list, overall time complexity of adding and removing area to vmalloc space is O(N), although we use rbtree for finding vacant place and it's time complexity is just O(logN). And vmlist and vmlist_lock is used many places of outside of vmalloc.c. It is preferable that we hide this raw data structure and provide well-defined function for supporting them, because it makes that they cannot mistake when manipulating theses structure and it makes us easily maintain vmalloc layer. For kexec and makedumpfile, I export vmap_area_list, instead of vmlist. This comes from Atsushi's recommendation. For more information, please refer below link. https://lkml.org/lkml/2012/12/6/184 This patch: The purpose of iterating a vmlist is finding vm area with specific virtual address. find_vm_area() is provided for this purpose and more efficient, because it uses a rbtree. So change it. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23swap: add per-partition lock for swapfileShaohua Li
swap_lock is heavily contended when I test swap to 3 fast SSD (even slightly slower than swap to 2 such SSD). The main contention comes from swap_info_get(). This patch tries to fix the gap with adding a new per-partition lock. Global data like nr_swapfiles, total_swap_pages, least_priority and swap_list are still protected by swap_lock. nr_swap_pages is an atomic now, it can be changed without swap_lock. In theory, it's possible get_swap_page() finds no swap pages but actually there are free swap pages. But sounds not a big problem. Accessing partition specific data (like scan_swap_map and so on) is only protected by swap_info_struct.lock. Changing swap_info_struct.flags need hold swap_lock and swap_info_struct.lock, because scan_scan_map() will check it. read the flags is ok with either the locks hold. If both swap_lock and swap_info_struct.lock must be hold, we always hold the former first to avoid deadlock. swap_entry_free() can change swap_list. To delete that code, we add a new highest_priority_index. Whenever get_swap_page() is called, we check it. If it's valid, we use it. It's a pity get_swap_page() still holds swap_lock(). But in practice, swap_lock() isn't heavily contended in my test with this patch (or I can say there are other much more heavier bottlenecks like TLB flush). And BTW, looks get_swap_page() doesn't really need the lock. We never free swap_info[] and we check SWAP_WRITEOK flag. The only risk without the lock is we could swapout to some low priority swap, but we can quickly recover after several rounds of swap, so sounds not a big deal to me. But I'd prefer to fix this if it's a real problem. "swap: make each swap partition have one address_space" improved the swapout speed from 1.7G/s to 2G/s. This patch further improves the speed to 2.3G/s, so around 15% improvement. It's a multi-process test, so TLB flush isn't the biggest bottleneck before the patches. [arnd@arndb.de: fix it for nommu] [hughd@google.com: add missing unlock] [minchan@kernel.org: get rid of lockdep whinge on sys_swapon] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23memory-hotplug: introduce new arch_remove_memory() for removing page tableWen Congyang
For removing memory, we need to remove page tables. But it depends on architecture. So the patch introduce arch_remove_memory() for removing page table. Now it only calls __remove_pages(). Note: __remove_pages() for some archtecuture is not implemented (I don't know how to implement it for s390). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23mm: remove flags argument to mmap_regionMichel Lespinasse
After the MAP_POPULATE handling has been moved to mmap_region() call sites, the only remaining use of the flags argument is to pass the MAP_NORESERVE flag. This can be just as easily handled by do_mmap_pgoff(), so do that and remove the mmap_region() flags parameter. [akpm@linux-foundation.org: remove double parens] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>