summaryrefslogtreecommitdiff
path: root/include/asm-generic
AgeCommit message (Collapse)Author
2013-10-02mm: Fix generic hugetlb pte check return type.David Miller
The include/asm-generic/hugetlb.h stubs that just vector huge_pte_*() calls to the pte_*() implementations won't work in certain situations. x86 and sparc, for example, return "unsigned long" from the bit checks, and just go "return pte_val(pte) & PTE_BIT_FOO;" But since huge_pte_*() returns 'int', if any high bits on 64-bit are relevant, they get chopped off. The net effect is that we can loop forever trying to COW a huge page, because the huge_pte_write() check signals false all the time. Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: David Rientjes <rientjes@google.com>
2013-09-30include/asm-generic/vtime.h: avoid zero-length fileAndrew Morton
patch(1) can't handle zero-length files - it appears to simply not create the file, so my powerpc build fails. Put something in here to make life easier. Cc: Hugh Dickins <hughd@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-09Merge branch 'for-v3.12' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA mapping update from Marek Szyprowski: "This contains an addition of Device Tree support for reserved memory regions (Contiguous Memory Allocator is one of the drivers for it) and changes required by the KVM extensions for PowerPC architectue" * 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: ARM: init: add support for reserved memory defined by device tree drivers: of: add initialization code for dma reserved memory drivers: of: add function to scan fdt nodes given by path drivers: dma-contiguous: clean source code and prepare for device tree
2013-09-04Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timers/nohz changes from Ingo Molnar: "It mostly contains fixes and full dynticks off-case optimizations, by Frederic Weisbecker" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) nohz: Include local CPU in full dynticks global kick nohz: Optimize full dynticks's sched hooks with static keys nohz: Optimize full dynticks state checks with static keys nohz: Rename a few state variables vtime: Always debug check snapshot source _before_ updating it vtime: Always scale generic vtime accounting results vtime: Optimize full dynticks accounting off case with static keys vtime: Describe overriden functions in dedicated arch headers m68k: hardirq_count() only need preempt_mask.h hardirq: Split preempt count mask definitions context_tracking: Split low level state headers vtime: Fix racy cputime delta update vtime: Remove a few unneeded generic vtime state checks context_tracking: User/kernel broundary cross trace events context_tracking: Optimize context switch off case with static keys context_tracking: Optimize guest APIs off case with static key context_tracking: Optimize main APIs off case with static key context_tracking: Ground setup for static key use context_tracking: Remove full dynticks' hacky dependency on wide context tracking nohz: Only enable context tracking on full dynticks CPUs ...
2013-09-04Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "Main RCU changes this cycle were: - Full-system idle detection. This is for use by Frederic Weisbecker's adaptive-ticks mechanism. Its purpose is to allow the timekeeping CPU to shut off its tick when all other CPUs are idle. - Miscellaneous fixes. - Improved rcutorture test coverage. - Updated RCU documentation" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU nohz_full: Add full-system-idle state machine jiffies: Avoid undefined behavior from signed overflow rcu: Simplify _rcu_barrier() processing rcu: Make rcutorture emit online failures if verbose rcu: Remove unused variable from rcu_torture_writer() rcu: Sort rcutorture module parameters rcu: Increase rcutorture test coverage rcu: Add duplicate-callback tests to rcutorture doc: Fix memory-barrier control-dependency example rcu: Update RTFP documentation nohz_full: Add full-system-idle arguments to API nohz_full: Add full-system idle states and variables nohz_full: Add per-CPU idle-state tracking nohz_full: Add rcu_dyntick data for scalable detection of all-idle state nohz_full: Add Kconfig parameter for scalable detection of all-idle state nohz_full: Add testing information to documentation rcu: Eliminate unused APIs intended for adaptive ticks rcu: Select IRQ_WORK from TREE_PREEMPT_RCU rculist: list_first_or_null_rcu() should use list_entry_rcu() ...
2013-09-03Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: " * Update RCU documentation. These were posted to LKML at https://lkml.org/lkml/2013/8/19/611. * Miscellaneous fixes. These were posted to LKML at https://lkml.org/lkml/2013/8/19/619. * Full-system idle detection. This is for use by Frederic Weisbecker's adaptive-ticks mechanism. Its purpose is to allow the timekeeping CPU to shut off its tick when all other CPUs are idle. These were posted to LKML at https://lkml.org/lkml/2013/8/19/648. * Improve rcutorture test coverage. These were posted to LKML at https://lkml.org/lkml/2013/8/19/675. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-29s390/mm: implement software referenced bitsMartin Schwidefsky
The last remaining use for the storage key of the s390 architecture is reference counting. The alternative is to make page table entries invalid while they are old. On access the fault handler marks the pte/pmd as young which makes the pte/pmd valid if the access rights allow read access. The pte/pmd invalidations required for software managed reference bits cost a bit of performance, on the other hand the RRBE/RRBM instructions to read and reset the referenced bits are quite expensive as well. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-27drivers: dma-contiguous: clean source code and prepare for device treeMarek Szyprowski
This patch cleans the initialization of dma contiguous framework. The all-in-one dma_declare_contiguous() function is now separated into dma_contiguous_reserve_area() which only steals the the memory from memblock allocator and dma_contiguous_add_device() function, which assigns given device to the specified reserved memory area. This improves the flexibility in defining contiguous memory areas and assigning device to them, because now it is possible to assign more than one device to the given contiguous memory area. Such split in initialization procedure is also required for upcoming device tree support. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Tomasz Figa <t.figa@samsung.com>
2013-08-16Fix TLB gather virtual address range invalidation corner casesLinus Torvalds
Ben Tebulin reported: "Since v3.7.2 on two independent machines a very specific Git repository fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This only occurs on a very specific repository and can be reproduced stably on two independent laptops. Git mailing list ran out of ideas and for me this looks like some very exotic kernel issue" and bisected the failure to the backport of commit 53a59fc67f97 ("mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"). That commit itself is not actually buggy, but what it does is to make it much more likely to hit the partial TLB invalidation case, since it introduces a new case in tlb_next_batch() that previously only ever happened when running out of memory. The real bug is that the TLB gather virtual memory range setup is subtly buggered. It was introduced in commit 597e1c3580b7 ("mm/mmu_gather: enable tlb flush range in generic mmu_gather"), and the range handling was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB range flushed when __tlb_remove_page() runs out of slots"), but that fix was not complete. The problem with the TLB gather virtual address range is that it isn't set up by the initial tlb_gather_mmu() initialization (which didn't get the TLB range information), but it is set up ad-hoc later by the functions that actually flush the TLB. And so any such case that forgot to update the TLB range entries would potentially miss TLB invalidates. Rather than try to figure out exactly which particular ad-hoc range setup was missing (I personally suspect it's the hugetlb case in zap_huge_pmd(), which didn't have the same logic as zap_pte_range() did), this patch just gets rid of the problem at the source: make the TLB range information available to tlb_gather_mmu(), and initialize it when initializing all the other tlb gather fields. This makes the patch larger, but conceptually much simpler. And the end result is much more understandable; even if you want to play games with partial ranges when invalidating the TLB contents in chunks, now the range information is always there, and anybody who doesn't want to bother with it won't introduce subtle bugs. Ben verified that this fixes his problem. Reported-bisected-and-tested-by: Ben Tebulin <tebulin@googlemail.com> Build-testing-by: Stephen Rothwell <sfr@canb.auug.org.au> Build-testing-by: Richard Weinberger <richard.weinberger@gmail.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-14vtime: Describe overriden functions in dedicated arch headersFrederic Weisbecker
If the arch overrides some generic vtime APIs, let it describe these on a dedicated and standalone header. This way it becomes convenient to include it in vtime generic headers without irrelevant stuff in such a low level header. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-08-13mm: save soft-dirty bits on file pagesCyrill Gorcunov
Andy reported that if file page get reclaimed we lose the soft-dirty bit if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get encoded into pte entry. Thus when #pf happens on such non-present pte we can restore it back. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13mm: save soft-dirty bits on swapped pagesCyrill Gorcunov
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set get swapped out, the bit is getting lost and no longer available when pte read back. To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in pte entry for the page being swapped out. When such page is to be read back from a swap cache we check for bit presence and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY bit back. One of the problem was to find a place in pte entry where we can save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was chosen for that, it doesn't intersect with swap entry format stored in pte. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-26tracing: Add __tracepoint_string() to export string pointersSteven Rostedt (Red Hat)
There are several tracepoints (mostly in RCU), that reference a string pointer and uses the print format of "%s" to display the string that exists in the kernel, instead of copying the actual string to the ring buffer (saves time and ring buffer space). But this has an issue with userspace tools that read the binary buffers that has the address of the string but has no access to what the string itself is. The end result is just output that looks like: rcu_dyntick: ffffffff818adeaa 1 0 rcu_dyntick: ffffffff818adeb5 0 140000000000000 rcu_dyntick: ffffffff818adeb5 0 140000000000000 rcu_utilization: ffffffff8184333b rcu_utilization: ffffffff8184333b The above is pretty useless when read by the userspace tools. Ideally we would want something that looks like this: rcu_dyntick: Start 1 0 rcu_dyntick: End 0 140000000000000 rcu_dyntick: Start 140000000000000 0 rcu_callback: rcu_preempt rhp=0xffff880037aff710 func=put_cred_rcu 0/4 rcu_callback: rcu_preempt rhp=0xffff880078961980 func=file_free_rcu 0/5 rcu_dyntick: End 0 1 The trace_printk() which also only stores the address of the string format instead of recording the string into the buffer itself, exports the mapping of kernel addresses to format strings via the printk_format file in the debugfs tracing directory. The tracepoint strings can use this same method and output the format to the same file and the userspace tools will be able to decipher the address without any modification. The tracepoint strings need its own section to save the strings because the trace_printk section will cause the trace_printk() buffers to be allocated if anything exists within the section. trace_printk() is only used for debugging and should never exist in the kernel, we can not use the trace_printk sections. Add a new tracepoint_str section that will also be examined by the output of the printk_format file. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-07Merge branch 'cpuinit-delete' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull first stage of __cpuinit removal from Paul Gortmaker: "The two commits here 1) dummy out all the __cpuinit macros so that we no longer generate such sections, and then 2) remove all the section processing that we used to do for those sections. This makes all the __cpuinit and friends no-ops, so that we can remove the use cases of it at our leisure. Expect stage 2, which does the tree wide removal sweep at the end of the merge window." * 'cpuinit-delete' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: modpost: remove all traces of cpuinit/cpuexit sections init.h: remove __cpuinit sections from the kernel
2013-07-04Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...
2013-07-03rapidio: convert switch drivers to modulesAlexandre Bounine
Rework RapidIO switch drivers to add an option to build them as loadable kernel modules. This patch removes RapidIO-specific vmlinux section and converts switch drivers to be compatible with LDM driver registration method. To simplify registration of device-specific callback routines this patch introduces rio_switch_ops data structure. The sw_sysfs() callback is removed from the list of device-specific operations because under the new structure its functions can be handled by switch driver's probe() and remove() routines. If a specific switch device driver is not loaded the RapidIO subsystem core will use default standard-based operations to configure a switch. Because the current implementation of RapidIO enumeration/discovery method relies on availability of device-specific operations for error management, switch device drivers must be loaded before the RapidIO enumeration/discovery starts. This patch also moves several common routines from enumeration/discovery module into the RapidIO core code to make switch-specific operations accessible to all components of RapidIO subsystem. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@Prodrive.nl> Cc: Micha Nelissen <micha.nelissen@Prodrive.nl> Cc: Stef van Os <stef.van.os@Prodrive.nl> Cc: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03vmlinux.lds: add comments for global variables and clean up useless declarationsJiang Liu
The original goal of this patchset is to fix the bug reported by https://bugzilla.kernel.org/show_bug.cgi?id=53501 Now it has also been expanded to reduce common code used by memory initializion. Patch 1-7: 1) add comments for global variables exported by vmlinux.lds 2) normalize global variables exported by vmlinux.lds Patch 8: Introduce helper functions mem_init_print_info() and get_num_physpages() Patch 9: Avoid using global variable num_physpages at runtime Patch 10: Don't update num_physpages in memory_hotplug.c Patch 11-40: Modify arch mm initialization code to: 1) Simplify mem_init() by using mem_init_print_info() 2) Prepare for killing global variable num_physpages Patch 41: Kill the global variable num_physpages With all patches applied, mem_init(), free_initmem(), free_initrd_mem() could be as simple as below. This patch series has reduced about 1.2K lines of code in total. #ifndef CONFIG_DISCONTIGMEM void __init mem_init(void) { max_mapnr = max_low_pfn; free_all_bootmem(); high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); mem_init_print_info(NULL); } #endif /* CONFIG_DISCONTIGMEM */ void free_initmem(void) { free_initmem_default(-1); } #ifdef CONFIG_BLK_DEV_INITRD void free_initrd_mem(unsigned long start, unsigned long end) { free_reserved_area(start, end, -1, "initrd"); } #endif Due to hardware resource limitations, I have only tested this on x86_64. And the messages reported on an x86_64 system are: Log message before applying patches: Memory: 7745676k/8910848k available (6934k kernel code, 836024k absent, 329148k reserved, 6343k data, 1012k init) Log message after applying patches: Memory: 7744624K/8074824K available (6969K kernel code, 1011K data, 2828K rodata, 1016K init, 9640K bss, 330200K reserved) Great thanks to Vineet Gupta for testing on ARC. This patch: Document global variables exported from vmlinux.lds. 1) Add comments about usage guidelines for global variables exported from vmlinux.lds.S. 2) Remove unused __initdata_begin[] and __initdata_end[]. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03mm: soft-dirty bits for user memory changes trackingPavel Emelyanov
The soft-dirty is a bit on a PTE which helps to track which pages a task writes to. In order to do this tracking one should 1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs) 2. Wait some time. 3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries) To do this tracking, the writable bit is cleared from PTEs when the soft-dirty bit is. Thus, after this, when the task tries to modify a page at some virtual address the #PF occurs and the kernel sets the soft-dirty bit on the respective PTE. Note, that although all the task's address space is marked as r/o after the soft-dirty bits clear, the #PF-s that occur after that are processed fast. This is so, since the pages are still mapped to physical memory, and thus all the kernel does is finds this fact out and puts back writable, dirty and soft-dirty bits on the PTE. Another thing to note, is that when mremap moves PTEs they are marked with soft-dirty as well, since from the user perspective mremap modifies the virtual memory at mremap's new address. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-02Merge branch 'sched-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull voluntary preemption fixes from Ingo Molnar: "This tree contains a speedup which is achieved through better might_sleep()/might_fault() preemption point annotations for uaccess functions, by Michael S Tsirkin: 1. The only reason uaccess routines might sleep is if they fault. Make this explicit for all architectures. 2. A voluntary preemption point in uaccess functions means compiler can't inline them efficiently, this breaks assumptions that they are very fast and small that e.g. net code seems to make. Remove this preemption point so behaviour matches with what callers assume. 3. Accesses (e.g through socket ops) to kernel memory with KERNEL_DS like net/sunrpc does will never sleep. Remove an unconditinal might_sleep() in the might_fault() inline in kernel.h (used when PROVE_LOCKING is not set). 4. Accesses with pagefault_disable() return EFAULT but won't cause caller to sleep. Check for that and thus avoid might_sleep() when PROVE_LOCKING is set. These changes offer a nice speedup for CONFIG_PREEMPT_VOLUNTARY=y kernels, here's a network bandwidth measurement between a virtual machine and the host: before: incoming: 7122.77 Mb/s outgoing: 8480.37 Mb/s after: incoming: 8619.24 Mb/s [ +21.0% ] outgoing: 9455.42 Mb/s [ +11.5% ] I kept these changes in a separate tree, separate from scheduler changes, because it's a mixed MM and scheduler topic" * 'sched-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm, sched: Allow uaccess in atomic with pagefault_disable() mm, sched: Drop voluntary schedule from might_fault() x86: uaccess s/might_sleep/might_fault/ tile: uaccess s/might_sleep/might_fault/ powerpc: uaccess s/might_sleep/might_fault/ mn10300: uaccess s/might_sleep/might_fault/ microblaze: uaccess s/might_sleep/might_fault/ m32r: uaccess s/might_sleep/might_fault/ frv: uaccess s/might_sleep/might_fault/ arm64: uaccess s/might_sleep/might_fault/ asm-generic: uaccess s/might_sleep/might_fault/
2013-07-02Merge branch 'core-mutexes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull WW mutex support from Ingo Molnar: "This tree adds support for wound/wait style locks, which the graphics guys would like to make use of in the TTM graphics subsystem. Wound/wait mutexes are used when other multiple lock acquisitions of a similar type can be done in an arbitrary order. The deadlock handling used here is called wait/wound in the RDBMS literature: The older tasks waits until it can acquire the contended lock. The younger tasks needs to back off and drop all the locks it is currently holding, ie the younger task is wounded. See this LWN.net description of W/W mutexes: https://lwn.net/Articles/548909/ The comments there outline specific usecases for this facility (which have already been implemented for the DRM tree). Also see Documentation/ww-mutex-design.txt for more details" * 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking-selftests: Handle unexpected failures more strictly mutex: Add more w/w tests to test EDEADLK path handling mutex: Add more tests to lib/locking-selftest.c mutex: Add w/w tests to lib/locking-selftest.c mutex: Add w/w mutex slowpath debugging mutex: Add support for wound/wait style locks arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
2013-07-02Merge tag 'driver-core-3.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here's the big driver core merge for 3.11-rc1 Lots of little things, and larger firmware subsystem updates, all described in the shortlog. Nice thing here is that we finally get rid of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had been always on for a number of kernel releases, now it's just removed)" * tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits) driver core: device.h: fix doc compilation warnings firmware loader: fix another compile warning with PM_SLEEP unset build some drivers only when compile-testing firmware loader: fix compile warning with PM_SLEEP set kobject: sanitize argument for format string sysfs_notify is only possible on file attributes firmware loader: simplify holding module for request_firmware firmware loader: don't export cache_firmware and uncache_firmware drivers/base: Use attribute groups to create sysfs memory files firmware loader: fix compile warning firmware loader: fix build failure with !CONFIG_FW_LOADER_USER_HELPER Documentation: Updated broken link in HOWTO Finally eradicate CONFIG_HOTPLUG driver core: firmware loader: kill FW_ACTION_NOHOTPLUG requests before suspend driver core: firmware loader: don't cache FW_ACTION_NOHOTPLUG firmware Documentation: Tidy up some drivers/base/core.c kerneldoc content. platform_device: use a macro instead of platform_driver_register firmware: move EXPORT_SYMBOL annotations firmware: Avoid deadlock of usermodehelper lock at shutdown dell_rbu: Select CONFIG_FW_LOADER_USER_HELPER explicitly ...
2013-06-29consolidate io_remap_pfn_range definitionsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-26modpost: remove all traces of cpuinit/cpuexit sectionsPaul Gortmaker
Delete all audit rules that were checking how the .cpuXYZ related sections were inter-operating with other __init like sections, now that __cpuinit is gone. Update the linker script to not have any knowledge of .cpuinit sections. [lds.h update courtesy of Ralf Baechle <ralf@linux-mips.org>] Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-06-26arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or notMaarten Lankhorst
This will allow me to call functions that have multiple arguments if fastpath fails. This is required to support ticket mutexes, because they need to be able to pass an extra argument to the fail function. Originally I duplicated the functions, by adding __mutex_fastpath_lock_retval_arg. This ended up being just a duplication of the existing function, so a way to test if fastpath was called ended up being better. This also cleaned up the reservation mutex patch some by being able to call an atomic_set instead of atomic_xchg, and making it easier to detect if the wrong unlock function was previously used. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: robclark@gmail.com Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20mm/THP: add pmd args to pgtable deposit and withdraw APIsAneesh Kumar K.V
This will be later used by powerpc THP support. In powerpc we want to use pgtable for storing the hash index values. So instead of adding them to mm_context list, we would like to store them in the second half of pmd Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-17Merge 3.10-rc6 into driver-core-nextGreg Kroah-Hartman
We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-11Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm bugfixes from Gleb Natapov: "There is one more fix for MIPS KVM ABI here, MIPS and PPC build breakage fixes and a couple of PPC bug fixes" * 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit() kvm/ppc/booke: Hold srcu lock when calling gfn functions kvm/ppc/booke64: Disable e6500 support kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG kvm: Add definition of KVM_REG_MIPS KVM: add kvm_para_available to asm-generic/kvm_para.h
2013-06-06arch, mm: Remove tlb_fast_mode()Peter Zijlstra
Since the introduction of preemptible mmu_gather TLB fast mode has been broken. TLB fast mode relies on there being absolutely no concurrency; it frees pages first and invalidates TLBs later. However now we can get concurrency and stuff goes *bang*. This patch removes all tlb_fast_mode() code; it was found the better option vs trying to patch the hole by entangling tlb invalidation with the scheduler. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Reported-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-05KVM: add kvm_para_available to asm-generic/kvm_para.hJames Hogan
According to include/uapi/linux/kvm_para.h architectures should define kvm_para_available, so add an implementation to asm-generic/kvm_para.h which just returns false. This fixes intel8x0.c build failure on mips with KVM enabled. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03Finally eradicate CONFIG_HOTPLUGStephen Rothwell
Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"), it has been basically impossible to build a kernel with CONFIG_HOTPLUG turned off. Remove all the remaining references to it. Cc: Russell King <linux@arm.linux.org.uk> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-28asm-generic: uaccess s/might_sleep/might_fault/Michael S. Tsirkin
The only reason uaccess routines might sleep is if they fault. Make this explicit. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1369577426-26721-1-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-22kernel: Fix s390 absolute memory access for /dev/memMichael Holzheu
On s390 the prefix page and absolute zero pages are not correctly returned when reading /dev/mem. The reason is that the s390 asm/io.h file includes the asm-generic/io.h file which then defines xlate_dev_mem_ptr() and therefore overwrites the s390 specific version that does the correct swap operation for prefix and absolute zero pages. The problem is a regression that was introduced with git commit cd248341 (s390/pci: base support). To fix the problem add "#ifndef xlate_dev_mem_ptr" in asm-generic/io.h and "#define xlate_dev_mem_ptr" in asm/io.h. This ensures that the s390 version is used. For completeness also add the "#ifndef" construct for xlate_dev_kmem_ptr(). Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-05-05Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull 'full dynticks' support from Ingo Molnar: "This tree from Frederic Weisbecker adds a new, (exciting! :-) core kernel feature to the timer and scheduler subsystems: 'full dynticks', or CONFIG_NO_HZ_FULL=y. This feature extends the nohz variable-size timer tick feature from idle to busy CPUs (running at most one task) as well, potentially reducing the number of timer interrupts significantly. This feature got motivated by real-time folks and the -rt tree, but the general utility and motivation of full-dynticks runs wider than that: - HPC workloads get faster: CPUs running a single task should be able to utilize a maximum amount of CPU power. A periodic timer tick at HZ=1000 can cause a constant overhead of up to 1.0%. This feature removes that overhead - and speeds up the system by 0.5%-1.0% on typical distro configs even on modern systems. - Real-time workload latency reduction: CPUs running critical tasks should experience as little jitter as possible. The last remaining source of kernel-related jitter was the periodic timer tick. - A single task executing on a CPU is a pretty common situation, especially with an increasing number of cores/CPUs, so this feature helps desktop and mobile workloads as well. The cost of the feature is mainly related to increased timer reprogramming overhead when a CPU switches its tick period, and thus slightly longer to-idle and from-idle latency. Configuration-wise a third mode of operation is added to the existing two NOHZ kconfig modes: - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named as a config option. This is the traditional Linux periodic tick design: there's a HZ tick going on all the time, regardless of whether a CPU is idle or not. - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the periodic tick when a CPU enters idle mode. - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the tick when a CPU is idle, also slows the tick down to 1 Hz (one timer interrupt per second) when only a single task is running on a CPU. The .config behavior is compatible: existing !CONFIG_NO_HZ and CONFIG_NO_HZ=y settings get translated to the new values, without the user having to configure anything. CONFIG_NO_HZ_FULL is turned off by default. This feature is based on a lot of infrastructure work that has been steadily going upstream in the last 2-3 cycles: related RCU support and non-periodic cputime support in particular is upstream already. This tree adds the final pieces and activates the feature. The pull request is marked RFC because: - it's marked 64-bit only at the moment - the 32-bit support patch is small but did not get ready in time. - it has a number of fresh commits that came in after the merge window. The overwhelming majority of commits are from before the merge window, but still some aspects of the tree are fresh and so I marked it RFC. - it's a pretty wide-reaching feature with lots of effects - and while the components have been in testing for some time, the full combination is still not very widely used. That it's default-off should reduce its regression abilities and obviously there are no known regressions with CONFIG_NO_HZ_FULL=y enabled either. - the feature is not completely idempotent: there is no 100% equivalent replacement for a periodic scheduler/timer tick. In particular there's ongoing work to map out and reduce its effects on scheduler load-balancing and statistics. This should not impact correctness though, there are no known regressions related to this feature at this point. - it's a pretty ambitious feature that with time will likely be enabled by most Linux distros, and we'd like you to make input on its design/implementation, if you dislike some aspect we missed. Without flaming us to crisp! :-) Future plans: - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off the periodic tick altogether when there's a single busy task on a CPU. We'd first like 1 Hz to be exposed more widely before we go for the 0 Hz target though. - once we reach 0 Hz we can remove the periodic tick assumption from nr_running>=2 as well, by essentially interrupting busy tasks only as frequently as the sched_latency constraints require us to do - once every 4-40 msecs, depending on nr_running. I am personally leaning towards biting the bullet and doing this in v3.10, like the -rt tree this effort has been going on for too long - but the final word is up to you as usual. More technical details can be found in Documentation/timers/NO_HZ.txt" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) sched: Keep at least 1 tick per second for active dynticks tasks rcu: Fix full dynticks' dependency on wide RCU nocb mode nohz: Protect smp_processor_id() in tick_nohz_task_switch() nohz_full: Add documentation. cputime_nsecs: use math64.h for nsec resolution conversion helpers nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config nohz: Reduce overhead under high-freq idling patterns nohz: Remove full dynticks' superfluous dependency on RCU tree nohz: Fix unavailable tick_stop tracepoint in dynticks idle nohz: Add basic tracing nohz: Select wide RCU nocb for full dynticks nohz: Disable the tick when irq resume in full dynticks CPU nohz: Re-evaluate the tick for the new task after a context switch nohz: Prepare to stop the tick on irq exit nohz: Implement full dynticks kick nohz: Re-evaluate the tick from the scheduler IPI sched: New helper to prevent from stopping the tick in full dynticks sched: Kick full dynticks CPU that have more than one task enqueued. perf: New helper to prevent full dynticks CPUs from stopping tick perf: Kick full dynticks CPU if events rotation is needed ...
2013-05-05Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull mudule updates from Rusty Russell: "We get rid of the general module prefix confusion with a binary config option, fix a remove/insert race which Never Happens, and (my favorite) handle the case when we have too many modules for a single commandline. Seriously, the kernel is full, please go away!" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: modpost: fix unwanted VMLINUX_SYMBOL_STR expansion X.509: Support parse long form of length octets in Authority Key Identifier module: don't unlink the module until we've removed all exposure. kernel: kallsyms: memory override issue, need check destination buffer length MODSIGN: do not send garbage to stderr when enabling modules signature modpost: handle huge numbers of modules. modpost: add -T option to read module names from file/stdin. modpost: minor cleanup. genksyms: pass symbol-prefix instead of arch module: fix symbol versioning with symbol prefixes CONFIG_SYMBOL_PREFIX: cleanup.
2013-05-02Merge commit '8700c95adb03' into timers/nohzFrederic Weisbecker
The full dynticks tree needs the latest RCU and sched upstream updates in order to fix some dependencies. Merge a common upstream merge point that has these updates. Conflicts: include/linux/perf_event.h kernel/rcutree.h kernel/rcutree_plugin.h Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-05-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull compat cleanup from Al Viro: "Mostly about syscall wrappers this time; there will be another pile with patches in the same general area from various people, but I'd rather push those after both that and vfs.git pile are in." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: syscalls.h: slightly reduce the jungles of macros get rid of union semop in sys_semctl(2) arguments make do_mremap() static sparc: no need to sign-extend in sync_file_range() wrapper ppc compat wrappers for add_key(2) and request_key(2) are pointless x86: trim sys_ia32.h x86: sys32_kill and sys32_mprotect are pointless get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC merge compat sys_ipc instances consolidate compat lookup_dcookie() convert vmsplice to COMPAT_SYSCALL_DEFINE switch getrusage() to COMPAT_SYSCALL_DEFINE switch epoll_pwait to COMPAT_SYSCALL_DEFINE convert sendfile{,64} to COMPAT_SYSCALL_DEFINE switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect make HAVE_SYSCALL_WRAPPERS unconditional consolidate cond_syscall and SYSCALL_ALIAS declarations teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long get rid of duplicate logics in __SC_....[1-6] definitions
2013-04-29mm: allow arch code to control the user page table ceilingHugh Dickins
On architectures where a pgd entry may be shared between user and kernel (e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0. This patch introduces a generic USER_PGTABLES_CEILING that arch code can override. It is the responsibility of the arch code setting the ceiling to ensure the complete freeing of the page tables (usually in pgd_free()). [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: <stable@vger.kernel.org> [3.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/hugetlb: add more arch-defined huge_pte functionsGerald Schaefer
Commit abf09bed3cce ("s390/mm: implement software dirty bits") introduced another difference in the pte layout vs. the pmd layout on s390, thoroughly breaking the s390 support for hugetlbfs. This requires replacing some more pte_xxx functions in mm/hugetlbfs.c with a huge_pte_xxx version. This patch introduces those huge_pte_xxx functions and their generic implementation in asm-generic/hugetlb.h, which will now be included on all architectures supporting hugetlbfs apart from s390. This change will be a no-op for those architectures. [akpm@linux-foundation.org: fix warning] Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts] Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-26cputime_nsecs: use math64.h for nsec resolution conversion helpersKevin Hilman
For the nsec resolution conversions to be useable on non 64-bit architectures, the helpers in <linux/math64.h> need to be used so the right arch-specific 64-bit math helpers can be used (e.g. do_div()) Signed-off-by: Kevin Hilman <khilman@linaro.org> Cc: Christoph Lameter <cl@linux.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-04-12x86-32: Fix possible incomplete TLB invalidate with PAE pagetablesDave Hansen
This patch attempts to fix: https://bugzilla.kernel.org/show_bug.cgi?id=56461 The symptom is a crash and messages like this: chrome: Corrupted page table at address 34a03000 *pdpt = 0000000000000000 *pde = 0000000000000000 Bad pagetable: 000f [#1] PREEMPT SMP Ingo guesses this got introduced by commit 611ae8e3f520 ("x86/tlb: enable tlb flush range support for x86") since that code started to free unused pagetables. On x86-32 PAE kernels, that new code has the potential to free an entire PMD page and will clear one of the four page-directory-pointer-table (aka pgd_t entries). The hardware aggressively "caches" these top-level entries and invlpg does not actually affect the CPU's copy. If we clear one we *HAVE* to do a full TLB flush, otherwise we might continue using a freed pmd page. (note, we do this properly on the population side in pud_populate()). This patch tracks whenever we clear one of these entries in the 'struct mmu_gather', and ensures that we follow up with a full tlb flush. BTW, I disassembled and checked that: if (tlb->fullmm == 0) and if (!tlb->fullmm && !tlb->need_flush_all) generate essentially the same code, so there should be zero impact there to the !PAE case. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Artem S Tashkinov <t.artem@mailcity.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-15CONFIG_SYMBOL_PREFIX: cleanup.Rusty Russell
We have CONFIG_SYMBOL_PREFIX, which three archs define to the string "_". But Al Viro broke this in "consolidate cond_syscall and SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to do so. Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to prefix it so something. So various places define helpers which are defined to nothing if CONFIG_SYMBOL_PREFIX isn't set: 1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX. 2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym) 3) include/linux/export.h defines MODULE_SYMBOL_PREFIX. 4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7) 5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym) 6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX 7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version for pasting. (arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too). Let's solve this properly: 1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX. 2) Make linux/export.h usable from asm. 3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR(). 4) Make everyone use them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: James Hogan <james.hogan@imgtec.com> Tested-by: James Hogan <james.hogan@imgtec.com> (metag)
2013-03-13asm-generic: move cmpxchg*_local defs to cmpxchg.hJonas Bonn
asm/cmpxchg.h can be included on its own and needs to be self-consistent. The definitions for the cmpxchg*_local macros, as such, need to be part of this file. This fixes a build issue on OpenRISC since the system.h smashing patch 96f951edb1f1bdbbc99b0cd458f9808bb83d58ae that introdued the direct inclusion asm/cmpxchg.h into linux/llist.h. CC: David Howells <dhowells@redhat.com> Signed-off-by: Jonas Bonn <jonas@southpole.se> Acked-by: Arnd Bergmann <arnd@arndb.de>
2013-03-03consolidate cond_syscall and SYSCALL_ALIAS declarationsAl Viro
take them to asm/linkage.h, with default in linux/linkage.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-03Merge tag 'metag-v3.9-rc1-v4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag Pull new ImgTec Meta architecture from James Hogan: "This adds core architecture support for Imagination's Meta processor cores, followed by some later miscellaneous arch/metag cleanups and fixes which I kept separate to ease review: - Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture - A few fixes all over, particularly for symbol prefixes - A few privilege protection fixes - Several cleanups (setup.c includes, split out a lot of metag_ksyms.c) - Fix some missing exports - Convert hugetlb to use vm_unmapped_area() - Copy device tree to non-init memory - Provide dma_get_sgtable()" * tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits) metag: Provide dma_get_sgtable() metag: prom.h: remove declaration of metag_dt_memblock_reserve() metag: copy devicetree to non-init memory metag: cleanup metag_ksyms.c includes metag: move mm/init.c exports out of metag_ksyms.c metag: move usercopy.c exports out of metag_ksyms.c metag: move setup.c exports out of metag_ksyms.c metag: move kick.c exports out of metag_ksyms.c metag: move traps.c exports out of metag_ksyms.c metag: move irq enable out of irqflags.h on SMP genksyms: fix metag symbol prefix on crc symbols metag: hugetlb: convert to vm_unmapped_area() metag: export clear_page and copy_page metag: export metag_code_cache_flush_all metag: protect more non-MMU memory regions metag: make TXPRIVEXT bits explicit metag: kernel/setup.c: sort includes perf: Enable building perf tools for Meta metag: add boot time LNKGET/LNKSET check metag: add __init to metag_cache_probe() ...
2013-03-02asm-generic/unistd.h: handle symbol prefixes in cond_syscallJames Hogan
Some architectures have symbol prefixes and set CONFIG_SYMBOL_PREFIX, but this wasn't taken into account by the generic cond_syscall. It's easy enough to fix in a generic fashion, so add the symbol prefix to symbol names in cond_syscall when CONFIG_SYMBOL_PREFIX is set. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mike Frysinger <vapier@gentoo.org>
2013-03-02asm-generic/io.h: check CONFIG_VIRT_TO_BUSJames Hogan
Make asm-generic/io.h check CONFIG_VIRT_TO_BUS before defining virt_to_bus() and bus_to_virt(), otherwise it's easy to accidentally have a silently failing incorrect direct mapped definition rather then no definition at all. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
2013-03-02Merge tag 'arc-v3.9-rc1-late' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull new ARC architecture from Vineet Gupta: "Initial ARC Linux port with some fixes on top for 3.9-rc1: I would like to introduce the Linux port to ARC Processors (from Synopsys) for 3.9-rc1. The patch-set has been discussed on the public lists since Nov and has received a fair bit of review, specially from Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb... The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a minor change to PARISC (acked by Helge). The series is a touch bigger for a new port for 2 main reasons: 1. It enables a basic kernel in first sub-series and adds ptrace/kgdb/.. later 2. Some of the fallout of review (DeviceTree support, multi-platform- image support) were added on top of orig series, primarily to record the revision history. This updated pull request additionally contains - fixes due to our GNU tools catching up with the new syscall/ptrace ABI - some (minor) cross-arch Kconfig updates." * tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits) ARC: split elf.h into uapi and export it for userspace ARC: Fixup the current ABI version ARC: gdbserver using regset interface possibly broken ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window ARC: make a copy of flat DT ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed" ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled ARC: Fix pt_orig_r8 access ARC: [3.9] Fallout of hlist iterator update ARC: 64bit RTSC timestamp hardware issue ARC: Don't fiddle with non-existent caches ARC: Add self to MAINTAINERS ARC: Provide a default serial.h for uart drivers needing BASE_BAUD ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux ARC: [Review] Multi-platform image #8: platform registers SMP callbacks ARC: [Review] Multi-platform image #7: SMP common code to use callbacks ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core ARC: [Review] Multi-platform image #4: Isolate platform headers ARC: [Review] Multi-platform image #3: switch to board callback ...
2013-02-26Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds
Pull microblaze update from Michal Simek: "Microblaze changes. After my discussion with Arnd I have also added there asm-generic io patch which is Acked by him and Geert." * 'next' of git://git.monstr.eu/linux-2.6-microblaze: asm-generic: io: Fix ioread16/32be and iowrite16/32be microblaze: Do not use module.h in files which are not modules microblaze: Fix coding style issues microblaze: Add missing return from debugfs_tlb microblaze: Makefile clean microblaze: Add .gitignore entries for auto-generated files microblaze: Fix strncpy_from_user macro
2013-02-26Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cputime: Use local_clock() for full dynticks cputime accounting cputime: Constify timeval_to_cputime(timeval) argument sched: Move RR_TIMESLICE from sysctl.h to rt.h sched: Fix /proc/sched_debug failure on very very large systems sched: Fix /proc/sched_stat failure on very very large systems sched/core: Remove the obsolete and unused nr_uninterruptible() function
2013-02-26Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linuxLinus Torvalds
Pull GPIO changes from Grant Likely: "This branch contains the usual set of individual driver improvements and bug fixes, as well as updates to the core code. The more notable changes include: - Internally add new API for referencing GPIOs by gpio_desc instead of number. Eventually this will become a public API - ACPI GPIO binding support" * tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux: (33 commits) arm64: select ARCH_WANT_OPTIONAL_GPIOLIB gpio: em: Use irq_domain_add_simple() to fix runtime error gpio: using common order: let 'static const' instead of 'const static' gpio/vt8500: memory cleanup missing gpiolib: Fix locking on gpio debugfs files gpiolib: let gpio_chip reference its descriptors gpiolib: use descriptors internally gpiolib: use gpio_chips list in gpiochip_find_base gpiolib: use gpio_chips list in sysfs ops gpiolib: use gpio_chips list in gpiochip_find gpiolib: use gpio_chips list in gpiolib_sysfs_init gpiolib: link all gpio_chips using a list gpio/langwell: cleanup driver gpio/langwell: Add Cloverview ids to pci device table gpio/lynxpoint: add chipset gpio driver. gpiolib: add missing braces in gpio_direction_show gpiolib-acpi: Fix error checks in interrupt requesting gpio: mpc8xxx: don't set IRQ_TYPE_NONE when creating irq mapping gpiolib: remove gpiochip_reserve() arm: pxa: tosa: do not use gpiochip_reserve() ...