linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2012-11-01	sk-filter: Add ability to get socket filter program (v2)	Pavel Emelyanov
	The SO_ATTACH_FILTER option is set only. I propose to add the get ability by using SO_ATTACH_FILTER in getsockopt. To be less irritating to eyes the SO_GET_FILTER alias to it is declared. This ability is required by checkpoint-restore project to be able to save full state of a socket. There are two issues with getting filter back. First, kernel modifies the sock_filter->code on filter load, thus in order to return the filter element back to user we have to decode it into user-visible constants. Fortunately the modification in question is interconvertible. Second, the BPF_S_ALU_DIV_K code modifies the command argument k to speed up the run-time division by doing kernel_k = reciprocal(user_k). Bad news is that different user_k may result in same kernel_k, so we can't get the original user_k back. Good news is that we don't have to do it. What we need to is calculate a user2_k so, that reciprocal(user2_k) == reciprocal(user_k) == kernel_k i.e. if it's re-loaded back the compiled again value will be exactly the same as it was. That said, the user2_k can be calculated like this user2_k = reciprocal(kernel_k) with an exception, that if kernel_k == 0, then user2_k == 1. The optlen argument is treated like this -- when zero, kernel returns the amount of sock_fprog elements in filter, otherwise it should be large enough for the sock_fprog array. changes since v1: * Declared SO_GET_FILTER in all arch headers * Added decode of vlan-tag codes Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-28	sparc: Wire up sys_kcmp.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-28	sparc64: Improvde documentation and readability of atomic backoff code.	David S. Miller
	Document what's going on in asm/backoff.h with a large and descriptive comment. Refer to it above the cpu_relax() definition in asm/processor_64.h Rename the pause patching section to have "3insn" in it's name like the other patching sections do. Based upon feedback from Sam Ravnborg. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-27	sparc64: Use pause instruction when available.	David S. Miller
	In atomic backoff and cpu_relax(), use the pause instruction found on SPARC-T4 and later. It makes the cpu strand unselectable for the given number of cycles, unless an intervening disrupting trap occurs. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-27	sparc64: Fix cpu strand yielding.	David S. Miller
	For atomic backoff, we just loop over an exponentially backed off counter. This is extremely ineffective as it doesn't actually yield the cpu strand so that other competing strands can use the cpu core. In cpus previous to SPARC-T4 we have to do this in a slightly hackish way, by doing an operation with no side effects that also happens to mark the strand as unavailable. The mechanism we choose for this is three reads of the %ccr (condition-code) register into %g0 (the zero register). SPARC-T4 has an explicit "pause" instruction, and we'll make use of that in a subsequent commit. Yield strands also in cpu_relax(). We really should have done this a very long time ago. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26	sparc64: Make montmul/montsqr/mpmul usable in 32-bit threads.	David S. Miller
	The Montgomery Multiply, Montgomery Square, and Multiple-Precision Multiply instructions work by loading a combination of the floating point and multiple register windows worth of integer registers with the inputs. These values are 64-bit. But for 32-bit userland processes we only save the low 32-bits of each integer register during a register spill. This is because the register window save area is in the user stack and has a fixed layout. Therefore, the only way to use these instruction in 32-bit mode is to perform the following sequence: 1) Load the top-32bits of a choosen integer register with a sentinel, say "-1". This will be in the outer-most register window. The idea is that we're trying to see if the outer-most register window gets spilled, and thus the 64-bit values were truncated. 2) Load all the inputs for the montmul/montsqr/mpmul instruction, down to the inner-most register window. 3) Execute the opcode. 4) Traverse back up to the outer-most register window. 5) Check the sentinel, if it's still "-1" store the results. Otherwise retry the entire sequence. This retry is extremely troublesome. If you're just unlucky and an interrupt or other trap happens, it'll push that outer-most window to the stack and clear the sentinel when we restore it. We could retry forever and never make forward progress if interrupts arrive at a fast enough rate (consider perf events as one example). So we have do limited retries and fallback to software which is extremely non-deterministic. Luckily it's very straightforward to provide a mechanism to let 32-bit applications use a 64-bit stack. Stacks in 64-bit mode are biased by 2047 bytes, which means that the lowest bit is set in the actual %sp register value. So if we see bit zero set in a 32-bit application's stack we treat it like a 64-bit stack. Runtime detection of such a facility is tricky, and cumbersome at best. For example, just trying to use a biased stack and seeing if it works is hard to recover from (the signal handler will need to use an alt stack, plus something along the lines of longjmp). Therefore, we add a system call to report a bitmask of arch specific features like this in a cheap and less hairy way. With help from Andy Polyakov. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-25	tty, ioctls -- Add new ioctl definitions for tty flags fetching	Cyrill Gorcunov
	This patch defines new ioctl codes TIOCGPKT, TIOCGPTLCK, TIOCGEXCL for fetching pty's packet mode and locking state, and exclusive mode of tty. [ No real handlers for the codes though, this will be addressed in another patch for easier review and bisectability ] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Alan Cox <alan@lxorguk.ukuu.org.uk> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Pavel Emelyanov <xemul@parallels.com> CC: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-17	Merge tag 'uapi-fixes-20121017' of ↵	Linus Torvalds
	git://git.infradead.org/users/dhowells/linux-headers Pull misc UAPI fixes from David Howells: "They do a number of things: (1) Import a patch from Catalin Marinas to extend the generic-y in Kbuild facility to uapi directories. (2) Make arch/tile's ucontext.h file use (1) and remove the header-y line from the kernel internal side of things. (3) Remove some now-empty conditional bits from include/linux/Kbuild. The contents got moved to the UAPI side of things along with new conditionals. (4) Deal with now-empty files: (a) Empty Kbuild files under include/ get removed. (b) Empty Kbuild files under arch/ get comments to hold them as they are likely to end up with generic-y or genhdr-y lines. Deleting them appears to work if we want to go that route. (c) Put a comment into uapi/asm-generic/kvm_para.h to prevent the patch program from deleting that, and made the arches with empty kvm_para.h uapi files use that instead of having their own files. (d) Put comments into four other empty uapi/ headers to prevent the patch program from deleting them. A question: Is this the right way to deal with the now-empty Kbuild files? The ones under include/ are unlikely to be used - even for generated files, I think - so getting rid of them is probably okay. Once all the bits are in, we can probably remove all the Kbuild files under include/ that aren't also under include/uapi/. The ones under arch/ are more of an issue because of the potential for generic-y and genhdr-y." * tag 'uapi-fixes-20121017' of git://git.infradead.org/users/dhowells/linux-headers: UAPI: Make arch/sparc/include/uapi/asm/sigcontext.h non-empty UAPI: Make arch/sh/include/uapi/asm/hw_breakpoint.h non-empty UAPI: Make arch/mn10300/include/uapi/asm/setup.h non-empty UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches UAPI: The tile arch uses the generic ucontext.h file UAPI: Place comments in empty arch Kbuilds to make them non-empty UAPI: Remove empty non-UAPI Kbuild files UAPI: Remove empty conditionals from include/linux/Kbuild UAPI: Make uapi/linux/irqnr.h non-empty uapi: Allow automatic generation of uapi/asm/ header files
2012-10-17	UAPI: Make arch/sparc/include/uapi/asm/sigcontext.h non-empty	David Howells
	arch/sparc/include/uapi/asm/sigcontext.h was emitted by the UAPI disintegration script as an empty file because the parent file had no UAPI stuff in it, despite being marked with "header-y". Unfortunately, the patch program deletes resultant empty files when applying a kernel patch. So just stick a comment in there as a placeholder. Signed-off-by: David Howells <dhowells@redhat.com> cc: David S. Miller <davem@davemloft.net> cc: sparclinux@vger.kernel.org
2012-10-16	sparc32: switch to generic sys_execve()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-16	sparc32: switch to generic kernel_execve()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-16	sparc32: switch to generic kernel_thread()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-16	sparc32: bury the remnants of LOWSYS tricks	Al Viro
	Time to end that depravity, let's bury the body. It's been 15 years, for crying out loud... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-16	sparc64: convert to generic execve	Al Viro
	We still have wrappers, but nowhere near as scary as they used to be. I'm not sure how necessary that flushw is now, TBH... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-16	sparc64: Fix bit twiddling in sparc_pmu_enable_event().	David S. Miller
	There was a serious disconnect in the logic happening in sparc_pmu_disable_event() vs. sparc_pmu_enable_event(). Event disable is implemented by programming a NOP event into the PCR. However, event enable was not reversing this operation. Instead, it was setting the User/Priv/Hypervisor trace enable bits. That's not sparc_pmu_enable_event()'s job, that's what sparc_pmu_enable() and sparc_pmu_disable() do . The intent of sparc_pmu_enable_event() is clear, since it first clear out the event type encoding field. So fix this by OR'ing in the event encoding rather than the trace enable bits. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16	sparc64: Add global PMU register dumping via sysrq.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-14	sparc64: Like x86 we should check current->mm during perf backtrace generation.	David S. Miller
	If the MM is not active, only report the top-level PC. Do not try to access the address space. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-14	sparc64: switch to generic kernel_execve()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-14	sparc64: take fprs_write() and friends to start_thread()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-14	sparc64: switch to generic kernel_thread()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-14	sparc64: clear syscall_noerror on the entry to syscall, not on the exit	Al Viro
	Move that sucker to just before TI_FPDEPTH and replace stb with sth in etrap_save(). Take current_ds to its old place, so that we don't push wsaved into TI_... flags. That allows to lose clearing syscall_noerror on return from syscall. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-14	Merge branch 'modules-next' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module signing support from Rusty Russell: "module signing is the highlight, but it's an all-over David Howells frenzy..." Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG. * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits) X.509: Fix indefinite length element skip error handling X.509: Convert some printk calls to pr_devel asymmetric keys: fix printk format warning MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking MODSIGN: Make mrproper should remove generated files. MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs MODSIGN: Use the same digest for the autogen key sig as for the module sig MODSIGN: Sign modules during the build process MODSIGN: Provide a script for generating a key ID from an X.509 cert MODSIGN: Implement module signature checking MODSIGN: Provide module signing public keys to the kernel MODSIGN: Automatically generate module signing keys if missing MODSIGN: Provide Kconfig options MODSIGN: Provide gitignore and make clean rules for extra files MODSIGN: Add FIPS policy module: signature checking hook X.509: Add a crypto key parser for binary (DER) X.509 certificates MPILIB: Provide a function to read raw data into an MPI X.509: Add an ASN.1 decoder X.509: Add simple ASN.1 grammar compiler ...
2012-10-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc	Linus Torvalds
	Pull Sparc updates from David Miller: 1) Updated syscall tracing fix from Al Viro. 2) SUN4V error reporting was deficient in several areas. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: fix ptrace interaction with force_successful_syscall_return() sparc64: Fix deficiencies in sun4v error reporting.
2012-10-12	vfs: define struct filename and have getname() return it	Jeff Layton
	getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull pile 2 of execve and kernel_thread unification work from Al Viro: "Stuff in there: kernel_thread/kernel_execve/sys_execve conversions for several more architectures plus assorted signal fixes and cleanups. There'll be more (in particular, real fixes for the alpha do_notify_resume() irq mess)..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (43 commits) alpha: don't open-code trace_report_syscall_{enter,exit} Uninclude linux/freezer.h m32r: trim masks avr32: trim masks tile: don't bother with SIGTRAP in setup_frame microblaze: don't bother with SIGTRAP in setup_rt_frame() mn10300: don't bother with SIGTRAP in setup_frame() frv: no need to raise SIGTRAP in setup_frame() x86: get rid of duplicate code in case of CONFIG_VM86 unicore32: remove pointless test h8300: trim _TIF_WORK_MASK parisc: decide whether to go to slow path (tracesys) based on thread flags parisc: don't bother looping in do_signal() parisc: fix double restarts bury the rest of TIF_IRET sanitize tsk_is_polling() bury _TIF_RESTORE_SIGMASK unicore32: unobfuscate _TIF_WORK_MASK mips: NOTIFY_RESUME is not needed in TIF masks mips: merge the identical "return from syscall" per-ABI code ... Conflicts: arch/arm/include/asm/thread_info.h
2012-10-10	sparc64: fix ptrace interaction with force_successful_syscall_return()	Al Viro
	we want syscall_trace_leave() called on exit from any syscall; skipping its call in case we'd done force_successful_syscall_return() is broken... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10	sparc64: Fix deficiencies in sun4v error reporting.	David S. Miller
	Missing error types, attributes, and report fields. Pad out to 64-bytes. Make string reporting cleaner and easier to extend in the future using "const char *" arrays that index by either bit position, or absolute field value. Report the raw 64-byte error report as a sequence of u64s before the annotated version. Only report fields which are valid, given the context and the attribute bits which are set. For shutdown requests, use the local copy of the error report not the one we just freed up back to the queue. Also, use orderly_poweroff() just like the Domain Services shutdown request code does. If the real-address reported is "-1" (unknown) try to disassemble the instruction to report the effective address of the access. Only do this in privileged mode. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull generic execve() changes from Al Viro: "This introduces the generic kernel_thread() and kernel_execve() functions, and switches x86, arm, alpha, um and s390 over to them." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits) s390: convert to generic kernel_execve() s390: switch to generic kernel_thread() s390: fold kernel_thread_helper() into ret_from_fork() s390: fold execve_tail() into start_thread(), convert to generic sys_execve() um: switch to generic kernel_thread() x86, um/x86: switch to generic sys_execve and kernel_execve x86: split ret_from_fork alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve() alpha: switch to generic kernel_thread() alpha: switch to generic sys_execve() arm: get rid of execve wrapper, switch to generic execve() implementation arm: optimized current_pt_regs() arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve() arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk] generic sys_execve() generic kernel_execve() new helper: current_pt_regs() preparation for generic kernel_thread() um: kill thread->forking um: let signal_delivered() do SIGTRAP on singlestepping into handler ...
2012-10-09	UAPI: (Scripted) Disintegrate arch/sparc/include/asm	David Howells
	Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-10-09	Merge branch 'akpm' (Andrew's patch-bomb)	Linus Torvalds
	Merge patches from Andrew Morton: "A few misc things and very nearly all of the MM tree. A tremendous amount of stuff (again), including a significant rbtree library rework." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits) sparc64: Support transparent huge pages. mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd(). mm: Add and use update_mmu_cache_pmd() in transparent huge page code. sparc64: Document PGD and PMD layout. sparc64: Eliminate PTE table memory wastage. sparc64: Halve the size of PTE tables sparc64: Only support 4MB huge pages and 8KB base pages. memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning mm: memcg: clean up mm_match_cgroup() signature mm: document PageHuge somewhat mm: use %pK for /proc/vmallocinfo mm, thp: fix mlock statistics mm, thp: fix mapped pages avoiding unevictable list on mlock memory-hotplug: update memory block's state and notify userspace memory-hotplug: preparation to notify memory block's state at memory hot remove mm: avoid section mismatch warning for memblock_type_name make GFP_NOTRACK definition unconditional cma: decrease cc.nr_migratepages after reclaiming pagelist CMA: migrate mlocked pages kpageflags: fix wrong KPF_THP on non-huge compound pages ...
2012-10-09	sparc64: Support transparent huge pages.	David Miller
	This is relatively easy since PMD's now cover exactly 4MB of memory. Our PMD entries are 32-bits each, so we use a special encoding. The lowest bit, PMD_ISHUGE, determines the interpretation. This is possible because sparc64's page tables are purely software entities so we can use whatever encoding scheme we want. We just have to make the TLB miss assembler page table walkers aware of the layout. set_pmd_at() works much like set_pte_at() but it has to operate in two page from a table of non-huge PTEs, so we have to queue up TLB flushes based upon what mappings are valid in the PTE table. In the second regime we are going from huge-page to non-huge-page, and in that case we need only queue up a single TLB flush to push out the huge page mapping. We still have 5 bits remaining in the huge PMD encoding so we can very likely support any new pieces of THP state tracking that might get added in the future. With lots of help from Johannes Weiner. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	sparc64: Document PGD and PMD layout.	David Miller
	We're going to be messing around with the PMD interpretation and layout for the sake of transparent huge pages, so we better clearly document what we're starting with. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	sparc64: Eliminate PTE table memory wastage.	David Miller
	We've split up the PTE tables so that they take up half a page instead of a full page. This is in order to facilitate transparent huge page support, which works much better if our PMDs cover 4MB instead of 8MB. What we do is have a one-behind cache for PTE table allocations in the mm struct. This logic triggers only on allocations. For example, we don't try to keep track of free'd up page table blocks in the style that the s390 port does. There were only two slightly annoying aspects to this change: 1) Changing pgtable_t to be a "pte_t *". There's all of this special logic in the TLB free paths that needed adjustments, as did the PMD populate interfaces. 2) init_new_context() needs to zap the pointer, since the mm struct just gets copied from the parent on fork. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	sparc64: Halve the size of PTE tables	David Miller
	The reason we want to do this is to facilitate transparent huge page support. Right now PMD's cover 8MB of address space, and our huge page size is 4MB. The current transparent hugepage support is not able to handle HPAGE_SIZE != PMD_SIZE. So make PTE tables be sized to half of a page instead of a full page. We can still map properly the whole supported virtual address range which on sparc64 requires 44 bits. Add a compile time CPP test which ensures that this requirement is always met. There is a minor inefficiency added by this change. We only use half of the page for PTE tables. It's not trivial to use only half of the page yet still get all of the pgtable_page_{ctor,dtor}() stuff working properly. It is doable, and that will come in a subsequent change. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	sparc64: Only support 4MB huge pages and 8KB base pages.	David Miller
	Narrowing the scope of the page size configurations will make the transparent hugepage changes much simpler. In the end what we really want to do is have the kernel support multiple huge page sizes and use whatever is appropriate as the context dictactes. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	readahead: fault retry breaks mmap file read random detection	Shaohua Li
	.fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	mm: hugetlb: add arch hook for clearing page flags before entering pool	Will Deacon
	The core page allocator ensures that page flags are zeroed when freeing pages via free_pages_check. A number of architectures (ARM, PPC, MIPS) rely on this property to treat new pages as dirty with respect to the data cache and perform the appropriate flushing before mapping the pages into userspace. This can lead to cache synchronisation problems when using hugepages, since the allocator keeps its own pool of pages above the usual page allocator and does not reset the page flags when freeing a page into the pool. This patch adds a new architecture hook, arch_clear_hugepage_flags, so that architectures which rely on the page flags being in a particular state for fresh allocations can adjust the flags accordingly when a page is freed into the pool. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Michal Hocko <mhocko@suse.cz> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	mm: kill vma flag VM_RESERVED and mm->reserved_vm counter	Konstantin Khlebnikov
	A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: \| effect \| alternative flags -+------------------------+--------------------------------------------- 1\| account as reserved_vm \| VM_IO 2\| skip in core dump \| VM_IO, VM_DONTDUMP 3\| do not merge or expand \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4\| do not mlock \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND \| VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO\|VM_DONTEXPAND\|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND \| VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry	Catalin Marinas
	Introduce SYSCTL_EXCEPTION_TRACE config option and selec it in the architectures requiring support for the "exception-trace" debug_table entry in kernel/sysctl.c. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	Kconfig: clean up the long arch list for the DEBUG_BUGVERBOSE config option	Catalin Marinas
	Introduce HAVE_DEBUG_BUGVERBOSE config option and select it in corresponding architecture Kconfig files. Architectures that already select GENERIC_BUG don't need to select HAVE_DEBUG_BUGVERBOSE. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Howells <dhowells@redhat.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	Kconfig: clean up the long arch list for the DEBUG_KMEMLEAK config option	Catalin Marinas
	Introduce HAVE_DEBUG_KMEMLEAK config option and select it in corresponding architecture Kconfig files. DEBUG_KMEMLEAK now only depends on HAVE_DEBUG_KMEMLEAK. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	Kconfig: clean up the long arch list for the UID16 config option	Catalin Marinas
	Introduce HAVE_UID16 config option and select it in corresponding architecture Kconfig files. UID16 now only depends on HAVE_UID16. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09	Merge tag 'asm-generic' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "This has three changes for asm-generic that did not really fit into any other branch as normal asm-generic changes do. One is a fix for a build warning, the other two are more interesting: * A patch from Mark Brown to allow using the common clock infrastructure on all architectures, so we can use the clock API in architecture independent device drivers. * The UAPI split patches from David Howells for the asm-generic files. There are other architecture specific series that are going through the arch maintainer tree and that depend on this one. There may be a few small merge conflicts between Mark's patch and the following arch header file split patches. In each case the solution will be to keep the new "generic-y += clkdev.h" line, even if it ends up being the only line in the Kbuild file." * tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: UAPI: (Scripted) Disintegrate include/asm-generic asm-generic: Add default clkdev.h asm-generic: xor: mark static functions as __maybe_unused
2012-10-09	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc	Linus Torvalds
	Pull sparc changes from David S Miller: "There is an attempt to fix a bad interaction between syscall tracing and force_successful_syscall() from Al Viro, but it needs to be redone as it introduced regressions and thus had to be reverted for now. Al is working on an updated version. But what we do have here are some significant bzero/memset improvements for Niagara-4. An 8K page can be cleared in around 600 cycles, because we essentially have a store that behaves like powerpc's dcbz that we can actually make real use of." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: Revert strace hiccups fix. sparc64: Niagara-4 bzero/memset, plus use MRU stores in page copy. sparc64: Fix strace hiccups when force_successful_syscall() triggers. sparc64: Rearrange thread info to cheaply clear syscall noerror state.
2012-10-05	Revert strace hiccups fix.	David S. Miller
	This reverts commit 40138249c3b7a0762155216b963ec7fd4d09b5b4 and ffa9009c9828db3f74178e459cfbca6e77ff5dd9. There are problems with how the flag bytes were rearranged, in particular we really can't move values down into the lowest 16 bits since those are used for individual state bits. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-05	sparc64: Niagara-4 bzero/memset, plus use MRU stores in page copy.	David S. Miller
	This adds optimized memset/bzero/page-clear routines for Niagara-4. We basically can do what powerpc has been able to do for a decade (via the "dcbz" instruction), which is use cache line clearing stores for bzero and memsets with a 'c' argument of zero. As long as we make the cache initializing store to each 32-byte subblock of the L2 cache line, it works. As with other Niagara-4 optimized routines, the key is to make sure to avoid any usage of the %asi register, as reads and writes to it cost at least 50 cycles. For the user clear cases, we don't use these new routines, we use the Niagara-1 variants instead. Those have to use %asi in an unavoidable way. A Niagara-4 8K page clear costs just under 600 cycles. Add definitions of the MRU variants of the cache initializing store ASIs. By default, cache initializing stores install the line as Least Recently Used. If we know we're going to use the data immediately (which is true for page copies and clears) we can use the Most Recently Used variant, to decrease the likelyhood of the lines being evicted before they get used. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-06	compat: move compat_siginfo_t definition to asm/compat.h	Denys Vlasenko
	This is a preparatory patch for the introduction of NT_SIGINFO elf note. Make the location of compat_siginfo_t uniform across eight architectures which have it. Now it can be pulled in by including asm/compat.h or linux/compat.h. Most of the copies are verbatim. compat_uid[32]_t had to be replaced by __compat_uid[32]_t. compat_uptr_t had to be moved up before compat_siginfo_t in asm/compat.h on a several architectures (tile already had it moved up). compat_sigval_t had to be relocated from linux/compat.h to asm/compat.h. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06	cross-arch: don't corrupt personality flags upon exec()	Jiri Kosina
	Historically, the top three bytes of personality have been used for things such as ADDR_NO_RANDOMIZE, which made sense only for specific architectures. We now however have a flag there that is general no matter the architecture (UNAME26); generally we have to be careful to preserve the personality flags across exec(). This patch tries to fix all architectures that forcefully overwrite personality flags during exec() (ppc32 and s390 have been fixed recently by commits f9783ec862ea ("[S390] Do not clobber personality flags on exec") and 59e4c3a2fe9c ("powerpc/32: Don't clobber personality flags on exec") in a similar way already). Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mark Salter <msalter@redhat.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-04	sparc64: Fix strace hiccups when force_successful_syscall() triggers.	Al Viro
	When force_successful_syscall() triggers, the syscall return status reported the ptrace applications gets garbled. Fix this by reordering the events and tests in the ret_sys_call path. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04	sparc64: Rearrange thread info to cheaply clear syscall noerror state.	Al Viro
	After fixing a couple of brainos, it even seems to work. What's done here is move of ->syscall_noerror right before FPDEPTH byte in ->flags and using sth to [%g6 + TI_SYS_NOERROR] instead of stb to [%g6 + TI_FPDEPTH] in both branches of etrap_save. AFAICS, that ought to be solid. Again, deciding what to do with now unused delay slot of branch on ->syscall_noerror and dealing with the order of tests in ret_from_sys is a separate question, but at least that way we don't have to clean ->syscall_noerror in there at all. AFAICS, it ought to be a clear win - sth is not going to cost more than stb on etrap_64.S side of things, and we are losing write on syscalls.S one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>