summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2012-07-03powerpc/pseries: Disable interrupts around IOMMU percpu data accessesAnton Blanchard
tce_buildmulti_pSeriesLP uses a per cpu page to communicate with the hypervisor. We currently rely on the IOMMU table spinlock but subsequent patches will be removing that so disable interrupts around all accesses of tce_page. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: POWER7 optimised memcpy using VMX and enhanced prefetchAnton Blanchard
Implement a POWER7 optimised memcpy using VMX and enhanced prefetch instructions. This is a copy of the POWER7 optimised copy_to_user/copy_from_user loop. Detailed implementation and performance details can be found in commit a66086b8197d (powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX). I noticed memcpy issues when profiling a RAID6 workload: .memcpy .async_memcpy .async_copy_data .__raid_run_ops .handle_stripe .raid5d .md_thread I created a simplified testcase by building a RAID6 array with 4 1GB ramdisks (booting with brd.rd_size=1048576): # mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3] I then timed how long it took to write to the entire array: # dd if=/dev/zero of=/dev/md0 bs=1M Before: 892 MB/s After: 999 MB/s A 12% improvement. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_userAnton Blanchard
Version 2.06 of the POWER ISA introduced enhanced touch instructions, allowing us to specify a number of attributes including the length of a stream. This patch adds a software stream for both loads and stores in the POWER7 copy_tofrom_user loop. Since the setup is quite complicated and we have to use an eieio to ensure correct ordering of the "GO" command we only do this for copies above 4kB. To quantify any performance improvements we need a working set bigger than the caches so we operate on a 1GB file: # dd if=/dev/zero of=/tmp/foo bs=1M count=1024 And we compare how fast we can read the file: # dd if=/tmp/foo of=/dev/null bs=1M before: 7.7 GB/s after: 9.6 GB/s A 25% improvement. The worst case for this patch will be a completely L1 cache contained copy of just over 4kB. We can test this with the copy_to_user testcase we used to tune copy_tofrom_user originally: http://ozlabs.org/~anton/junkcode/copy_to_user.c # time ./copy_to_user2 -l 4224 -i 10000000 before: 6.807 s after: 6.946 s A 2% slowdown, which seems reasonable considering our data is unlikely to be completely L1 contained. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc/pci: cleanup on duplicate assignmentGavin Shan
While creating the PCI root bus through function pci_create_root_bus() of PCI core, it should have assigned the secondary bus number for the newly created PCI root bus. Thus we needn't do the explicit assignment for the secondary bus number again in pcibios_scan_phb(). Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc/numa: Fix OF node refcounting bugGavin Shan
The form affinity for NUMA is set to 1 if the firmware supports OPAL. Otherwise, we have to retrieve that from OF node "/chosen". For the latter case, OF node "/chosen" reference count was never decreased. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: POWER7 optimised copy_page using VMX and enhanced prefetchAnton Blanchard
Implement a POWER7 optimised copy_page using VMX and enhanced prefetch instructions. We use enhanced prefetch hints to prefetch both the load and store side. We copy a cacheline at a time and fall back to regular loads and stores if we are unable to use VMX (eg we are in an interrupt). The following microbenchmark was used to assess the impact of the patch: http://ozlabs.org/~anton/junkcode/page_fault_file.c We test MAP_PRIVATE page faults across a 1GB file, 100 times: # time ./page_fault_file -p -l 1G -i 100 Before: 22.25s After: 18.89s 17% faster Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: Rename copyuser_power7_vmx.c to vmx-helper.cAnton Blanchard
Subsequent patches will add more VMX library functions and it makes sense to keep all the c-code helper functions in the one file. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: Clear RI and EE at the same time in system call exitAnton Blanchard
mtmsrd is an expensive instruction, we save a few cycles by doing it once instead of twice. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_userAnton Blanchard
Version 2.06 of the POWER ISA introduced enhanced touch instructions, allowing us to specify a number of attributes including the length of a stream. This patch adds a software stream for both loads and stores in the POWER7 copy_tofrom_user loop. Since the setup is quite complicated and we have to use an eieio to ensure correct ordering of the "GO" command we only do this for copies above 4kB. To quantify any performance improvements we need a working set bigger than the caches so we operate on a 1GB file: # dd if=/dev/zero of=/tmp/foo bs=1M count=1024 And we compare how fast we can read the file: # dd if=/tmp/foo of=/dev/null bs=1M before: 7.7 GB/s after: 9.6 GB/s A 25% improvement. The worst case for this patch will be a completely L1 cache contained copy of just over 4kB. We can test this with the copy_to_user testcase we used to tune copy_tofrom_user originally: http://ozlabs.org/~anton/junkcode/copy_to_user.c # time ./copy_to_user2 -l 4224 -i 10000000 before: 6.807 s after: 6.946 s A 2% slowdown, which seems reasonable considering our data is unlikely to be completely L1 contained. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc/smp: remove call to ipi_call_lock()/ipi_call_unlock()Yong Zhang
1) call_function.lock used in smp_call_function_many() is just to protect call_function.queue and &data->refs, cpu_online_mask is outside of the lock. And it's not necessary to protect cpu_online_mask, because data->cpumask is pre-calculate and even if a cpu is brougt up when calling arch_send_call_function_ipi_mask(), it's harmless because validation test in generic_smp_call_function_interrupt() will take care of it. 2) For cpu down issue, stop_machine() will guarantee that no concurrent smp_call_fuction() is processing. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: 64bit optimised __clear_userAnton Blanchard
I noticed __clear_user high up in a profile of one of my RAID stress tests. The testcase was doing a dd from /dev/zero which ends up calling __clear_user. __clear_user is basically a loop with a single 4 byte store which is horribly slow. We can do much better by aligning the desination and doing 32 bytes of 8 byte stores in a loop. The following testcase was used to verify the patch: http://ozlabs.org/~anton/junkcode/stress_clear_user.c To show the improvement in performance I ran a dd from /dev/zero to /dev/null on a POWER7 box: Before: # dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s After: # time dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s Over 5x faster. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: tracing: Avoid tracepoint duplication with DECLARE_EVENT_CLASSAnton Blanchard
irq_entry, irq_exit, timer_interrupt_entry and timer_interrupt_exit all do the same thing so use DECLARE_EVENT_CLASS to avoid duplicating everything 4 times. This saves quite a lot of space in both instruction text and data: text data bss dec hex filename 9265 19622 16 28903 70e7 arch/powerpc/kernel/irq.o 6817 19019 16 25852 64fc arch/powerpc/kernel/irq.o Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: Enable jump label supportAnton Blanchard
When looking through some instruction traces I noticed our tracepoint checks were inline. It turns out we don't have CONFIG_JUMP_LABEL enabled. By enabling CONFIG_JUMP_LABEL we replace a load/compare/branch with a nop at every tracepoint call. For example in do_IRQ: CONFIG_JUMP_LABEL disabled: stdx 3,11,9 lwz 0,8(29) cmpwi 7,0,0 bne- 7,.L124 bl .irq_enter CONFIG_JUMP_LABEL enabled: stdx 3,11,9 nop bl .irq_enter Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc/pseries/cpuidle: Replace pseries_notify_cpuidle_add call with notifierDeepthi Dharwar
The following patch is to remove the pseries_notify_add_cpu() call and replace it by a hot plug notifier. This would prevent cpuidle resources being released and allocated each time cpu comes online on pseries. The earlier design was causing a lockdep problem in start_secondary as reported on this thread -https://lkml.org/lkml/2012/5/17/2 This applies on 3.4-rc7 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc/pseries/iommu: remove default window before attempting DDW manipulationNishanth Aravamudan
An upcoming release of firmware will add DDW extensions, in particular an API to "reset" the DMA window to the original configuration (32-bit, 2GB in size). With that API available, we can safely remove the default window, increasing the resources available to firmware for creation of larger windows for the slot in question -- if we encounter an error, we can use the new API to reset the state of the slot. Further, this same release of firmware will make it a hard requirement for OSes to release the existing window before any other windows will be shown as available, to avoid conflicts in addressing between the two windows. In anticipation of these changes, always remove the default window before we do any DDW manipulations. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc/ftrace: Use patch_instruction instead of probe_kernel_write()Steven Rostedt
The patch_instruction() interface is made to modify kernel text. It is safer to use that then the probe_kernel_write() when modifying kernel code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc: Have patch_instruction detect faultsSteven Rostedt
For ftrace to use the patch_instruction code, it needs to check for faults on write. Ftrace updates code all over the kernel, and we need to know if code is updated or not due to protections that are placed on some portions of the kernel. If ftrace does not detect a fault, it will error later on, and it will be much more difficult to find the problem. By changing patch_instruction() to detect faults, then ftrace will be able to make use of it too. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc/ftrace: Have PPC skip updating with stop_machine()Steven Rostedt
PowerPC does not have the synchronization issues that x86 has with modifying code on one CPU while another CPU is executing it. The other CPU will either see the old or new code without any issues, unlike x86 which may issue a GPF. Instead of calling the heavy stop_machine, just update the code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-03powerpc/boot: Only build board support files when required.Tony Breeds
Currently we build all board files regardless of the final zImage target. This is sub-optimal (in terms on compilation) and leads to problems in one platform needlessly causing failures for other platforms. Use the Kconfig variables to selectively construct this board files to build. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-02powerpc/kvm: sldi should be sldMichael Neuling
Since we are taking a registers, this should never have been an sldi. Talking to paulus offline, this is the correct fix. Was introduced by: commit 19ccb76a1938ab364a412253daec64613acbf3df Author: Paul Mackerras <paulus@samba.org> Date: Sat Jul 23 17:42:46 2011 +1000 Talking to paulus, this shouldn't be a literal. Signed-off-by: Michael Neuling <mikey@neuling.org> CC: <stable@kernel.org> [v3.2+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-02powerpc/xmon: Use cpumask iterator to avoid warningAnton Blanchard
We have a bug report where the kernel hits a warning in the cpumask code: WARNING: at include/linux/cpumask.h:107 Which is: WARN_ON_ONCE(cpu >= nr_cpumask_bits); The backtrace is: cpu_cmd cmds xmon_core xmon die xmon is iterating through 0 to NR_CPUS. I'm not sure why we are still open coding this but iterating above nr_cpu_ids is definitely a bug. This patch iterates through all possible cpus, in case we issue a system reset and CPUs in an offline state call in. Perhaps the old code was trying to handle CPUs that were in the partition but were never started (eg kexec into a kernel with an nr_cpus= boot option). They are going to die way before we get into xmon since we haven't set any kernel state up for them. Signed-off-by: Anton Blanchard <anton@samba.org> CC: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-29powerpc/pseries: Fix software invalidate TCEMichael Neuling
The following added support for powernv but broke pseries/BML: 1f1616e powerpc/powernv: Add TCE SW invalidation support TCE_PCI_SW_INVAL was split into FREE and CREATE flags but the tests in the pseries code were not updated to reflect this. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: stable@kernel.org [v3.3+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-29powerpc: check_and_cede_processor() never cedesAnton Blanchard
Commit f948501b36c6 ("Make hard_irq_disable() actually hard-disable interrupts") caused check_and_cede_processor to stop working. ->irq_happened will never be zero right after a hard_irq_disable so the compiler removes the call to cede_processor completely. The bug was introduced back in the lazy interrupt handling rework of 3.4 but was hidden until recently because hard_irq_disable did nothing. This issue will eventually appear in 3.4 stable since the hard_irq_disable fix is marked stable, so mark this one for stable too. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-29powerpc/ftrace: Do not trace restore_interrupts()Steven Rostedt
As I was adding code that affects all archs, I started testing function tracer against PPC64 and found that it currently locks up with 3.4 kernel. I figured it was due to tracing a function that shouldn't be, so I went through the following process to bisect to find the culprit: cat /debug/tracing/available_filter_functions > t num=`wc -l t` sed -ne "1,${num}p" t > t1 let num=num+1 sed -ne "${num},$p" t > t2 cat t1 > /debug/tracing/set_ftrace_filter echo function /debug/tracing/current_tracer <failed? bisect t1, if not bisect t2> It finally came down to this function: restore_interrupts() I'm not sure why this locks up the system. It just seems to prevent scheduling from occurring. Interrupts seem to still work, as I can ping the box. But all user processes freeze. When restore_interrupts() is not traced, function tracing works fine. Cc: stable@kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-29powerpc: Fix Section mismatch warnings in prom_init.cLi Zhong
This patches tries to fix a couple of Section mismatch warnings like following one: WARNING: arch/powerpc/kernel/built-in.o(.text+0x2923c): Section mismatch in reference from the function .prom_query_opal() to the function .init.text:.call_prom() The function .prom_query_opal() references the function __init .call_prom(). This is often because .prom_query_opal lacks a __init annotation or the annotation of .call_prom is wrong. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-29ppc64: fix missing to check all bits of _TIF_USER_WORK_MASK in preemptTiejun Chen
In entry_64.S version of ret_from_except_lite, you'll notice that in the !preempt case, after we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to decide whether to go to do_work or not. However, in the preempt case, we do a convoluted trick to test SIGPENDING only if PR was set and always test NEED_RESCHED ... but we forget to test any other bit of _TIF_USER_WORK_MASK !!! So that means that with preempt, we completely fail to test for things like single step, syscall tracing, etc... This should be fixed as the following path: - Test PR. If not set, go to resume_kernel, else continue. - If go resume_kernel, to do that original do_work. - If else, then always test for _TIF_USER_WORK_MASK to decide to do that original user_work, else restore directly. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-29powerpc: Fix uninitialised error in numa.cMichael Neuling
chroma_defconfig currently gives me this with gcc 4.6: arch/powerpc/mm/numa.c:638:13: error: 'dm' may be used uninitialized in this function [-Werror=uninitialized] It's a bogus warning/error since of_get_drconf_memory() only writes it anyway. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: <stable@kernel.org> [v3.3+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-29powerpc: Fix BPF_JIT code to link with multiple TOCsMichael Ellerman
If the kernel is big enough (eg. allyesconfig), the linker may need to switch TOCs when calling from the BPF JIT code out to the external helpers (skb_copy_bits() & bpf_internal_load_pointer_neg_helper()). In order to do that we need to leave space after the bl for the linker to insert a reload of our TOC pointer. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-24Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Avi Kivity: "Fixing a scheduling-while-atomic bug in the ppc code, and a bug which allowed pci bridges to be assigned to guests." * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Drop locks around call to kvmppc_pin_guest_page KVM: Fix PCI header check on device assignment
2012-06-20Merge tag 'driver-core-3.5-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and printk fixes from Greg Kroah-Hartman: "Here are some fixes for 3.5-rc4 that resolve the kmsg problems that people have reported showing up after the printk and kmsg changes went into 3.5-rc1. There are also a smattering of other tiny fixes for the extcon and hyper-v drivers that people have reported. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: extcon: max8997: Add missing kfree for info->edev in max8997_muic_remove() extcon: Set platform drvdata in gpio_extcon_probe() and fix irq leak extcon: Fix wrong index in max8997_extcon_cable[] kmsg - kmsg_dump() fix CONFIG_PRINTK=n compilation printk: return -EINVAL if the message len is bigger than the buf size printk: use mutex lock to stop syslog_seq from going wild kmsg - kmsg_dump() use iterator to receive log buffer content vme: change maintainer e-mail address Extcon: Don't try to create duplicate link names driver core: fixup reversed deferred probe order printk: Fix alignment of buf causing crash on ARM EABI Tools: hv: verify origin of netlink connector message
2012-06-19KVM: PPC: Book3S HV: Drop locks around call to kvmppc_pin_guest_pagePaul Mackerras
At the moment we call kvmppc_pin_guest_page() in kvmppc_update_vpa() with two spinlocks held: the vcore lock and the vcpu->vpa_update_lock. This is not good, since kvmppc_pin_guest_page() calls down_read() and get_user_pages_fast(), both of which can sleep. This bug was introduced in 2e25aa5f ("KVM: PPC: Book3S HV: Make virtual processor area registration more robust"). This arranges to drop those spinlocks before calling kvmppc_pin_guest_page() and re-take them afterwards. Dropping the vcore lock in kvmppc_run_core() means we have to set the vcore_state field to VCORE_RUNNING before we drop the lock, so that other vcpus won't try to run this vcore. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-15kmsg - kmsg_dump() use iterator to receive log buffer contentKay Sievers
Provide an iterator to receive the log buffer content, and convert all kmsg_dump() users to it. The structured data in the kmsg buffer now contains binary data, which should no longer be copied verbatim to the kmsg_dump() users. The iterator should provide reliable access to the buffer data, and also supports proper log line-aware chunking of data while iterating. Signed-off-by: Kay Sievers <kay@vrfy.org> Tested-by: Tony Luck <tony.luck@intel.com> Reported-by: Anton Vorontsov <anton.vorontsov@linaro.org> Tested-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-15Make hard_irq_disable() actually hard-disable interruptsPaul Mackerras
At present, hard_irq_disable() does nothing on powerpc because of this code in include/linux/interrupt.h: #ifndef hard_irq_disable #define hard_irq_disable() do { } while(0) #endif So we need to make our hard_irq_disable be a macro. It was previously a macro until commit 7230c56441 ("powerpc: Rework lazy-interrupt handling") changed it to a static inline function. Cc: stable@vger.kernel.org Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> -- arch/powerpc/include/asm/hw_irq.h | 3 +++ 1 file changed, 3 insertions(+)
2012-06-08powerpc: Fix kernel panic during kernel module loadSteffen Rumler
This fixes a problem which can causes kernel oopses while loading a kernel module. According to the PowerPC EABI specification, GPR r11 is assigned the dedicated function to point to the previous stack frame. In the powerpc-specific kernel module loader, do_plt_call() (in arch/powerpc/kernel/module_32.c), GPR r11 is also used to generate trampoline code. This combination crashes the kernel, in the case where the compiler chooses to use a helper function for saving GPRs on entry, and the module loader has placed the .init.text section far away from the .text section, meaning that it has to generate a trampoline for functions in the .init.text section to call the GPR save helper. Because the trampoline trashes r11, references to the stack frame using r11 can cause an oops. The fix just uses GPR r12 instead of GPR r11 for generating the trampoline code. According to the statements from Freescale, this is safe from an EABI perspective. I've tested the fix for kernel 2.6.33 on MPC8541. Cc: stable@vger.kernel.org Signed-off-by: Steffen Rumler <steffen.rumler.ext@nsn.com> [paulus@samba.org: reworded the description] Signed-off-by: Paul Mackerras <paulus@samba.org>
2012-06-08powerpc/time: Sanity check of decrementer expiration is necessaryPaul Mackerras
This reverts 68568add2c ("powerpc/time: Remove unnecessary sanity check of decrementer expiration"). We do need to check whether we have reached the expiration time of the next event, because we sometimes get an early decrementer interrupt, most notably when we set the decrementer to 1 in arch_irq_work_raise(). The effect of not having the sanity check is that if timer_interrupt() gets called early, we leave the decrementer set to its maximum value, which means we then don't get any more decrementer interrupts for about 4 seconds (or longer, depending on timebase frequency). I saw these pauses as a consequence of getting a stray hypervisor decrementer interrupt left over from exiting a KVM guest. This isn't quite a straight revert because of changes to the surrounding code, but it restores the same algorithm as was previously used. Cc: stable@vger.kernel.org Acked-by: Anton Blanchard <anton@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2012-06-02powerpc: Fix size of st_nlink on 64bitAnton Blanchard
commit e57f93cc53b7 (powerpc: get rid of nlink_t uses, switch to explicitly-sized type) changed the size of st_nlink on ppc64 from a long to a short, resulting in boot failures. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull third pile of signal handling patches from Al Viro: "This time it's mostly helpers and conversions to them; there's a lot of stuff remaining in the tree, but that'll either go in -rc2 (isolated bug fixes, ideally via arch maintainers' trees) or will sit there until the next cycle." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: x86: get rid of calling do_notify_resume() when returning to kernel mode blackfin: check __get_user() return value whack-a-mole with TIF_FREEZE FRV: Optimise the system call exit path in entry.S [ver #2] FRV: Shrink TIF_WORK_MASK [ver #2] FRV: Prevent syscall exit tracing and notify_resume at end of kernel exceptions new helper: signal_delivered() powerpc: get rid of restore_sigmask() most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set set_restore_sigmask() is never called without SIGPENDING (and never should be) TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set don't call try_to_freeze() from do_signal() pull clearing RESTORE_SIGMASK into block_sigmask() sh64: failure to build sigframe != signal without handler openrisc: tracehook_signal_handler() is supposed to be called on success new helper: sigmask_to_save() new helper: restore_saved_sigmask() new helpers: {clear,test,test_and_clear}_restore_sigmask() HAVE_RESTORE_SIGMASK is defined on all architectures now
2012-06-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs changes from Al Viro. "A lot of misc stuff. The obvious groups: * Miklos' atomic_open series; kills the damn abuse of ->d_revalidate() by NFS, which was the major stumbling block for all work in that area. * ripping security_file_mmap() and dealing with deadlocks in the area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in general. * ->encode_fh() switched to saner API; insane fake dentry in mm/cleancache.c gone. * assorted annotations in fs (endianness, __user) * parts of Artem's ->s_dirty work (jff2 and reiserfs parts) * ->update_time() work from Josef. * other bits and pieces all over the place. Normally it would've been in two or three pull requests, but signal.git stuff had eaten a lot of time during this cycle ;-/" Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the 'truncate_range' inode method was removed by the VM changes, the VFS update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due to sparse fix added twice, with other changes nearby). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits) nfs: don't open in ->d_revalidate vfs: retry last component if opening stale dentry vfs: nameidata_to_filp(): don't throw away file on error vfs: nameidata_to_filp(): inline __dentry_open() vfs: do_dentry_open(): don't put filp vfs: split __dentry_open() vfs: do_last() common post lookup vfs: do_last(): add audit_inode before open vfs: do_last(): only return EISDIR for O_CREAT vfs: do_last(): check LOOKUP_DIRECTORY vfs: do_last(): make ENOENT exit RCU safe vfs: make follow_link check RCU safe vfs: do_last(): use inode variable vfs: do_last(): inline walk_component() vfs: do_last(): make exit RCU safe vfs: split do_lookup() Btrfs: move over to use ->update_time fs: introduce inode operation ->update_time reiserfs: get rid of resierfs_sync_super reiserfs: mark the superblock as dirty a bit later ...
2012-06-01new helper: signal_delivered()Al Viro
Does block_sigmask() + tracehook_signal_handler(); called when sigframe has been successfully built. All architectures converted to it; block_sigmask() itself is gone now (merged into this one). I'm still not too happy with the signature, but that's a separate story (IMO we need a structure that would contain signal number + siginfo + k_sigaction, so that get_signal_to_deliver() would fill one, signal_delivered(), handle_signal() and probably setup...frame() - take one). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01powerpc: get rid of restore_sigmask()Al Viro
... it's just a call of set_current_blocked() now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from setAl Viro
Only 3 out of 63 do not. Renamed the current variant to __set_current_blocked(), added set_current_blocked() that will exclude unblockable signals, switched open-coded instances to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01set_restore_sigmask() is never called without SIGPENDING (and never should be)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01pull clearing RESTORE_SIGMASK into block_sigmask()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01new helper: sigmask_to_save()Al Viro
replace boilerplate "should we use ->saved_sigmask or ->blocked?" with calls of obvious inlined helper... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01new helper: restore_saved_sigmask()Al Viro
first fruits of ..._restore_sigmask() helpers: now we can take boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK and restore the blocked mask from ->saved_mask" into a common helper. Open-coded instances switched... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01new helpers: {clear,test,test_and_clear}_restore_sigmask()Al Viro
helpers parallel to set_restore_sigmask(), used in the next commits Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull second pile of signal handling patches from Al Viro: "This one is just task_work_add() series + remaining prereqs for it. There probably will be another pull request from that tree this cycle - at least for helpers, to get them out of the way for per-arch fixes remaining in the tree." Fix trivial conflict in kernel/irq/manage.c: the merge of Andrew's pile had brought in commit 97fd75b7b8e0 ("kernel/irq/manage.c: use the pr_foo() infrastructure to prefix printks") which changed one of the pr_err() calls that this merge moves around. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: keys: kill task_struct->replacement_session_keyring keys: kill the dummy key_replace_session_keyring() keys: change keyctl_session_to_parent() to use task_work_add() genirq: reimplement exit_irq_thread() hook via task_work_add() task_work_add: generic process-context callbacks avr32: missed _TIF_NOTIFY_RESUME on one of do_notify_resume callers parisc: need to check NOTIFY_RESUME when exiting from syscall move key_repace_session_keyring() into tracehook_notify_resume() TIF_NOTIFY_RESUME is defined on all targets now
2012-05-31powerpc: use clear_tasks_mm_cpumask()Anton Vorontsov
Current CPU hotplug code has some task->mm handling issues: 1. Working with task->mm w/o getting mm or grabing the task lock is dangerous as ->mm might disappear (exit_mm() assigns NULL under task_lock(), so tasklist lock is not enough). We can't use get_task_mm()/mmput() pair as mmput() might sleep, so we must take the task lock while handle its mm. 2. Checking for process->mm is not enough because process' main thread may exit or detach its mm via use_mm(), but other threads may still have a valid mm. To fix this we would need to use find_lock_task_mm(), which would walk up all threads and returns an appropriate task (with task lock held). clear_tasks_mm_cpumask() has all the issues fixed, so let's use it. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-30bury __kernel_nlink_t, make internal nlink_t consistentAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30powerpc: get rid of nlink_t uses, switch to explicitly-sized typeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>