summaryrefslogtreecommitdiff
path: root/arch/mips/include/asm/barrier.h
AgeCommit message (Collapse)Author
2019-10-07MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3Paul Burton
Loongson3 systems with CONFIG_CPU_LOONGSON3_WORKAROUNDS enabled already emit a full completion barrier as part of the inline assembly containing LL/SC loops for atomic operations. As such the barrier emitted by __smp_mb__before_atomic() is redundant, and we can remove it. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-10-07MIPS: barrier: Remove loongson_llsc_mb()Paul Burton
The loongson_llsc_mb() macro is no longer used - instead barriers are emitted as part of inline asm using the __SYNC() macro. Remove the now-defunct loongson_llsc_mb() macro. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-10-07MIPS: futex: Emit Loongson3 sync workarounds within asmPaul Burton
Generate the sync instructions required to workaround Loongson3 LL/SC errata within inline asm blocks, which feels a little safer than doing it from C where strictly speaking the compiler would be well within its rights to insert a memory access between the separate asm statements we previously had, containing sync & ll instructions respectively. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-10-07MIPS: barrier: Clean up sync_ginv()Paul Burton
Use the new __SYNC() infrastructure to implement sync_ginv(), for consistency with much of the rest of the asm/barrier.h. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-10-07MIPS: barrier: Clean up __sync() definitionPaul Burton
Implement __sync() using the new __SYNC() infrastructure, which will take care of not emitting an instruction for old R3k CPUs that don't support it. The only behavioral difference is that __sync() will now provide a compiler barrier on these old CPUs, but that seems like reasonable behavior anyway. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-10-07MIPS: barrier: Remove fast_mb() Octeon #ifdef'eryPaul Burton
The definition of fast_mb() is the same in both the Octeon & non-Octeon cases, so remove the duplication & define it only once. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-10-07MIPS: barrier: Clean up __smp_mb() definitionPaul Burton
We #ifdef on Cavium Octeon CPUs, but emit the same sync instruction in both cases. Remove the #ifdef & simply expand to the __sync() macro. Whilst here indent the strong ordering case definitions to match the indentation of the weak ordering ones, helping readability. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-10-07MIPS: barrier: Clean up rmb() & wmb() definitionsPaul Burton
Simplify our definitions of rmb() & wmb() using the new __SYNC() infrastructure. The fast_rmb() & fast_wmb() macros are removed, since they only provided a level of indirection that made the code less readable & weren't directly used anywhere in the kernel tree. The Octeon #ifdef'ery is removed, since the "syncw" instruction previously used is merely an alias for "sync 4" which __SYNC() will emit for the wmb sync type when the kernel is configured for an Octeon CPU. Similarly __SYNC() will emit nothing for the rmb sync type in Octeon configurations. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-10-07MIPS: barrier: Add __SYNC() infrastructurePaul Burton
Introduce an asm/sync.h header which provides infrastructure that can be used to generate sync instructions of various types, and for various reasons. For example if we need a sync instruction that provides a full completion barrier but only on systems which have weak memory ordering, we can generate the appropriate assembly code using: __SYNC(full, weak_ordering) When the kernel is configured to run on systems with weak memory ordering (ie. CONFIG_WEAK_ORDERING is selected) we'll emit a sync instruction. When the kernel is configured to run on systems with strong memory ordering (ie. CONFIG_WEAK_ORDERING is not selected) we'll emit nothing. The caller doesn't need to know which happened - it simply says what it needs & when, with no concern for checking the kernel configuration. There are some scenarios in which we may want to emit code only when we *didn't* emit a sync instruction. For example, some Loongson3 CPUs suffer from a bug that requires us to emit a sync instruction prior to each ll instruction (enabled by CONFIG_CPU_LOONGSON3_WORKAROUNDS). In cases where this bug workaround is enabled, it's wasteful to then have more generic code emit another sync instruction to provide barriers we need in general. A __SYNC_ELSE() macro allows for this, providing an extra argument that contains code to be assembled only in cases where the sync instruction was not emitted. For example if we have a scenario in which we generally want to emit a release barrier but for affected Loongson3 configurations upgrade that to a full completion barrier, we can do that like so: __SYNC_ELSE(full, loongson3_war, __SYNC(rl, always)) The assembly generated by these macros can be used either as inline assembly or in assembly source files. Differing types of sync as provided by MIPSr6 are defined, but currently they all generate a full completion barrier except in kernels configured for Cavium Octeon systems. There the wmb sync-type is used, and rmb syncs are omitted, as has been the case since commit 6b07d38aaa52 ("MIPS: Octeon: Use optimized memory barrier primitives."). Using __SYNC() with the wmb or rmb types will abstract away the Octeon specific behavior and allow us to later clean up asm/barrier.h code that currently includes a plethora of #ifdef's. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: linux-kernel@vger.kernel.org
2019-08-31mips/atomic: Fix smp_mb__{before,after}_atomic()Peter Zijlstra
Recent probing at the Linux Kernel Memory Model uncovered a 'surprise'. Strongly ordered architectures where the atomic RmW primitive implies full memory ordering and smp_mb__{before,after}_atomic() are a simple barrier() (such as MIPS without WEAK_REORDERING_BEYOND_LLSC) fail for: *x = 1; atomic_inc(u); smp_mb__after_atomic(); r0 = *y; Because, while the atomic_inc() implies memory order, it (surprisingly) does not provide a compiler barrier. This then allows the compiler to re-order like so: atomic_inc(u); *x = 1; smp_mb__after_atomic(); r0 = *y; Which the CPU is then allowed to re-order (under TSO rules) like: atomic_inc(u); r0 = *y; *x = 1; And this very much was not intended. Therefore strengthen the atomic RmW ops to include a compiler barrier. Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul Burton <paul.burton@mips.com>
2019-08-31mips/atomic: Fix loongson_llsc_mb() wreckagePeter Zijlstra
The comment describing the loongson_llsc_mb() reorder case doesn't make any sense what so ever. Instruction re-ordering is not an SMP artifact, but rather a CPU local phenomenon. Clarify the comment by explaining that these issue cause a coherence fail. For the branch speculation case; if futex_atomic_cmpxchg_inatomic() needs one at the bne branch target, then surely the normal __cmpxch_asm() implementation does too. We cannot rely on the barriers from cmpxchg() because cmpxchg_local() is implemented with the same macro, and branch prediction and speculation are, too, CPU local. Fixes: e02e07e3127d ("MIPS: Loongson: Introduce and use loongson_llsc_mb()") Cc: Huacai Chen <chenhc@lemote.com> Cc: Huang Pei <huangpei@loongson.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul Burton <paul.burton@mips.com>
2019-03-05Merge tag 'mips_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds
Pull MIPS updates from Paul Burton: - Support for the MIPSr6 MemoryMapID register & Global INValidate TLB (GINVT) instructions, allowing for more efficient TLB maintenance when running on a CPU such as the I6500 that supports these. - Enable huge page support for MIPS64r6. - Optimize post-DMA cache sync by removing that code entirely for kernel configurations in which we know it won't be needed. - The number of pages allocated for interrupt stacks is now calculated correctly, where before we would wastefully allocate too much memory in some configurations. - The ath79 platform migrates to devicetree. - The bcm47xx platform sees fixes for the Buffalo WHR-G54S board. - The ingenic/jz4740 platform gains support for appended devicetrees. - The cavium_octeon, lantiq, loongson32 & sgi-ip27 platforms all see cleanups as do various pieces of core architecture code. * tag 'mips_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (66 commits) MIPS: lantiq: Remove separate GPHY Firmware loader MIPS: ingenic: Add support for appended devicetree MIPS: SGI-IP27: rework HUB interrupts MIPS: SGI-IP27: do boot CPU init later MIPS: SGI-IP27: do xtalk scanning later MIPS: SGI-IP27: use pr_info/pr_emerg and pr_cont to fix output MIPS: SGI-IP27: clean up bridge access and header files MIPS: SGI-IP27: get rid of volatile and hubreg_t MIPS: irq: Allocate accurate order pages for irq stack MIPS: dma-noncoherent: Remove bogus condition in dma_sync_phys() MIPS: eBPF: Remove REG_32BIT_ZERO_EX MIPS: eBPF: Always return sign extended 32b values MIPS: CM: Fix indentation MIPS: BCM47XX: Fix/improve Buffalo WHR-G54S support MIPS: OCTEON: program rx/tx-delay always from DT MIPS: OCTEON: delete board-specific link status MIPS: OCTEON: don't lie about interface type of CN3005 board MIPS: OCTEON: warn if deprecated link status is being used MIPS: OCTEON: add fixed-link nodes to in-kernel device tree MIPS: Delete unused flush_cache_sigtramp() ...
2019-02-04MIPS: Add GINVT instruction helpersPaul Burton
Add a family of ginvt_* functions making it easy to emit a GINVT instruction to globally invalidate TLB entries. We make use of the _ASM_MACRO infrastructure to support emitting the instructions even if the assembler isn't new enough to support them natively. An associated STYPE_GINV definition & sync_ginv() function are added to emit a sync instruction of type 0x14, which operates as a completion barrier for these new GINVT (and GINVI) instructions. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org
2019-02-04MIPS: Loongson: Introduce and use loongson_llsc_mb()Huacai Chen
On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and lld/scd is very weak ordering. We should add sync instructions "before each ll/lld" and "at the branch-target between ll/sc" to workaround. Otherwise, this flaw will cause deadlock occasionally (e.g. when doing heavy load test with LTP). Below is the explaination of CPU designer: "For Loongson 3 family, when a memory access instruction (load, store, or prefetch)'s executing occurs between the execution of LL and SC, the success or failure of SC is not predictable. Although programmer would not insert memory access instructions between LL and SC, the memory instructions before LL in program-order, may dynamically executed between the execution of LL/SC, so a memory fence (SYNC) is needed before LL/LLD to avoid this situation. Since Loongson-3A R2 (3A2000), we have improved our hardware design to handle this case. But we later deduce a rarely circumstance that some speculatively executed memory instructions due to branch misprediction between LL/SC still fall into the above case, so a memory fence (SYNC) at branch-target (if its target is not between LL/SC) is needed for Loongson 3A1000, 3B1500, 3A2000 and 3A3000. Our processor is continually evolving and we aim to to remove all these workaround-SYNCs around LL/SC for new-come processor." Here is an example: Both cpu1 and cpu2 simutaneously run atomic_add by 1 on same atomic var, this bug cause both 'sc' run by two cpus (in atomic_add) succeed at same time('sc' return 1), and the variable is only *added by 1*, sometimes, which is wrong and unacceptable(it should be added by 2). Why disable fix-loongson3-llsc in compiler? Because compiler fix will cause problems in kernel's __ex_table section. This patch fix all the cases in kernel, but: +. the fix at the end of futex_atomic_cmpxchg_inatomic is for branch-target of 'bne', there other cases which smp_mb__before_llsc() and smp_llsc_mb() fix the ll and branch-target coincidently such as atomic_sub_if_positive/ cmpxchg/xchg, just like this one. +. Loongson 3 does support CONFIG_EDAC_ATOMIC_SCRUB, so no need to touch edac.h +. local_ops and cmpxchg_local should not be affected by this bug since only the owner can write. +. mips_atomic_set for syscall.c is deprecated and rarely used, just let it go Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Huang Pei <huangpei@loongson.cn> [paul.burton@mips.com: - Simplify the addition of -mno-fix-loongson3-llsc to cflags, and add a comment describing why it's there. - Make loongson_llsc_mb() a no-op when CONFIG_CPU_LOONGSON3_WORKAROUNDS=n, rather than a compiler memory barrier. - Add a comment describing the bug & how loongson_llsc_mb() helps in asm/barrier.h.] Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: ambrosehua@gmail.com Cc: Steven J . Hill <Steven.Hill@cavium.com> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: Li Xuefeng <lixuefeng@loongson.cn> Cc: Xu Chenghua <xuchenghua@loongson.cn>
2016-10-04MIPS: Barrier: Add definitions of SYNC stype valuesMatt Redfearn
Add the definitions of sync stype 0 (global completion barrier) and sync stype 0x10 (local ordering barrier) to barrier.h for use with the sync instruction. These types are defined by the MIPS Instruction Set since R2 of the architecture and are documented in document MD00087 table 6.5. Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Reviewed-by: Paul Burton <paul.burton@imgtec.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/14222/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-01-12mips: define __smp_xxxMichael S. Tsirkin
This defines __smp_xxx barriers for mips, for use by virtualization. smp_xxx barriers are removed as they are defined correctly by asm-generic/barriers.h Note: the only exception is smp_mb__before_llsc which is mips-specific. We define both the __smp_mb__before_llsc variant (for use in asm/barriers.h) and smp_mb__before_llsc (for use elsewhere on this architecture). Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12mips: reuse asm-generic/barrier.hMichael S. Tsirkin
On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends, smp_read_barrier_depends, smp_store_release and smp_load_acquire match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-08-03locking, arch: use WRITE_ONCE()/READ_ONCE() in ↵Andrey Konovalov
smp_store_release()/smp_load_acquire() Replace ACCESS_ONCE() macro in smp_store_release() and smp_load_acquire() with WRITE_ONCE() and READ_ONCE() on x86, arm, arm64, ia64, metag, mips, powerpc, s390, sparc and asm-generic since ACCESS_ONCE() does not work reliably on non-scalar types. WRITE_ONCE() and READ_ONCE() were introduced in the following commits: 230fa253df63 ("kernel: Provide READ_ONCE and ASSIGN_ONCE") 43239cbe79fc ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@suse.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1438528264-714-1-git-send-email-andreyknvl@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19locking/arch: Rename set_mb() to smp_store_mb()Peter Zijlstra
Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier like mb() rename it to match the recent smp_load_acquire() and smp_store_release(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19locking/arch: Add WRITE_ONCE() to set_mb()Peter Zijlstra
Since we assume set_mb() to result in a single store followed by a full memory barrier, employ WRITE_ONCE(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11arch: Add lightweight memory barriers dma_rmb() and dma_wmb()Alexander Duyck
There are a number of situations where the mandatory barriers rmb() and wmb() are used to order memory/memory operations in the device drivers and those barriers are much heavier than they actually need to be. For example in the case of PowerPC wmb() calls the heavy-weight sync instruction when for coherent memory operations all that is really needed is an lsync or eieio instruction. This commit adds a coherent only version of the mandatory memory barriers rmb() and wmb(). In most cases this should result in the barrier being the same as the SMP barriers for the SMP case, however in some cases we use a barrier that is somewhere in between rmb() and smp_rmb(). For example on ARM the rmb barriers break down as follows: Barrier Call Explanation --------- -------- ---------------------------------- rmb() dsb() Data synchronization barrier - system dma_rmb() dmb(osh) data memory barrier - outer sharable smp_rmb() dmb(ish) data memory barrier - inner sharable These new barriers are not as safe as the standard rmb() and wmb(). Specifically they do not guarantee ordering between coherent and incoherent memories. The primary use case for these would be to enforce ordering of reads and writes when accessing coherent memory that is shared between the CPU and a device. It may also be noted that there is no dma_mb(). Most architectures don't provide a good mechanism for performing a coherent only full barrier without resorting to the same mechanism used in mb(). As such there isn't much to be gained in trying to define such a function. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11arch: Cleanup read_barrier_depends() and commentsAlexander Duyck
This patch is meant to cleanup the handling of read_barrier_depends and smp_read_barrier_depends. In multiple spots in the kernel headers read_barrier_depends is defined as "do {} while (0)", however we then go into the SMP vs non-SMP sections and have the SMP version reference read_barrier_depends, and the non-SMP define it as yet another empty do/while. With this commit I went through and cleaned out the duplicate definitions and reduced the number of definitions down to 2 per header. In addition I moved the 50 line comments for the macro from the x86 and mips headers that defined it as an empty do/while to those that were actually defining the macro, alpha and blackfin. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-18arch,mips: Convert smp_mb__*()Peter Zijlstra
MIPS is interesting and has hardware variants that reorder over ll/sc as well as those that do not. Implement the 2 new barrier functions as per the old barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-9ph49jbae3hol9v721sbc2g6@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maciej W. Rozycki" <macro@codesourcery.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12arch: Introduce smp_load_acquire(), smp_store_release()Peter Zijlstra
A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-04MIPS: Random whitespace clean-upsMaciej W. Rozycki
Another whitespace clean-up, this removes tabs from between sentences in some comments. Signed-off-by: Maciej W. Rozycki <macro@codesourcery.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/6103/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-02-01MIPS: Whitespace cleanup.Ralf Baechle
Having received another series of whitespace patches I decided to do this once and for all rather than dealing with this kind of patches trickling in forever. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-03-28Disintegrate asm/system.h for MIPSDavid Howells
Disintegrate asm/system.h for MIPS. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> cc: linux-mips@linux-mips.org
2010-02-27MIPS: Optimize spinlocks.David Daney
The current locking mechanism uses a ll/sc sequence to release a spinlock. This is slower than a wmb() followed by a store to unlock. The branching forward to .subsection 2 on sc failure slows down the contended case. So we get rid of that part too. Since we are now working on naturally aligned u16 values, we can get rid of a masking operation as the LHU already does the right thing. The ANDI are reversed for better scheduling on multi-issue CPUs On a 12 CPU 750MHz Octeon cn5750 this patch improves ipv4 UDP packet forwarding rates from 3.58*10^6 PPS to 3.99*10^6 PPS, or about 11%. Signed-off-by: David Daney <ddaney@caviumnetworks.com> To: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/937/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-02-27MIPS: Octeon: Use optimized memory barrier primitives.David Daney
In order to achieve correct synchronization semantics, the Octeon port had defined CONFIG_WEAK_REORDERING_BEYOND_LLSC. This resulted in code that looks like: sync ll ... . . . sc ... . . sync The second SYNC was redundant, but harmless. Octeon has a SYNCW instruction that acts as a write-memory-barrier (due to an erratum in some parts two SYNCW are used). It is much faster than SYNC because it imposes ordering on the writes, but doesn't otherwise stall the execution pipeline. On Octeon, SYNC stalls execution until all preceeding writes are committed to the coherent memory system. Using: syncw;syncw ll . . . sc . . Has identical semantics to the first sequence, but is much faster. The SYNCW orders the writes, and the SC will not complete successfully until the write is committed to the coherent memory system. So at the end all preceeding writes have been committed. Since Octeon does not do speculative reads, this functions as a full barrier. The patch removes CONFIG_WEAK_REORDERING_BEYOND_LLSC, and substitutes SYNCW for SYNC in write-memory-barriers. Signed-off-by: David Daney <ddaney@caviumnetworks.com> To: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/850/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-02-27MIPS: New macro smp_mb__before_llsc.David Daney
Replace some instances of smp_llsc_mb() with a new macro smp_mb__before_llsc(). It is used before ll/sc sequences that are documented as needing write barrier semantics. The default implementation of smp_mb__before_llsc() is just smp_llsc_mb(), so there are no changes in semantics. Also simplify definition of smp_mb(), smp_rmb(), and smp_wmb() to be just barrier() in the non-SMP case. Signed-off-by: David Daney <ddaney@caviumnetworks.com> To: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/851/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-02-27MIPS: Remove unused macros from barrier.hDavid Daney
The smp_llsc_rmb() and smp_llsc_wmb() macros are not used in the tree, remove them. Signed-off-by: David Daney <ddaney@caviumnetworks.com> To: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/848/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-10-11MIPS: Move headfiles to new location below arch/mips/includeRalf Baechle
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>