summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-07-07libceph: make sure need_resend targets reflect latest mapIlya Dryomov
Otherwise we may miss events like PG splits, pool deletions, etc when we get multiple incremental maps at once. Because check_pool_dne() can now be fed an unlinked request, finish_request() needed to be taught to handle unlinked requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: delete from need_resend_linger before check_linger_pool_dne()Ilya Dryomov
When processing a map update consisting of multiple incrementals, we may end up running check_linger_pool_dne() on a lingering request that was previously added to need_resend_linger list. If it is concluded that the target pool doesn't exist, the request is killed off while still on need_resend_linger list, which leads to a crash on a NULL lreq->osd in kick_requests(): libceph: linger_id 18446462598732840961 pool does not exist BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: ceph_osdc_handle_map+0x4ae/0x870 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: resend on PG splits if OSD has RESEND_ON_SPLITIlya Dryomov
Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: drop need_resend from calc_target()Ilya Dryomov
Replace it with more fine-grained bools to separate updating ceph_osd_request_target fields and the decision to resend. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: MOSDOp v8 encoding (actual spgid + full hash)Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: ceph_connection_operations::reencode_message() methodIlya Dryomov
Give upper layers a chance to reencode the message after the connection is negotiated and ->peer_features is set. OSD client will use this to support both luminous and pre-luminous OSDs (in a single cluster): the former need MOSDOp v8; the latter will continue to be sent MOSDOp v4. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: encode_{pgid,oloc}() helpersIlya Dryomov
Factor out encode_{pgid,oloc}() and use ceph_encode_string() for oid. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: introduce ceph_spg, ceph_pg_to_primary_shard()Ilya Dryomov
Store both raw pgid and actual spgid in ceph_osd_request_target. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: new pi->last_force_request_resendIlya Dryomov
The old (v15) pi->last_force_request_resend has been repurposed to make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do obey pi->last_force_request_resend resend on splits. See ceph.git commit 189ca7ec6420 ("mon/OSDMonitor: make pre-luminous clients resend ops on split"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: fold [l]req->last_force_resend into ceph_osd_request_targetIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: support SERVER_JEWEL feature bitsIlya Dryomov
Only MON_STATEFUL_SUB, really. MON_ROUTE_OSDMAP and OSDSUBOP_NO_SNAPCONTEXT are irrelevant. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: advertise support for OSD_POOLRESENDIlya Dryomov
The code has been in place since commit 63244fa123a7 ("libceph: introduce ceph_osd_request_target, calc_target()"), and, with the ceph_{oloc,oid}_copy() issue fixed in the previous commit, is now in working order. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: handle non-empty dest in ceph_{oloc,oid}_copy()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: new features macrosIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: remove ceph_sanitize_features() workaroundIlya Dryomov
Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: update ceph_dentry_info::lease_session when necessaryYan, Zheng
Current code does not update ceph_dentry_info::lease_session once it is set. If auth mds of corresponding dentry changes, dentry lease keeps in an invalid state. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: new mount option that specifies fscache uniquifierYan, Zheng
Current ceph uses FSID as primary index key of fscache data. This allows ceph to retain cached data across remount. But this causes problem (kernel opps, fscache does not support sharing data) when a filesystem get mounted several times (with fscache enabled, with different mount options). The fix is adding a new mount option, which specifies uniquifier for fscache. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: avoid accessing freeing inode in ceph_check_delayed_caps()Yan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: avoid invalid memory dereference in the middle of umountYan, Zheng
extra_mon_dispatch() and debugfs' foo_show functions dereference fsc->mdsc. we should clean up fsc->client->extra_mon_dispatch and debugfs before destroying fsc->mds. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: getattr before read on ceph.* xattrsYan, Zheng
Previously we were returning values for quota, layout xattrs without any kind of update -- the user just got whatever happened to be in our cache. Clearly this extra round trip has a cost, but reads of these xattrs are fairly rare, happening on admin intervention rather than in normal operation. Link: http://tracker.ceph.com/issues/17939 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: don't re-send interrupted flock requestYan, Zheng
Don't re-send interrupted flock request in cases of mds failover and receiving request forward. Because corresponding 'lock intr' request may have been finished, it won't get re-sent. Link: http://tracker.ceph.com/issues/20170 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: cleanup writepage_nounlock()Yan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: redirty page when writepage_nounlock() skips unwritable pageYan, Zheng
Ceph needs to flush dirty page in the order in which in which snap context they belong to. Dirty pages belong to older snap context should be flushed earlier. if writepage_nounlock() can not flush a page, it should redirty the page. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: remove useless page->mapping check in writepage_nounlock()Yan, Zheng
Callers of writepage_nounlock() have already ensured non-null page->mapping. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: update the 'approaching max_size' codeYan, Zheng
The old 'approaching max_size' code expects MDS set max_size to '2 * reported_size'. This is no longer true. The new code reports file size when half of previous max_size increment has been used. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07ceph: re-request max size after importing capsYan, Zheng
The 'wanted max size' could be sent to inode's old auth mds, re-send it to inode's new auth mds if necessary. Otherwise write syscall may hang. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-02Linux 4.12Linus Torvalds
2017-07-02moduleparam: fix doc: hwparam_irq configures an IRQSylvain 'ythier' Hitier
Signed-off-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-02Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS fixes from Ralf Baechle: "Here's a final round of fixes for 4.12: - Fix misordered instructions in assembly code making kenel startup via UHB unreliable. - Fix special case of MADDF and MADDF emulation. - Fix alignment issue in address calculation in pm-cps on 64 bit. - Fix IRQ tracing & lockdep when rescheduling - Systems with MAARs require post-DMA cache flushes. The reordering fix and the MADDF/MSUBF fix have sat in linux-next for a number of days. The others haven't propagated from my pull tree to linux-next yet but all have survived manual testing and Imagination's automated test system and there are no pending bug reports" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Avoid accidental raw backtrace MIPS: Perform post-DMA cache flushes on systems with MAARs MIPS: Fix IRQ tracing & lockdep when rescheduling MIPS: pm-cps: Drop manual cache-line alignment of ready_count MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately MIPS: head: Reorder instructions missing a delay slot
2017-07-02Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fix from Russell King: "One final fix for 4.12 - Doug found a boot failure case triggered by requesting a non-even MB vmalloc size" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8685/1: ensure memblock-limit is pmd-aligned
2017-07-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Fixlets for x86: - Prevent kexec crash when KASLR is enabled, which was caused by an address calculation bug - Restore the freeing of PUDs on memory hot remove - Correct a negated pointer check in the intel uncore performance monitoring driver - Plug a memory leak in an error exit path in the RDT code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix memory leak on mount failure x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization perf/x86/intel/uncore: Fix wrong box pointer check x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUD
2017-07-01Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "The last fix for perf for this cycles: - Prevent a segfault when kernel.kptr_restrict=2 is set by avoiding a null pointer dereference" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf machine: Fix segfault for kernel.kptr_restrict=2
2017-07-01Merge tag 'pinctrl-v4.12-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl fix from Linus Walleij: "Brian noticed that this regression has not got a proper fix for the entire merge window and consequently we need to revert the offending commit. It's part of the RT-mainstream work, the dance goes like this, two steps forward, one step back. Summary: - A last fix for v4.12, an IRQ problem reported early in the merge window appears not to have been properly fixed, so the offending commit will be reverted and we will find the proper fix for v4.13. Hopefully" * tag 'pinctrl-v4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: Revert "pinctrl: rockchip: avoid hardirq-unsafe functions in irq_chip"
2017-07-01Merge tag 'gpio-v4.12-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull last minute fixes for GPIO from Linus Walleij: - Fix another ACPI problem with broken BIOSes. - Filter out the right GPIO events, making a very user-visible bug go away. * tag 'gpio-v4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: acpi: Skip _AEI entries without a handler rather then aborting the scan gpiolib: fix filtering out unwanted events
2017-06-30Merge tag 'trace-v4.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull last-minute tracing fixes from Steven Rostedt: "Two fixes: One is for a crash when using the :mod: trace probe command into stack_trace_filter. This bug was introduced during the last merge window. The other was there forever. It's a small bug that makes it impossible to name a module function for kprobes when the module starts with a digit" * tag 'trace-v4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Allow to create probe with a module name starting with a digit ftrace: Fix regression with module command in stack_trace_filter
2017-06-30uapi/linux/a.out.h: don't use deprecated system-specific predefines.Zack Weinberg
uapi/linux/a.out.h uses a number of predefined macros that are deprecated because they're in the application namespace (e.g. '#ifdef linux' instead of '#ifdef __linux__'). This patch either corrects or just removes them if they are not applicable to Linux. The primary reason this is worth bothering to fix, considering how obsolete a.out binary support is, is that the GCC build process considers this such a severe error that it will copy the header into a private directory and change the macro names, which causes future updates to the header to be masked. This header probably doesn't get updated very often anymore, but it is the _only_ uapi header that gets this treatment, so IMHO it is worth patching just to drive that number all the way to zero. Signed-off-by: Zack Weinberg <zackw@panix.com> [hch: removed dead conditionals] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-30hashtable: remove repeated phrase from a commentJakub Kicinski
"in a rcu enabled hashtable" is repeated twice in a comment. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-30x86/intel_rdt: Fix memory leak on mount failureVikas Shivappa
If mount fails, the kn_info directory is not freed causing memory leak. Add the missing error handling path. Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: andi.kleen@intel.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1498503368-20173-3-git-send-email-vikas.shivappa@linux.intel.com
2017-06-30Merge tag 'powerpc-4.12-8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Hopefully the last two powerpc fixes for 4.12. The CXL one is larger than I'd usually send at rc7, but it fixes new code this cycle, so better to have it working for the release. It was actually sent a few weeks back but got blocked in testing behind another fix that was causing issues. We are still tracking one crash in v4.12-rc7, but only one person has reproduced it and the commit identified by bisect doesn't touch any of the relevant code, so I think it's 50/50 whether that commit is actually the problem or it's some code layout / toolchain issue. Two fixes for code we merged this cycle: - cxl: Fixes for Coherent Accelerator Interface Architecture 2.0 - Avoid miscompilation w/GCC 4.6.3 on 32-bit - don't inline copy_to/from_user() Thanks to Al Viro, Larry Finger, Christophe Lombard" * tag 'powerpc-4.12-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user() cxl: Fixes for Coherent Accelerator Interface Architecture 2.0
2017-06-30Merge tag 'iommu-fixes-v4.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: "Two fixes: - A fix for AMD IOMMU interrupt remapping code when IRQs are forwarded directly to KVM guests - Fixed check in the recently merged code to allow tboot with Intel VT-d disabled" * tag 'iommu-fixes-v4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix interrupt remapping when disable guest_mode iommu/vt-d: Correctly disable Intel IOMMU force on
2017-06-30Merge tag 'sound-4.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Two last-minute HD-audio fixes" * tag 'sound-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix endless loop of codec configure ALSA: hda - set input_path bitmap to zero after moving it to new place
2017-06-30Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix two bugs in copy-up code. One introduced in 4.11 and one in 4.12-rc" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: don't set origin on broken lower hardlink ovl: copy-up: don't unlock between lookup and link
2017-06-30x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bugBaoquan He
Kernel text KASLR is separated into physical address and virtual address randomization. And for virtual address randomization, we only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE. So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR, but not the original kernel loading address 'output'. The bug will cause kernel boot failure if kernel is loaded at a different position than the address, 16M, which is decided at compiled time. Kexec/kdump is such practical case. To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial value. Tested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately") Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30x86/boot/KASLR: Add checking for the offset of kernel virtual address ↵Baoquan He
randomization For kernel text KASLR, the virtual address is confined to area of 1G, [0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of virtual address randomization, we only randomize to get an offset between 16M and 1G, then add this offset to the starting address, 0xffffffff80000000. Here 16M is the offset which is decided at linking stage. So the amount of the local variable 'virt_addr' which respresents the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE. Add a debug check for the offset. If out of bounds, print error message and hang there. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-29tracing/kprobes: Allow to create probe with a module name starting with a digitSabrina Dubroca
Always try to parse an address, since kstrtoul() will safely fail when given a symbol as input. If that fails (which will be the case for a symbol), try to parse a symbol instead. This allows creating a probe such as: p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0 Which is necessary for this command to work: perf probe -m 8021q -a vlan_gro_receive Link: http://lkml.kernel.org/r/fd72d666f45b114e2c5b9cf7e27b91de1ec966f1.1498122881.git.sd@queasysnail.net Cc: stable@vger.kernel.org Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-30MIPS: Avoid accidental raw backtraceJames Hogan
Since commit 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") show_backtrace() invokes the raw backtracer when cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels where user and kernel address spaces overlap. However this is used by show_stack() which creates its own pt_regs on the stack and leaves cp0_status uninitialised in most of the code paths. This results in the non deterministic use of the raw back tracer depending on the previous stack content. show_stack() deals exclusively with kernel mode stacks anyway, so explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure we get a useful backtrace. Fixes: 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.15+ Patchwork: https://patchwork.linux-mips.org/patch/16656/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-30MIPS: Perform post-DMA cache flushes on systems with MAARsPaul Burton
Recent CPUs from Imagination Technologies such as the I6400 or P6600 are able to speculatively fetch data from memory into caches. This means that if used in a system with non-coherent DMA they require that caches be invalidated after a device performs DMA, and before the CPU reads the DMA'd data, in order to ensure that stale values weren't speculatively prefetched. Such CPUs also introduced Memory Accessibility Attribute Registers (MAARs) in order to control the regions in which they are allowed to speculate. Thus we can use the presence of MAARs as a good indication that the CPU requires the above cache maintenance. Use the presence of MAARs to determine the result of cpu_needs_post_dma_flush() in the default case, in order to handle these recent CPUs correctly. Note that the return type of cpu_needs_post_dma_flush() is changed to bool, such that it's clearer what's happening when cpu_has_maar is cast to bool for the return value. If this patch were backported to a pre-v4.7 kernel then MIPS_CPU_MAAR was 1ull<<34, so when cast to an int we would incorrectly return 0. It so happens that MIPS_CPU_MAAR is currently 1ull<<30, so when truncated to an int gives a non-zero value anyway, but even so the implicit conversion from long long int to bool makes it clearer to understand what will happen than the implicit conversion from long long int to int would. The bool return type also fits this usage better semantically, so seems like an all-round win. Thanks to Ed for spotting the issue for pre-v4.7 kernels & suggesting the return type change. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Reviewed-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Ed Blake <ed.blake@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16363/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-30MIPS: Fix IRQ tracing & lockdep when reschedulingPaul Burton
When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler from arch/mips/kernel/entry.S we disable interrupts. This is true regardless of whether we reach work_resched from syscall_exit_work, resume_userspace or by looping after calling schedule(). Although we disable interrupts in these paths we don't call trace_hardirqs_off() before calling into C code which may acquire locks, and we therefore leave lockdep with an inconsistent view of whether interrupts are disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are both enabled. Without tracing this interrupt state lockdep will print warnings such as the following once a task returns from a syscall via syscall_exit_partial with TIF_NEED_RESCHED set: [ 49.927678] ------------[ cut here ]------------ [ 49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8 [ 49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) [ 49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197 [ 49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4 [ 49.974431] 0000000000000000 0000000000000000 0000000000000000 000000000000004a [ 49.985300] ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8 [ 49.996194] 0000000000000001 0000000000000000 0000000000000000 0000000077c8030c [ 50.007063] 000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88 [ 50.017945] 0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498 [ 50.028827] 0000000000000000 0000000000000001 0000000000000000 0000000000000000 [ 50.039688] 0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc [ 50.050575] 00000000140084e0 0000000000000000 0000000000000000 0000000000040a00 [ 50.061448] 0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc [ 50.072327] ... [ 50.076087] Call Trace: [ 50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8 [ 50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190 [ 50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108 [ 50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48 [ 50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8 [ 50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0 [ 50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8 [ 50.129221] [<ffffffff80946a60>] schedule+0x30/0x98 [ 50.135659] [<ffffffff80106278>] work_resched+0x8/0x34 [ 50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]--- [ 50.148405] possible reason: unannotated irqs-off. [ 50.154600] irq event stamp: 400463 [ 50.159566] hardirqs last enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8 [ 50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0 [ 50.183897] softirqs last enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8 [ 50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128 Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off() when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking schedule() following the work_resched label because: 1) Interrupts are disabled regardless of the path we take to reach work_resched() & schedule(). 2) Performing the tracing here avoids the need to do it in paths which disable interrupts but don't call out to C code before hitting a path which uses the RESTORE_SOME macro that will call trace_hardirqs_on() or trace_hardirqs_off() as appropriate. We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling syscall_trace_leave() for similar reasons, ensuring that lockdep has a consistent view of state after we re-enable interrupts. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: linux-mips@linux-mips.org Cc: stable <stable@vger.kernel.org> Patchwork: https://patchwork.linux-mips.org/patch/15385/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-30MIPS: pm-cps: Drop manual cache-line alignment of ready_countPaul Burton
We allocate memory for a ready_count variable per-CPU, which is accessed via a cached non-coherent TLB mapping to perform synchronisation between threads within the core using LL/SC instructions. In order to ensure that the variable is contained within its own data cache line we allocate 2 lines worth of memory & align the resulting pointer to a line boundary. This is however unnecessary, since kmalloc is guaranteed to return memory which is at least cache-line aligned (see ARCH_DMA_MINALIGN). Stop the redundant manual alignment. Besides cleaning up the code & avoiding needless work, this has the side effect of avoiding an arithmetic error found by Bryan on 64 bit systems due to the 32 bit size of the former dlinesz. This led the ready_count variable to have its upper 32b cleared erroneously for MIPS64 kernels, causing problems when ready_count was later used on MIPS64 via cpuidle. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 3179d37ee1ed ("MIPS: pm-cps: add PM state entry code for CPS systems") Reported-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Tested-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Cc: linux-mips@linux-mips.org Cc: stable <stable@vger.kernel.org> # v3.16+ Patchwork: https://patchwork.linux-mips.org/patch/15383/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-29ARM: 8685/1: ensure memblock-limit is pmd-alignedDoug Berger
The pmd containing memblock_limit is cleared by prepare_page_table() which creates the opportunity for early_alloc() to allocate unmapped memory if memblock_limit is not pmd aligned causing a boot-time hang. Commit 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM") attempted to resolve this problem, but there is a path through the adjust_lowmem_bounds() routine where if all memory regions start and end on pmd-aligned addresses the memblock_limit will be set to arm_lowmem_limit. Since arm_lowmem_limit can be affected by the vmalloc early parameter, the value of arm_lowmem_limit may not be pmd-aligned. This commit corrects this oversight such that memblock_limit is always rounded down to pmd-alignment. Fixes: 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM") Signed-off-by: Doug Berger <opendmb@gmail.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>