summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-08-27mm: ZONE_DEVICE for "device memory"Dan Williams
While pmem is usable as a block device or via DAX mappings to userspace there are several usage scenarios that can not target pmem due to its lack of struct page coverage. In preparation for "hot plugging" pmem into the vmemmap add ZONE_DEVICE as a new zone to tag these pages separately from the ones that are subject to standard page allocations. Importantly "device memory" can be removed at will by userspace unbinding the driver of the device. Having a separate zone prevents allocation and otherwise marks these pages that are distinct from typical uniform memory. Device memory has different lifetime and performance characteristics than RAM. However, since we have run out of ZONES_SHIFT bits this functionality currently depends on sacrificing ZONE_DMA. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Jerome Glisse <j.glisse@gmail.com> [hch: various simplifications in the arch interface] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.hChristoph Hellwig
Three architectures already define these, and we'll need them genericly soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27dax: drop size parameter to ->direct_access()Dan Williams
None of the implementations currently use it. The common bdev_direct_access() entry point handles all the size checks before calling ->direct_access(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27Merge branch 'pmem-api' into libnvdimm-for-nextDan Williams
2015-08-27nd_blk: change aperture mapping from WC to WBRoss Zwisler
This should result in a pretty sizeable performance gain for reads. For rough comparison I did some simple read testing using PMEM to compare reads of write combining (WC) mappings vs write-back (WB). This was done on a random lab machine. PMEM reads from a write combining mapping: # dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000 100000+0 records in 100000+0 records out 409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s PMEM reads from a write-back mapping: # dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000 1000000+0 records in 1000000+0 records out 4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s To be able to safely support a write-back aperture I needed to add support for the "read flush" _DSM flag, as outlined in the DSM spec: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf This flag tells the ND BLK driver that it needs to flush the cache lines associated with the aperture after the aperture is moved but before any new data is read. This ensures that any stale cache lines from the previous contents of the aperture will be discarded from the processor cache, and the new data will be read properly from the DIMM. We know that the cache lines are clean and will be discarded without any writeback because either a) the previous aperture operation was a read, and we never modified the contents of the aperture, or b) the previous aperture operation was a write and we must have written back the dirtied contents of the aperture to the DIMM before the I/O was completed. In order to add support for the "read flush" flag I needed to add a generic routine to invalidate cache lines, mmio_flush_range(). This is protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently only supported on x86. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27nvdimm: change to use generic kvfree()yalin wang
Signed-off-by: yalin wang <yalin.wang2010@gmail.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-20pmem, dax: have direct_access use __pmem annotationRoss Zwisler
Update the annotation for the kaddr pointer returned by direct_access() so that it is a __pmem pointer. This is consistent with the PMEM driver and with how this direct_access() pointer is used in the DAX code. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-20dax: update I/O path to do proper PMEM flushingRoss Zwisler
Update the DAX I/O path so that all operations that store data (I/O writes, zeroing blocks, punching holes, etc.) properly synchronize the stores to media using the PMEM API. This ensures that the data DAX is writing is durable on media before the operation completes. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-20pmem: add copy_from_iter_pmem() and clear_pmem()Ross Zwisler
Add support for two new PMEM APIs, copy_from_iter_pmem() and clear_pmem(). copy_from_iter_pmem() is used to copy data from an iterator into a PMEM buffer. clear_pmem() zeros a PMEM memory range. Both of these new APIs must be explicitly ordered using a wmb_pmem() function call and are implemented in such a way that the wmb_pmem() will make the stores to PMEM durable. Because both APIs are unordered they can be called as needed without introducing any unwanted memory barriers. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-20pmem, x86: clean up conditional pmem includesRoss Zwisler
Prior to this change x86_64 used the pmem defines in arch/x86/include/asm/pmem.h, and UM used the default ones at the top of include/linux/pmem.h. The inclusion or exclusion in linux/pmem.h was controlled by CONFIG_ARCH_HAS_PMEM_API, but the ones in asm/pmem.h were controlled by ARCH_HAS_NOCACHE_UACCESS. Instead, control them both with CONFIG_ARCH_HAS_PMEM_API so that it's clear that they are related and we don't run into the possibility where they are both included or excluded. Also remove a bunch of stale function prototypes meant for UM in asm/pmem.h - these just conflicted with the inline defaults in linux/pmem.h and gave compile errors. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-20pmem: remove layer when calling arch_has_wmb_pmem()Ross Zwisler
Prior to this change arch_has_wmb_pmem() was only called by arch_has_pmem_api(). Both arch_has_wmb_pmem() and arch_has_pmem_api() checked to make sure that CONFIG_ARCH_HAS_PMEM_API was enabled. Instead, remove the old arch_has_wmb_pmem() wrapper to be rid of one extra layer of indirection and the redundant CONFIG_ARCH_HAS_PMEM_API check. Rename __arch_has_wmb_pmem() to arch_has_wmb_pmem() since we no longer have a wrapper, and just have arch_has_pmem_api() call the architecture specific arch_has_wmb_pmem() directly. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-20pmem, x86: move x86 PMEM API to new pmem.h headerRoss Zwisler
Move the x86 PMEM API implementation out of asm/cacheflush.h and into its own header asm/pmem.h. This will allow members of the PMEM API to be more easily identified on this and other architectures. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-19libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate optionDan Williams
We currently register a platform device for e820 type-12 memory and register a nvdimm bus beneath it. Registering the platform device triggers the device-core machinery to probe for a driver, but that search currently comes up empty. Building the nvdimm-bus registration into the e820_pmem platform device registration in this way forces libnvdimm to be built-in. Instead, convert the built-in portion of CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the rest of the logic to the driver for e820_pmem, for the following reasons: 1/ Letting e820_pmem support be a module allows building and testing libnvdimm.ko changes without rebooting 2/ All the normal policy around modules can be applied to e820_pmem (unbind to disable and/or blacklisting the module from loading by default) 3/ Moving the driver to a generic location and converting it to scan "iomem_resource" rather than "e820.map" means any other architecture can take advantage of this simple nvdimm resource discovery mechanism by registering a resource named "Persistent Memory (legacy)" Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14pmem: switch to devm_ allocationsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> [djbw: tools/testing/nvdimm/ and memunmap_pmem support] Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14devres: add devm_memremapChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14libnvdimm, btt: write and validate parent_uuidVishal Verma
When a BTT is instantiated on a namespace it must validate the namespace uuid matches the 'parent_uuid' stored in the btt superblock. This property enforces that changing the namespace UUID invalidates all former BTT instances on that storage. For "IO namespaces" that don't have a label or UUID, the parent_uuid is set to zero, and this validation is skipped. For such cases, old BTTs have to be invalidated by forcing the namespace to raw mode, and overwriting the BTT info blocks. Based on a patch by Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14libnvdimm, btt: consolidate arena validationVishal Verma
Use arena_is_valid as a common routine for checking the validity of an info block from both discover_arenas, and nd_btt_probe. As a result, don't check for validity of the BTT's UUID, and lbasize. The checksum in the BTT info block guarantees self-consistency, and when we're called from nd_btt_probe, we don't have a valid uuid or lbasize available to check against. Also cleanup to return a bool instead of an int. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14libnvdimm, btt: clean up internal interfacesVishal Verma
Consolidate the parameters passed to arena_is_valid into just nd_btt, and an info block to increase re-usability. Similarly, btt_arena_write_layout doesn't need to be passed a uuid, as it can be obtained from arena->nd_btt. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14pmem: convert to generic memremapDan Williams
Kill arch_memremap_pmem() and just let the architecture specify the flags to be passed to memremap(). Default to writethrough by default. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14visorbus: switch from ioremap_cache to memremapDan Williams
In preparation for deprecating ioremap_cache() convert its usage in visorbus to memremap. Cc: Benjamin Romer <benjamin.romer@unisys.com> Cc: David Kershner <david.kershner@unisys.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-14arch: introduce memremap()Dan Williams
Existing users of ioremap_cache() are mapping memory that is known in advance to not have i/o side effects. These users are forced to cast away the __iomem annotation, or otherwise neglect to fix the sparse errors thrown when dereferencing pointers to this memory. Provide memremap() as a non __iomem annotated ioremap_*() in the case when ioremap is otherwise a pointer to cacheable memory. Empirically, ioremap_<cacheable-type>() call sites are seeking memory-like semantics (e.g. speculative reads, and prefetching permitted). memremap() is a break from the ioremap implementation pattern of adding a new memremap_<type>() for each mapping type and having silent compatibility fall backs. Instead, the implementation defines flags that are passed to the central memremap() and if a mapping type is not supported by an arch memremap returns NULL. We introduce a memremap prototype as a trivial wrapper of ioremap_cache() and ioremap_wt(). Later, once all ioremap_cache() and ioremap_wt() usage has been removed from drivers we teach archs to implement arch_memremap() with the ability to strictly enforce the mapping type. Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-10cleanup IORESOURCE_CACHEABLE vs ioremap()Dan Williams
Quoting Arnd: I was thinking the opposite approach and basically removing all uses of IORESOURCE_CACHEABLE from the kernel. There are only a handful of them.and we can probably replace them all with hardcoded ioremap_cached() calls in the cases they are actually useful. All existing usages of IORESOURCE_CACHEABLE call ioremap() instead of ioremap_nocache() if the resource is cacheable, however ioremap() is uncached by default. Clearly none of the existing usages care about the cacheability. Particularly devm_ioremap_resource() never worked as advertised since it always fell back to plain ioremap(). Clean this up as the new direction we want is to convert ioremap_<type>() usages to memremap(..., flags). Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-10arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> insteadDan Williams
Preparation for uniform definition of ioremap, ioremap_wc, ioremap_wt, and ioremap_cache, tree-wide. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-10mm: enhance region_is_ram() to region_intersects()Dan Williams
region_is_ram() is used to prevent the establishment of aliased mappings to physical "System RAM" with incompatible cache settings. However, it uses "-1" to indicate both "unknown" memory ranges (ranges not described by platform firmware) and "mixed" ranges (where the parameters describe a range that partially overlaps "System RAM"). Fix this up by explicitly tracking the "unknown" vs "mixed" resource cases and returning REGION_INTERSECTS, REGION_MIXED, or REGION_DISJOINT. This re-write also adds support for detecting when the requested region completely eclipses all of a resource. Note, the implementation treats overlaps between "unknown" and the requested memory type as REGION_INTERSECTS. Finally, other memory types can be passed in by name, for now the only usage "System RAM". Suggested-by: Luis R. Rodriguez <mcgrof@suse.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-31nvdimm: fix inline function return type warningRandy Dunlap
Fix multiple build warnings when CONFIG_BTT is not enabled: In file included from ../drivers/nvdimm/bus.c:29:0: ../drivers/nvdimm/nd.h:169:15: warning: return type defaults to 'int' [-Wreturn-type] static inline nd_btt_probe(struct nd_namespace_common *ndns, void *drvdata) ^ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-nvdimm@lists.01.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-27nfit: Don't check _STA on NVDIMM devicesLinda Knippers
The _STA only applies to the root device, not the individual NVDIMMS, so don't check here. NVDIMM device state flags are checked elsewhere. Signed-off-by: Linda Knippers <linda.knippers@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-27libnvdimm, pmem: Change pmem physical sector size to PAGE_SIZEVishal Verma
Based on a patch: c8fa317 brd: Request from fdisk 4k alignment by Boaz Harrosh, allow fdisk to create properly aligned partitions for DAX. This will also cause mkfs.ext4 to emit a warning if using a file system block size of less than PAGE_SIZE. Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Elliott, Robert <Elliott@hp.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Boaz Harrosh <boaz@plexistor.com> Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-27libnvdimm: Add DSM support for Address Range Scrub commandsVishal Verma
Add support for the three ARS DSM commands: - Query ARS Capabilities - Queries the firmware to check if a given range supports scrub, and if so, which type (persistent vs. volatile) - Start ARS - Starts a scrub for a given range/type - Query ARS Status - Checks status of a previously started scrub, and provides the error logs if any. The commands are described by the example DSM spec at: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Also add these commands to the nfit_test test framework, and return canned data. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-27libnvdimm: Update name of the ars_status_record mask fieldVishal Verma
The spec suggests that this is a simple 'length' field, not a mask. Update the name accordingly. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-27libnvdimm, btt: sparse fixDan Williams
Fix: drivers/nvdimm/btt.c:635:29: warning: restricted __le64 degrades to integer Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-26Linux 4.2-rc4Linus Torvalds
2015-07-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for the intel cqm perf facility to prevent IPIs from interrupt context" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/cqm: Return cached counter value from IRQ context
2015-07-26Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This update contains: - the manual revert of the SYSCALL32 changes which caused a regression - a fix for the MPX vma handling - three fixes for the ioremap 'is ram' checks. - PAT warning fixes - a trivial fix for the size calculation of TLB tracepoints - handle old EFI structures gracefully This also contains a PAT fix from Jan plus a revert thereof. Toshi explained why the code is correct" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/pat: Revert 'Adjust default caching mode translation tables' x86/asm/entry/32: Revert 'Do not use R9 in SYSCALL32' commit x86/mm: Fix newly introduced printk format warnings mm: Fix bugs in region_is_ram() x86/mm: Remove region_is_ram() call from ioremap x86/mm: Move warning from __ioremap_check_ram() to the call site x86/mm/pat, drivers/media/ivtv: Move the PAT warning and replace WARN() with pr_warn() x86/mm/pat, drivers/infiniband/ipath: Replace WARN() with pr_warn() x86/mm/pat: Adjust default caching mode translation tables x86/fpu: Disable dependent CPU features on "noxsave" x86/mpx: Do not set ->vm_ops on MPX VMAs x86/mm: Add parenthesis for TLB tracepoint size calculation efi: Handle memory error structures produced based on old versions of standard
2015-07-26x86/mm/pat: Revert 'Adjust default caching mode translation tables'Thomas Gleixner
Toshi explains: "No, the default values need to be set to the fallback types, i.e. minimal supported mode. For WC and WT, UC is the fallback type. When PAT is disabled, pat_init() does update the tables below to enable WT per the default BIOS setup. However, when PAT is enabled, but CPU has PAT -errata, WT falls back to UC per the default values." Revert: ca1fec58bc6a 'x86/mm/pat: Adjust default caching mode translation tables' Requested-by: Toshi Kani <toshi.kani@hp.com> Cc: Jan Beulich <jbeulich@suse.de> Link: http://lkml.kernel.org/r/1437577776.3214.252.camel@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-26perf/x86/intel/cqm: Return cached counter value from IRQ contextMatt Fleming
Peter reported the following potential crash which I was able to reproduce with his test program, [ 148.765788] ------------[ cut here ]------------ [ 148.765796] WARNING: CPU: 34 PID: 2840 at kernel/smp.c:417 smp_call_function_many+0xb6/0x260() [ 148.765797] Modules linked in: [ 148.765800] CPU: 34 PID: 2840 Comm: perf Not tainted 4.2.0-rc1+ #4 [ 148.765803] ffffffff81cdc398 ffff88085f105950 ffffffff818bdfd5 0000000000000007 [ 148.765805] 0000000000000000 ffff88085f105990 ffffffff810e413a 0000000000000000 [ 148.765807] ffffffff82301080 0000000000000022 ffffffff8107f640 ffffffff8107f640 [ 148.765809] Call Trace: [ 148.765810] <NMI> [<ffffffff818bdfd5>] dump_stack+0x45/0x57 [ 148.765818] [<ffffffff810e413a>] warn_slowpath_common+0x8a/0xc0 [ 148.765822] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765824] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765825] [<ffffffff810e422a>] warn_slowpath_null+0x1a/0x20 [ 148.765827] [<ffffffff811613f6>] smp_call_function_many+0xb6/0x260 [ 148.765829] [<ffffffff8107f640>] ? intel_cqm_stable+0x60/0x60 [ 148.765831] [<ffffffff81161748>] on_each_cpu_mask+0x28/0x60 [ 148.765832] [<ffffffff8107f6ef>] intel_cqm_event_count+0x7f/0xe0 [ 148.765836] [<ffffffff811cdd35>] perf_output_read+0x2a5/0x400 [ 148.765839] [<ffffffff811d2e5a>] perf_output_sample+0x31a/0x590 [ 148.765840] [<ffffffff811d333d>] ? perf_prepare_sample+0x26d/0x380 [ 148.765841] [<ffffffff811d3497>] perf_event_output+0x47/0x60 [ 148.765843] [<ffffffff811d36c5>] __perf_event_overflow+0x215/0x240 [ 148.765844] [<ffffffff811d4124>] perf_event_overflow+0x14/0x20 [ 148.765847] [<ffffffff8107e7f4>] intel_pmu_handle_irq+0x1d4/0x440 [ 148.765849] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765853] [<ffffffff81219bad>] ? vunmap_page_range+0x19d/0x2f0 [ 148.765854] [<ffffffff81219d11>] ? unmap_kernel_range_noflush+0x11/0x20 [ 148.765859] [<ffffffff814ce6fe>] ? ghes_copy_tofrom_phys+0x11e/0x2a0 [ 148.765863] [<ffffffff8109e5db>] ? native_apic_msr_write+0x2b/0x30 [ 148.765865] [<ffffffff8109e44d>] ? x2apic_send_IPI_self+0x1d/0x20 [ 148.765869] [<ffffffff81065135>] ? arch_irq_work_raise+0x35/0x40 [ 148.765872] [<ffffffff811c8d86>] ? irq_work_queue+0x66/0x80 [ 148.765875] [<ffffffff81075306>] perf_event_nmi_handler+0x26/0x40 [ 148.765877] [<ffffffff81063ed9>] nmi_handle+0x79/0x100 [ 148.765879] [<ffffffff81064422>] default_do_nmi+0x42/0x100 [ 148.765880] [<ffffffff81064563>] do_nmi+0x83/0xb0 [ 148.765884] [<ffffffff818c7c0f>] end_repeat_nmi+0x1e/0x2e [ 148.765886] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765888] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765890] [<ffffffff811d07a6>] ? __perf_event_task_sched_in+0x36/0xa0 [ 148.765891] <<EOE>> [<ffffffff8110ab66>] finish_task_switch+0x156/0x210 [ 148.765898] [<ffffffff818c1671>] __schedule+0x341/0x920 [ 148.765899] [<ffffffff818c1c87>] schedule+0x37/0x80 [ 148.765903] [<ffffffff810ae1af>] ? do_page_fault+0x2f/0x80 [ 148.765905] [<ffffffff818c1f4a>] schedule_user+0x1a/0x50 [ 148.765907] [<ffffffff818c666c>] retint_careful+0x14/0x32 [ 148.765908] ---[ end trace e33ff2be78e14901 ]--- The CQM task events are not safe to be called from within interrupt context because they require performing an IPI to read the counter value on all sockets. And performing IPIs from within IRQ context is a "no-no". Make do with the last read counter value currently event in event->count when we're invoked in this context. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Will Auld <will.auld@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-25Merge tag 'usb-4.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here's a few USB and PHY fixes for 4.2-rc4. Nothing major, the shortlog has the full details. All of these have been in linux-next successfully" * tag 'usb-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits) USB: OHCI: fix bad #define in ohci-tmio.c cdc-acm: Destroy acm_minors IDR on module exit usb-storage: Add ignore-device quirk for gm12u320 based usb mini projectors usb-storage: ignore ZTE MF 823 card reader in mode 0x1225 USB: OHCI: Fix race between ED unlink and URB submission usb: core: lpm: set lpm_capable for root hub device xhci: do not report PLC when link is in internal resume state xhci: prevent bus_suspend if SS port resuming in phase 1 xhci: report U3 when link is in resume state xhci: Calculate old endpoints correctly on device reset usb: xhci: Bugfix for NULL pointer deference in xhci_endpoint_init() function xhci: Workaround to get D3 working in Intel xHCI xhci: call BIOS workaround to enable runtime suspend on Intel Braswell usb: dwc3: Reset the transfer resource index on SET_INTERFACE usb: gadget: udc: core: Fix argument of dma_map_single for IOMMU usb: gadget: mv_udc_core: fix phy_regs I/O memory leak usb: ulpi: ulpi_init should be executed in subsys_initcall phy: berlin-usb: fix divider for BG2 phy: berlin-usb: fix divider for BG2CD phy/pxa: add HAS_IOMEM dependency ...
2015-07-25Merge tag 'tty-4.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are a number of small serial and tty fixes for reported issues. All have been in linux-next successfully" * tag 'tty-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: vt: Fix !TASK_RUNNING diagnostic warning from paste_selection() serial: core: Fix crashes while echoing when closing m32r: Add ioreadXX/iowriteXX big-endian mmio accessors Revert "serial: imx: initialized DMA w/o HW flow enabled" sc16is7xx: fix FIFO address of secondary UART sc16is7xx: fix Kconfig dependencies serial: etraxfs-uart: Fix release etraxfs_uart_ports tty/vt: Fix the memory leak in visual_init serial: amba-pl011: Fix devm_ioremap_resource return value check n_tty: signal and flush atomically
2015-07-25Merge tag 'staging-4.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are a number of iio and staging driver fixes for reported issues for 4.2-rc4. All have been in linux-next for a while with no problems" * tag 'staging-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (34 commits) iio:light:stk3310: make endianness independent of host iio:light:stk3310: move device register to end of probe iio: mma8452: use iio event type IIO_EV_TYPE_MAG iio: mcp320x: Fix NULL pointer dereference iio: adc: vf610: fix the adc register read fail issue iio: mlx96014: Replace offset sign iio: magnetometer: mmc35240: fix SET/RESET sequence iio: magnetometer: mmc35240: Fix SET/RESET mask iio: magnetometer: mmc35240: Fix crash in pm suspend iio:magnetometer:bmc150_magn: output intended variable iio:magnetometer:bmc150_magn: add regmap dependency staging: vt6656: check ieee80211_bss_conf bssid not NULL staging: vt6655: check ieee80211_bss_conf bssid not NULL iio: tmp006: Check channel info on write iio: sx9500: Add missing init in sx9500_buffer_pre{en,dis}able() iio:light:ltr501: fix regmap dependency iio:light:ltr501: fix variable in ltr501_init iio: sx9500: fix bug in compensation code iio: sx9500: rework error handling of raw readings iio: magnetometer: mmc35240: fix available sampling frequencies ...
2015-07-25Merge tag 'char-misc-4.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some char and misc driver fixes for reported issues. One parport patch is reverted as it was incorrect, thanks to testing by the 0-day bot" * tag 'char-misc-4.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: parport: Revert "parport: fix memory leak" mei: prevent unloading mei hw modules while the device is opened. misc: mic: scif bug fix for vmalloc_to_page crash parport: fix freeing freed memory parport: fix memory leak parport: fix error handling
2015-07-25parport: Revert "parport: fix memory leak"Sudip Mukherjee
This reverts commit 23c405912b88 ("parport: fix memory leak") par_dev->state was already being removed in parport_unregister_device(). Reported-by: Ying Huang <ying.huang@intel.com> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-25Merge tag 'trace-v4.2-rc2-fix3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Back in 3.16 the ftrace code was redesigned and cleaned up to remove the double iteration list (one for registered ftrace ops, and one for registered "global" ops), to just use one list. That simplified the code but also broke the function tracing filtering on pid. This updates the code to handle the filtering again with the new logic" * tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix breakage of set_ftrace_pid
2015-07-25Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm Pull libnvdimm fix from Dan Williams: "A minor fix for the libnvdimm subsystem. This is not critical. The problem can be worked around in userspace by putting the namespace temporarily into raw mode (ndctl_namespace_set_raw_mode() from libndctl), but that is awkward for management utilities. * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: libnvdimm: fix namespace seed creation
2015-07-25Merge tag 'md/4.2-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull md fixes from Neil Brown: "Some md fixes for 4.2 Several are tagged for -stable. A few aren't because they are not very, serious or because they are in the 'experimental' cluster code" * tag 'md/4.2-fixes' of git://neil.brown.name/md: md/raid5: clear R5_NeedReplace when no longer needed. Fix read-balancing during node failure md-cluster: fix bitmap sub-offset in bitmap_read_sb md: Return error if request_module fails and returns positive value md: Skip cluster setup in case of error while reading bitmap md/raid1: fix test for 'was read error from last working device'. md: Skip cluster setup for dm-raid md: flush ->event_work before stopping array. md/raid10: always set reshape_safe when initializing reshape_position. md/raid5: avoid races when changing cache size.
2015-07-25Merge tag 'for-linus-20150724' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD fixes from Brian Norris: "Two trivial updates. I meant to send these much earlier, but I've been preoccupied. - Add MAINTAINERS entry for diskonchip g3 driver - Fix an overlooked conflict in bitfield value assignments The latter update is a bit overdue, but there's no reason to wait any longer" * tag 'for-linus-20150724' of git://git.infradead.org/linux-mtd: mtd: nand: Fix NAND_USE_BOUNCE_BUFFER flag conflict MAINTAINERS: mtd: docg3: add docg3 maintainer
2015-07-25libnvdimm: fix namespace seed creationDan Williams
A new BLK namespace "seed" device is created whenever the current seed is successfully probed. However, if that namespace is assigned to a BTT it may never directly experience a successful probe as it is a subordinate device to a BTT configuration. The effect of the current code is that no new namespaces can be instantiated, after the seed namespace, to consume available BLK DPA capacity. Fix this by treating a successful BTT probe event as a successful probe event for the backing namespace. Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-24Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Four smaller fixes for the current series. This contains: - A fix for clones of discard bio's, that can cause data corruption. From Martin. - A fix for null_blk, where in certain queue modes it could access a request after it had been freed. From Mike Krinkin. - An error handling leak fix for blkcg, from Tejun. - Also from Tejun, export of the functions that a file system needs to implement cgroup writeback support" * 'for-linus' of git://git.kernel.dk/linux-block: block: Do a full clone when splitting discard bios block: export bio_associate_*() and wbc_account_io() blkcg: fix gendisk reference leak in blkg_conf_prep() null_blk: fix use-after-free problem
2015-07-24Merge branch 'for-4.2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "A couple important fixes. - A block layer change which removed restriction on max transfer size led to silent data corruption on some devices. A new quirk is added to restore the old size limit for the reported device. If it gets reported on more devices, we might have to consider restoring the restriction for ATA devices by default. - There finally is a SSD which is confirmed to cause data corruption on TRIM regardless of which flavor is used. A new quirk is added and the device is blacklisted - Other device-specific workarounds" * 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata: Do not blacklist M510DC libata: increase the timeout when setting transfer mode libata: add ATA_HORKAGE_MAX_SEC_1024 to revert back to previous max_sectors limit libata: force disable trim for SuperSSpeed S238 libata: add ATA_HORKAGE_NOTRIM libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for HP 250GB SATA disk VB0250EAVER ata: pmp: add quirk for Marvell 4140 SATA PMP
2015-07-24Merge tag 'mmc-4.2-rc3' of git://git.linaro.org/people/ulf.hansson/mmcLinus Torvalds
Pull MMC fixes from Ulf Hansson: "Here are some mmc fixes intended for v4.2 rc4. Note, most of the changes are for the sdhci-esdhc-imx controller, which also required us to modify some related DTS files. Those changes have been acked by the SoC maintainer. MMC core: - Fix a reference inbalance issue for power_ro_lock_show() sysfs handler MMC host: - omap_hsmmc: Fix IRQ errorhandling for CD, DTO, and CRC - sdhci: Prevent a kernel panic while using DMA - mtk-sd: Let it depend on HAS_DMA to prevent build errors - sdhci-esdhc: Make 8BIT bus work - sdhci-esdhc-imx: Fix some regressions for DT based platforms - sdhci-pxav3: Fix a regression for DT based platforms" * tag 'mmc-4.2-rc3' of git://git.linaro.org/people/ulf.hansson/mmc: mmc: sdhci-pxav3: fix platform_data is not initialized dts: mmc: fsl-imx-esdhc: remove fsl,cd-controller support mmc: sdhci-esdhc-imx: clear f_max in boarddata mmc: sdhci-esdhc-imx: remove duplicated dts parsing mmc: sdhci: make max-frequency property in device tree work mmc: sdhci-esdhc-imx: move all non dt probe code into one function mmc: sdhci-esdhc-imx: fix cd regression for dt platform dts: imx7: fix sd card gpio polarity specified in device tree dts: imx25: fix sd card gpio polarity specified in device tree dts: imx6: fix sd card gpio polarity specified in device tree dts: imx53: fix sd card gpio polarity specified in device tree dts: imx51: fix sd card gpio polarity specified in device tree mmc: sdhci-esdhc: Make 8BIT bus work mmc: block: Add missing mmc_blk_put() in power_ro_lock_show() mmc: MMC_MTK should depend on HAS_DMA mmc: sdhci check parameters before call dma_free_coherent mmc: omap_hsmmc: Handle BADA, DEB and CEB interrupts mmc: omap_hsmmc: Fix DTO and DCRC handling
2015-07-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "A fix for the warnings/oops when handling HID devices with "unnamed" LEDs and couple of other driver fixups"" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: goodix - fix touch coordinates on WinBook TW100 and TW700 Input: LEDs - skip unnamed LEDs Input: usbtouchscreen - avoid unresponsive TSC-30 touch screen Input: elantech - force resolution of 31 u/mm Input: zforce - don't overwrite the stack
2015-07-24Merge tag 'regulator-fix-v4.2-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "As well as some driver specific fixes there's several fixes here for the core support for regulators supplying other regulators fixing both an issue with ACPI support (which had never been tested before) and some error handling and device removal issues that Javier noticed" * tag 'regulator-fix-v4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: core: Fix memory leak in regulator_resolve_supply() regulator: core: Increase refcount for regulator supply's module regulator: core: Handle full constraints systems when resolving supplies regulator: 88pm800: fix LDO vsel_mask value regulator: max8973: Fix up control flag option for bias control regulator: s2mps11: Fix GPIO suspend enable shift wrapping bug