summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-20af_unix: fix hard linked sockets on overlayMiklos Szeredi
Overlayfs uses separate inodes even in the case of hard links on the underlying filesystems. This is a problem for AF_UNIX socket implementation which indexes sockets based on the inode. This resulted in hard linked sockets not working. The fix is to use the real, underlying inode. Test case follows: -- ovl-sock-test.c -- #include <unistd.h> #include <err.h> #include <sys/socket.h> #include <sys/un.h> #define SOCK "test-sock" #define SOCK2 "test-sock2" int main(void) { int fd, fd2; struct sockaddr_un addr = { .sun_family = AF_UNIX, .sun_path = SOCK, }; struct sockaddr_un addr2 = { .sun_family = AF_UNIX, .sun_path = SOCK2, }; unlink(SOCK); unlink(SOCK2); if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) err(1, "socket"); if (bind(fd, (struct sockaddr *) &addr, sizeof(addr)) == -1) err(1, "bind"); if (listen(fd, 0) == -1) err(1, "listen"); if (link(SOCK, SOCK2) == -1) err(1, "link"); if ((fd2 = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) err(1, "socket"); if (connect(fd2, (struct sockaddr *) &addr2, sizeof(addr2)) == -1) err (1, "connect"); return 0; } ---- Reported-by: Alexander Morozov <alexandr.morozov@docker.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-05-20vfs: add d_real_inode() helperMiklos Szeredi
Needed by the following fix. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-05-15Linux 4.6Linus Torvalds
2016-05-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "Just the missing compat entry for the new pread/writev2" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Use compat version for preadv2 and pwritev2
2016-05-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix mvneta/bm dependencies, from Arnd Bergmann. 2) RX completion hw bug workaround in bnxt_en, from Michael Chan. 3) Kernel pointer leak in nf_conntrack, from Linus. 4) Hoplimit route attribute limits not enforced properly, from Paolo Abeni. 5) qlcnic driver NULL deref fix from Dan Carpenter. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: arm64: bpf: jit JMP_JSET_{X,K} net/route: enforce hoplimit max value nf_conntrack: avoid kernel pointer value leak in slab name drivers: net: xgene: fix register offset drivers: net: xgene: fix statistics counters race condition drivers: net: xgene: fix ununiform latency across queues drivers: net: xgene: fix sharing of irqs drivers: net: xgene: fix IPv4 forward crash xen-netback: fix extra_info handling in xenvif_tx_err() net: mvneta: bm: fix dependencies again bnxt_en: Add workaround to detect bad opaque in rx completion (part 2) bnxt_en: Add workaround to detect bad opaque in rx completion (part 1) qlcnic: potential NULL dereference in qlcnic_83xx_get_minidump_template()
2016-05-14arm64: bpf: jit JMP_JSET_{X,K}Zi Shen Lim
Original implementation commit e54bcde3d69d ("arm64: eBPF JIT compiler") had the relevant code paths, but due to an oversight always fail jiting. As a result, we had been falling back to BPF interpreter whenever a BPF program has JMP_JSET_{X,K} instructions. With this fix, we confirm that the corresponding tests in lib/test_bpf continue to pass, and also jited. ... [ 2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS [ 2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS [ 2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS ... [ 3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS [ 3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS [ 3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS [ 3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS ... Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-14net/route: enforce hoplimit max valuePaolo Abeni
Currently, when creating or updating a route, no check is performed in both ipv4 and ipv6 code to the hoplimit value. The caller can i.e. set hoplimit to 256, and when such route will be used, packets will be sent with hoplimit/ttl equal to 0. This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4 ipv6 route code, substituting any value greater than 255 with 255. This is consistent with what is currently done for ADVMSS and MTU in the ipv4 code. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-14nf_conntrack: avoid kernel pointer value leak in slab nameLinus Torvalds
The slab name ends up being visible in the directory structure under /sys, and even if you don't have access rights to the file you can see the filenames. Just use a 64-bit counter instead of the pointer to the 'net' structure to generate a unique name. This code will go away in 4.7 when the conntrack code moves to a single kmemcache, but this is the backportable simple solution to avoiding leaking kernel pointers to user space. Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Overlayfs fixes from Miklos, assorted fixes from me. Stable fodder of varying severity, all sat in -next for a while" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ovl: ignore permissions on underlying lookup vfs: add lookup_hash() helper vfs: rename: check backing inode being equal vfs: add vfs_select_inode() helper get_rock_ridge_filename(): handle malformed NM entries ecryptfs: fix handling of directory opening atomic_open(): fix the handling of create_error fix the copy vs. map logics in blk_rq_map_user_iov() do_splice_to(): cap the size before passing to ->splice_read()
2016-05-13Merge branch 'xgene-fixes'David S. Miller
Iyappan Subramanian says: ==================== drivers: net: xgene: Bug fixes This patch set addresses the following bug fixes that were found during testing. 1. IPv4 forward test crash - drivers: net: xgene: fix IPv4 forward crash 2. Sharing of irqs - drivers: net: xgene: fix sharing of irqs 3. Ununiform latency across queues - drivers: net: xgene: fix ununiform latency across queues 4. Fix statistics counters race condition - drivers: net: xgene: fix statistics counters race condition 5. Correcting register offset and field lengths - drivers: net: xgene: fix register offset v2: Address review comments from v1 - Defer TSO fix, and reposting all other patches from v1 v1: - Initial version ==================== Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13drivers: net: xgene: fix register offsetIyappan Subramanian
This patch fixes SG_RX_DV_GATE_REG_0_ADDR register offset and ring state field lengths. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13drivers: net: xgene: fix statistics counters race conditionIyappan Subramanian
This patch fixes the race condition on updating the statistics counters by moving the counters to the ring structure. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13drivers: net: xgene: fix ununiform latency across queuesIyappan Subramanian
This patch addresses ununiform latency across queues by adding more queues to match with, upto number of CPU cores. Also, number of interrupts are increased and the channel numbers are reordered. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13drivers: net: xgene: fix sharing of irqsIyappan Subramanian
Since hardware doesn't allow sharing of interrupts, this patch fixes the same by removing IRQF_SHARED flag. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13drivers: net: xgene: fix IPv4 forward crashIyappan Subramanian
This patch fixes the crash observed during IPv4 forward test by setting the drop field in the dbptr. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Tested-by: Toan Le <toanle@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "During v4.6-rc1 cgroup namespace support was merged. There is an issue where it's impossible to tell whether a given cgroup mount point is bind mounted or namespaced. Serge has been working on the issue but it took longer than expected to resolve, so the late pull request. Given that it's a completely new feature and the patches don't touch anything else, the risk seems acceptable. However, if this is too late, an alternative is plugging new cgroup ns creation for v4.6 and retrying for v4.7" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix compile warning kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces kernfs_path_from_node_locked: don't overwrite nlen
2016-05-13Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding DOWN_PREPARE which can trigger a WARN_ON() in workqueue. The bug has been there for a very long time. It only triggers if CPU down fails at a specific point and I don't think it has adverse effects other than the warning messages. The fix is very low impact" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix rebind bound workers warning
2016-05-13Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "This is a revert to fix an interactivity problem. The proper fixes for the problems that the reverted commit exposed are now in sched/core (consisting of 3 patches), but were too risky for v4.6 and will arrive in the v4.7 merge window" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "sched/fair: Fix fairness issue on migration"
2016-05-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "An uncharacteristically large number of bugs popped up in the last week: - various tooling fixes, two crashes and build problems - two Intel PT fixes - an KNL uncore driver fix - an Intel PMU driver fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf stat: Fallback to user only counters when perf_event_paranoid > 1 perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback() perf evsel: Improve EPERM error handling in open_strerror() tools lib traceevent: Do not reassign parg after collapse_tree() perf probe: Check if dwarf_getlocations() is available perf dwarf: Guard !x86_64 definitions under #ifdef else clause perf tools: Use readdir() instead of deprecated readdir_r() perf thread_map: Use readdir() instead of deprecated readdir_r() perf script: Use readdir() instead of deprecated readdir_r() perf tools: Use readdir() instead of deprecated readdir_r() perf/core: Disable the event on a truncated AUX record perf/x86/intel/pt: Generate PMI in the STOP region as well perf/x86: Fix undefined shift on 32-bit kernels perf/x86/msr: Fix SMI overflow perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform perf diff: Fix duplicated output column
2016-05-13Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "Three more bug fixes for ARM SoCs this week: - The Atmel sama5d2 was registering the wrong NFC device type - On Atmel sam9x5, the power management controller had an incorrect register area size - On ARM64 Allwinner machine was not secting the generic irqchip code, causing build errors in some configurations" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC arm64/sunxi: 4.6-rc1: Add dependency on generic irq chip ARM: dts: at91: sama5d2: use "atmel,sama5d3-nfc" compatible for nfc
2016-05-13Merge tag 'regulator-fix-v4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A small collection of driver specific fixes for the regulator subsysetem: - Fix handling of probe deferral for GPIO regulators - Fix a typo in the module alias for DA9053 - Fix the definition of BUCK9 in the S2MPS11 driver. This change looks larger than it is because an irregularity in the hardware means that the macro used to define bucks 6-10 needs duplicating and tweaking to have a separate macro for 9 - Fix a series of errors in the definitions of the LDOs the AXP20x regulators, some of which had always been present and some of which were introduced in the merge window" * tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: da9063: Correct module alias prefix to fix module autoloading regulator: axp20x: Fix axp22x ldo_io registration error on cold boot regulator: axp20x: Fix axp22x ldo_io voltage ranges regulator: axp20x: Fix LDO4 linear voltage range regulator: s2mps11: Fix invalid selector mask and voltages for buck9 regulator: gpio: check return value of of_get_named_gpio
2016-05-13Merge tag 'regmap-fix-v4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fixes from Mark Brown: "This is rather too late so it'd be completely understandable if you don't want to pull it at this point, I had thought I'd sent this earlier but it seems I didn't. Everything has been in -next for some time now. The main set of fixes here are mopping up some more issues with MMIO, fixing handling of endianness configuration in DT (which just wasn't working at all) and cases where the register and value endianness are different. There is also a fix for bulk register reads on SPMI" * tag 'regmap-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case regmap: mmio: Explicitly say little endian is the defualt in the bus config regmap: mmio: Parse endianness definitions from DT regmap: Fix implicit inclusion of device.h regmap: mmio: Fix value endianness selection regmap: fix documentation to match code
2016-05-13Merge tag 'media/v4.6-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fix from Mauro Carvalho Chehab: "A revert fixing a breakage that caused an OOPS on all VB2-based DVB drivers. We already have a proper fix, but it sounds safer to keep it being tested for a while and not hurry, to avoid the risk of another regression, specially since this is meant to be c/c to stable. So, for now, let's just revert the broken patch" * tag 'media/v4.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
2016-05-13Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "A bunch of radeon displayport mode setting fixes, and some misc i915 fixes. There is one revert, the MST audio code in i915 was causing some oopses, so we've decided just to drop it until next kernel when we can fix it properly" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/amdgpu: fix DP mode validation drm/radeon: fix DP mode validation drm/i915: Bail out of pipe config compute loop on LPT drm/radeon: fix PLL sharing on DCE6.1 (v2) drm/radeon: fix DP link training issue with second 4K monitor Revert "drm/i915: start adding dp mst audio" drm/i915/bdw: Add missing delay during L3 SQC credit programming drm/i915/lvds: separate border enable readout from panel fitter drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency
2016-05-13Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a bug in the RSA self-test that may cause crashes on some architectures such as SPARC" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: testmgr - Use kmalloc memory for RSA input
2016-05-13Merge remote-tracking branches 'regulator/fix/axp20x', ↵Mark Brown
'regulator/fix/da9063', 'regulator/fix/gpio' and 'regulator/fix/s2mps11' into regulator-linus
2016-05-13Merge remote-tracking branches 'regmap/fix/be', 'regmap/fix/doc' and ↵Mark Brown
'regmap/fix/spmi' into regmap-linus
2016-05-13Merge remote-tracking branch 'regmap/fix/mmio' into regmap-linusMark Brown
2016-05-13Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes DP mode validation regression fix. * 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: fix DP mode validation drm/radeon: fix DP mode validation
2016-05-13xen-netback: fix extra_info handling in xenvif_tx_err()Paul Durrant
Patch 562abd39 "xen-netback: support multiple extra info fragments passed from frontend" contained a mistake which can result in an in- correct number of responses being generated when handling errors encountered when processing packets containing extra info fragments. This patch fixes the problem. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reported-by: Jan Beulich <JBeulich@suse.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13Merge tag 'perf-urgent-for-mingo-20160512' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fallback to usermode-only counters when perf_event_paranoid > 1, which is the case now (Arnaldo Carvalho de Melo) - Do not reassign parg after collapse_tree() in libtraceevent, which may cause tool crashes (Steven Rostedt) - Fix the build on Fedora Rawhide, where readdir_r() is deprecated and also wrt -Werror=unused-const-variable= + x86_32_regoffset_table on !x86_64 (Arnaldo Carvalho de Melo) - Fix the build on Ubuntu 12.04.5, where dwarf_getlocations() isn't available, i.e. libdw-dev < 0.157 (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "4 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: thp: calculate the mapcount correctly for THP pages during WP faults ksm: fix conflict between mmput and scan_get_next_rmap_item ocfs2: fix posix_acl_create deadlock ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
2016-05-12mm: thp: calculate the mapcount correctly for THP pages during WP faultsAndrea Arcangeli
This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning will not get broken by false positive copy-on-writes. total_mapcount() isn't the right calculation needed in reuse_swap_page(), so this introduces a page_trans_huge_mapcount() that is effectively the full accurate return value for page_mapcount() if dealing with Transparent Hugepages, however we only use the page_trans_huge_mapcount() during COW faults where it strictly needed, due to its higher runtime cost. This also provide at practical zero cost the total_mapcount information which is needed to know if we can still relocate the page anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we can reuse the page no matter if it's a pte or a pmd_trans_huge triggering the fault, but we can only relocate the page anon_vma to the local vma->anon_vma if we're sure it's only this "vma" mapping the whole THP physical range. Kirill A. Shutemov discovered the problem with moving the page anon_vma to the local vma->anon_vma in a previous version of this patch and another problem in the way page_move_anon_rmap() was called. Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a previous version, because reuse_swap_page must be a macro to call page_trans_huge_mapcount from swap.h, so this uses a macro again instead of an inline function. With this change at least it's a less dangerous usage than it was before, because "page" is used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Dean Luick noticed an uninitialized variable that could result in a rmap inefficiency for the non-THP case in a previous version. Mike Marciniszyn said: : Our RDMA tests are seeing an issue with memory locking that bisects to : commit 61f5d698cc97 ("mm: re-enable THP") : : The test program registers two rather large MRs (512M) and RDMA : writes data to a passive peer using the first and RDMA reads it back : into the second MR and compares that data. The sizes are chosen randomly : between 0 and 1024 bytes. : : The test will get through a few (<= 4 iterations) and then gets a : compare error. : : Tracing indicates the kernel logical addresses associated with the individual : pages at registration ARE correct , the data in the "RDMA read response only" : packets ARE correct. : : The "corruption" occurs when the packet crosse two pages that are not physically : contiguous. The second page reads back as zero in the program. : : It looks like the user VA at the point of the compare error no longer points to : the same physical address as was registered. : : This patch totally resolves the issue! Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reviewed-by: Dean Luick <dean.luick@intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Josh Collier <josh.d.collier@intel.com> Cc: Marc Haber <mh+linux-kernel@zugschlus.de> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12ksm: fix conflict between mmput and scan_get_next_rmap_itemZhou Chengming
A concurrency issue about KSM in the function scan_get_next_rmap_item. task A (ksmd): |task B (the mm's task): | mm = slot->mm; | down_read(&mm->mmap_sem); | | ... | | spin_lock(&ksm_mmlist_lock); | | ksm_scan.mm_slot go to the next slot; | | spin_unlock(&ksm_mmlist_lock); | |mmput() -> | ksm_exit(): | |spin_lock(&ksm_mmlist_lock); |if (mm_slot && ksm_scan.mm_slot != mm_slot) { | if (!mm_slot->rmap_list) { | easy_to_free = 1; | ... | |if (easy_to_free) { | mmdrop(mm); | ... | |So this mm_struct may be freed in the mmput(). | up_read(&mm->mmap_sem); | As we can see above, the ksmd thread may access a mm_struct that already been freed to the kmem_cache. Suppose a fork will get this mm_struct from the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will cause mmap_sem.count to become -1. As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has the same SMP race condition, so fix it too. My prev fix in function scan_get_next_rmap_item will introduce a different SMP race condition, so just invert the up_read/spin_unlock order as Andrea Arcangeli said. Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Geliang Tang <geliangtang@163.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12ocfs2: fix posix_acl_create deadlockJunxiao Bi
Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") refactored code to use posix_acl_create. The problem with this function is that it is not mindful of the cluster wide inode lock making it unsuitable for use with ocfs2 inode creation with ACLs. For example, when used in ocfs2_mknod, this function can cause deadlock as follows. The parent dir inode lock is taken when calling posix_acl_create -> get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can cause deadlock if there is a blocked remote lock request waiting for the lock to be downconverted. And same deadlock happened in ocfs2_reflink. This fix is to revert back using ocfs2_init_acl. Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hangJunxiao Bi
Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") introduced this issue. ocfs2_setattr called by chmod command holds cluster wide inode lock when calling posix_acl_chmod. This latter function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl. These two are also called directly from vfs layer for getfacl/setfacl commands and therefore acquire the cluster wide inode lock. If a remote conversion request comes after the first inode lock in ocfs2_setattr, OCFS2_LOCK_BLOCKED will be set. And this will cause the second call to inode lock from the ocfs2_iop_get_acl() to block indefinetly. The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which does not call back into the filesystem. Therefore, we restore ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that instead. Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12net: mvneta: bm: fix dependencies againArnd Bergmann
I tried to fix this before, but my previous fix was incomplete and we can still get the same link error in randconfig builds because of the way that Kconfig treats the default y if MVNETA=y && MVNETA_BM_ENABLE line that does not actually trigger when MVNETA_BM_ENABLE=m, unlike I intended. Changing the line to use MVNETA_BM_ENABLE!=n however has the desired effect and hopefully makes all configurations work as expected. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 019ded3aa7c9 ("net: mvneta: bm: clarify dependencies") Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-12Merge tag 'keys-fixes-20160512' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring fix from David Howells: "Fix ASN.1 indefinite length object parsing" * tag 'keys-fixes-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: KEYS: Fix ASN.1 indefinite length object parsing
2016-05-12Merge tag 'sound-4.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This is a pretty boring pull request as you wish: including a few small and trivial HD-audio and USB-audio quirks and a couple of small regression fixes in HD-audio" * tag 'sound-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Yet another Phoneix Audio device quirk ALSA: hda - Fix regression on ATI HDMI audio ALSA: hda - Fix subwoofer pin on ASUS N751 and N551 ALSA: hda - Fix broken reconfig ALSA: hda - Fix white noise on Asus UX501VW headset ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
2016-05-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem fixes from Dmitry Torokhov. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: twl6040-vibra - fix DT node memory management Input: max8997-haptic - fix NULL pointer dereference Input: byd - update copyright header
2016-05-12perf stat: Fallback to user only counters when perf_event_paranoid > 1Arnaldo Carvalho de Melo
After 0161028b7c8a ("perf/core: Change the default paranoia level to 2") 'perf stat' fails for users without CAP_SYS_ADMIN, so just use 'perf_evsel__fallback()' to have the same behaviour as 'perf record', i.e. set perf_event_attr.exclude_kernel to 1. Now: [acme@jouet linux]$ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.352536 task-clock:u (msec) # 0.423 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 49 page-faults:u # 0.139 M/sec 309,407 cycles:u # 0.878 GHz 243,791 instructions:u # 0.79 insn per cycle 49,622 branches:u # 140.757 M/sec 3,884 branch-misses:u # 7.83% of all branches 0.000834174 seconds time elapsed [acme@jouet linux]$ Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()Arnaldo Carvalho de Melo
Now with the default for the kernel.perf_event_paranoid sysctl being 2 [1] we need to fall back to :u, i.e. to set perf_event_attr.exclude_kernel to 1. Before: [acme@jouet linux]$ perf record usleep 1 Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN [acme@jouet linux]$ After: [acme@jouet linux]$ perf record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ] [acme@jouet linux]$ perf evlist cycles:u [acme@jouet linux]$ perf evlist -v cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 [acme@jouet linux]$ And if the user turns on verbose mode, an explanation will appear: [acme@jouet linux]$ perf record -v usleep 1 Warning: kernel.perf_event_paranoid=2, trying to fall back to excluding kernel samples mmap size 528384B [ perf record: Woken up 1 times to write data ] Looking at the vmlinux_path (8 entries long) Using /lib/modules/4.6.0-rc7+/build/vmlinux for symbols [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ] [acme@jouet linux]$ [1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2") Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12drm/amdgpu: fix DP mode validationAlex Deucher
Switch the order of the loops to walk the rates on the top so we exhaust all DP 1.1 rate/lane combinations before trying DP 1.2 rate/lane combos. This avoids selecting rates that are supported by the monitor, but not the connector leading to valid modes getting rejected. bug: https://bugs.freedesktop.org/show_bug.cgi?id=95206 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-05-12drm/radeon: fix DP mode validationAlex Deucher
Switch the order of the loops to walk the rates on the top so we exhaust all DP 1.1 rate/lane combinations before trying DP 1.2 rate/lane combos. This avoids selecting rates that are supported by the monitor, but not the connector leading to valid modes getting rejected. bug: https://bugs.freedesktop.org/show_bug.cgi?id=95206 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-05-12perf evsel: Improve EPERM error handling in open_strerror()Arnaldo Carvalho de Melo
We were showing a hardcoded default value for the kernel.perf_event_paranoid sysctl, now that it became more paranoid (1 -> 2 [1]), this would need to be updated, instead show the current value: [acme@jouet linux]$ perf record ls Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN [acme@jouet linux]$ [1] 0161028b7c8a ("perf/core: Change the default paranoia level to 2") Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-0gc4rdpg8d025r5not8s8028@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-12Merge tag 'pinctrl-v4.6-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl fix from Linus Walleij: "A single last pin control fix for v4.6. t's tagged for stable and only hits a single driver with two added lines so should be safe. Tested in linux-next. - The pull up/down logic for the AT91 PIO4 controller was tilted: we need to mask the reverse pull when unmasking a pull direction. Setting both pull up & pull down is illegal and makes no sense" * tag 'pinctrl-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: at91-pio4: fix pull-up/down logic
2016-05-12workqueue: fix rebind bound workers warningWanpeng Li
------------[ cut here ]------------ WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0 Modules linked in: CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31 Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013 0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010 0000000000000000 0000000000000000 0000000000000000 ffff881037babba8 ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046 Call Trace: dump_stack+0x89/0xd4 __warn+0xfd/0x120 warn_slowpath_null+0x1d/0x20 rebind_workers+0x1c0/0x1d0 workqueue_cpu_up_callback+0xf5/0x1d0 notifier_call_chain+0x64/0x90 ? trace_hardirqs_on_caller+0xf2/0x220 ? notify_prepare+0x80/0x80 __raw_notifier_call_chain+0xe/0x10 __cpu_notify+0x35/0x50 notify_down_prepare+0x5e/0x80 ? notify_prepare+0x80/0x80 cpuhp_invoke_callback+0x73/0x330 ? __schedule+0x33e/0x8a0 cpuhp_down_callbacks+0x51/0xc0 cpuhp_thread_fun+0xc1/0xf0 smpboot_thread_fn+0x159/0x2a0 ? smpboot_create_threads+0x80/0x80 kthread+0xef/0x110 ? wait_for_completion+0xf0/0x120 ? schedule_tail+0x35/0xf0 ret_from_fork+0x22/0x50 ? __init_kthread_worker+0x70/0x70 ---[ end trace eb12ae47d2382d8f ]--- notify_down_prepare: attempt to take down CPU 0 failed This bug can be reproduced by below config w/ nohz_full= all cpus: CONFIG_BOOTPARAM_HOTPLUG_CPU0=y CONFIG_DEBUG_HOTPLUG_CPU0=y CONFIG_NO_HZ_FULL=y As Thomas pointed out: | If a down prepare callback fails, then DOWN_FAILED is invoked for all | callbacks which have successfully executed DOWN_PREPARE. | | But, workqueue has actually two notifiers. One which handles | UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE. | | Now look at the priorities of those callbacks: | | CPU_PRI_WORKQUEUE_UP = 5 | CPU_PRI_WORKQUEUE_DOWN = -5 | | So the call order on DOWN_PREPARE is: | | CB 1 | CB ... | CB workqueue_up() -> Ignores DOWN_PREPARE | CB ... | CB X ---> Fails | | So we call up to CB X with DOWN_FAILED | | CB 1 | CB ... | CB workqueue_up() -> Handles DOWN_FAILED | CB ... | CB X-1 | | So the problem is that the workqueue stuff handles DOWN_FAILED in the up | callback, while it should do it in the down callback. Which is not a good idea | either because it wants to be called early on rollback... | | Brilliant stuff, isn't it? The hotplug rework will solve this problem because | the callbacks become symetric, but for the existing mess, we need some | workaround in the workqueue code. The boot CPU handles housekeeping duty(unbound timers, workqueues, timekeeping, ...) on behalf of full dynticks CPUs. It must remain online when nohz full is enabled. There is a priority set to every notifier_blocks: workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down So tick_nohz_cpu_down callback failed when down prepare cpu 0, and notifier_blocks behind tick_nohz_cpu_down will not be called any more, which leads to workers are actually not unbound. Then hotplug state machine will fallback to undo and online cpu 0 again. Workers will be rebound unconditionally even if they are not unbound and trigger the warning in this progress. This patch fix it by catching !DISASSOCIATED to avoid rebind bound workers. Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
2016-05-12Merge tag 'at91-fixes2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91 into fixes Merge "Second AT91 fix PR for 4.6" from Nicolas Ferre: - fix a regression on the clock subsystem while switching to syscon/regmap due to a stricter check of the register map. * tag 'at91-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91: ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
2016-05-12cgroup: fix compile warningFelipe Balbi
commit 4f41fc59620f ("cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces") added the following compile warning: kernel/cgroup.c: In function ‘cgroup_show_path’: kernel/cgroup.c:1634:15: warning: unused variable ‘ret’ [-Wunused-variable] int len = 0, ret = 0; ^ fix it. Fixes: 4f41fc59620f ("cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces") Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-05-12kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry callSerge E. Hallyn
Our caller expects 0 on success, not >0. This fixes a bug in the patch cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces where /sys does not show up in mountinfo, breaking criu. Thanks for catching this, Andrei. Reported-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>