summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-04-28IB/rdmavt: Fix send schedulingJubin John
call_send is used to determine whether to send immediately or schedule a send for later. The current logic in rdmavt is inverted and has a negative impact on the latency of the hfi1 and qib drivers. Fix this regression by correctly calling send immediately when call_send is set. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/hfi1: Prevent unpinning of wrong pagesMitko Haralanov
The routine used by the SDMA cache to handle already cached nodes can extend an already existing node. In its error handling code, the routine will unpin pages when not all pages of the buffer extension were pinned. There was a bug in that part of the routine, which would mistakenly unpin pages from the original set rather than the newly pinned pages. This commit fixes that bug by offsetting the page array to the proper place pointing at the beginning of the newly pinned pages. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/hfi1: Fix deadlock caused by locking with wrong scopeMitko Haralanov
The locking around the interval RB tree is designed to prevent access to the tree while it's being modified. The locking in its current form is too overzealous, which is causing a deadlock in certain cases with the following backtrace: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G O 3.12.18-wfr+ #1 0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0 ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8 ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662 Call Trace: <NMI> [<ffffffff814f1caa>] dump_stack+0x45/0x56 [<ffffffff814ecd56>] panic+0xc2/0x1cb [<ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50 [<ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0 [<ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0 [<ffffffff8110a714>] perf_event_overflow+0x14/0x20 [<ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390 [<ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180 [<ffffffff814f8d39>] do_nmi+0x169/0x310 [<ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e [<ffffffff81272600>] ? unmap_single+0x30/0x30 [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40 [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40 [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40 <<EOE>> <IRQ> [<ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1] [<ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1] [<ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1] [<ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1] [<ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1] [<ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1] [<ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0 [<ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1] [<ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1] [<ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0 [<ffffffff81098017>] handle_irq_event+0x37/0x60 [<ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120 [<ffffffff810044af>] handle_irq+0xbf/0x150 [<ffffffff8104c9b7>] ? irq_enter+0x17/0x80 [<ffffffff8150168d>] do_IRQ+0x4d/0xc0 [<ffffffff814f7c6a>] common_interrupt+0x6a/0x6a <EOI> [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0 [<ffffffff814f56c6>] __schedule+0x3b6/0x7e0 [<ffffffff810763a6>] __cond_resched+0x26/0x30 [<ffffffff814f5eda>] _cond_resched+0x3a/0x50 [<ffffffff814f4f82>] down_write+0x12/0x30 [<ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1] [<ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1] [<ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1] [<ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1] [<ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1] [<ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1] [<ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80 [<ffffffff8116b58b>] do_readv_writev+0xbb/0x230 [<ffffffff811a9da1>] ? fsnotify+0x241/0x320 [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0 [<ffffffff8116b795>] vfs_writev+0x35/0x60 [<ffffffff8116b8c9>] SyS_writev+0x49/0xc0 [<ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0 [<ffffffff814ff992>] system_call_fastpath+0x16/0x1b As evident from the backtrace above, the process was being put to sleep while holding the lock. Limiting the scope of the lock only to the RB tree operation fixes the above error allowing for proper locking and the process being put to sleep when needed. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/hfi1: Prevent NULL pointer deferences in caching codeMitko Haralanov
There is a potential kernel crash when the MMU notifier calls the invalidation routines in the hfi1 pinned page caching code for sdma. The invalidation routine could call the remove callback for the node, which in turn ends up dereferencing the current task_struct to get a pointer to the mm_struct. However, the mm_struct pointer could be NULL resulting in the following backtrace: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1] 15 task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000 RIP: 0010:[<ffffffffa041f75a>] [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1] RSP: 0000:ffff88085c245878 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830 RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0 RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0 R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40 FS: 0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0 Stack: ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0 ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000 0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282 Call Trace: [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1] [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1] [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1] [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70 [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470 [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120 [<ffffffff81141fad>] try_to_unmap+0x4d/0x60 [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0 [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490 [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640 [<ffffffff81121641>] shrink_zone+0x31/0x100 [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0 [<ffffffff811229e3>] kswapd+0x403/0x7e0 [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0 [<ffffffff81068ac0>] kthread+0xc0/0xd0 [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40 [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40 To correct this, the mm_struct passed to us by the MMU notifier is used (which is what should have been done to begin with). This avoids the broken derefences and ensures that the correct mm_struct is used. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28MAINTAINERS: Update iser/isert maintainer contact infoSagi Grimberg
Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/mlx5: Expose correct max_sge_rd limitSagi Grimberg
mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation where rdma read work queue entries cannot exceed 512 bytes. A rdma_read wqe needs to fit in 512 bytes: - wqe control segment (16 bytes) - rdma segment (16 bytes) - scatter elements (16 bytes each) So max_sge_rd should be: (512 - 16 - 16) / 16 = 30. Cc: linux-stable@vger.kernel.org Reported-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagig@grimberg.me> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28mmc: sunxi: Disable eMMC HS-DDR (MMC_CAP_1_8V_DDR) for Allwinner A80Chen-Yu Tsai
eMMC HS-DDR no longer works on the A80, despite it working when support for this developed. Disable it for now. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-04-28perf/x86/intel: Fix incorrect lbr_sel_mask valueKan Liang
This patch fixes a bug which was introduced by: b16a5b52eb90 ("perf/x86: Add option to disable reading branch flags/cycles") In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be set in lbr_select. This patch corrects the LBR_SEL_MASK by including all valid bits in LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in LBR_SELECT. It does not operate in suppress mode, so it needs to be specially handled in intel_pmu_setup_hw_lbr_filter. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28perf/x86/intel/pt: Don't die on VMXONAlexander Shishkin
Some versions of Intel PT do not support tracing across VMXON, more specifically, VMXON will clear TraceEn control bit and any attempt to set it before VMXOFF will throw a #GP, which in the current state of things will crash the kernel. Namely: $ perf record -e intel_pt// kvm -nographic on such a machine will kill it. To avoid this, notify the intel_pt driver before VMXON and after VMXOFF so that it knows when not to enable itself. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28perf/core: Fix perf_event_open() vs. execve() racePeter Zijlstra
Jann reported that the ptrace_may_access() check in find_lively_task_by_vpid() is racy against exec(). Specifically: perf_event_open() execve() ptrace_may_access() commit_creds() ... if (get_dumpable() != SUID_DUMP_USER) perf_event_exit_task(); perf_install_in_context() would result in installing a counter across the creds boundary. Fix this by wrapping lots of perf_event_open() in cred_guard_mutex. This should be fine as perf_event_exit_task() is already called with cred_guard_mutex held, so all perf locks already nest inside it. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAXAdam Borowski
The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is referenced by filter_events() which expects undefined events to have a value of 0. Found via KASAN: UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30 index 9 is out of range for type 'u64 [9]' UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9 load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64' Signed-off-by: Adam Borowski <kilobyte@angband.pl> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28rbd: report unsupported features to syslogIlya Dryomov
... instead of just returning an error. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28rbd: fix rbd map vs notify racesIlya Dryomov
A while ago, commit 9875201e1049 ("rbd: fix use-after free of rbd_dev->disk") fixed rbd unmap vs notify race by introducing an exported wrapper for flushing notifies and sticking it into do_rbd_remove(). A similar problem exists on the rbd map path, though: the watch is registered in rbd_dev_image_probe(), while the disk is set up quite a few steps later, in rbd_dev_device_setup(). Nothing prevents a notify from coming in and crashing on a NULL rbd_dev->disk: BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 Call Trace: [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd] [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph] [<ffffffff8109d5db>] process_one_work+0x17b/0x470 [<ffffffff8109e3ab>] worker_thread+0x11b/0x400 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400 [<ffffffff810a5acf>] kthread+0xcf/0xe0 [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81645dd8>] ret_from_fork+0x58/0x90 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 RIP [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd] If an error occurs during rbd map, we have to error out, potentially tearing down a watch. Just like on rbd unmap, notifies have to be flushed, otherwise rbd_watch_cb() may end up trying to read in the image header after rbd_dev_image_release() has run: Assertion failure in rbd_dev_header_info() at line 4722: rbd_assert(rbd_image_format_valid(rbd_dev->image_format)); Call Trace: [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150 [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390 [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290 [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0 [<ffffffff81107799>] process_one_work+0x689/0x1a80 [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80 [<ffffffff81132065>] ? finish_task_switch+0x225/0x640 [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0 [<ffffffff81108c69>] worker_thread+0xd9/0x1320 [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80 [<ffffffff8111b02d>] kthread+0x21d/0x2e0 [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550 [<ffffffff82022802>] ret_from_fork+0x22/0x40 [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550 RIP [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30 To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling revalidate_disk(), b) move ceph_osdc_flush_notifies() call into rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn header read-in into a critical section. The latter also happens to take care of rbd map foo@bar vs rbd snap rm foo@bar race. Fixes: http://tracker.ceph.com/issues/15490 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28x86/apic: Handle zero vector gracefully in clear_vector_irq()Keith Busch
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup the already allocated vectors. This subsequently calls clear_vector_irq(). The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in clear_vector_irq(). We cannot suppress the call to x86_vector_free_irqs() for the failed interrupt, because the other data related to this irq must be cleaned up as well. So calling clear_vector_irq() with vector == 0 is legitimate. Remove the BUG_ON and return if vector is zero, [ tglx: Massaged changelog ] Fixes: b5dc8e6c21e7 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors" Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-27thermal: use %d to print S32 parametersLeo Yan
Power allocator's parameters are S32 type, so use %d to print them. Acked-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-27thermal: hisilicon: increase temperature resolutionLeo Yan
When calculate temperature, old code firstly do division and then convert to "millicelsius" unit. This will lose resolution and only can read back temperature with "Celsius" unit. So firstly scale step value to "millicelsius" and then do division, so finally we can increase resolution for temperature value. Also refine the calculation from temperature value to step value. Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-27Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "So, it turns out we had a silly bug in the most fundamental part of workqueue for a very long time. AFAICS, this dates back to pre-git era and has quite likely been there from the time workqueue was first introduced. A work item uses its PENDING bit to synchronize multiple queuers. Anyone who wins the PENDING bit owns the pending state of the work item. Whether a queuer wins or loses the race, one thing should be guaranteed - there will soon be at least one execution of the work item - where "after" means that the execution instance would be able to see all the changes that the queuer has made prior to the queueing attempt. Unfortunately, we were missing a smp_mb() after clearing PENDING for execution, so nothing guaranteed visibility of the changes that a queueing loser has made, which manifested as a reproducible blk-mq stall. Lots of kudos to Roman for debugging the problem. The patch for -stable is the minimal one. For v3.7, Peter is working on a patch to make the code path slightly more efficient and less fragile" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix ghost PENDING flag while doing MQ IO
2016-04-27Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two patches to fix a deadlock which can be easily triggered if memcg charge moving is used. This bug was introduced while converting threadgroup locking to a global percpu_rwsem and is caused by cgroup controller task migration path depending on the ability to create new kthreads. cpuset had a similar issue which was fixed by performing heavy-lifting operations asynchronous to task migration. The two patches fix the same issue in memcg in a similar way. The first patch makes the mechanism generic and the second relocates memcg charge moving outside the migration path. Given that we don't want to perform heavy operations while writelocking threadgroup lock anyway, moving them out of the way is a desirable solution. One thing to note is that the problem was difficult to debug because lockdep couldn't figure out the deadlock condition. Looking into how to improve that" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: memcg: relocate charge moving from ->attach to ->post_attach cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
2016-04-27Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID' patches" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared i2c: cpm: Fix build break due to incompatible pointer types i2c: ismt: Add Intel DNV PCI ID i2c: xlp9xx: add support for Broadcom Vulcan i2c: rk3x: add support for rk3228
2016-04-27Merge tag 'arc-4.6-rc6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - lockdep now works for ARCv2 builds - enable DT reserved-memory binding (for forthcoming HDMI driver) * tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: add support for reserved memory defined by device tree ARC: support generic per-device coherent dma mem Documentation: dt: arc: fix spelling mistakes ARCv2: Enable LOCKDEP
2016-04-27Merge tag 'nios2-v4.6-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 Pull arch/nios2 fix from Ley Foon Tan: "memset: use the right constraint modifier for the %4 output operand" * tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2: nios2: memset: use the right constraint modifier for the %4 output operand
2016-04-27drm/amdgpu: disable vm interrupts with vm_fault_stop=2Flora Cui
V2: disable all vm interrupts in late_init() Signed-off-by: Flora Cui <Flora.Cui@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-27drm/amdgpu: print a message if ATPX dGPU power control is missingAlex Deucher
It will help identify problematic boards. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-27Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"Alex Deucher
This reverts commit bedf2a65c1aa8fb29ba8527fd00c0f68ec1f55f1. See the radeon revert for an extended description. Cc: stable@vger.kernel.org
2016-04-27drm/radeon: fix vertical bars appear on monitor (v2)Vitaly Prosyak
When crtc/timing is disabled on boot the dig block should be stopped in order ignore timing from crtc, reset the steering fifo otherwise we get display corruption or hung in dp sst mode. v2: agd: fix coding style Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-27drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tailFlora Cui
Fixes the following scenario: 1. Page table bo allocated in vram and linked to man->lru. tbo->list_kref.refcount=2 2. Page table bo is swapped out and removed from man->lru. tbo->list_kref.refcount=1 3. Command submission from userspace. Page table bo is moved to vram. ttm_bo_move_to_lru_tail() link it to man->lru and don't increase the kref count. Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Flora Cui <Flora.Cui@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-04-27Merge tag 'platform-drivers-x86-v4.6-3' of ↵Linus Torvalds
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver fix from Darren Hart: "Fix regression caused by hotkey enabling value in toshiba_acpi" * tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: toshiba_acpi: Fix regression caused by hotkey enabling value
2016-04-27Merge tag 'asoc-fix-v4.6-rc5' of ↵Takashi Iwai
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v4.6 This is a fairly large collection of fixes but almost all driver specific ones, especially to the new Intel drivers which have had a lot of recent development. The one core fix is a change to the debugfs code to avoid crashes in some relatively unusual configurations.
2016-04-27ARC: add support for reserved memory defined by device treeAlexey Brodkin
Enable reserved memory initialization from device tree. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-04-27ARC: support generic per-device coherent dma memAlexey Brodkin
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-04-27nios2: memset: use the right constraint modifier for the %4 output operandRomain Perier
Depending on the size of the area to be memset'ed, the nios2 memset implementation either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores rather than 1-byte stores to speed up memset. However, we discovered that on our nios2 platform, memset() was not properly setting the buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to: 0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ... Which is obviously incorrect. Our investigation has revealed that the problem lies in the incorrect constraints used in the inline assembly. The following piece of assembly, from the nios2 memset implementation, is supposed to create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument: /* fill8 %3, %5 (c & 0xff) */ " slli %4, %5, 8\n" " or %4, %4, %5\n" " slli %3, %4, 16\n" " or %3, %3, %4\n" However, depending on the compiler and optimization level, this code might be compiled as: 34: 280a923a slli r5,r5,8 38: 294ab03a or r5,r5,r5 3c: 2808943a slli r4,r5,16 40: 2148b03a or r4,r4,r5 This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff. %4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from using the same register for an output operand (%4) and input operand (%5). By using the constraint modifier '&', we indicate that the register should be used for output only. With this change, we get the following assembly output: 34: 2810923a slli r8,r5,8 38: 4150b03a or r8,r8,r5 3c: 400e943a slli r7,r8,16 40: 3a0eb03a or r7,r7,r8 Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern. It is worth mentioning the observed consequence of this bug: we were hitting the kernel BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right after the initialization was finding some bits to be set to 0, which didn't make sense since the bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the bitmap was in fact initialized to 0xff00ff00. Thanks to Marek Vasut for his help and feedback. Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Marek Vasut <marex@denx.de> Acked-by: Ley Foon Tan <lftan@altera.com>
2016-04-27s390/sclp_ctl: fix potential information leak with /dev/sclpMartin Schwidefsky
The sclp_ctl_ioctl_sccb function uses two copy_from_user calls to retrieve the sclp request from user space. The first copy_from_user fetches the length of the request which is stored in the first two bytes of the request. The second copy_from_user gets the complete sclp request, but this copies the length field a second time. A malicious user may have changed the length in the meantime. Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-27powerpc: wire up preadv2 and pwritev2 syscallsRui Salvaterra
Wire up preadv2/pwritev2 in the same way as preadv/pwritev. Fixes two build warnings on ppc64. mpe: Lightly tested with fio (slightly hacked to add the syscall wrappers): fio-4217 [009] .... 1304.635300: sys_preadv2(fd: 3, vec: 10025821de0, vlen: 1, pos_l: 6253000, pos_h: 0, flags: 1) fio-4217 [009] .... 1304.635474: sys_preadv2 -> 0x1000 Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27cxl: Poll for outstanding IRQs when detaching a contextMichael Neuling
When detaching contexts, we may still have interrupts in the system which are yet to be delivered to any CPU and be acked in the PSL. This can result in a subsequent unrelated process getting an spurious IRQ or an interrupt for a non-existent context. This polls the PSL to ensure that the PSL is clear of IRQs for the detached context, before removing the context from the idr. Signed-off-by: Michael Neuling <mikey@neuling.org> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27cxl: Keep IRQ mappings on context teardownMichael Neuling
Keep IRQ mappings on context teardown. This won't leak IRQs as if we allocate the mapping again, the generic code will give the same mapping used last time. Doing this works around a race in the generic code. Masking the interrupt introduces a race which can crash the kernel or result in IRQ that is never EOIed. The lost of EOI results in all subsequent mappings to the same HW IRQ never receiving an interrupt. We've seen this race with cxl test cases which are doing heavy context startup and teardown at the same time as heavy interrupt load. A fix to the generic code is being investigated also. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org # 3.8 Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27drm/virtio: send vblank event after crtc updatesGustavo Padovan
virtio_gpu was failing to send vblank events when using the atomic IOCTL with the DRM_MODE_PAGE_FLIP_EVENT flag set. This patch fixes each and enables atomic pageflips updates. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27drm/dp/mst: Restore primary hub guid on resumeLyude
Some hubs are forgetful, and end up forgetting whatever GUID we set previously after we do a suspend/resume cycle. This can lead to hotplugging breaking (along with probably other things) since the hub will start sending connection notifications with the wrong GUID. As such, we need to check on resume whether or not the GUID the hub is giving us is valid. Signed-off-by: Lyude <cpaul@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1460580618-7421-1-git-send-email-cpaul@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()cpaul@redhat.com
We can thank KASAN for finding this, otherwise I probably would have spent hours on it. This fixes a somewhat harder to trigger kernel panic, occuring while enabling MST where the port we were currently updating the payload on would have all of it's refs dropped before we finished what we were doing: ================================================================== BUG: KASAN: use-after-free in drm_dp_update_payload_part1+0xb3f/0xdb0 [drm_kms_helper] at addr ffff8800d29de018 Read of size 4 by task Xorg/973 ============================================================================= BUG kmalloc-2048 (Tainted: G B W ): kasan: bad access detected ----------------------------------------------------------------------------- INFO: Allocated in drm_dp_add_port+0x1aa/0x1ed0 [drm_kms_helper] age=16477 cpu=0 pid=2175 ___slab_alloc+0x472/0x490 __slab_alloc+0x20/0x40 kmem_cache_alloc_trace+0x151/0x190 drm_dp_add_port+0x1aa/0x1ed0 [drm_kms_helper] drm_dp_send_link_address+0x526/0x960 [drm_kms_helper] drm_dp_check_and_send_link_address+0x1ac/0x210 [drm_kms_helper] drm_dp_mst_link_probe_work+0x77/0xd0 [drm_kms_helper] process_one_work+0x562/0x1350 worker_thread+0xd9/0x1390 kthread+0x1c5/0x260 ret_from_fork+0x22/0x40 INFO: Freed in drm_dp_free_mst_port+0x50/0x60 [drm_kms_helper] age=7521 cpu=0 pid=2175 __slab_free+0x17f/0x2d0 kfree+0x169/0x180 drm_dp_free_mst_port+0x50/0x60 [drm_kms_helper] drm_dp_destroy_connector_work+0x2b8/0x490 [drm_kms_helper] process_one_work+0x562/0x1350 worker_thread+0xd9/0x1390 kthread+0x1c5/0x260 ret_from_fork+0x22/0x40 which on this T460s, would eventually lead to kernel panics in somewhat random places later in intel_mst_enable_dp() if we got lucky enough. Signed-off-by: Lyude <cpaul@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig Gallak. 2) Bug fixes for the new macsec facility (missing kmalloc NULL checks, missing locking around netdev list traversal, etc.) from Sabrina Dubroca. 3) Fix handling of host routes on ifdown in ipv6, from David Ahern. 4) Fix double-fdput in bpf verifier. From Jann Horn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits) bpf: fix double-fdput in replace_map_fd_with_map_ptr() net: ipv6: Delete host routes on an ifdown Revert "ipv6: Revert optional address flusing on ifdown." net/mlx4_en: fix spurious timestamping callbacks net: dummy: remove note about being Y by default cxgbi: fix uninitialized flowi6 ipv6: Revert optional address flusing on ifdown. ipv4/fib: don't warn when primary address is missing if in_dev is dead net/mlx5: Add pci shutdown callback net/mlx5_core: Remove static from local variable net/mlx5e: Use vport MTU rather than physical port MTU net/mlx5e: Fix minimum MTU net/mlx5e: Device's mtu field is u16 and not int net/mlx5_core: Add ConnectX-5 to list of supported devices net/mlx5e: Fix MLX5E_100BASE_T define net/mlx5_core: Fix soft lockup in steering error flow qlcnic: Update version to 5.3.64 net: stmmac: socfpga: Remove re-registration of reset controller macsec: fix netlink attribute validation macsec: add missing macsec prefix in uapi ...
2016-04-27Merge branch 'drm-etnaviv-fixes' of git://git.pengutronix.de:/git/lst/linux ↵Dave Airlie
into drm-fixes just a single fix to not move the GPU linear window on cores where it might lead to inconsistent views of the memory by different engines in the core, thus breaking relocs and possibly causing other fun. * 'drm-etnaviv-fixes' of git://git.pengutronix.de:/git/lst/linux: drm/etnaviv: don't move linear memory window on 3D cores without MC2.0
2016-04-26Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "Here are the latest bug fixes for ARM SoCs, mostly addressing recent regressions. Changes are across several platforms, so I'm listing every change separately here. Regressions since 4.5: - A correction of the psci firmware DT binding, to prevent users from relying on unintended semantics - Actually getting the newly merged clock driver for some OMAP platforms to work - A revert of patches for the Qualcomm BAM, these need to be reworked for 4.7 to avoid breaking boards other than the one they were intended for - A correction for the I2C device nodes on the Socionext Uniphier platform - i.MX SDHCI was broken for non-DT platforms due to a change with the setting of the DMA mask - A revert of a patch that accidentally added a nonexisting clock on the Rensas "Porter" board - A couple of OMAP fixes that are all related to suspend after the power domain changes for dra7 - On Mediatek, revert part of the power domain initialization changes that broke mt8173-evb Fixes for older bugs: - Workaround for an "external abort" in the omap34xx suspend/resume code. - The USB1/eSATA should not be listed as an excon device on am57xx-beagle-x15 (broken since v4.0) - A v4.5 regression in the TI AM33xx and AM43XX DT specifying incorrect DMA request lines for the GPMC - The jiffies calibration on Renesas platforms was incorrect for some modern CPU cores. - A hardware errata woraround for clockdomains on TI DRA7" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems arm64: dts: uniphier: fix I2C nodes of PH1-LD20 ARM: shmobile: timer: Fix preset_lpj leading to too short delays Revert "ARM: dts: porter: Enable SCIF_CLK frequency and pins" ARM: dts: r8a7791: Don't disable referenced optional clocks Revert "ARM: OMAP: Catch callers of revision information prior to it being populated" ARM: OMAP3: Fix external abort on 36xx waking from off mode idle ARM: dts: am57xx-beagle-x15: remove extcon_usb1 ARM: dts: am437x: Fix GPMC dma properties ARM: dts: am33xx: Fix GPMC dma properties Revert "soc: mediatek: SCPSYS: Fix double enabling of regulators" ARM: mach-imx: sdhci-esdhc-imx: initialize DMA mask ARM: DRA7: clockdomain: Implement timer workaround for errata i874 ARM: OMAP: Catch callers of revision information prior to it being populated ARM: dts: dra7: Correct clock tree for sys_32k_ck ARM: OMAP: DRA7: Provide proper class to omap2_set_globals_tap ARM: OMAP: DRA7: wakeupgen: Skip SAR save for wakeupgen Revert "dts: msm8974: Add dma channels for blsp2_i2c1 node" Revert "dts: msm8974: Add blsp2_bam dma node" ARM: dts: Add clocks for dm814x ADPLL
2016-04-26devpts: more pty driver interface cleanupsLinus Torvalds
This is more prep-work for the upcoming pty changes. Still just code cleanup with no actual semantic changes. This removes a bunch pointless complexity by just having the slave pty side remember the dentry associated with the devpts slave rather than the inode. That allows us to remove all the "look up the dentry" code for when we want to remove it again. Together with moving the tty pointer from "inode->i_private" to "dentry->d_fsdata" and getting rid of pointless inode locking, this removes about 30 lines of code. Not only is the end result smaller, it's simpler and easier to understand. The old code, for example, depended on the d_find_alias() to not just find the dentry, but also to check that it is still hashed, which in turn validated the tty pointer in the inode. That is a _very_ roundabout way to say "invalidate the cached tty pointer when the dentry is removed". The new code just does dentry->d_fsdata = NULL; in devpts_pty_kill() instead, invalidating the tty pointer rather more directly and obviously. Don't do something complex and subtle when the obvious straightforward approach will do. The rest of the patch (ie apart from code deletion and the above tty pointer clearing) is just switching the calling convention to pass the dentry or file pointer around instead of the inode. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Greg KH <greg@kroah.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-26bpf: fix double-fdput in replace_map_fd_with_map_ptr()Jann Horn
When bpf(BPF_PROG_LOAD, ...) was invoked with a BPF program whose bytecode references a non-map file descriptor as a map file descriptor, the error handling code called fdput() twice instead of once (in __bpf_map_get() and in replace_map_fd_with_map_ptr()). If the file descriptor table of the current task is shared, this causes f_count to be decremented too much, allowing the struct file to be freed while it is still in use (use-after-free). This can be exploited to gain root privileges by an unprivileged user. This bug was introduced in commit 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn"), but is only exploitable since commit 1be7f75d1668 ("bpf: enable non-root eBPF programs") because previously, CAP_SYS_ADMIN was required to reach the vulnerable code. (posted publicly according to request by maintainer) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26Merge remote-tracking branches 'asoc/fix/rt5640' and 'asoc/fix/wm8962' into ↵Mark Brown
asoc-linus
2016-04-26Merge remote-tracking branches 'asoc/fix/arizona', 'asoc/fix/cs35l32', ↵Mark Brown
'asoc/fix/hdac', 'asoc/fix/nau8825' and 'asoc/fix/rt5616' into asoc-linus
2016-04-26Merge remote-tracking branch 'asoc/fix/intel' into asoc-linusMark Brown
2016-04-26Merge remote-tracking branch 'asoc/fix/dapm' into asoc-linusMark Brown
2016-04-26Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging"Andy Lutomirski
This reverts commit 320d25b6a05f8b73c23fc21025d2906ecdd2d4fc. This change was problematic for a couple of reasons: 1. It missed a some entry points (Xen things and 64-bit native). 2. The entry it changed can be executed more than once. This isn't really a problem, but it conflated per-cpu state setup and global state setup. 3. It broke 64-bit non-NX. 64-bit non-NX worked the other way around from 32-bit -- __supported_pte_mask had NX set initially and was *cleared* in x86_configure_nx. With the patch applied, it never got cleared. Reported-and-tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/59bd15f7f4b56b633a611b7f70876c6d2ad01a98.1461685884.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-26RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chipsHariprasad S
For T4, kernel mode qps don't use the user doorbell. User mode qps during flow control db ringing are forced into kernel, where user doorbell is treated as kernel doorbell and proper bar2 offset in bar2 virtual space is calculated, which incase of T4 is a bogus address, causing a kernel panic due to illegal write during doorbell ringing. In case of T4, kernel mode qp bar2 virtual address should be 0. Added T4 check during bar2 virtual address calculation to return 0. Fixed Bar2 range checks based on bar2 physical address. The below oops will be fixed <1>BUG: unable to handle kernel paging request at 000000000002aa08 <1>IP: [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4] <4>PGD 1416a8067 PUD 15bf35067 PMD 0 <4>Oops: 0002 [#1] SMP <4>last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.4/infiniband/cxgb4_0/node_guid <4>CPU 5 <4>Modules linked in: rdma_ucm rdma_cm ib_cm ib_sa ib_mad ib_uverbs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf vhost_net macvtap macvlan tun kvm uinput microcode iTCO_wdt iTCO_vendor_support sg joydev serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp pps_core ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix iw_cxgb4 iw_cm ib_core ib_addr cxgb4 ipv6 dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> Supermicro X8ST3/X8ST3 <4>RIP: 0010:[<ffffffffa011d800>] [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4] <4>RSP: 0000:ffff880155a03db0 EFLAGS: 00010006 <4>RAX: 000000000000001d RBX: ffff88013ae5fc00 RCX: ffff880155adb180 <4>RDX: 000000000002aa00 RSI: 0000000000000001 RDI: ffff88013ae5fdf8 <4>RBP: ffff880155a03e10 R08: 0000000000000000 R09: 0000000000000001 <4>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>R13: 000000000000001d R14: ffff880156414ab0 R15: ffffe8ffffc05b88 <4>FS: 0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 000000000002aa08 CR3: 000000015bd0e000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process cxgb4 (pid: 394, threadinfo ffff880155a00000, task ffff880156414ab0) <4>Stack: <4> ffff880156415068 ffff880155adb180 ffff880155a03df0 ffffffffa00a344b <4><d> 00000000000003e8 ffff880155920000 0000000000000004 ffff880155920000 <4><d> ffff88015592d438 ffffffffa00a3860 ffff880155a03fd8 ffffe8ffffc05b88 <4>Call Trace: <4> [<ffffffffa00a344b>] ? enable_txq_db+0x2b/0x80 [cxgb4] <4> [<ffffffffa00a3860>] ? process_db_full+0x0/0xa0 [cxgb4] <4> [<ffffffffa00a38a6>] process_db_full+0x46/0xa0 [cxgb4] <4> [<ffffffff8109fda0>] worker_thread+0x170/0x2a0 <4> [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40 <4> [<ffffffff8109fc30>] ? worker_thread+0x0/0x2a0 <4> [<ffffffff810a660e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff810a6570>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4>Code: e9 ba 00 00 00 66 0f 1f 44 00 00 44 8b 05 29 07 02 00 45 85 c0 0f 85 71 02 00 00 8b 83 70 01 00 00 45 0f b7 ed c1 e0 0f 44 09 e8 <89> 42 08 0f ae f8 66 c7 83 82 01 00 00 00 00 44 0f b7 ab dc 01 <1>RIP [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4] <4> RSP <ffff880155a03db0> <4>CR2: 000000000002aa08` Based on original work by Bharat Potnuri <bharat@chelsio.com> Fixes: 74217d4c6a4fb0d8 ("iw_cxgb4: support for bar2 qid densities exceeding the page size") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Reviewed-by: Leon Romanovsky <leon@leon.nu> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-26iw_cxgb4: handle draining an idle qpSteve Wise
In c4iw_drain_sq/rq(), if the particular queue is already empty then don't block. Fixes: ce4af14d94aa ('iw_cxgb4: add queue drain functions') Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>