summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-09-07Merge tag 'acpi-4.19-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a regression from the 4.18 cycle in the ACPI driver for Intel SoCs (LPSS) and prevent dmi_check_system() from being called on non-x86 systems in the ACPI core. Specifics: - Fix a power management regression in the ACPI driver for Intel SoCs (LPSS) introduced by a system-wide suspend/resume fix during the 4.18 cycle (Zhang Rui). - Prevent dmi_check_system() from being called on non-x86 systems in the ACPI core (Jean Delvare)" * tag 'acpi-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / LPSS: Force LPSS quirks on boot ACPI / bus: Only call dmi_check_system() on X86
2018-09-07Merge tag 'sound-4.19-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Just a few small fixes: - a fix for the recursive work cancellation in a specific HD-audio operation mode - a fix for potentially uninitialized memory access via rawmidi - the register bit access fixes for ASoC HD-audio" * tag 'sound-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda: Fix several mismatch for register mask and value ALSA: rawmidi: Initialize allocated buffers ALSA: hda - Fix cancel_work_sync() stall from jackpoll work
2018-09-07KVM: LAPIC: Fix pv ipis out-of-bounds accessWanpeng Li
Dan Carpenter reported that the untrusted data returns from kvm_register_read() results in the following static checker warning: arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi() error: buffer underflow 'map->phys_map' 's32min-s32max' KVM guest can easily trigger this by executing the following assembly sequence in Ring0: mov $10, %rax mov $0xFFFFFFFF, %rbx mov $0xFFFFFFFF, %rdx mov $0, %rsi vmcall As this will cause KVM to execute the following code-path: vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi() which will reach out-of-bounds access. This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id, ignoring destinations that are not present and delivering the rest. We also check whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> [Add second "if (min > map->max_apic_id)" to complete the fix. -Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-09-07KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2Liran Alon
Consider the case L1 had a IRQ/NMI event until it executed VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME, L0 needs to evaluate if this pending event should cause an exit from L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept EXTERNAL_INTERRUPT). Usually this would be handled by L0 requesting a IRQ/NMI window by setting VMCS accordingly. However, this setting was done on VMCS01 and now VMCS02 is active instead. Thus, when L1 executes VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by requesting a KVM_REQ_EVENT. Note that above scenario exists when L1 KVM is about to enter L2 but requests an "immediate-exit". As in this case, L1 will disable-interrupts and then send a self-IPI before entering L2. Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-09-07Merge tag 'kvm-arm-fixes-for-v4.19-v2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm Fixes for KVM/ARM for Linux v4.19 v2: - Fix a VFP corruption in 32-bit guest - Add missing cache invalidation for CoW pages - Two small cleanups
2018-09-07Merge tag 'kvm-s390-master-4.19-1' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: Fixes for 4.19 - Fallout from the hugetlbfs support: pfmf interpretion and locking - VSIE: fix keywrapping for nested guests
2018-09-07arm64: KVM: Remove pgd_lockSteven Price
The lock has never been used and the page tables are protected by mmu_lock in struct kvm. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-09-07KVM: Remove obsolete kvm_unmap_hva notifier backendMarc Zyngier
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to deal with. Drop the now obsolete code. Fixes: fb1522e099f0 ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-09-07arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMDMarc Zyngier
If trapping FPSIMD in the context of an AArch32 guest, it is critical to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and not EL1. Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1 if we're not going to trap FPSIMD, as we then corrupt the existing VFP state. Moving the call to __activate_traps_fpsimd32 to the point where we know for sure that we are going to trap ensures that we don't set that bit spuriously. Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing") Cc: stable@vger.kernel.org # v4.18 Cc: Dave Martin <dave.martin@arm.com> Reported-by: Alexander Graf <agraf@suse.de> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-09-07KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoWMarc Zyngier
When triggering a CoW, we unmap the RO page via an MMU notifier (invalidate_range_start), and then populate the new PTE using another one (change_pte). In the meantime, we'll have copied the old page into the new one. The problem is that the data for the new page is sitting in the cache, and should the guest have an uncached mapping to that page (or its MMU off), following accesses will bypass the cache. In a way, this is similar to what happens on a translation fault: We need to clean the page to the PoC before mapping it. So let's just do that. This fixes a KVM unit test regression observed on a HiSilicon platform, and subsequently reproduced on Seattle. Fixes: a9c0e12ebee5 ("KVM: arm/arm64: Only clean the dcache on translation fault") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-09-07i2c: xiic: Record xilinx i2c with Zynq fragmentMichal Simek
Include xilinx soft i2c controller to Zynq fragment to make clear who is responsible for it. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-09-07Merge branch 'acpi-bus'Rafael J. Wysocki
Merge ACPI core fix to avoid calling dmi_check_system() on non-x86. * acpi-bus: ACPI / bus: Only call dmi_check_system() on X86
2018-09-06tipc: call start and done ops directly in __tipc_nl_compat_dumpit()Cong Wang
__tipc_nl_compat_dumpit() uses a netlink_callback on stack, so the only way to align it with other ->dumpit() call path is calling tipc_dump_start() and tipc_dump_done() directly inside it. Otherwise ->dumpit() would always get NULL from cb->args[]. But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve net pointer, the cb->skb here doesn't set skb->sk, the net pointer is saved in msg->net instead, so introduce a helper function __tipc_dump_start() to pass in msg->net. Ying pointed out cb->args[0...3] are already used by other callbacks on this call path, so we can't use cb->args[0] any more, use cb->args[4] instead. Fixes: 9a07efa9aea2 ("tipc: switch to rhashtable iterator") Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06Merge tag 'drm-fixes-2018-09-07' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Seems to have been overly quiet this week so I expect next week will be more stuff, just one pull from Rodrigo with i915 fixes in it. Quoting Rodrigo: 'The critical fix here on display side is the DP MST regression one. But this pull also include fixes for DP SST, small VDSC register fix and GVT's bucked with "BXT fixes, two guest warning fixes, dmabuf format mod fix and one for recent multiple VM timeout failure'." * tag 'drm-fixes-2018-09-07' of git://anongit.freedesktop.org/drm/drm: drm/i915/dp_mst: Fix enabling pipe clock for all streams drm/i915/dsc: Fix PPS register definition macros for 2nd VDSC engine drm/i915: Re-apply "Perform link quality check, unconditionally during long pulse" drm/i915/gvt: Give new born vGPU higher scheduling chance drm/i915/gvt: Fix drm_format_mod value for vGPU plane drm/i915/gvt: move intel_runtime_pm_get out of spin_lock in stop_schedule drm/i915/gvt: Handle GEN9_WM_CHICKEN3 with F_CMD_ACCESS. drm/i915/gvt: Make correct handling to vreg BXT_PHY_CTL_FAMILY drm/i915/gvt: emulate gen9 dbuf ctl register access
2018-09-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu fix from Greg Ungerer: "A single change to fix booting on ColdFire platforms that have RAM starting at a non-0 address" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: fix early memory reservation for ColdFire MMU systems
2018-09-07Merge tag 'drm-intel-fixes-2018-09-05' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes The critical fix here on display side is the DP MST regression one. But this pull also include fixes for DP SST, small VDSC register fix and GVT's bucked with "BXT fixes, two guest warning fixes, dmabuf format mod fix and one for recent multiple VM timeout failure." Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180905183000.GA2151@intel.com
2018-09-06Merge tag 'mips_fixes_4.19_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Paul Burton: "A single fix for v4.19-rc3, resolving a problem with our VDSO data page for systems with dcache aliasing. Those systems could previously observe stale data, causing clock_gettime() & gettimeofday() to return incorrect values" * tag 'mips_fixes_4.19_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: VDSO: Match data page cache colouring when D$ aliases
2018-09-06Merge tag '4.19-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Four small SMB3 fixes, three for stable, and one minor debug clarification" * tag '4.19-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: connect to servername instead of IP for IPC$ share smb3: check for and properly advertise directory lease support smb3: minor debugging clarifications in rfc1001 len processing SMB3: Backup intent flag missing for directory opens with backupuid mounts fs/cifs: don't translate SFM_SLASH (U+F026) to backslash
2018-09-06clocksource: Revert "Remove kthread"Peter Zijlstra
I turns out that the silly spawn kthread from worker was actually needed. clocksource_watchdog_kthread() cannot be called directly from clocksource_watchdog_work(), because clocksource_select() calls timekeeping_notify() which uses stop_machine(). One cannot use stop_machine() from a workqueue() due lock inversions wrt CPU hotplug. Revert the patch but add a comment that explain why we jump through such apparently silly hoops. Fixes: 7197e77abcb6 ("clocksource: Remove kthread") Reported-by: Siegfried Metz <frame@mailbox.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Niklas Cassel <niklas.cassel@linaro.org> Tested-by: Kevin Shanahan <kevin@shanahan.id.au> Tested-by: viktor_jaegerskuepper@freenet.de Tested-by: Siegfried Metz <frame@mailbox.org> Cc: rafael.j.wysocki@intel.com Cc: len.brown@intel.com Cc: diego.viola@gmail.com Cc: rui.zhang@intel.com Cc: bjorn.andersson@linaro.org Link: https://lkml.kernel.org/r/20180905084158.GR24124@hirez.programming.kicks-ass.net
2018-09-06dm raid: bump target version, update comments and documentationHeinz Mauelshagen
Bump target version to reflect the documented fixes are available. Also fix some code comments (typos and clarity). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06dm raid: fix RAID leg rebuild errorsHeinz Mauelshagen
On fast devices such as NVMe, a flaw in rs_get_progress() results in false target status output when userspace lvm2 requests leg rebuilds (symptom of the failure is device health chars 'aaaaaaaa' instead of expected 'aAaAAAAA' causing lvm2 to fail). The correct sync action state definitions already exist in decipher_sync_action() so fix rs_get_progress() to use it. Change decipher_sync_action() to return an enum rather than a string for the sync states and call it from rs_get_progress(). Introduce sync_str() to translate from enum to the string that is needed by raid_status(). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06dm raid: fix rebuild of specific devices by updating superblockHeinz Mauelshagen
Update superblock when particular devices are requested via rebuild (e.g. lvconvert --replace ...) to avoid spurious failure with the "New device injected into existing raid set without 'delta_disks' or 'rebuild' parameter specified" error message. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06dm raid: fix stripe adding reshape deadlockHeinz Mauelshagen
When initiating a stripe adding reshape, a deadlock between md_stop_writes() waiting for the sync thread to stop and the running sync thread waiting for inactive stripes occurs (this frequently happens on single-core but rarely on multi-core systems). Fix this deadlock by setting MD_RECOVERY_WAIT to have the main MD resynchronization thread worker (md_do_sync()) bail out when initiating the reshape via constructor arguments. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06Merge tag 'for-linus-20180906' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Small collection of fixes that should go into this release. This contains: - Small series that fixes a race between blkcg teardown and writeback (Dennis Zhou) - Fix disallowing invalid block size settings from the nbd ioctl (me) - BFQ fix for a use-after-free on last release of a bfqg (Konstantin Khlebnikov) - Fix for the "don't warn for flush" fix (Mikulas)" * tag 'for-linus-20180906' of git://git.kernel.dk/linux-block: block: bfq: swap puts in bfqg_and_blkg_put block: don't warn when doing fsync on read-only devices nbd: don't allow invalid blocksize settings blkcg: use tryget logic when associating a blkg with a bio blkcg: delay blkg destruction until after writeback has finished Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
2018-09-07drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP ↵Ben Skeggs
panels Fixes eDP backlight issues on more recent laptops. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/disp: fix DP disable raceBen Skeggs
If a HPD pulse signalling the need to retrain the link occurs between the KMS driver releasing the output and the supervisor interrupt that finishes the teardown, it was possible get a NULL-ptr deref. Avoid this by marking the link as inactive earlier. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/disp: move eDP panel power handlingBen Skeggs
We need to do this earlier to prevent aux channel timeouts in resume paths on certain systems. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/disp: remove unused struct memberBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOSBen Skeggs
This Falcon application doesn't appear to be present on some newer systems, so let's not fail init if we can't find it. TBD: is there a way to determine whether it *should* be there? Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointerBen Skeggs
Fixes oopses in certain failure paths. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: fix oops in client init failure pathBen Skeggs
The NV_ERROR macro requires drm->client to be initialised, which it may not be at this stage of the init process. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Fix nouveau_connector_ddc_detect()Lyude Paul
It looks like that when we moved over to using drm_connector_for_each_possible_encoder() in nouveau, that one rather important part of this function got dropped by accident: /* Right v here */ for (i = 0; nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER; i++) { int id = connector->encoder_ids[i]; if (id == 0) break; Since it's rather difficult to notice: the conditional in this loop is actually: nv_encoder = NULL, i < DRM_CONNECTOR_MAX_ENCODER Meaning that all early breaks result in nv_encoder keeping it's value, otherwise nv_encoder = NULL. Ugh. Since this got dropped, nouveau_connector_ddc_detect() now returns an encoder for every single connector, regardless of whether or not it's detected: [ 1780.056185] nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DP-2 So: fix this to ensure we only return an encoder if we actually found one, and clean up the rest of the function while we're at it since it's nearly impossible to read properly. Changes since v1: - Don't skip ddc probing for LVDS if we can't switch DDC through vga-switcheroo, just do the DDC probing without calling vga_switcheroo_lock_ddc() - skeggsb Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: ddba766dd07e ("drm/nouveau: Use drm_connector_for_each_possible_encoder()") Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unloadLyude Paul
Currently, there's nothing in nouveau that actually cancels this work struct. So, cancel it on suspend/unload. Otherwise, if we're unlucky enough hpd_work might try to keep running up until the system is suspended. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too earlyLyude Paul
On most systems with ACPI hotplugging support, it seems that we always receive a hotplug event once we re-enable EC interrupts even if the GPU hasn't even been resumed yet. This can cause problems since even though we schedule hpd_work to handle connector reprobing for us, hpd_work synchronizes on pm_runtime_get_sync() to wait until the device is ready to perform reprobing. Since runtime suspend/resume callbacks are disabled before the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during this period will grab a runtime PM ref and return immediately with -EACCES. Because we schedule hpd_work from our ACPI HPD handler, and hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch a connector reprobe immediately even if the GPU isn't actually resumed just yet. This causes various warnings in dmesg and occasionally, also prevents some displays connected to the dedicated GPU from coming back up after suspend. Example: usb 1-4: USB disconnect, device number 14 usb 1-4.1: USB disconnect, device number 15 WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau] CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1 Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018 Workqueue: events nouveau_display_hpd_work [nouveau] RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau] RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000 RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002 RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000 R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418 FS: 0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? _cond_resched+0x15/0x30 nouveau_connector_detect+0x2ce/0x520 [nouveau] ? _cond_resched+0x15/0x30 ? ww_mutex_lock+0x12/0x40 drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper] drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper] nouveau_display_hpd_work+0x2a/0x60 [nouveau] process_one_work+0x187/0x340 worker_thread+0x2e/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff ---[ end trace 55d811b38fc8e71a ]--- So, to fix this we attempt to grab a runtime PM reference in the ACPI handler itself asynchronously. If the GPU is already awake (it will have normal hotplugging at this point) or runtime PM callbacks are currently disabled on the device, we drop our reference without updating the autosuspend delay. We only schedule connector reprobes when we successfully managed to queue up a resume request with our asynchronous PM ref. This also has the added benefit of preventing redundant connector reprobes from ACPI while the GPU is runtime resumed! Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <kherbst@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41 Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Reset MST branching unit before enablingLyude Paul
When probing a new MST device, it's not safe to make any assumptions about it's current state. While most well mannered MST hubs will just disable the branching unit on hotplug disconnects, this isn't enough to save us from various other scenarios that might have resulted in something writing to the MST branching unit before we got control of it. This could happen if a previous probe we tried failed, if we're booting in kexec context and the hub is still in the state the last kernel put it in, etc. Luckily; there is no reason we can't just reset the branching unit every time we enable a new topology. So, fix this by resetting it on enabling new topologies to ensure that we always start off with a clean, unmodified topology state on MST sinks. This fixes occasional hard-lockups on my P50's laptop dock (e.g. AUX times out all DPCD trasactions) observed after multiple docks, undocks, and module reloads. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Only write DP_MSTM_CTRL when neededLyude Paul
Currently, nouveau will re-write the DP_MSTM_CTRL register for an MST hub every time it receives a long HPD pulse on DP. This isn't actually necessary and additionally, has some unintended side effects. With the P50 I've got here, rewriting DP_MSTM_CTRL constantly seems to make it rather likely (1 out of 5 times usually) that bringing up MST with it's ThinkPad dock will fail and result in sideband messages timing out in the middle. Afterwards, successive probes don't manage to get the dock to communicate properly over MST sideband properly. Many times sideband message timeouts from MST hubs are indicative of either the source or the sink dropping an ESI event, which can cause DRM's perspective of the topology's current state to go out of sync with reality. While it's tough to really know for sure what's happening to the dock, using userspace tools to write to DP_MSTM_CTRL in the middle of the MST link probing process does appear to make things flaky. It's possible that when we write to DP_MSTM_CTRL, the function that gets triggered to respond in the dock's firmware temporarily puts it in a state where it might end up not reporting an ESI to the source, or ends up dropping a sideband message we sent it. So, to fix this we make it so that when probing an MST topology, we respect it's current state. If the dock's already enabled, we simply read DP_MSTM_CTRL and disable the topology if it's value is not what we expected. Otherwise, we perform the normal MST probing dance. We avoid taking any action except if the state of the MST topology actually changes. This fixes MST sideband message timeouts and detection failures on my P50 with its ThinkPad dock. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Remove useless poll_enable() call in drm_load()Lyude Paul
Again, this doesn't do anything. drm_kms_helper_poll_enable() will have already been called in nouveau_display_init() Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()Lyude Paul
This won't do anything but potentially make us miss hotplugs. We already call drm_kms_helper_poll_disable() in nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini() Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()Lyude Paul
This doesn't do anything, drm_kms_helper_poll_enable() gets called in nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init() already. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Fix deadlocks in nouveau_connector_detect()Lyude Paul
When we disable hotplugging on the GPU, we need to be able to synchronize with each connector's hotplug interrupt handler before the interrupt is finally disabled. This can be a problem however, since nouveau_connector_detect() currently grabs a runtime power reference when handling connector probing. This will deadlock the runtime suspend handler like so: [ 861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds. [ 861.483290] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.486332] kworker/0:2 D 0 61 2 0x80000000 [ 861.487044] Workqueue: events nouveau_display_hpd_work [nouveau] [ 861.487737] Call Trace: [ 861.488394] __schedule+0x322/0xaf0 [ 861.489070] schedule+0x33/0x90 [ 861.489744] rpm_resume+0x19c/0x850 [ 861.490392] ? finish_wait+0x90/0x90 [ 861.491068] __pm_runtime_resume+0x4e/0x90 [ 861.491753] nouveau_display_hpd_work+0x22/0x60 [nouveau] [ 861.492416] process_one_work+0x231/0x620 [ 861.493068] worker_thread+0x44/0x3a0 [ 861.493722] kthread+0x12b/0x150 [ 861.494342] ? wq_pool_ids_show+0x140/0x140 [ 861.494991] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.495648] ret_from_fork+0x3a/0x50 [ 861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds. [ 861.496968] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.498341] kworker/6:2 D 0 320 2 0x80000080 [ 861.499045] Workqueue: pm pm_runtime_work [ 861.499739] Call Trace: [ 861.500428] __schedule+0x322/0xaf0 [ 861.501134] ? wait_for_completion+0x104/0x190 [ 861.501851] schedule+0x33/0x90 [ 861.502564] schedule_timeout+0x3a5/0x590 [ 861.503284] ? mark_held_locks+0x58/0x80 [ 861.503988] ? _raw_spin_unlock_irq+0x2c/0x40 [ 861.504710] ? wait_for_completion+0x104/0x190 [ 861.505417] ? trace_hardirqs_on_caller+0xf4/0x190 [ 861.506136] ? wait_for_completion+0x104/0x190 [ 861.506845] wait_for_completion+0x12c/0x190 [ 861.507555] ? wake_up_q+0x80/0x80 [ 861.508268] flush_work+0x1c9/0x280 [ 861.508990] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 861.509735] nvif_notify_put+0xb1/0xc0 [nouveau] [ 861.510482] nouveau_display_fini+0xbd/0x170 [nouveau] [ 861.511241] nouveau_display_suspend+0x67/0x120 [nouveau] [ 861.511969] nouveau_do_suspend+0x5e/0x2d0 [nouveau] [ 861.512715] nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau] [ 861.513435] pci_pm_runtime_suspend+0x6b/0x180 [ 861.514165] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.514897] __rpm_callback+0x7a/0x1d0 [ 861.515618] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.516313] rpm_callback+0x24/0x80 [ 861.517027] ? pci_has_legacy_pm_support+0x70/0x70 [ 861.517741] rpm_suspend+0x142/0x6b0 [ 861.518449] pm_runtime_work+0x97/0xc0 [ 861.519144] process_one_work+0x231/0x620 [ 861.519831] worker_thread+0x44/0x3a0 [ 861.520522] kthread+0x12b/0x150 [ 861.521220] ? wq_pool_ids_show+0x140/0x140 [ 861.521925] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.522622] ret_from_fork+0x3a/0x50 [ 861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds. [ 861.523977] Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 861.525349] kworker/6:0 D 0 1329 2 0x80000000 [ 861.526073] Workqueue: events nvif_notify_work [nouveau] [ 861.526751] Call Trace: [ 861.527411] __schedule+0x322/0xaf0 [ 861.528089] schedule+0x33/0x90 [ 861.528758] rpm_resume+0x19c/0x850 [ 861.529399] ? finish_wait+0x90/0x90 [ 861.530073] __pm_runtime_resume+0x4e/0x90 [ 861.530798] nouveau_connector_detect+0x7e/0x510 [nouveau] [ 861.531459] ? ww_mutex_lock+0x47/0x80 [ 861.532097] ? ww_mutex_lock+0x47/0x80 [ 861.532819] ? drm_modeset_lock+0x88/0x130 [drm] [ 861.533481] drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper] [ 861.534127] drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper] [ 861.534940] nouveau_connector_hotplug+0x98/0x120 [nouveau] [ 861.535556] nvif_notify_work+0x2d/0xb0 [nouveau] [ 861.536221] process_one_work+0x231/0x620 [ 861.536994] worker_thread+0x44/0x3a0 [ 861.537757] kthread+0x12b/0x150 [ 861.538463] ? wq_pool_ids_show+0x140/0x140 [ 861.539102] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.539815] ret_from_fork+0x3a/0x50 [ 861.540521] Showing all locks held in the system: [ 861.541696] 2 locks held by kworker/0:2/61: [ 861.542406] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543071] #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.543814] 1 lock held by khungtaskd/64: [ 861.544535] #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 861.545160] 3 locks held by kworker/6:2/320: [ 861.545896] #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.546702] #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.547443] #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau] [ 861.548146] 1 lock held by dmesg/983: [ 861.548889] 2 locks held by zsh/1250: [ 861.549605] #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 861.550393] #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 861.551122] 6 locks held by kworker/6:0/1329: [ 861.551957] #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.552765] #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620 [ 861.553582] #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper] [ 861.554357] #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper] [ 861.555227] #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper] [ 861.556133] #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm] [ 861.557864] ============================================= [ 861.559507] NMI backtrace for cpu 2 [ 861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G O 4.18.0-rc6Lyude-Test+ #1 [ 861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018 [ 861.561948] Call Trace: [ 861.562757] dump_stack+0x8e/0xd3 [ 861.563516] nmi_cpu_backtrace.cold.3+0x14/0x5a [ 861.564269] ? lapic_can_unplug_cpu.cold.27+0x42/0x42 [ 861.565029] nmi_trigger_cpumask_backtrace+0xa1/0xae [ 861.565789] arch_trigger_cpumask_backtrace+0x19/0x20 [ 861.566558] watchdog+0x316/0x580 [ 861.567355] kthread+0x12b/0x150 [ 861.568114] ? reset_hung_task_detector+0x20/0x20 [ 861.568863] ? kthread_create_worker_on_cpu+0x70/0x70 [ 861.569598] ret_from_fork+0x3a/0x50 [ 861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7: [ 861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120 [ 861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120 [ 861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120 [ 861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120 [ 861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120 [ 861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120 [ 861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120 [ 861.572428] Kernel panic - not syncing: hung_task: blocked tasks So: fix this by making it so that normal hotplug handling /only/ happens so long as the GPU is currently awake without any pending runtime PM requests. In the event that a hotplug occurs while the device is suspending or resuming, we can simply defer our response until the GPU is fully runtime resumed again. Changes since v4: - Use a new trick I came up with using pm_runtime_get() instead of the hackish junk we had before Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()Lyude Paul
It's true we can't resume the device from poll workers in nouveau_connector_detect(). We can however, prevent the autosuspend timer from elapsing immediately if it hasn't already without risking any sort of deadlock with the runtime suspend/resume operations. So do that instead of entirely avoiding grabbing a power reference. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requestsLyude Paul
Currently, nouveau uses the generic drm_fb_helper_output_poll_changed() function provided by DRM as it's output_poll_changed callback. Unfortunately however, this function doesn't grab runtime PM references early enough and even if it did-we can't block waiting for the device to resume in output_poll_changed() since it's very likely that we'll need to grab the fb_helper lock at some point during the runtime resume process. This currently results in deadlocking like so: [ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds. [ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.676527] kworker/4:0 D 0 37 2 0x80000000 [ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper] [ 246.678704] Call Trace: [ 246.679753] __schedule+0x322/0xaf0 [ 246.680916] schedule+0x33/0x90 [ 246.681924] schedule_preempt_disabled+0x15/0x20 [ 246.683023] __mutex_lock+0x569/0x9a0 [ 246.684035] ? kobject_uevent_env+0x117/0x7b0 [ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.686179] mutex_lock_nested+0x1b/0x20 [ 246.687278] ? mutex_lock_nested+0x1b/0x20 [ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper] [ 246.692611] process_one_work+0x231/0x620 [ 246.693725] worker_thread+0x214/0x3a0 [ 246.694756] kthread+0x12b/0x150 [ 246.695856] ? wq_pool_ids_show+0x140/0x140 [ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.697998] ret_from_fork+0x3a/0x50 [ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds. [ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.702278] kworker/0:1 D 0 60 2 0x80000000 [ 246.703293] Workqueue: pm pm_runtime_work [ 246.704393] Call Trace: [ 246.705403] __schedule+0x322/0xaf0 [ 246.706439] ? wait_for_completion+0x104/0x190 [ 246.707393] schedule+0x33/0x90 [ 246.708375] schedule_timeout+0x3a5/0x590 [ 246.709289] ? mark_held_locks+0x58/0x80 [ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40 [ 246.711222] ? wait_for_completion+0x104/0x190 [ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190 [ 246.713094] ? wait_for_completion+0x104/0x190 [ 246.713964] wait_for_completion+0x12c/0x190 [ 246.714895] ? wake_up_q+0x80/0x80 [ 246.715727] ? get_work_pool+0x90/0x90 [ 246.716649] flush_work+0x1c9/0x280 [ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0 [ 246.718442] __cancel_work_timer+0x146/0x1d0 [ 246.719247] cancel_delayed_work_sync+0x13/0x20 [ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper] [ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau] [ 246.721897] pci_pm_runtime_suspend+0x6b/0x190 [ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.723737] __rpm_callback+0x7a/0x1d0 [ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.725607] rpm_callback+0x24/0x80 [ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70 [ 246.727376] rpm_suspend+0x142/0x6b0 [ 246.728185] pm_runtime_work+0x97/0xc0 [ 246.728938] process_one_work+0x231/0x620 [ 246.729796] worker_thread+0x44/0x3a0 [ 246.730614] kthread+0x12b/0x150 [ 246.731395] ? wq_pool_ids_show+0x140/0x140 [ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.732878] ret_from_fork+0x3a/0x50 [ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds. [ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2 [ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.736113] kworker/4:2 D 0 422 2 0x80000080 [ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] [ 246.737665] Call Trace: [ 246.738490] __schedule+0x322/0xaf0 [ 246.739250] schedule+0x33/0x90 [ 246.739908] rpm_resume+0x19c/0x850 [ 246.740750] ? finish_wait+0x90/0x90 [ 246.741541] __pm_runtime_resume+0x4e/0x90 [ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau] [ 246.743124] drm_atomic_commit+0x4a/0x50 [drm] [ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper] [ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper] [ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper] [ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] [ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper] [ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper] [ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper] [ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau] [ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper] [ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper] [ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper] [ 246.752314] process_one_work+0x231/0x620 [ 246.752979] worker_thread+0x44/0x3a0 [ 246.753838] kthread+0x12b/0x150 [ 246.754619] ? wq_pool_ids_show+0x140/0x140 [ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70 [ 246.756162] ret_from_fork+0x3a/0x50 [ 246.756847] Showing all locks held in the system: [ 246.758261] 3 locks held by kworker/4:0/37: [ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper] [ 246.761516] 2 locks held by kworker/0:1/60: [ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.763890] 1 lock held by khungtaskd/64: [ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185 [ 246.765588] 5 locks held by kworker/4:2/422: [ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620 [ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper] [ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper] [ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm] [ 246.770839] 1 lock held by dmesg/1038: [ 246.771739] 2 locks held by zsh/1172: [ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 [ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870 [ 246.775522] ============================================= After trying dozens of different solutions, I found one very simple one that should also have the benefit of preventing us from having to fight locking for the rest of our lives. So, we work around these deadlocks by deferring all fbcon hotplug events that happen after the runtime suspend process starts until after the device is resumed again. Changes since v7: - Fixup commit message - Daniel Vetter Changes since v6: - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia Changes since v5: - Come up with the (hopefully final) solution for solving this dumb problem, one that is a lot less likely to cause issues with locking in the future. This should work around all deadlock conditions with fbcon brought up thus far. Changes since v4: - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock condition that Lukas described - Just move all of this out of drm_fb_helper. It seems that other DRM drivers have already figured out other workarounds for this. If other drivers do end up needing this in the future, we can just move this back into drm_fb_helper again. Changes since v3: - Actually check if fb_helper is NULL in both new helpers - Actually check drm_fbdev_emulation in both new helpers - Don't fire off a fb_helper hotplug unconditionally; only do it if the following conditions are true (as otherwise, calling this in the wrong spot will cause Bad Things to happen): - fb_helper hotplug handling was actually inhibited previously - fb_helper actually has a delayed hotplug pending - fb_helper is actually bound - fb_helper is actually initialized - Add __must_check to drm_fb_helper_suspend_hotplug(). There's no situation where a driver would actually want to use this without checking the return value, so enforce that - Rewrite and clarify the documentation for both helpers. - Make sure to return true in the drm_fb_helper_suspend_hotplug() stub that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION isn't enabled - Actually grab the toplevel fb_helper lock in drm_fb_helper_resume_hotplug(), since it's possible other activity (such as a hotplug) could be going on at the same time the driver calls drm_fb_helper_resume_hotplug(). We need this to check whether or not drm_fb_helper_hotplug_event() needs to be called anyway Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()Lyude Paul
Since actual hotplug notifications don't get disabled until nouveau_display_fini() is called, all this will do is cause any hotplugs that happen between this drm_kms_helper_poll_disable() call and the actual hotplug disablement to potentially be dropped if ACPI isn't around to help us. Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: stable@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-07drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placementLyude Paul
Turns out this part is my fault for not noticing when reviewing 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling"). Currently we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work(). This makes basically no sense however, because that means we're calling drm_kms_helper_poll_enable() every time we schedule the hotplug detection work. This is also against the advice mentioned in drm_kms_helper_poll_enable()'s documentation: Note that calls to enable and disable polling must be strictly ordered, which is automatically the case when they're only call from suspend/resume callbacks. Of course, hotplugs can't really be ordered. They could even happen immediately after we called drm_kms_helper_poll_disable() in nouveau_display_fini(), which can lead to all sorts of issues. Additionally; enabling polling /after/ we call drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to probe connectors so long as polling is disabled. So; simply move this back into nouveau_display_init() again. The race condition that both of these patches attempted to work around has already been fixed properly in d61a5c106351 ("drm/nouveau: Fix deadlock on runtime suspend") Fixes: 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling") Signed-off-by: Lyude Paul <lyude@redhat.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Cc: Lukas Wunner <lukas@wunner.de> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-09-06RDMA/mlx4: Ensure that maximal send/receive SGE less than supported by HWLeon Romanovsky
In calculating the global maximum number of the Scatter/Gather elements supported, the following four maximum parameters must be taken into consideration: max_sg_rq, max_sg_sq, max_desc_sz_rq and max_desc_sz_sq. However instead of bringing this complexity to query_device, which still won't be sufficient anyway (the calculations are dependent on QP type), the safer approach will be to restore old code, which will give us 32 SGEs. Fixes: 33023fb85a42 ("IB/core: add max_send_sge and max_recv_sge attributes") Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06RDMA/cma: Protect cma dev list with lockParav Pandit
When AF_IB addresses are used during rdma_resolve_addr() a lock is not held. A cma device can get removed while list traversal is in progress which may lead to crash. ie CPU0 CPU1 ==== ==== rdma_resolve_addr() cma_resolve_ib_dev() list_for_each() cma_remove_one() cur_dev->device mutex_lock(&lock) list_del(); mutex_unlock(&lock); cma_process_remove(); Therefore, hold a lock while traversing the list which avoids such situation. Cc: <stable@vger.kernel.org> # 3.10 Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06i2c: xiic: Make the start and the byte count write atomicShubhrajyoti Datta
Disable interrupts while configuring the transfer and enable them back. We have below as the programming sequence 1. start and slave address 2. byte count and stop In some customer platform there was a lot of interrupts between 1 and 2 and after slave address (around 7 clock cyles) if 2 is not executed then the transaction is nacked. To fix this case make the 2 writes atomic. Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> [wsa: added a newline for better readability] Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2018-09-06irqchip/gic-v3-its: Cap lpi_id_bits to reduce memory footprintJia He
Commit fe8e93504ce8 ("irqchip/gic-v3-its: Use full range of LPIs"), removes the cap for lpi_id_bits, which causes the following warning to trigger on a QDF2400 server: WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:4066 __alloc_pages_nodemask ... Call trace: __alloc_pages_nodemask+0x2d8/0x1188 alloc_pages_current+0x8c/0xd8 its_allocate_prop_table+0x5c/0xb8 its_init+0x220/0x3c0 gic_init_bases+0x250/0x380 gic_acpi_init+0x16c/0x2a4 In its_alloc_lpi_tables(), lpi_id_bits is 24 in QDF2400. The allocation in allocate_prop_table() tries therefore to allocate 16M (order 12 if pagesize=4k), which triggers the warning. As said by MarcL Capping lpi_id_bits at 16 (which is what we had before) is plenty, will save a some memory, and gives some margin before we need to push it up again. Bring the upper limit of lpi_id_bits back to prevent Fixes: fe8e93504ce8 ("irqchip/gic-v3-its: Use full range of LPIs") Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jia He <jia.he@hxt-semitech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Olof Johansson <olof@lixom.net> Cc: Jason Cooper <jason@lakedaemon.net> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/1535432006-2304-1-git-send-email-jia.he@hxt-semitech.com
2018-09-06dm raid: fix reshape race on small devicesHeinz Mauelshagen
Loading a new mapping table, the dm-raid target's constructor retrieves the volatile reshaping state from the raid superblocks. When the new table is activated in a following resume, the actual reshape position is retrieved. The reshape driven by the previous mapping can already have finished on small and/or fast devices thus updating raid superblocks about the new raid layout. This causes the actual array state (e.g. stripe size reshape finished) to be inconsistent with the one in the new mapping, causing hangs with left behind devices. This race does not occur with usual raid device sizes but with small ones (e.g. those created by the lvm2 test suite). Fix by no longer transferring stale/inconsistent raid_set state during preresume. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06block: bfq: swap puts in bfqg_and_blkg_putKonstantin Khlebnikov
Fix trivial use-after-free. This could be last reference to bfqg. Fixes: 8f9bebc33dd7 ("block, bfq: access and cache blkg data only when safe") Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>