Age | Commit message (Collapse) | Author |
|
This patch fixes a bug associated with iscsit_reset_np_thread()
that can occur during parallel configfs rmdir of a single iscsi_np
used across multiple iscsi-target instances, that would result in
hung task(s) similar to below where configfs rmdir process context
was blocked indefinately waiting for iscsi_np->np_restart_comp
to finish:
[ 6726.112076] INFO: task dcp_proxy_node_:15550 blocked for more than 120 seconds.
[ 6726.119440] Tainted: G W O 4.1.26-3321 #2
[ 6726.125045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6726.132927] dcp_proxy_node_ D ffff8803f202bc88 0 15550 1 0x00000000
[ 6726.140058] ffff8803f202bc88 ffff88085c64d960 ffff88083b3b1ad0 ffff88087fffeb08
[ 6726.147593] ffff8803f202c000 7fffffffffffffff ffff88083f459c28 ffff88083b3b1ad0
[ 6726.155132] ffff88035373c100 ffff8803f202bca8 ffffffff8168ced2 ffff8803f202bcb8
[ 6726.162667] Call Trace:
[ 6726.165150] [<ffffffff8168ced2>] schedule+0x32/0x80
[ 6726.170156] [<ffffffff8168f5b4>] schedule_timeout+0x214/0x290
[ 6726.176030] [<ffffffff810caef2>] ? __send_signal+0x52/0x4a0
[ 6726.181728] [<ffffffff8168d7d6>] wait_for_completion+0x96/0x100
[ 6726.187774] [<ffffffff810e7c80>] ? wake_up_state+0x10/0x10
[ 6726.193395] [<ffffffffa035d6e2>] iscsit_reset_np_thread+0x62/0xe0 [iscsi_target_mod]
[ 6726.201278] [<ffffffffa0355d86>] iscsit_tpg_disable_portal_group+0x96/0x190 [iscsi_target_mod]
[ 6726.210033] [<ffffffffa0363f7f>] lio_target_tpg_store_enable+0x4f/0xc0 [iscsi_target_mod]
[ 6726.218351] [<ffffffff81260c5a>] configfs_write_file+0xaa/0x110
[ 6726.224392] [<ffffffff811ea364>] vfs_write+0xa4/0x1b0
[ 6726.229576] [<ffffffff811eb111>] SyS_write+0x41/0xb0
[ 6726.234659] [<ffffffff8169042e>] system_call_fastpath+0x12/0x71
It would happen because each iscsit_reset_np_thread() sets state
to ISCSI_NP_THREAD_RESET, sends SIGINT, and then blocks waiting
for completion on iscsi_np->np_restart_comp.
However, if iscsi_np was active processing a login request and
more than a single iscsit_reset_np_thread() caller to the same
iscsi_np was blocked on iscsi_np->np_restart_comp, iscsi_np
kthread process context in __iscsi_target_login_thread() would
flush pending signals and only perform a single completion of
np->np_restart_comp before going back to sleep within transport
specific iscsit_transport->iscsi_accept_np code.
To address this bug, add a iscsi_np->np_reset_count and update
__iscsi_target_login_thread() to keep completing np->np_restart_comp
until ->np_reset_count has reached zero.
Reported-by: Gary Guo <ghg@datera.io>
Tested-by: Gary Guo <ghg@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A large number of ext4 bug fixes and cleanups for v4.13"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix copy paste error in ext4_swap_extents()
ext4: fix overflow caused by missing cast in ext4_resize_fs()
ext4, project: expand inode extra size if possible
ext4: cleanup ext4_expand_extra_isize_ea()
ext4: restructure ext4_expand_extra_isize
ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize
ext4: make xattr inode reads faster
ext4: inplace xattr block update fails to deduplicate blocks
ext4: remove unused mode parameter
ext4: fix warning about stack corruption
ext4: fix dir_nlink behaviour
ext4: silence array overflow warning
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
ext4: release discard bio after sending discard commands
ext4: convert swap_inode_data() over to use swap() on most of the fields
ext4: error should be cleared if ea_inode isn't added to the cache
ext4: Don't clear SGID when inheriting ACLs
ext4: preserve i_mode if __ext4_set_acl() fails
ext4: remove unused metadata accounting variables
ext4: correct comment references to ext4_ext_direct_IO()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"This series is larger than I would like to submit for -rc4. My
original intent were to sent it to either -rc2 or -rc3. Unfortunately,
due to my vacations, I got a lot of pending stuff after my return, and
had to do some biz trips, with prevented me to send this earlier.
Several fixes:
- some fixes at atomisp staging driver
- several gcc 7 warning fixes
- cleanup media SVG files, in order to fix PDF build on some distros
- fix random Kconfig build of venus driver
- some fixes for the venus driver
- some changes from semaphone to mutex in ngene's driver
- some locking fixes at dib0700 driver
- several fixes on ngene's driver and frontends to make it properly
support some new boards added on Kernel 4.13
- some fixes to CEC drivers
- omap_vout: vrfb: convert to dmaengine
- docs-rst: document EBUSY for VIDIOC_S_FMT
Please notice that the big diffstat changes here are at the SVG files.
Visually, the images look the same, but the file size is now a lot
smaller than before, and they don't use some XML tags that would cause
them to be badly parsed by some ImageMagick versions, or to require a
lot of memory by TeTex, with would break PDF output on some
distributions"
* tag 'media/v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (68 commits)
media: atomisp2: array underflow in imx_enum_frame_size()
media: atomisp2: array underflow in ap1302_enum_frame_size()
media: atomisp2: Array underflow in atomisp_enum_input()
media: platform: davinci: drop VPFE_CMD_S_CCDC_RAW_PARAMS
media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
media: venus: don't abuse dma_alloc for non-DMA allocations
media: venus: hfi: fix error handling in hfi_sys_init_done()
media: venus: fix compile-test build on non-qcom ARM platform
media: venus: mark PM functions as __maybe_unused
media: cec-notifier: small improvements
media: pulse8-cec: persistent_config should be off by default
media: cec: cec_transmit_attempt_done: ignore CEC_TX_STATUS_MAX_RETRIES
media: staging: atomisp: array underflow in ioctl
media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
media: svg: avoid too long lines
media: svg files: simplify files
media: selection.svg: simplify the SVG file
media: vimc: set id_table for platform drivers
media: staging: atomisp: disable warnings with cc-disable-warning
media: davinci: variable 'common' set but not used
...
|
|
Pull KVM fixes from Radim Krčmář:
"ARM:
- Yet another race with VM destruction plugged
- A set of small vgic fixes
x86:
- Preserve pending INIT
- RCU fixes in paravirtual async pf, VM teardown, and VMXOFF
emulation
- nVMX interrupt injection and dirty tracking fixes
- initialize to make UBSAN happy"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg
KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"
KVM: nVMX: mark vmcs12 pages dirty on L2 exit
kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown
KVM: nVMX: do not pin the VMCS12
KVM: avoid using rcu_dereference_protected
KVM: X86: init irq->level in kvm_pv_kick_cpu_op
KVM: X86: Fix loss of pending INIT due to race
KVM: async_pf: make rcu irq exit if not triggered from idle task
KVM: nVMX: fixes to nested virt interrupt injection
KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12
KVM: arm/arm64: Handle hva aging while destroying the vm
KVM: arm/arm64: PMU: Fix overflow interrupt injection
KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capability
|
|
The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.
Fixes: 17d2f88f92ce ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Pull ceph fixes from Ilya Dryomov:
"A bunch of fixes and follow-ups for -rc1 Luminous patches: issues with
->reencode_message() and last minute RADOS semantic changes in
v12.1.2"
* tag 'ceph-for-4.13-rc4' of git://github.com/ceph/ceph-client:
libceph: make RECOVERY_DELETES feature create a new interval
libceph: upmap semantic changes
crush: assume weight_set != null imples weight_set_size > 0
libceph: fallback for when there isn't a pool-specific choose_arg
libceph: don't call ->reencode_message() more than once per message
libceph: make encode_request_*() work with r_mempool requests
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Now we hit the usual ASoC-fix-flood in the middle of release.
Most of the changes are trivial and device-specific, while one
significant change is the fix for unbalanced of_graph_*() refcounts.
This involved a change in the graph API itself that had been a bit
messy"
* tag 'sound-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix speaker output from VAIO VPCL14M1R
device property: Fix usecount for of_graph_get_port_parent()
ASoC: rt5665: fix wrong register for bclk ratio control
ASoC: Intel: Use MCLK instead of BLCK as the sysclock for RT5514 codec on kabylake platform
ASoC: Intel: Enabling ASRC for RT5663 codec on kabylake platform
ASoC: codecs: msm8916-analog: fix DIG_CLK_CTL_RXD3_CLK_EN define
ASoC: Intel: Skylake: Fix missing sentinels in sst_acpi_mach
ASoC: sh: hac: add missing "int ret"
ASoC: samsung: odroid: Fix EPLL frequency values
ASoC: sgtl5000: Use snd_soc_kcontrol_codec()
ASoC: rt5665: fix GPIO6 pin function define
ASoC: ux500: Restore platform DAI assignments
ASoC: fix pcm-creation regression
ASoC: do not close shared backend dailink
ASoC: pxa: SND_PXA2XX_SOC should depend on HAS_DMA
ASoC: Intel: Skylake: Fix default dma_buffer_size
ASoC: rt5663: Update the HW default values based on the shipping version
ASoC: imx-ssi: add check on platform_get_irq return value
|
|
Pure refactor. This helper will be required in the xmit timer fix
later in the patch series. (Because the TLP logic will want to make
this calculation.)
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull VFIO fixes from Alex Williamson:
- SPAPR/EEH config build fix (Murilo Opsfelder Araujo)
- Fix possible device lock deadlock (Alex Williamson)
- Correctly size integrated endpoint PCIe capabilities (Alex
Williamson)
* tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio:
vfio/pci: Fix handling of RC integrated endpoint PCIe capability size
vfio/pci: Use pci_try_reset_function() on initial open
include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH
|
|
Merge misc fixes from Andrew Morton:
"15 fixes"
[ This does not merge the "fortify: use WARN instead of BUG for now"
patch, which needs a bit of extra work to build cleanly with all
configurations. Arnd is on it. - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
ocfs2: don't clear SGID when inheriting ACLs
mm: allow page_cache_get_speculative in interrupt context
userfaultfd: non-cooperative: flush event_wqh at release time
ipc: add missing container_of()s for randstruct
cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
userfaultfd_zeropage: return -ENOSPC in case mm has gone
mm: take memory hotplug lock within numa_zonelist_order_handler()
mm/page_io.c: fix oops during block io poll in swapin path
zram: do not free pool->size_class
kthread: fix documentation build warning
kasan: avoid -Wmaybe-uninitialized warning
userfaultfd: non-cooperative: notify about unmap of destination during mremap
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
pid: kill pidhash_size in pidhash_init()
mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM Fixes for v4.13-rc4
- Yet another race with VM destruction plugged
- A set of small vgic fixes
|
|
Pull NFS client fixes from Anna Schumaker:
"Two fixes from Trond this time, now that he's back from his vacation.
The first is a stable fix for the EXCHANGE_ID issue on the mailing
list, and the other fixes a double-free situation that he found at the
same time.
Stable fix:
- Fix EXCHANGE_ID corrupt verifier issue
Other fix:
- Fix double frees in nfs4_test_session_trunk()"
* tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Fix double frees in nfs4_test_session_trunk()
NFSv4: Fix EXCHANGE_ID corrupt verifier issue
|
|
Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI
handler.
The bug was introduced by commit 2947ba054a4d ("x86/mm/gup: Switch GUP
to the generic get_user_page_fast() implementation").
The original x86 __get_user_page_fast used plain get_page() or
page_ref_add(). However, the generic __get_user_page_fast uses
page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()).
There is no reason to prevent page_cache_get_speculative from using in
interrupt context. According to the author, putting a BUG_ON there is
just because the code is not verifying correctness of interrupt races.
I did some tests in interrupt context. There is no issue found.
Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative().
Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com
Fixes: 2947ba054a4d ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In codepaths that use the begin/retry interface for reading
mems_allowed_seq with irqs disabled, there exists a race condition that
stalls the patch process after only modifying a subset of the
static_branch call sites.
This problem manifested itself as a deadlock in the slub allocator,
inside get_any_partial. The loop reads mems_allowed_seq value (via
read_mems_allowed_begin), performs the defrag operation, and then
verifies the consistency of mem_allowed via the read_mems_allowed_retry
and the cookie returned by xxx_begin.
The issue here is that both begin and retry first check if cpusets are
enabled via cpusets_enabled() static branch. This branch can be
rewritted dynamically (via cpuset_inc) if a new cpuset is created. The
x86 jump label code fully synchronizes across all CPUs for every entry
it rewrites. If it rewrites only one of the callsites (specifically the
one in read_mems_allowed_retry) and then waits for the
smp_call_function(do_sync_core) to complete while a CPU is inside the
begin/retry section with IRQs off and the mems_allowed value is changed,
we can hang.
This is because begin() will always return 0 (since it wasn't patched
yet) while retry() will test the 0 against the actual value of the seq
counter.
The fix is to use two different static keys: one for begin
(pre_enable_key) and one for retry (enable_key). In cpuset_inc(), we
first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
always return a valid seqcount if are enabling cpusets. Similarly, when
disabling cpusets via cpuset_dec(), we first ensure that callers of
cpuset_mems_allowed_retry() will start ignoring the seqcount value
before we let cpuset_mems_allowed_begin() return 0.
The relevant stack traces of the two stuck threads:
CPU: 1 PID: 1415 Comm: mkdir Tainted: G L 4.9.36-00104-g540c51286237 #4
Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
task: ffff8817f9c28000 task.stack: ffffc9000ffa4000
RIP: smp_call_function_many+0x1f9/0x260
Call Trace:
smp_call_function+0x3b/0x70
on_each_cpu+0x2f/0x90
text_poke_bp+0x87/0xd0
arch_jump_label_transform+0x93/0x100
__jump_label_update+0x77/0x90
jump_label_update+0xaa/0xc0
static_key_slow_inc+0x9e/0xb0
cpuset_css_online+0x70/0x2e0
online_css+0x2c/0xa0
cgroup_apply_control_enable+0x27f/0x3d0
cgroup_mkdir+0x2b7/0x420
kernfs_iop_mkdir+0x5a/0x80
vfs_mkdir+0xf6/0x1a0
SyS_mkdir+0xb7/0xe0
entry_SYSCALL_64_fastpath+0x18/0xad
...
CPU: 2 PID: 1 Comm: init Tainted: G L 4.9.36-00104-g540c51286237 #4
Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
task: ffff8818087c0000 task.stack: ffffc90000030000
RIP: int3+0x39/0x70
Call Trace:
<#DB> ? ___slab_alloc+0x28b/0x5a0
<EOE> ? copy_process.part.40+0xf7/0x1de0
__slab_alloc.isra.80+0x54/0x90
copy_process.part.40+0xf7/0x1de0
copy_process.part.40+0xf7/0x1de0
kmem_cache_alloc_node+0x8a/0x280
copy_process.part.40+0xf7/0x1de0
_do_fork+0xe7/0x6c0
_raw_spin_unlock_irq+0x2d/0x60
trace_hardirqs_on_caller+0x136/0x1d0
entry_SYSCALL_64_fastpath+0x5/0xad
do_syscall_64+0x27/0x350
SyS_clone+0x19/0x20
do_syscall_64+0x60/0x350
entry_SYSCALL64_slow_path+0x25/0x25
Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com
Fixes: 46e700abc44c ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
Signed-off-by: Dima Zavin <dmitriyz@waymo.com>
Reported-by: Cliff Spradlin <cspradlin@waymo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The kerneldoc comment for kthread_create() had an incorrect argument
name, leading to a warning in the docs build.
Correct it, and make one more small step toward a warning-free build.
Link: http://lkml.kernel.org/r/20170724135916.7f486c6f@lwn.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
leaving stale TLB entries
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.
He described the race as follows:
CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]
user writes using cached RW PTE
...
try_to_unmap_flush()
The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.
For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access. For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB. However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.
Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption. In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch. Other users such as gup as a race with
reclaim looks just at PTEs. huge page variants should be ok as they
don't race with reclaim. mincore only looks at PTEs. userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.
Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected. His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.
[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org> [v4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During teardown, accesses to memslots and buses are using
rcu_dereference_protected with an always-true condition because
these accesses are done outside the usual mutexes. This
is because the last reference is gone and there cannot be any
concurrent modifications, but rcu_dereference_protected is
ugly and unobvious.
Instead, check the refcount in kvm_get_bus and __kvm_memslots.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Currently when WoL is supported but disabled, ethtool reports:
"Supports Wake-on: d".
Fix the indication of Wol support, so that the indication
remains "g" all the time if the NIC supports WoL.
Tested:
As accepted, when NIC supports WoL- ethtool reports:
Supports Wake-on: g
Wake-on: d
when NIC doesn't support WoL- ethtool reports:
Supports Wake-on: d
Wake-on: d
Fixes: 14c07b1358ed ("mlx4: Wake on LAN support")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All timings in nand_sdr_timings are expressed in picoseconds but some
of them may not fit in an u32.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes: 204e7ecd47e2 ("mtd: nand: Add a few more timings to nand_sdr_timings")
Reported-by: Alexander Dahl <ada@thorsis.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Alexander Dahl <ada@thorsis.com>
Tested-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
|
|
The implementation of PCI workarounds may require that the device is reset
from its probe function. This implies that the PCI device lock is already
held, and makes calling pci_reset_function() impossible (since it will
itself try to take that lock).
Add pci_reset_function_locked(), which is the equivalent of
pci_reset_function(), except that it requires the PCI device lock to be
already held by the caller.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[bhelgaas: folded in fix for conflict with 52354b9d1f46 ("PCI: Remove
__pci_dev_reset() and pci_dev_reset()")]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.11: 52354b9d1f46: PCI: Remove __pci_dev_reset() and pci_dev_reset()
Cc: stable@vger.kernel.org # 4.11
|
|
__user should be used to identify user pointers and not __u64
variables containing pointers.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
|
|
Many PTP drivers required to perform some asynchronous or periodic work,
like periodically handling PHC counter overflow or handle delayed timestamp
for RX/TX network packets. In most of the cases, such work is implemented
using workqueues. Unfortunately, Kernel workqueues might introduce
significant delay in work scheduling under high system load and on -RT,
which could cause misbehavior of PTP drivers due to internal counter
overflow, for example, and there is no way to tune its execution policy and
priority manuallly.
Hence, The kthread_worker can be used insted of workqueues, as it create
separte named kthread for each worker and its its execution policy and
priority can be configured using chrt tool.
This prblem was reported for two drivers TI CPSW CPTS and dp83640, so
instead of modifying each of these driver it was proposed to add PTP
auxiliary worker to the PHC subsystem.
The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker
and kthread_delayed_work and introduces two new PHC subsystem APIs:
- long (*do_aux_work)(struct ptp_clock_info *ptp) callback in
ptp_clock_info structure, which driver should assign if it require to
perform asynchronous or periodic work. Driver should return the delay of
the PTP next auxiliary work scheduling time (>=0) or negative value in case
further scheduling is not required.
- int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which
allows schedule PTP auxiliary work.
The name of kthread_worker thread corresponds PTP PHC device name "ptp%d".
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
changed to be asynchronous by commit 8d89bd70bc939. If we interrrupt
the call to rpc_wait_for_completion_task(), we can therefore end up
transmitting random stack contents in lieu of the verifier.
Fixes: 8d89bd70bc939 ("NFS setup async exchange_id")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This is needed so that the OSDs can regenerate the missing set at the
start of a new interval where support for recovery deletes changed.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
There is now a fallback to a choose_arg index of -1 if there isn't
a pool-specific choose_arg set. If you create a per-pool weight-set,
that works for that pool. Otherwise we try the compat/default one. If
that doesn't exist either, then we use the normal CRUSH weights.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
'asoc/fix/msm8916', 'asoc/fix/multi-pcm', 'asoc/fix/of-graph' and 'asoc/fix/pxa' into asoc-linus
|
|
Pull networking fixes from David Miller:
1) Handle notifier registry failures properly in tun/tap driver, from
Tonghao Zhang.
2) Fix bpf verifier handling of subtraction bounds and add a testcase
for this, from Edward Cree.
3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.
4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
from Cong Wang.
5) Fix SElinux regression due to recent UDP optimizations, from Paolo
Abeni.
6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
paths, fix from Stefano Brivio.
7) Fix some mem leaks in dccp, from Xin Long.
8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
Bergmann.
9) Mac address length check and buffer size fixes from Cong Wang.
10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.
11) Fix return value when copy_from_user() fails in
bpf_prog_get_info_by_fd(), from Daniel Borkmann.
12) Handle PHY_HALTED properly in phy library state machine, from
Florian Fainelli.
13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.
14) Fix truesize calculation in virtio_net which led to performance
regressions, from Michael S Tsirkin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
samples/bpf: fix bpf tunnel cleanup
udp6: fix jumbogram reception
ppp: Fix a scheduling-while-atomic bug in del_chan
Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
virtio_net: fix truesize for mergeable buffers
mv643xx_eth: fix of_irq_to_resource() error check
MAINTAINERS: Add more files to the PHY LIBRARY section
ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
sunhme: fix up GREG_STAT and GREG_IMASK register offsets
bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
tcp: avoid bogus gcc-7 array-bounds warning
net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
bpf: don't indicate success when copy_from_user fails
udp6: fix socket leak on early demux
net: thunderx: Fix BGX transmit stall due to underflow
Revert "vhost: cache used event for better performance"
team: use a larger struct for mac address
net: check dev->addr_len for dev_set_mac_address()
phy: bcm-ns-usb3: fix MDIO_BUS dependency
...
|
|
Since commit 67a51780aebb ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
<ingress datagram len> % (1<<16)
This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.
Fixes: 67a51780aebb ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Two notable fixes.
- While adding NUMA affinity support to unbound workqueues, the
assumption that an unbound workqueue with max_active == 1 is
ordered was broken.
The plan was to use explicit alloc_ordered_workqueue() for those
cases. Unfortunately, I forgot to update the documentation properly
and we grew a handful of use cases which depend on that assumption.
While we want to convert them to alloc_ordered_workqueue(), we
don't really lose anything by enforcing ordered execution on
unbound max_active == 1 workqueues and it doesn't make sense to
risk subtle bugs. Restore the assumption.
- Workqueue assumes that CPU <-> NUMA node mapping remains static.
This is a general assumption - we don't have any synchronization
mechanism around CPU <-> node mapping. Unfortunately, powerpc may
change the mapping dynamically leading to crashes. Michael added a
workaround so that we at least don't crash while powerpc hotplug
code gets updated"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Work around edge cases for calc of pool's cpumask
workqueue: implicit ordered attribute should be overridable
workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Dan found a really old bug where libata hotplug code wasn't sanitizing
index value from userland and may end up indexing with a negative
number. It is scary but fortunately can only be triggered by root.
Other than that, minor fixes"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: fix a couple of doc build warnings
libata: array underflow in ata_find_dev()
ata: sata_rcar: add gen[23] fallback compatibility strings
libata: remove unused rc in ata_eh_handle_port_resume
libata: Cleanup ata_read_log_page()
ata: fix gemini Kconfig dependencies
|
|
Hopefully making clear that it is not needed for new drivers.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Up until recently sync_file were create to export a single dma-fence to
userspace, and so we could canabalise a bit insie dma-fence to mark
whether or not we had enable polling for the sync_file itself. However,
with the advent of syncobj, we do allow userspace to create multiple
sync_files for a single dma-fence. (Similarly, that the sw-sync
validation framework also started returning multiple sync-files wrapping
a single dma-fence for a syncpt also triggering the problem.)
This patch reverts my suggestion in commit e24165537312
("dma-buf/sync_file: only enable fence signalling on poll()") to use a
single bit in the shared dma-fence and restores the sync_file->flags for
tracking the bits individually.
Reported-by: Gustavo Padovan <gustavo.padovan@collabora.com>
Fixes: f1e8c67123cf ("dma-buf/sw-sync: Use an rbtree to sort fences in the timeline")
Fixes: e9083420bbac ("drm: introduce sync objects (v4)")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: dri-devel@lists.freedesktop.org
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170728212951.7818-1-chris@chris-wilson.co.uk
(cherry picked from commit db1fc97ca0c0d3fdeeadf314d99a26188438940a)
|
|
Two variables in ext4_inode_info, i_reserved_meta_blocks and
i_allocated_meta_blocks, are unused. Removing them saves a little
memory per in-memory inode and cleans up clutter in several tracepoints.
Adjust tracepoint output from ext4_alloc_da_blocks() for consistency
and fix a typo and whitespace near these changes.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Two patches addressing build warnings caused by inconsistent kernel
doc comments"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/wait: Clean up some documentation warnings
sched/core: Fix some documentation build warnings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"Fix for a regression caused by the conversion of x86 to the generic
hotplug code.
Instead of doing a plain single line revert, this adds a pile of
comments so the semantics of the force argument are clear"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
|
|
The work-around for Qualcomm Technologies QDF2400 Erratum 44 hinges on a
global variable defined in the pl011 driver. The ACPI SPCR parsing code
determines whether the work-around is needed, and if so, it changes the
console name from "pl011" to "qdf2400_e44". The expectation is that
the pl011 driver will implement the work-around when it sees the console
name. The global variable qdf2400_e44_present is set when that happens.
The problem is that work-around needs to be enabled when the pl011
driver probes, not when the console name is queried. However, sbsa_probe()
is called before pl011_console_match(). The work-around appeared to work
previously because the default console on QDF2400 platforms was always
ttyAMA1. The first time sbsa_probe() is called (for ttyAMA0),
qdf2400_e44_present is still false. Then pl011_console_match() is called,
and it sets qdf2400_e44_present to true. All subsequent calls to
sbsa_probe() enable the work-around.
The solution is to move the global variable into spcr.c and let the
pl011 driver query it during probe time. This works because all QDF2400
platforms require SPCR, so parse_spcr() will always be called.
pl011_console_match still checks for the "qdf2400_e44" console name,
but it doesn't do anything else special.
Fixes: 5a0722b898f8 ("tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44")
Tested-by: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
sk reference is retrieved and used, but the relevant reference
count is leaked and the socket destructor is never called.
Beyond leaking the sk memory, if there are pending UDP packets
in the receive queue, even the related accounted memory is leaked.
In the long run, this will cause persistent forward allocation errors
and no UDP skbs (both ipv4 and ipv6) will be able to reach the
user-space.
Fix this by explicitly accessing the early demux reference before
the lookup, and properly decreasing the socket reference count
after usage.
Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
the now obsoleted comment about "socket cache".
The newly added code is derived from the current ipv4 code for the
similar path.
v1 -> v2:
fixed the __udp6_lib_rcv() return code for resubmission,
as suggested by Eric
Reported-by: Sam Edwards <CFSworks@gmail.com>
Reported-by: Marc Haber <mh+netdev@zugschlus.de>
Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull KVM fixes from Paolo Bonzini:
"s390:
- SRCU fix
PPC:
- host crash fixes
x86:
- bugfixes, including making nested posted interrupts really work
Generic:
- tweaks to kvm_stat and to uevents"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Fix reentrancy issues with preempt notifiers
tools/kvm_stat: add '-f help' to get the available event list
tools/kvm_stat: use variables instead of hard paths in help output
KVM: nVMX: Fix loss of L2's NMI blocking state
KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode
x86: irq: Define a global vector for nested posted interrupts
KVM: x86: do mask out upper bits of PAE CR3
KVM: make pid available for uevents without debugfs
KVM: s390: take srcu lock when getting/setting storage keys
KVM: VMX: remove unused field
KVM: PPC: Book3S HV: Fix host crash on changing HPT size
KVM: PPC: Book3S HV: Enable TM before accessing TM registers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I'd been collecting these whilst we debugged a CPU hotplug failure,
but we ended up diagnosing that one to tglx, who has taken a fix via
the -tip tree separately.
We're seeing some NFS issues that we haven't gotten to the bottom of
yet, and we've uncovered some issues with our backtracing too so there
might be another fixes pull before we're done.
Summary:
- Ensure we have a guard page after the kernel image in vmalloc
- Fix incorrect prefetch stride in copy_page
- Ensure irqs are disabled in die()
- Fix for event group validation in QCOM L2 PMU driver
- Fix requesting of PMU IRQs on AMD Seattle
- Minor cleanups and fixes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mmu: Place guard page after mapping of kernel image
drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
arm64: sysreg: Fix unprotected macro argmuent in write_sysreg
perf: qcom_l2: fix column exclusion check
arm64/lib: copy_page: use consistent prefetch stride
arm64/numa: Drop duplicate message
perf: Convert to using %pOF instead of full_name
arm64: Convert to using %pOF instead of full_name
arm64: traps: disable irq in die()
arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
arm64: uaccess: Remove redundant __force from addr cast in __range_ok
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a few DM integrity fixes that improve performance. One that address
inefficiencies in the on-disk journal device layout. Another that
makes use of the block layer's on-stack plugging when writing the
journal.
- a dm-bufio fix for the blk_status_t conversion that went in during
the merge window.
- a few DM raid fixes that address correctness when suspending the
device and a validation fix for validation that occurs during device
activation.
- a couple DM zoned target fixes. Important one being the fix to not
use GFP_KERNEL in the IO path due to concerns about deadlock in
low-memory conditions (e.g. swap over a DM zoned device, etc).
- a DM DAX device fix to make sure dm_dax_flush() is called if the
underlying DAX device is operating as a write cache.
* tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm, dax: Make sure dm_dax_flush() is called if device supports it
dm verity fec: fix GFP flags used with mempool_alloc()
dm zoned: use GFP_NOIO in I/O path
dm zoned: remove test for impossible REQ_OP_FLUSH conditions
dm raid: bump target version
dm raid: avoid mddev->suspended access
dm raid: fix activation check in validate_raid_redundancy()
dm raid: remove WARN_ON() in raid10_md_layout_to_format()
dm bufio: fix error code in dm_bufio_write_dirty_buffers()
dm integrity: test for corrupted disk format during table load
dm integrity: WARN_ON if variables representing journal usage get out of sync
dm integrity: use plugging when writing the journal
dm integrity: fix inefficient allocation of journal space
|
|
Pull block fixes from Jens Axboe:
"A small collection of fixes that should go into this series. This
contains:
- NVMe pull request from Christoph, with various fixes for nvme
proper and nvme-fc.
- disable runtime PM for blk-mq for now.
With scsi now defaulting to using blk-mq, this reared its head as
an issue. Longer term we'll fix up runtime PM for blk-mq, for now
just disable it to prevent a hang on laptop resume for some folks.
- blk-mq CPU <-> hw queue map fix from Christoph.
- xen/blkfront pull request from Konrad, with two small fixes for the
blkfront driver.
- a few fixups for nbd from Joseph.
- a stable fix for pblk from Javier"
* 'for-linus' of git://git.kernel.dk/linux-block:
lightnvm: pblk: advance bio according to lba index
nvme: validate admin queue before unquiesce
nbd: clear disconnected on reconnect
nvme-pci: fix HMB size calculation
nvme-fc: revise TRADDR parsing
nvme-fc: address target disconnect race conditions in fcp io submit
nvme: fabrics commands should use the fctype field for data direction
nvme: also provide a UUID in the WWID sysfs attribute
xen/blkfront: always allocate grants first from per-queue persistent grants
xen-blkfront: fix mq start/stop race
blk-mq: map queues to all present CPUs
block: disable runtime-pm for blk-mq
xen-blkfront: Fix handling of non-supported operations
nbd: only set sndtimeo if we have a timeout set
nbd: take tx_lock before disconnecting
nbd: allow multiple disconnects to be sent
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.13-rc1.
I have also included a couple of cleanup patches in this pull request
for OMAP2+, related to the omap_hsmmc driver. The reason is because of
the changes are also depending on OMAP SoC specific code, so this
simplifies how to deal with this.
Summary:
MMC host:
- sunxi: Correct time phase settings
- omap_hsmmc: Clean up some dead code
- dw_mmc: Fix message printed for deprecated num-slots DT binding
- dw_mmc: Fix DT documentation"
* tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Documentation: dw-mshc: deprecate num-slots
mmc: dw_mmc: fix the wrong condition check of getting num-slots from DT
mmc: host: omap_hsmmc: remove unused platform callbacks
ARM: OMAP2+: hsmmc.c: Remove dead code
mmc: sunxi: Keep default timing phase settings for new timing mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These are two fixups for the suspend-to-idle handling in the ACPI
subsystem after recent changes in that area and two simple fixes of
the ACPI NUMA code.
Specifics:
- Add an ACPI module parameter to allow users to override the new
default behavior on some systems where the EC GPE is not disabled
during suspend-to-idle in case the EC on their systems generates
excessive wakeup events and they want to sacrifice some
functionality (like power button wakeups) for extra battery life
while suspended (Rafael Wysocki).
- Fix flushing of the outstanding EC work in the ACPI core
suspend-to-idle code (Rafael Wysocki).
- Add a missing include and fix a messed-up comment in the ACPI NUMA
code (Ross Zwisler)"
* tag 'acpi-4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: NUMA: Fix typo in the full name of SRAT
ACPI: NUMA: add missing include in acpi_numa.h
ACPI / PM / EC: Flush all EC work in acpi_freeze_sync()
ACPI / EC: Add parameter to force disable the GPE on suspend
|
|
In order to mark relevant fields while setting the MTPPS register
add field select. Otherwise it can cause a misconfiguration in
firmware.
Fixes: ee7f12205abc ('net/mlx5e: Implement 1PPS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Fix miscalculation in reserved_at_1a0 field.
Fixes: ee7f12205abc ('net/mlx5e: Implement 1PPS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
That commit was part of the changes moving x86 to the generic CPU hotplug
interrupt migration code. The force flag was required on x86 before the
hierarchical irqdomain rework, but invoking set_affinity() with force=true
stayed and had no side effects.
At some point in the past, the force flag got repurposed to support the
exynos timer interrupt affinity setting to a not yet online CPU, so the
interrupt controller callback does not verify the supplied affinity mask
against cpu_online_mask.
Setting the flag in the CPU hotplug code causes the cpu online masking to
be blocked on these irq controllers and results in potentially affining an
interrupt to the CPU which is unplugged, i.e. instead of moving it away,
it's just reassigned to it.
As the force flags is not longer needed on x86, it's safe to revert that
patch so the ARM irqchips which use the force flag work again.
Add comments to that effect, so this won't happen again.
Note: The online mask handling should be done in the generic code and the
force flag and the masking in the irq chips removed all together, but
that's not a change possible for 4.13.
Fixes: 77f85e66aa8b ("genirq/cpuhotplug: Set force affinity flag on hotplug migration")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707271217590.3109@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Since the PMU register interface is banked per CPU, CPU PMU interrrupts
cannot be handled by a CPU other than the one with the PMU asserting the
interrupt. This means that migrating PMU SPIs, as we do during a CPU
hotplug operation doesn't make any sense and can lead to the IRQ being
disabled entirely if we route a spurious IRQ to the new affinity target.
This has been observed in practice on AMD Seattle, where CPUs on the
non-boot cluster appear to take a spurious PMU IRQ when coming online,
which is routed to CPU0 where it cannot be handled.
This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
affinity prior to requesting them, ensuring that they cannot
be migrated during hotplug events. This interacts badly with the DB8500
erratum workaround that ping-pongs the interrupt affinity from the handler,
so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
to be overridden in the platdata.
Fixes: 3cf7ee98b848 ("drivers/perf: arm_pmu: move irq request/free into probe")
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Commit b1f5bfc27a19 ("sctp: don't dereference ptr before leaving
_sctp_walk_{params, errors}()") tried to fix the issue that it
may overstep the chunk end for _sctp_walk_{params, errors} with
'chunk_end > offset(length) + sizeof(length)'.
But it introduced a side effect: When processing INIT, it verifies
the chunks with 'param.v == chunk_end' after iterating all params
by sctp_walk_params(). With the check 'chunk_end > offset(length)
+ sizeof(length)', it would return when the last param is not yet
accessed. Because the last param usually is fwdtsn supported param
whose size is 4 and 'chunk_end == offset(length) + sizeof(length)'
This is a badly issue even causing sctp couldn't process 4-shakes.
Client would always get abort when connecting to server, due to
the failure of INIT chunk verification on server.
The patch is to use 'chunk_end <= offset(length) + sizeof(length)'
instead of 'chunk_end < offset(length) + sizeof(length)' for both
_sctp_walk_params and _sctp_walk_errors.
Fixes: b1f5bfc27a19 ("sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The CPU hotplug related code of this driver can be simplified by:
1) Consolidating the callbacks into a single state. The CPU thread can be
torn down on the CPU which goes offline. There is no point in delaying
that to the CPU dead state
2) Let the core code invoke the online/offline callbacks and remove the
extra for_each_online_cpu() loops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The CPU hotplug related code of this driver can be simplified by:
1) Consolidating the callbacks into a single state. The CPU thread can be
torn down on the CPU which goes offline. There is no point in delaying
that to the CPU dead state
2) Let the core code invoke the online/offline callbacks and remove the
extra for_each_online_cpu() loops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|