Age | Commit message (Collapse) | Author |
|
The per-cpu bpf_redirect_info is shared among all skb_do_redirect()
and BPF redirect helpers. Callers on RX path are all in BH context,
disabling preemption is not sufficient to prevent BH interruption.
In production, we observed strange packet drops because of the race
condition between LWT xmit and TC ingress, and we verified this issue
is fixed after we disable BH.
Although this bug was technically introduced from the beginning, that
is commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure"),
at that time call_rcu() had to be call_rcu_bh() to match the RCU context.
So this patch may not work well before RCU flavor consolidation has been
completed around v5.0.
Update the comments above the code too, as call_rcu() is now BH friendly.
Signed-off-by: Dongdong Wang <wangdongdong.6@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/bpf/20201205075946.497763-1-xiyou.wangcong@gmail.com
|
|
If force_zc is set, we should exit out with an error, not fall back to
copy mode.
Fixes: 921b68692abb ("xsk: Enable sharing of dma mappings")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/1607077277-41995-1-git-send-email-zhangchangzhong@huawei.com
|
|
Modify the tx writeable condition from the queue is not full to the
number of present tx queues is less than the half of the total number
of queues. Because the tx queue not full is a very short time, this will
cause a large number of EPOLLOUT events, and cause a large number of
process wake up.
Fixes: 35fcde7f8deb ("xsk: support for Tx")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/508fef55188d4e1160747ead64c6dcda36735880.1606555939.git.xuanzhuo@linux.alibaba.com
|
|
datagram_poll will judge the current socket status (EPOLLIN, EPOLLOUT)
based on the traditional socket information (eg: sk_wmem_alloc), but
this does not apply to xsk. So this patch uses sock_poll_wait instead of
datagram_poll, and the mask is calculated by xsk_poll.
Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/e82f4697438cd63edbf271ebe1918db8261b7c09.1606555939.git.xuanzhuo@linux.alibaba.com
|
|
Avoid occasional test failures due to the last sample being delayed to
another ring_buffer__poll() call. Instead, drain samples completely with
ring_buffer__consume(). This is supposed to fix a rare and non-deterministic
test failure in libbpf CI.
Fixes: cb1c9ddd5525 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201130223336.904192-2-andrii@kernel.org
|
|
Fix ring_buffer__poll() to return the number of non-discarded records
consumed, just like its documentation states. It's also consistent with
ring_buffer__consume() return. Fix up selftests with wrong expected results.
Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Fixes: cb1c9ddd5525 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201130223336.904192-1-andrii@kernel.org
|
|
It turns out that it does exist a path where xdp_return_buff() is
being passed an XDP buffer of type MEM_TYPE_XSK_BUFF_POOL. This path
is when AF_XDP zero-copy mode is enabled, and a buffer is redirected
to a DEVMAP with an attached XDP program that drops the buffer.
This change simply puts the handling of MEM_TYPE_XSK_BUFF_POOL back
into xdp_return_buff().
Fixes: 82c41671ca4f ("xdp: Simplify xdp_return_{frame, frame_rx_napi, buff}")
Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/bpf/20201127171726.123627-1-bjorn.topel@gmail.com
|
|
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning:
1. GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE
2. GPIO_ACTIVE_LOW = 1 = IRQ_TYPE_EDGE_RISING
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_LOW => IRQ_TYPE_LEVEL_LOW
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Fixes: a1a8b4594f8d ("NFC: pn544: i2c: Add DTS Documentation")
Fixes: 6be88670fc59 ("NFC: nxp-nci_i2c: Add I2C support to NXP NCI driver")
Fixes: e3b329221567 ("dt-bindings: can: tcan4x5x: Update binding to use interrupt property")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for tcan4x5x.txt
Link: https://lore.kernel.org/r/20201026153620.89268-1-krzk@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Dany Madden says:
====================
ibmvnic: assorted bug fixes
Assorted fixes for ibmvnic originated from "[PATCH net 00/15] ibmvnic:
assorted bug fixes" sent by Lijun Pan.
====================
Link: https://lore.kernel.org/r/20201126000432.29897-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reduce the wait time for Command Response Queue response from 30 seconds
to 20 seconds, as recommended by VIOS and Power Hypervisor teams.
Fixes: bd0b672313941 ("ibmvnic: Move login and queue negotiation into ibmvnic_open")
Fixes: 53da09e92910f ("ibmvnic: Add set_link_state routine for setting adapter link state")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reset timeout is going off right after adapter reset. This patch ensures
that timeout is scheduled if it has been 5 seconds since the last reset.
5 seconds is the default watchdog timeout.
Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
send_login() does not check for the result of ibmvnic_send_crq() of the
login request. This results in the driver needlessly retrying the login
10 times even when CRQ is no longer active. Check the return code and
give up in case of errors in sending the CRQ.
The only time we want to retry is if we get a PARITALSUCCESS response
from the partner.
Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that
the worker thread will start reset process and free the login response
buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will
then try to access the login response buffer and crash.
Have ibmvnic track pending logins and discard any stale login responses.
Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If auto-priority failover is enabled, the backing device needs time
to settle if hard resetting fails for any reason. Add a delay of 60
seconds before retrying the hard-reset.
Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a failed reset, driver could end up in VNIC_PROBED or VNIC_CLOSED
state and cannot recover in subsequent resets, leaving it offline.
This patch restores the adapter state to reset_state, the original
state when reset was called.
Fixes: b27507bb59ed5 ("net/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run")
Fixes: 2770a7984db58 ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
scrq->msgs could be NULL during device reset, causing Linux to crash.
So, check before memset scrq->msgs.
Fixes: c8b2ad0a4a901 ("ibmvnic: Sanitize entire SCRQ buffer on reset")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When ibmvnic fails to reset, it breaks out of the reset loop and frees
all of the remaining resets from the workqueue. Doing so prevents the
adapter from recovering if no reset is scheduled after that. Instead,
have the driver continue to process resets on the workqueue.
Remove the no longer need free_all_rwi().
Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Inconsistent login with the vnicserver is causing the device to be
removed. This does not give the device a chance to recover from error
state. This patch schedules a FATAL reset instead to bring the adapter
up.
Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix insufficient validation of IPSET_ATTR_IPADDR_IPV6 reported
by syzbot.
2) Remove spurious reports on nf_tables when lockdep gets disabled,
from Florian Westphal.
3) Fix memleak in the error path of error path of
ip_vs_control_net_init(), from Wang Hai.
4) Fix missing control data in flow dissector, otherwise IP address
matching in hardware offload infra does not work.
5) Fix hardware offload match on prefix IP address when userspace
does not send a bitwise expression to represent the prefix.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nftables_offload: build mask based from the matching bytes
netfilter: nftables_offload: set address type in control dissector
ipvs: fix possible memory leak in ip_vs_control_net_init
netfilter: nf_tables: avoid false-postive lockdep splat
netfilter: ipset: prevent uninit-value in hash_ip6_add
====================
Link: https://lore.kernel.org/r/20201127190313.24947-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
a proper kernel configuration for running kselftest can be obtained with:
$ yes | make kselftest-merge
enable compile support for the 'red' qdisc: otherwise, tdc kselftest fail
when trying to run tdc test items contained in red.json.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/cfa23f2d4f672401e6cebca3a321dd1901a9ff07.1606416464.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When inet_rtm_getroute() was converted to use the RCU variants of
ip_route_input() and ip_route_output_key(), the TOS parameters
stopped being masked with IPTOS_RT_MASK before doing the route lookup.
As a result, "ip route get" can return a different route than what
would be used when sending real packets.
For example:
$ ip route add 192.0.2.11/32 dev eth0
$ ip route add unreachable 192.0.2.11/32 tos 2
$ ip route get 192.0.2.11 tos 2
RTNETLINK answers: No route to host
But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would
actually be routed using the first route:
$ ping -c 1 -Q 2 192.0.2.11
PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data.
64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms
--- 192.0.2.11 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms
This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to
return results consistent with real route lookups.
Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/b2d237d08317ca55926add9654a48409ac1b8f5b.1606412894.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-11-28
1) Do not reference the skb for xsk's generic TX side since when looped
back into RX it might crash in generic XDP, from Björn Töpel.
2) Fix umem cleanup on a partially set up xsk socket when being destroyed,
from Magnus Karlsson.
3) Fix an incorrect netdev reference count when failing xsk_bind() operation,
from Marek Majtyka.
4) Fix bpftool to set an error code on failed calloc() in build_btf_type_table(),
from Zhen Lei.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add MAINTAINERS entry for BPF LSM
bpftool: Fix error return value in build_btf_type_table
net, xsk: Avoid taking multiple skbuff references
xsk: Fix incorrect netdev reference count
xsk: Fix umem cleanup bug at socket destruct
MAINTAINERS: Update XDP and AF_XDP entries
====================
Link: https://lore.kernel.org/r/20201128005104.1205-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Fix head/tailroom issues for fragments, by Sven Eckelmann (3 patches)
* tag 'batadv-net-pullrequest-20201127' of git://git.open-mesh.org/linux-merge:
batman-adv: Don't always reallocate the fragmentation skb head
batman-adv: Reserve needed_*room for fragments
batman-adv: Consider fragmentation for needed_headroom
====================
Link: https://lore.kernel.org/r/20201127173849.19208-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Netfilter changes PACKET_OTHERHOST to PACKET_HOST before invoking the
hooks as, while it's an expected value for a bridge, routing expects
PACKET_HOST. The change is undone later on after hook traversal. This
can be seen with pairs of functions updating skb>pkt_type and then
reverting it to its original value:
For hook NF_INET_PRE_ROUTING:
setup_pre_routing / br_nf_pre_routing_finish
For hook NF_INET_FORWARD:
br_nf_forward_ip / br_nf_forward_finish
But the third case where netfilter does this, for hook
NF_INET_POST_ROUTING, the packet type is changed in br_nf_post_routing
but never reverted. A comment says:
/* We assume any code from br_dev_queue_push_xmit onwards doesn't care
* about the value of skb->pkt_type. */
But when having a tunnel (say vxlan) attached to a bridge we have the
following call trace:
br_nf_pre_routing
br_nf_pre_routing_ipv6
br_nf_pre_routing_finish
br_nf_forward_ip
br_nf_forward_finish
br_nf_post_routing <- pkt_type is updated to PACKET_HOST
br_nf_dev_queue_xmit <- but not reverted to its original value
vxlan_xmit
vxlan_xmit_one
skb_tunnel_check_pmtu <- a check on pkt_type is performed
In this specific case, this creates issues such as when an ICMPv6 PTB
should be sent back. When CONFIG_BRIDGE_NETFILTER is enabled, the PTB
isn't sent (as skb_tunnel_check_pmtu checks if pkt_type is PACKET_HOST
and returns early).
If the comment is right and no one cares about the value of
skb->pkt_type after br_dev_queue_push_xmit (which isn't true), resetting
it to its original value should be safe.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201123174902.622102-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fix from Arnd Bergmann:
"Add correct MAX_POSSIBLE_PHYSMEM_BITS setting to asm-generic.
This is a single bugfix for a bug that Stefan Agner found on 32-bit
Arm, but that exists on several other architectures"
* tag 'asm-generic-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Another set of patches for devicetree files and Arm SoC specific
drivers:
- A fix for OP-TEE shared memory on non-SMP systems
- multiple code fixes for the OMAP platform, including one regression
for the CPSW network driver and a few runtime warning fixes
- Some DT patches for the Rockchip RK3399 platform, in particular
fixing the MMC device ordering that recently became
nondeterministic with async probe.
- Multiple DT fixes for the Tegra platform, including a regression
fix for suspend/resume on TX2
- A regression fix for a user-triggered fault in the NXP dpio driver
- A regression fix for a bug caused by an earlier bug fix in the
xilinx firmware driver
- Two more DTC warning fixes
- Sylvain Lemieux steps down as maintainer for the NXP LPC32xx
platform"
* tag 'arm-soc-fixes-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
arm64: tegra: Fix Tegra234 VDK node names
arm64: tegra: Wrong AON HSP reg property size
arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1
arm64: tegra: Correct the UART for Jetson Xavier NX
arm64: tegra: Disable the ACONNECT for Jetson TX2
optee: add writeback to valid memory type
firmware: xilinx: Use hash-table for api feature check
firmware: xilinx: Fix SD DLL node reset issue
soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)
ARM: dts: dra76x: m_can: fix order of clocks
bus: ti-sysc: suppress err msg for timers used as clockevent/source
MAINTAINERS: Remove myself as LPC32xx maintainers
arm64: dts: qcom: clear the warnings caused by empty dma-ranges
arm64: dts: broadcom: clear the warnings caused by empty dma-ranges
ARM: dts: am437x-l4: fix compatible for cpsw switch dt node
arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc.
arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards.
arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance
arm64: dts: rockchip: fix NanoPi R2S GMAC clock name
ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc6, including fixes from the WiFi driver,
and CAN subtrees.
Current release - regressions:
- gro_cells: reduce number of synchronize_net() calls
- ch_ktls: release a lock before jumping to an error path
Current release - always broken:
- tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header
Previous release - regressions:
- net/tls: fix missing received data after fast remote close
- vsock/virtio: discard packets only when socket is really closed
- sock: set sk_err to ee_errno on dequeue from errq
- cxgb4: fix the panic caused by non smac rewrite
Previous release - always broken:
- tcp: fix corner cases around setting ECN with BPF selection of
congestion control
- tcp: fix race condition when creating child sockets from syncookies
on loopback interface
- usbnet: ipheth: fix connectivity with iOS 14
- tun: honor IOCB_NOWAIT flag
- net/packet: fix packet receive on L3 devices without visible hard
header
- devlink: Make sure devlink instance and port are in same net
namespace
- net: openvswitch: fix TTL decrement action netlink message format
- bonding: wait for sysfs kobject destruction before freeing struct
slave
- net: stmmac: fix upstream patch applied to the wrong context
- bnxt_en: fix return value and unwind in probe error paths
Misc:
- devlink: add extra layer of categorization to the reload stats uAPI
before it's released"
* tag 'net-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
sock: set sk_err to ee_errno on dequeue from errq
mptcp: fix NULL ptr dereference on bad MPJ
net: openvswitch: fix TTL decrement action netlink message format
can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check
can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0
can: m_can: fix nominal bitiming tseg2 min for version >= 3.1
can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags
can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given
can: gs_usb: fix endianess problem with candleLight firmware
ch_ktls: lock is not freed
net/tls: Protect from calling tls_dev_del for TLS RX twice
devlink: Make sure devlink instance and port are in same net namespace
devlink: Hold rtnl lock while reading netdev attributes
ptp: clockmatrix: bug fix for idtcm_strverscmp
enetc: Let the hardware auto-advance the taprio base-time of 0
gro_cells: reduce number of synchronize_net() calls
net: stmmac: fix incorrect merge of patch upstream
ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init
Documentation: netdev-FAQ: suggest how to post co-dependent series
ibmvnic: enhance resetting status check during module exit
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three small fixes in the UFS driver: two are for power management
issues and the third is to fix a slew of problem in the sysfs code"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: Fix race between shutdown and runtime resume flow
scsi: ufs: Make sure clk scaling happens only when HBA is runtime ACTIVE
scsi: ufs: Fix unexpected values from ufshcd_read_desc_param()
|
|
Pull io_uring fixes from Jens Axboe:
- Out of bounds fix for the cq size cap from earlier this release (Joseph)
- iov_iter type check fix (Pavel)
- Files grab + cancelation fix (Pavel)
* tag 'io_uring-5.10-2020-11-27' of git://git.kernel.dk/linux-block:
io_uring: fix files grab/cancel race
io_uring: fix ITER_BVEC check
io_uring: fix shift-out-of-bounds when round up cq size
|
|
Pull block fix from Jens Axboe:
"Just a single fix, for a crash in the keyslot manager"
* tag 'block-5.10-2020-11-27' of git://git.kernel.dk/linux-block:
block/keyslot-manager: prevent crash when num_slots=1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few fixes for various warnings that accumulated over past two weeks:
- tree-checker: add missing return values for some errors
- lockdep fixes
- when reading qgroup config and starting quota rescan
- reverse order of quota ioctl lock and VFS freeze lock
- avoid accessing potentially stale fs info during device scan,
reported by syzbot
- add scope NOFS protection around qgroup relation changes
- check for running transaction before flushing qgroups
- fix tracking of new delalloc ranges for some cases"
* tag 'for-5.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix lockdep splat when enabling and disabling qgroups
btrfs: do nofs allocations when adding and removing qgroup relations
btrfs: fix lockdep splat when reading qgroup config on mount
btrfs: tree-checker: add missing returns after data_ref alignment checks
btrfs: don't access possibly stale fs_info data for printing duplicate device
btrfs: tree-checker: add missing return after error in root_item
btrfs: qgroup: don't commit transaction when we already hold the handle
btrfs: fix missing delalloc new bit for new delalloc ranges
|
|
Pull rdma fixes from Jason Gunthorpe:
"Two security issues and several small bug fixes. Things seem to have
stabilized for this release here.
Summary:
- Significant out of bounds access security issue in i40iw
- Fix misuse of mmu notifiers in hfi1
- Several errors in the register map/usage in hns
- Missing error returns in mthca"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Bugfix for memory window mtpt configuration
RDMA/hns: Fix retry_cnt and rnr_cnt when querying QP
RDMA/hns: Fix wrong field of SRQ number the device supports
IB/hfi1: Ensure correct mm is used at all times
RDMA/i40iw: Address an mmap handler exploit in i40iw
IB/mthca: fix return value of error branch in mthca_init_cq()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Because of a recent change in the core, NAND controller drivers
initializing the ECC engine too early in the probe path are broken.
Drivers should wait for the NAND device to be discovered and its
memory layout known before doing any ECC related initialization, so
instead of reverting the faulty change which is actually moving in the
right direction, let's fix the drivers directly: socrates, sharpsl,
r852, plat_nand, pasemi, tmio, txx9ndfmc, orion, mpc5121, lpc32xx_slc,
lpc32xx_mlc, fsmc, diskonchip, davinci, cs553x, au1550, ams-delta,
xway and gpio"
* tag 'mtd/fixes-for-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: socrates: Move the ECC initialization to ->attach_chip()
mtd: rawnand: sharpsl: Move the ECC initialization to ->attach_chip()
mtd: rawnand: r852: Move the ECC initialization to ->attach_chip()
mtd: rawnand: plat_nand: Move the ECC initialization to ->attach_chip()
mtd: rawnand: pasemi: Move the ECC initialization to ->attach_chip()
mtd: rawnand: tmio: Move the ECC initialization to ->attach_chip()
mtd: rawnand: txx9ndfmc: Move the ECC initialization to ->attach_chip()
mtd: rawnand: orion: Move the ECC initialization to ->attach_chip()
mtd: rawnand: mpc5121: Move the ECC initialization to ->attach_chip()
mtd: rawnand: lpc32xx_slc: Move the ECC initialization to ->attach_chip()
mtd: rawnand: lpc32xx_mlc: Move the ECC initialization to ->attach_chip()
mtd: rawnand: fsmc: Move the ECC initialization to ->attach_chip()
mtd: rawnand: diskonchip: Move the ECC initialization to ->attach_chip()
mtd: rawnand: davinci: Move the ECC initialization to ->attach_chip()
mtd: rawnand: cs553x: Move the ECC initialization to ->attach_chip()
mtd: rawnand: au1550: Move the ECC initialization to ->attach_chip()
mtd: rawnand: ams-delta: Move the ECC initialization to ->attach_chip()
mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()
mtd: rawnand: gpio: Move the ECC initialization to ->attach_chip()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few fixes for v5.10, one for the core which fixes some potential
races for controllers with multiple chip selects when configuration of
the chip select for one client device races with the addition and
initial setup of an additional client"
* tag 'spi-fix-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dw: Fix spi registration for controllers overriding CS
spi: imx: fix the unbalanced spi runtime pm management
spi: spi-nxp-fspi: fix fspi panic by unexpected interrupts
spi: Take the SPI IO-mutex in the spi_setup() method
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull virtual digital TV driver fixes from Mauro Carvalho Chehab:
"A series of fixes for the new virtual digital TV driver (vidtv), which
is meant to help doing tests with the digital TV core and media
userspace apps and libraries.
They cover a series of issues I found on it, together with a few new
things in order to make it easier to detect problems at the DVB core"
* tag 'media/v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (36 commits)
media: vidtv.rst: add kernel-doc markups
media: vidtv.rst: update vidtv documentation
media: vidtv: simplify EIT write function
media: vidtv: simplify NIT write function
media: vidtv: simplify SDT write function
media: vidtv: cleanup PMT write table function
media: vidtv: cleanup PAT write function
media: vidtv: cleanup PSI table header function
media: vidtv: cleanup PSI descriptor write function
media: vidtv: simplify the crc writing logic
media: vidtv: simplify PSI write function
media: vidtv: add date to the current event
media: vidtv: fix service_id at SDT table
media: vidtv: fix service type
media: vidtv: add a PID entry for the NIT table
media: vidtv: properly fill EIT service_id
media: vidtv: fix the network ID range
media: vidtv: improve EIT data
media: vidtv: cleanup null packet initialization logic
media: vidtv: pre-initialize mux arrays
...
|
|
Pull drm fixes from Dave Airlie:
"Unfortunately this has a bit of thanksgiving stuffing in it, as it a
bit larger (at least the vc4 patches) than I like at this point in
time.
The main thing is it has a bunch of regressions fixes for reports in
the last couple of weeks, ast, nouveau and the amdgpu ttm init fix,
along with the usual selection of amdgpu and i915 fixes.
The vc4 fixes are a few but they are fixes and the nastiest one is a
fix for when you have a 2.4Ghz Wifi and a HDMI signal with a clock in
that range and there isn't enough shielding and interference happen
between the two, the fix adjusts the mode clock to try and avoid the
wifi channels in that case.
Hopefully you can merge this between turkey slices, and next week
should be quieter.
ast:
- LUT loading regression fix
nouveau:
- relocations regression fix
amdgpu:
- ttm init oops fix
- Runtime pm fix
- SI UVD suspend/resume fix
- HDCP fix for headless cards
- Sienna Cichlid golden register update
i915:
- Fix Perf/OA workaround register corruption (Lionel)
- Correct a comment statement in GVT (Yan)
- Fix GT enable/disable iterrupts, including a race condition that
prevented GPU to go idle (Chris)
- Free stale request on destroying the virtual engine (Chris)
exynos:
- config dependency fix
mediatek:
- unused var removal
- horizonal front/back porch formula fix
vc4:
- wifi and hdmi interference fix
- mode rejection fixes
- use after free fix
- cleanup some code"
* tag 'drm-fixes-2020-11-27-1' of git://anongit.freedesktop.org/drm/drm: (28 commits)
drm/nouveau: fix relocations applying logic and a double-free
drm/ast: Reload gamma LUT after changing primary plane's color format
drm/amdgpu: Fix size calculation when init onchip memory
drm/amdgpu: update golden setting for sienna_cichlid
drm/amd/display: Avoid HDCP initialization in devices without output
drm/i915/gt: Free stale request on destroying the virtual engine
drm/i915/gt: Don't cancel the interrupt shadow too early
drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
drm/amdgpu: fix a page fault
drm/amdgpu: fix SI UVD firmware validate resume fail
drm/amd/amdgpu: fix null pointer in runtime pm
drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission
drm/i915/gvt: correct a false comment of flag F_UNALIGN
drm/i915/perf: workaround register corruption in OATAILPTR
drm/vc4: kms: Don't disable the muxing of an active CRTC
drm/vc4: kms: Store the unassigned channel list in the state
drm/exynos: depend on COMMON_CLK to fix compile tests
drm/mediatek: dsi: Modify horizontal front/back porch byte formula
drm/vc4: hdmi: Disable Wifi Frequencies
dt-bindings: display: Add a property to deal with WiFi coexistence
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-27
The first patch is by me and target the gs_usb driver and fixes the endianess
problem with candleLight firmware.
Another patch by me for the mcp251xfd driver add sanity checking to bail out if
no IRQ is configured.
The next three patches target the m_can driver. A patch by me removes the
hardcoded IRQF_TRIGGER_FALLING from the request_threaded_irq() as this clashes
with the trigger level specified in the DT. Further a patch by me fixes the
nominal bitiming tseg2 min value for modern m_can cores. Pankaj Sharma's patch
add support for cores version 3.3.x.
The last patch by Oliver Hartkopp is for af_can and converts a WARN() into a
pr_warn(), which is triggered by the syzkaller. It was able to create a
situation where the closing of a socket runs simultaneously to the notifier
call chain for removing the CAN network device in use.
* tag 'linux-can-fixes-for-5.10-20201127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check
can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0
can: m_can: fix nominal bitiming tseg2 min for version >= 3.1
can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags
can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given
can: gs_usb: fix endianess problem with candleLight firmware
====================
Link: https://lore.kernel.org/r/20201127100301.512603-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- thinkpad_acpi fixes: two bug-fixes and three model specific quirks
- fixes for misc other drivers: two bug-fixes and three model specific
quirks
* tag 'platform-drivers-x86-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: touchscreen_dmi: Add info for the Irbis TW118 tablet
platform/x86: touchscreen_dmi: Add info for the Predia Basic tablet
platform/x86: intel-vbtn: Support for tablet mode on HP Pavilion 13 x360 PC
platform/x86: toshiba_acpi: Fix the wrong variable assignment
platform/x86: acer-wmi: add automatic keyboard background light toggle key as KEY_LIGHTS_TOGGLE
platform/x86: thinkpad_acpi: Whitelist P15 firmware for dual fan control
platform/x86: thinkpad_acpi: Send tablet mode switch at wakeup time
platform/x86: thinkpad_acpi: Add BAT1 is primary battery quirk for Thinkpad Yoga 11e 4th gen
platform/x86: thinkpad_acpi: Do not report SW_TABLET_MODE on Yoga 11e
platform/x86: thinkpad_acpi: add P1 gen3 second fan support
|
|
When setting sk_err, set it to ee_errno, not ee_origin.
Commit f5f99309fa74 ("sock: do not set sk_err in
sock_dequeue_err_skb") disabled updating sk_err on errq dequeue,
which is correct for most error types (origins):
- sk->sk_err = err;
Commit 38b257938ac6 ("sock: reset sk_err when the error queue is
empty") reenabled the behavior for IMCP origins, which do require it:
+ if (icmp_next)
+ sk->sk_err = SKB_EXT_ERR(skb_next)->ee.ee_origin;
But read from ee_errno.
Fixes: 38b257938ac6 ("sock: reset sk_err when the error queue is empty")
Reported-by: Ayush Ranjan <ayushranjan@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20201126151220.2819322-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If an msk listener receives an MPJ carrying an invalid token, it
will zero the request socket msk entry. That should later
cause fallback and subflow reset - as per RFC - at
subflow_syn_recv_sock() time due to failing hmac validation.
Since commit 4cf8b7e48a09 ("subflow: introduce and use
mptcp_can_accept_new_subflow()"), we unconditionally dereference
- in mptcp_can_accept_new_subflow - the subflow request msk
before performing hmac validation. In the above scenario we
hit a NULL ptr dereference.
Address the issue doing the hmac validation earlier.
Fixes: 4cf8b7e48a09 ("subflow: introduce and use mptcp_can_accept_new_subflow()")
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/03b2cfa3ac80d8fc18272edc6442a9ddf0b1e34e.1606400227.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix alignment of the new HYP sections
- Fix GICR_TYPER access from userspace
S390:
- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early
- fix for destroy page ultravisor call
x86:
- fix for SEV debugging
- fix incorrect return code
- fix for 'noapic' with PIC in userspace and LAPIC in kernel
- fix for 5-level paging"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
KVM: x86: Fix split-irqchip vs interrupt injection window request
KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
MAINTAINERS: Update email address for Sean Christopherson
MAINTAINERS: add uv.c also to KVM/s390
s390/uv: handle destroy page legacy interface
KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
KVM: SVM: fix error return code in svm_create_vcpu()
KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
KVM: arm64: Correctly align nVHE percpu data
KVM: s390: remove diag318 reset code
KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup
|
|
Currently, the openvswitch module is not accepting the correctly formated
netlink message for the TTL decrement action. For both setting and getting
the dec_ttl action, the actions should be nested in the
OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi.
When the original patch was sent, it was tested with a private OVS userspace
implementation. This implementation was unfortunately not upstreamed and
reviewed, hence an erroneous version of this patch was sent out.
Leaving the patch as-is would cause problems as the kernel module could
interpret additional attributes as actions and vice-versa, due to the
actions not being encapsulated/nested within the actual attribute, but
being concatinated after it.
Fixes: 744676e77720 ("openvswitch: add TTL decrement action")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.10:
- regression fix for a boot failure on some 32-bit machines.
- fix for host crashes in the KVM system reset handling.
- fix for a possible oops in the KVM XIVE interrupt handling on
Power9.
- fix for host crashes triggerable via the KVM emulated MMIO handling
when running HPT guests.
- a couple of small build fixes.
Thanks to Andreas Schwab, Cédric Le Goater, Christophe Leroy, Erhard
Furtner, Greg Kurz, Greg Kurz, Németh Márton, Nicholas Piggin, Nick
Desaulniers, Serge Belyshev, and Stephen Rothwell"
* tag 'powerpc-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix allnoconfig build since uaccess flush
powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context
powerpc: Drop -me200 addition to build flags
KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page
powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y
powerpc/32s: Use relocation offset when setting early hash table
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main changes are relating to our handling of access/dirty bits,
where our low-level page-table helpers could lead to stale young
mappings and loss of the dirty bit in some cases (the latter has not
been observed in practice, but could happen when clearing "soft-dirty"
if we enabled that). These were posted as part of a larger series, but
the rest of that is less urgent and needs a v2 which I'll get to
shortly.
In other news, we've now got a set of fixes to resolve the
lockdep/tracing problems that have been plaguing us for a while, but
they're still a bit "fresh" and I plan to send them to you next week
after we've got some more confidence in them (although initial CI
results look good).
Summary:
- Fix kerneldoc warnings generated by ACPI IORT code
- Fix pte_accessible() so that access flag is ignored
- Fix missing header #include
- Fix loss of software dirty bit across pte_wrprotect() when HW DBM
is enabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()
arm64: pgtable: Fix pte_accessible()
ACPI/IORT: Fix doc warnings in iort.c
arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull iommu fixes from Will Deacon:
"Here's another round of IOMMU fixes for -rc6 consisting mainly of a
bunch of independent driver fixes. Thomas agreed for me to take the
x86 'tboot' fix here, as it fixes a regression introduced by a vt-d
change.
- Fix intel iommu driver when running on devices without VCCAP_REG
- Fix swiotlb and "iommu=pt" interaction under TXT (tboot)
- Fix missing return value check during device probe()
- Fix probe ordering for Qualcomm SMMU implementation
- Ensure page-sized mappings are used for AMD IOMMU buffers with SNP
RMP"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
iommu/vt-d: Don't read VCCAP register unless it exists
x86/tboot: Don't disable swiotlb when iommu is forced on
iommu: Check return of __iommu_attach_device()
arm-smmu-qcom: Ensure the qcom_scm driver has finished probing
iommu/amd: Enforce 4k mapping for certain IOMMU data structures
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
- do not lose trailing newline in pr_cont() calls
- two trivial fixes for a dead store and a config description
* tag 'printk-for-5.10-rc6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: finalize records with trailing newlines
printk: remove unneeded dead-store assignment
init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT description
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull writeback fix from Jan Kara:
"A fix of possible missing string termination in writeback tracepoints"
* tag 'writeback_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
trace: fix potenial dangerous pointer
|
|
git://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes
Add writeback to valid OP-TEE shared memory types
Allows OP-TEE to work with ARMv7 based single CPU systems by allowing
writeback cache policy for shared memory.
* tag 'optee-valid-memory-type-for-v5.11' of git://git.linaro.org/people/jens.wiklander/linux-tee:
optee: add writeback to valid memory type
Link: https://lore.kernel.org/r/20201125120134.GA1642471@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Commit 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused
the following WARNING on an Intel Ice Lake CPU:
get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy:
------ spte 0xb80a0 level 5.
------ spte 0xfcd210107 level 4.
------ spte 0x1004c40107 level 3.
------ spte 0x1004c41107 level 2.
------ spte 0x1db00000000b83b6 level 1.
WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm]
...
Call Trace:
? kvm_io_bus_get_first_dev+0x55/0x110 [kvm]
vcpu_enter_guest+0xaa1/0x16a0 [kvm]
? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel]
? skip_emulated_instruction+0xaa/0x150 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm]
The guest triggering this crashes. Note, this happens with the traditional
MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out,
there was a subtle change in the above mentioned commit. Previously,
walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level'
which is returned by shadow_walk_init() and this equals to
'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to
'int root = vcpu->arch.mmu->root_level'.
The difference between 'root_level' and 'shadow_root_level' on CPUs
supporting 5-level page tables is that in some case we don't want to
use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48'
kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used,
the corresponding SPTE will fail '__is_rsvd_bits_set()' check.
Revert to using 'shadow_root_level'.
Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
a hodge-podge of conditions, hacked together to get something that
more or less works. But what is actually needed is much simpler;
in both cases the fundamental question is, do we have a place to stash
an interrupt if userspace does KVM_INTERRUPT?
In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
unnecessarily restrictive.
In split irqchip mode it's a bit more complicated, we need to check
kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
cycle and thus requires ExtINTs not to be masked) as well as
!pending_userspace_extint(vcpu). However, there is no need to
check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
pending ExtINT state separate from event injection state, and checking
kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
priority than APIC interrupts. In fact the latter fixes a bug:
when userspace requests an IRQ window vmexit, an interrupt in the
local APIC can cause kvm_cpu_has_interrupt() to be true and thus
kvm_vcpu_ready_for_interrupt_injection() to return false. When this
happens, vcpu_run does not exit to userspace but the interrupt window
vmexits keep occurring. The VM loops without any hope of making progress.
Once we try to fix these with something like
return kvm_arch_interrupt_allowed(vcpu) &&
- !kvm_cpu_has_interrupt(vcpu) &&
- !kvm_event_needs_reinjection(vcpu) &&
- kvm_cpu_accept_dm_intr(vcpu);
+ (!lapic_in_kernel(vcpu)
+ ? !vcpu->arch.interrupt.injected
+ : (kvm_apic_accept_pic_intr(vcpu)
+ && !pending_userspace_extint(v)));
we realize two things. First, thanks to the previous patch the complex
conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt
window request in vcpu_enter_guest()
bool req_int_win =
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
it is unnecessary to ask the processor for an interrupt window
if we would not be able to return to userspace. Therefore,
kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
ANDed with the existing check for masked ExtINT. It all makes sense:
- we can accept an interrupt from userspace if there is a place
to stash it (and, for irqchip split, ExtINTs are not masked).
Interrupts from userspace _can_ be accepted even if right now
EFLAGS.IF=0.
- in order to tell userspace we will inject its interrupt ("IRQ
window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
KVM and the vCPU need to be ready to accept the interrupt.
... and this is what the patch implements.
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Analyzed-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
|