summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2021-04-29Merge tag 'net-next-5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - bpf: - allow bpf programs calling kernel functions (initially to reuse TCP congestion control implementations) - enable task local storage for tracing programs - remove the need to store per-task state in hash maps, and allow tracing programs access to task local storage previously added for BPF_LSM - add bpf_for_each_map_elem() helper, allowing programs to walk all map elements in a more robust and easier to verify fashion - sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT redirection - lpm: add support for batched ops in LPM trie - add BTF_KIND_FLOAT support - mostly to allow use of BTF on s390 which has floats in its headers files - improve BPF syscall documentation and extend the use of kdoc parsing scripts we already employ for bpf-helpers - libbpf, bpftool: support static linking of BPF ELF files - improve support for encapsulation of L2 packets - xdp: restructure redirect actions to avoid a runtime lookup, improving performance by 4-8% in microbenchmarks - xsk: build skb by page (aka generic zerocopy xmit) - improve performance of software AF_XDP path by 33% for devices which don't need headers in the linear skb part (e.g. virtio) - nexthop: resilient next-hop groups - improve path stability on next-hops group changes (incl. offload for mlxsw) - ipv6: segment routing: add support for IPv4 decapsulation - icmp: add support for RFC 8335 extended PROBE messages - inet: use bigger hash table for IP ID generation - tcp: deal better with delayed TX completions - make sure we don't give up on fast TCP retransmissions only because driver is slow in reporting that it completed transmitting the original - tcp: reorder tcp_congestion_ops for better cache locality - mptcp: - add sockopt support for common TCP options - add support for common TCP msg flags - include multiple address ids in RM_ADDR - add reset option support for resetting one subflow - udp: GRO L4 improvements - improve 'forward' / 'frag_list' co-existence with UDP tunnel GRO, allowing the first to take place correctly even for encapsulated UDP traffic - micro-optimize dev_gro_receive() and flow dissection, avoid retpoline overhead on VLAN and TEB GRO - use less memory for sysctls, add a new sysctl type, to allow using u8 instead of "int" and "long" and shrink networking sysctls - veth: allow GRO without XDP - this allows aggregating UDP packets before handing them off to routing, bridge, OvS, etc. - allow specifing ifindex when device is moved to another namespace - netfilter: - nft_socket: add support for cgroupsv2 - nftables: add catch-all set element - special element used to define a default action in case normal lookup missed - use net_generic infra in many modules to avoid allocating per-ns memory unnecessarily - xps: improve the xps handling to avoid potential out-of-bound accesses and use-after-free when XPS change race with other re-configuration under traffic - add a config knob to turn off per-cpu netdev refcnt to catch underflows in testing Device APIs: - add WWAN subsystem to organize the WWAN interfaces better and hopefully start driving towards more unified and vendor- independent APIs - ethtool: - add interface for reading IEEE MIB stats (incl. mlx5 and bnxt support) - allow network drivers to dump arbitrary SFP EEPROM data, current offset+length API was a poor fit for modern SFP which define EEPROM in terms of pages (incl. mlx5 support) - act_police, flow_offload: add support for packet-per-second policing (incl. offload for nfp) - psample: add additional metadata attributes like transit delay for packets sampled from switch HW (and corresponding egress and policy-based sampling in the mlxsw driver) - dsa: improve support for sandwiched LAGs with bridge and DSA - netfilter: - flowtable: use direct xmit in topologies with IP forwarding, bridging, vlans etc. - nftables: counter hardware offload support - Bluetooth: - improvements for firmware download w/ Intel devices - add support for reading AOSP vendor capabilities - add support for virtio transport driver - mac80211: - allow concurrent monitor iface and ethernet rx decap - set priority and queue mapping for injected frames - phy: add support for Clause-45 PHY Loopback - pci/iov: add sysfs MSI-X vector assignment interface to distribute MSI-X resources to VFs (incl. mlx5 support) New hardware/drivers: - dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit interfaces. - dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and BCM63xx switches - Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches - ath11k: support for QCN9074 a 802.11ax device - Bluetooth: Broadcom BCM4330 and BMC4334 - phy: Marvell 88X2222 transceiver support - mdio: add BCM6368 MDIO mux bus controller - r8152: support RTL8153 and RTL8156 (USB Ethernet) chips - mana: driver for Microsoft Azure Network Adapter (MANA) - Actions Semi Owl Ethernet MAC - can: driver for ETAS ES58X CAN/USB interfaces Pure driver changes: - add XDP support to: enetc, igc, stmmac - add AF_XDP support to: stmmac - virtio: - page_to_skb() use build_skb when there's sufficient tailroom (21% improvement for 1000B UDP frames) - support XDP even without dedicated Tx queues - share the Tx queues with the stack when necessary - mlx5: - flow rules: add support for mirroring with conntrack, matching on ICMP, GTP, flex filters and more - support packet sampling with flow offloads - persist uplink representor netdev across eswitch mode changes - allow coexistence of CQE compression and HW time-stamping - add ethtool extended link error state reporting - ice, iavf: support flow filters, UDP Segmentation Offload - dpaa2-switch: - move the driver out of staging - add spanning tree (STP) support - add rx copybreak support - add tc flower hardware offload on ingress traffic - ionic: - implement Rx page reuse - support HW PTP time-stamping - octeon: support TC hardware offloads - flower matching on ingress and egress ratelimitting. - stmmac: - add RX frame steering based on VLAN priority in tc flower - support frame preemption (FPE) - intel: add cross time-stamping freq difference adjustment - ocelot: - support forwarding of MRP frames in HW - support multiple bridges - support PTP Sync one-step timestamping - dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like learning, flooding etc. - ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350, SC7280 SoCs) - mt7601u: enable TDLS support - mt76: - add support for 802.3 rx frames (mt7915/mt7615) - mt7915 flash pre-calibration support - mt7921/mt7663 runtime power management fixes" * tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits) net: selftest: fix build issue if INET is disabled net: netrom: nr_in: Remove redundant assignment to ns net: tun: Remove redundant assignment to ret net: phy: marvell: add downshift support for M88E1240 net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255 net/sched: act_ct: Remove redundant ct get and check icmp: standardize naming of RFC 8335 PROBE constants bpf, selftests: Update array map tests for per-cpu batched ops bpf: Add batched ops support for percpu array bpf: Implement formatted output helpers with bstr_printf seq_file: Add a seq_bprintf function sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues net:nfc:digital: Fix a double free in digital_tg_recv_dep_req net: fix a concurrency bug in l2tp_tunnel_register() net/smc: Remove redundant assignment to rc mpls: Remove redundant assignment to err llc2: Remove redundant assignment to rc net/tls: Remove redundant initialization of record rds: Remove redundant assignment to nr_sig dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0 ...
2021-04-28Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This consists of the usual driver updates (ufs, target, tcmu, smartpqi, lpfc, zfcp, qla2xxx, mpt3sas, pm80xx). The major core change is using a sbitmap instead of an atomic for queue tracking" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (412 commits) scsi: target: tcm_fc: Fix a kernel-doc header scsi: target: Shorten ALUA error messages scsi: target: Fix two format specifiers scsi: target: Compare explicitly with SAM_STAT_GOOD scsi: sd: Introduce a new local variable in sd_check_events() scsi: dc395x: Open-code status_byte(u8) calls scsi: 53c700: Open-code status_byte(u8) calls scsi: smartpqi: Remove unused functions scsi: qla4xxx: Remove an unused function scsi: myrs: Remove unused functions scsi: myrb: Remove unused functions scsi: mpt3sas: Fix two kernel-doc headers scsi: fcoe: Suppress a compiler warning scsi: libfc: Fix a format specifier scsi: aacraid: Remove an unused function scsi: core: Introduce enum scsi_disposition scsi: core: Modify the scsi_send_eh_cmnd() return value for the SDEV_BLOCK case scsi: core: Rename scsi_softirq_done() into scsi_complete() scsi: core: Remove an incorrect comment scsi: core: Make the scsi_alloc_sgtables() documentation more accurate ...
2021-04-28Merge tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - MD changes via Song: - raid5 POWER fix - raid1 failure fix - UAF fix for md cluster - mddev_find_or_alloc() clean up - Fix NULL pointer deref with external bitmap - Performance improvement for raid10 discard requests - Fix missing information of /proc/mdstat - rsxx const qualifier removal (Arnd) - Expose allocated brd pages (Calvin) - rnbd via Gioh Kim: - Change maintainer - Change domain address of maintainers' email - Add polling IO mode and document update - Fix memory leak and some bug detected by static code analysis tools - Code refactoring - Series of floppy cleanups/fixes (Denis) - s390 dasd fixes (Julian) - kerneldoc fixes (Lee) - null_blk double free (Lv) - null_blk virtual boundary addition (Max) - Remove xsysace driver (Michal) - umem driver removal (Davidlohr) - ataflop fixes (Dan) - Revalidate disk removal (Christoph) - Bounce buffer cleanups (Christoph) - Mark lightnvm as deprecated (Christoph) - mtip32xx init cleanups (Shixin) - Various fixes (Tian, Gustavo, Coly, Yang, Zhang, Zhiqiang) * tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block: (143 commits) async_xor: increase src_offs when dropping destination page drivers/block/null_blk/main: Fix a double free in null_init. md/raid1: properly indicate failure when ending a failed write request md-cluster: fix use-after-free issue when removing rdev nvme: introduce generic per-namespace chardev nvme: cleanup nvme_configure_apst nvme: do not try to reconfigure APST when the controller is not live nvme: add 'kato' sysfs attribute nvme: sanitize KATO setting nvmet: avoid queuing keep-alive timer if it is disabled brd: expose number of allocated pages in debugfs ataflop: fix off by one in ataflop_probe() ataflop: potential out of bounds in do_format() drbd: Fix fall-through warnings for Clang block/rnbd: Use strscpy instead of strlcpy block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name block/rnbd-clt: Remove max_segment_size block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev Documentation/ABI/rnbd-clt: Add description for nr_poll_queues ...
2021-04-27Merge tag 'cfi-v5.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull CFI on arm64 support from Kees Cook: "This builds on last cycle's LTO work, and allows the arm64 kernels to be built with Clang's Control Flow Integrity feature. This feature has happily lived in Android kernels for almost 3 years[1], so I'm excited to have it ready for upstream. The wide diffstat is mainly due to the treewide fixing of mismatched list_sort prototypes. Other things in core kernel are to address various CFI corner cases. The largest code portion is the CFI runtime implementation itself (which will be shared by all architectures implementing support for CFI). The arm64 pieces are Acked by arm64 maintainers rather than coming through the arm64 tree since carrying this tree over there was going to be awkward. CFI support for x86 is still under development, but is pretty close. There are a handful of corner cases on x86 that need some improvements to Clang and objtool, but otherwise works well. Summary: - Clean up list_sort prototypes (Sami Tolvanen) - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)" * tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: allow CONFIG_CFI_CLANG to be selected KVM: arm64: Disable CFI for nVHE arm64: ftrace: use function_nocfi for ftrace_call arm64: add __nocfi to __apply_alternatives arm64: add __nocfi to functions that jump to a physical address arm64: use function_nocfi with __pa_symbol arm64: implement function_nocfi psci: use function_nocfi for cpu_resume lkdtm: use function_nocfi treewide: Change list_sort to use const pointers bpf: disable CFI in dispatcher functions kallsyms: strip ThinLTO hashes from static functions kthread: use WARN_ON_FUNCTION_MISMATCH workqueue: use WARN_ON_FUNCTION_MISMATCH module: ensure __cfi_check alignment mm: add generic function_nocfi macro cfi: add __cficanonical add support for Clang CFI
2021-04-20block/rnbd-clt: Remove max_segment_sizeJack Wang
We always map with SZ_4K, so do not need max_segment_size. Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-18-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-20block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_evGioh Kim
struct rtrs_srv is not used when handling rnbd_srv_rdma_ev messages, so cleaned up rdma_ev function pointer in rtrs_srv_ops also is changed. Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-16-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-20block/rnbd-clt: Support polling mode for IO latency optimizationGioh Kim
RNBD can make double-queues for irq-mode and poll-mode. For example, on 4-CPU system 8 request-queues are created, 4 for irq-mode and 4 for poll-mode. If the IO has HIPRI flag, the block-layer will call .poll function of RNBD. Then IO is sent to the poll-mode queue. Add optional nr_poll_queues argument for map_devices interface. To support polling of RNBD, RTRS client creates connections for both of irq-mode and direct-poll-mode. For example, on 4-CPU system it could've create 5 connections: con[0] => user message (softirq cq) con[1:4] => softirq cq After this patch, it can create 9 connections: con[0] => user message (softirq cq) con[1:4] => softirq cq con[5:8] => DIRECT-POLL cq Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-14-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-20block/rnbd-clt: Replace {NO_WAIT,WAIT} with RTRS_PERMIT_{WAIT,NOWAIT}Gioh Kim
They are defined with the same value and similar meaning, let's remove one of them, then we can remove {WAIT,NOWAIT}. Also change the type of 'wait' from 'int' to 'enum wait_type' to make it clear. Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-9-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: MAINTAINERS - keep Chandrasekar drivers/net/ethernet/mellanox/mlx5/core/en_main.c - simple fix + trust the code re-added to param.c in -next is fine include/linux/bpf.h - trivial include/linux/ethtool.h - trivial, fix kdoc while at it include/linux/skmsg.h - move to relevant place in tcp.c, comment re-wrapped net/core/skmsg.c - add the sk = sk // sk = NULL around calls net/tipc/crypto.c - trivial Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09Merge branch 'mlx5-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next 2021-04-09 This pr contains changes from mlx5-next branch, already reviewed on netdev and rdma mailing lists, links below. 1) From Leon, Dynamically assign MSI-X vectors count Already Acked by Bjorn Helgaas. https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/ 2) Cleanup series: https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/ From Mark, E-Switch cleanups and refactoring, and the addition of single FDB mode needed HW bits. From Mikhael, Remove unused struct field From Saeed, Cleanup W=1 prototype warning From Zheng, Esw related cleanup From Tariq, User order-0 page allocation for EQs * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks net/mlx5: Dynamically assign MSI-X vectors count net/mlx5: Add dynamic MSI-X capabilities bits PCI/IOV: Add sysfs MSI-X vector assignment interface net/mlx5: Use order-0 allocations for EQs net/mlx5: Add IFC bits needed for single FDB mode net/mlx5: E-Switch, Refactor send to vport to be more generic RDMA/mlx5: Use representor E-Switch when getting netdev and metadata net/mlx5: E-Switch, Add eswitch pointer to each representor net/mlx5: E-Switch, Add match on vhca id to default send rules net/mlx5: Remove unused mlx5_core_health member recover_work net/mlx5: simplify the return expression of mlx5_esw_offloads_pair() net/mlx5: Cleanup prototype warning ==================== Link: https://lore.kernel.org/r/20210409200704.10886-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-08treewide: Change list_sort to use const pointersSami Tolvanen
list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-04-08RDMA/addr: Be strict with gid sizeLeon Romanovsky
The nla_len() is less than or equal to 16. If it's less than 16 then end of the "gid" buffer is uninitialized. Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://lore.kernel.org/r/20210405074434.264221-1-leon@kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07RDMA/qedr: Fix kernel panic when trying to access recv_cqKamal Heib
As INI QP does not require a recv_cq, avoid the following null pointer dereference by checking if the qp_type is not INI before trying to extract the recv_cq. BUG: kernel NULL pointer dereference, address: 00000000000000e0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 54250 Comm: mpitests-IMB-MP Not tainted 5.12.0-rc5 #1 Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.7.0 08/19/2019 RIP: 0010:qedr_create_qp+0x378/0x820 [qedr] Code: 02 00 00 50 e8 29 d4 a9 d1 48 83 c4 18 e9 65 fe ff ff 48 8b 53 10 48 8b 43 18 44 8b 82 e0 00 00 00 45 85 c0 0f 84 10 74 00 00 <8b> b8 e0 00 00 00 85 ff 0f 85 50 fd ff ff e9 fd 73 00 00 48 8d bd RSP: 0018:ffff9c8f056f7a70 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff9c8f056f7b58 RCX: 0000000000000009 RDX: ffff8c41a9744c00 RSI: ffff9c8f056f7b58 RDI: ffff8c41c0dfa280 RBP: ffff8c41c0dfa280 R08: 0000000000000002 R09: 0000000000000001 R10: 0000000000000000 R11: ffff8c41e06fc608 R12: ffff8c4194052000 R13: 0000000000000000 R14: ffff8c4191546070 R15: ffff8c41c0dfa280 FS: 00007f78b2787b80(0000) GS:ffff8c43a3200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000e0 CR3: 00000001011d6002 CR4: 00000000001706f0 Call Trace: ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x4e4/0xb90 [ib_uverbs] ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs] ib_uverbs_run_method+0x6f6/0x7a0 [ib_uverbs] ? ib_uverbs_handler_UVERBS_METHOD_QP_DESTROY+0x70/0x70 [ib_uverbs] ? __cond_resched+0x15/0x30 ? __kmalloc+0x5a/0x440 ib_uverbs_cmd_verbs+0x195/0x360 [ib_uverbs] ? xa_load+0x6e/0x90 ? cred_has_capability+0x7c/0x130 ? avc_has_extended_perms+0x17f/0x440 ? vma_link+0xae/0xb0 ? vma_set_page_prot+0x2a/0x60 ? mmap_region+0x298/0x6c0 ? do_mmap+0x373/0x520 ? selinux_file_ioctl+0x17f/0x220 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs] __x64_sys_ioctl+0x84/0xc0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f78b120262b Fixes: 06e8d1df46ed ("RDMA/qedr: Add support for user mode XRC-SRQ's") Link: https://lore.kernel.org/r/20210404125501.154789-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOSMike Marciniszyn
A panic can result when AIP is enabled: BUG: unable to handle kernel NULL pointer dereference at 000000000000000 PGD 0 P4D 0 Oops: 0000 1 SMP PTI CPU: 70 PID: 981 Comm: systemd-udevd Tainted: G OE --------- - - 4.18.0-240.el8.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.01.01.0005.101720141054 10/17/2014 RIP: 0010:__bitmap_and+0x1b/0x70 RSP: 0018:ffff99aa0845f9f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8d5a6fc18000 RCX: 0000000000000048 RDX: 0000000000000000 RSI: ffffffffc06336f0 RDI: ffff8d5a8fa67750 RBP: 0000000000000079 R08: 0000000fffffffff R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffc06336f0 R13: 00000000000000a0 R14: ffff8d5a6fc18000 R15: 0000000000000003 FS: 00007fec137a5980(0000) GS:ffff8d5a9fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000a04b48002 CR4: 00000000001606e0 Call Trace: hfi1_num_netdev_contexts+0x7c/0x110 [hfi1] hfi1_init_dd+0xd7f/0x1a90 [hfi1] ? pci_bus_read_config_dword+0x49/0x70 ? pci_mmcfg_read+0x3e/0xe0 do_init_one.isra.18+0x336/0x640 [hfi1] local_pci_probe+0x41/0x90 pci_device_probe+0x105/0x1c0 really_probe+0x212/0x440 driver_probe_device+0x49/0xc0 device_driver_attach+0x50/0x60 __driver_attach+0x61/0x130 ? device_driver_attach+0x60/0x60 bus_for_each_dev+0x77/0xc0 ? klist_add_tail+0x3b/0x70 bus_add_driver+0x14d/0x1e0 ? dev_init+0x10b/0x10b [hfi1] driver_register+0x6b/0xb0 ? dev_init+0x10b/0x10b [hfi1] hfi1_mod_init+0x1e6/0x20a [hfi1] do_one_initcall+0x46/0x1c3 ? free_unref_page_commit+0x91/0x100 ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x140/0x1c0 do_init_module+0x5a/0x220 load_module+0x14b4/0x17e0 ? __do_sys_finit_module+0xa8/0x110 __do_sys_finit_module+0xa8/0x110 do_syscall_64+0x5b/0x1a0 The issue happens when pcibus_to_node() returns NO_NUMA_NODE. Fix this issue by moving the initialization of dd->node to hfi1_devdata allocation and remove the other pcibus_to_node() calls in the probe path and use dd->node instead. Affinity logic is adjusted to use a new field dd->affinity_entry as a guard instead of dd->node. Fixes: 4730f4a6c6b2 ("IB/hfi1: Activate the dummy netdev") Link: https://lore.kernel.org/r/1617025700-31865-4-git-send-email-dennis.dalessandro@cornelisnetworks.com Cc: stable@vger.kernel.org Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-07RDMA/cxgb4: check for ipv6 address properly while destroying listenerPotnuri Bharat Teja
ipv6 bit is wrongly set by the below which causes fatal adapter lookup engine errors for ipv4 connections while destroying a listener. Fix it to properly check the local address for ipv6. Fixes: 3408be145a5d ("RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server") Link: https://lore.kernel.org/r/20210331135715.30072-1-bharat@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session filesMd Haris Iqbal
KASAN detected the following BUG: BUG: KASAN: use-after-free in rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client] Read of size 8 at addr ffff88bf2fb4adc0 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.4.84-pserver #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10 Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 Call Trace: <IRQ> dump_stack+0x96/0xe0 print_address_description.constprop.4+0x1f/0x300 ? irq_work_claim+0x2e/0x50 __kasan_report.cold.8+0x78/0x92 ? rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client] kasan_report+0x10/0x20 rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client] rtrs_clt_rdma_done+0xb1/0x760 [rtrs_client] ? lockdep_hardirqs_on+0x1a8/0x290 ? process_io_rsp+0xb0/0xb0 [rtrs_client] ? mlx4_ib_destroy_cq+0x100/0x100 [mlx4_ib] ? add_interrupt_randomness+0x1a2/0x340 __ib_process_cq+0x97/0x100 [ib_core] ib_poll_handler+0x41/0xb0 [ib_core] irq_poll_softirq+0xe0/0x260 __do_softirq+0x127/0x672 irq_exit+0xd1/0xe0 do_IRQ+0xa3/0x1d0 common_interrupt+0xf/0xf </IRQ> RIP: 0010:cpuidle_enter_state+0xea/0x780 Code: 31 ff e8 99 48 47 ff 80 7c 24 08 00 74 12 9c 58 f6 c4 02 0f 85 53 05 00 00 31 ff e8 b0 6f 53 ff e8 ab 4f 5e ff fb 8b 44 24 04 <85> c0 0f 89 f3 01 00 00 48 8d 7b 14 e8 65 1e 77 ff c7 43 14 00 00 RSP: 0018:ffffffffab007d58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffca RAX: 0000000000000002 RBX: ffff88b803d69800 RCX: ffffffffa91a8298 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffffffffab021414 RBP: ffffffffab6329e0 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 R13: 000000bf39d82466 R14: ffffffffab632aa0 R15: ffffffffab632ae0 ? lockdep_hardirqs_on+0x1a8/0x290 ? cpuidle_enter_state+0xe5/0x780 cpuidle_enter+0x3c/0x60 do_idle+0x2fb/0x390 ? arch_cpu_idle_exit+0x40/0x40 ? schedule+0x94/0x120 cpu_startup_entry+0x19/0x1b start_kernel+0x5da/0x61b ? thread_stack_cache_init+0x6/0x6 ? load_ucode_amd_bsp+0x6f/0xc4 ? init_amd_microcode+0xa6/0xa6 ? x86_family+0x5/0x20 ? load_ucode_bsp+0x182/0x1fd secondary_startup_64+0xa4/0xb0 Allocated by task 5730: save_stack+0x19/0x80 __kasan_kmalloc.constprop.9+0xc1/0xd0 kmem_cache_alloc_trace+0x15b/0x350 alloc_sess+0xf4/0x570 [rtrs_client] rtrs_clt_open+0x3b4/0x780 [rtrs_client] find_and_get_or_create_sess+0x649/0x9d0 [rnbd_client] rnbd_clt_map_device+0xd7/0xf50 [rnbd_client] rnbd_clt_map_device_store+0x4ee/0x970 [rnbd_client] kernfs_fop_write+0x141/0x240 vfs_write+0xf3/0x280 ksys_write+0xba/0x150 do_syscall_64+0x68/0x270 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5822: save_stack+0x19/0x80 __kasan_slab_free+0x125/0x170 kfree+0xe7/0x3f0 kobject_put+0xd3/0x240 rtrs_clt_destroy_sess_files+0x3f/0x60 [rtrs_client] rtrs_clt_close+0x3c/0x80 [rtrs_client] close_rtrs+0x45/0x80 [rnbd_client] rnbd_client_exit+0x10f/0x2bd [rnbd_client] __x64_sys_delete_module+0x27b/0x340 do_syscall_64+0x68/0x270 entry_SYSCALL_64_after_hwframe+0x49/0xbe When rtrs_clt_close is triggered, it iterates over all the present rtrs_clt_sess and triggers close on them. However, the call to rtrs_clt_destroy_sess_files is done before the rtrs_clt_close_conns. This is incorrect since during the initialization phase we allocate rtrs_clt_sess first, and then we go ahead and create rtrs_clt_con for it. If we free the rtrs_clt_sess structure before closing the rtrs_clt_con, it may so happen that an inflight IO completion would trigger the function rtrs_clt_rdma_done, which would lead to the above UAF case. Hence close the rtrs_clt_con connections first, and then trigger the destruction of session files. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20210325153308.1214057-12-gi-oh.kim@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Not much going on, just some small bug fixes: - Typo causing a regression in mlx5 devx - Regression in the recent hns rework causing the HW to get out of sync - Long-standing cxgb4 adaptor crash when destroying cm ids" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server RDMA/hns: Fix bug during CMDQ initialization RDMA/mlx5: Fix typo in destroy_mkey inbox
2021-03-25RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening serverPotnuri Bharat Teja
Not setting the ipv6 bit while destroying ipv6 listening servers may result in potential fatal adapter errors due to lookup engine memory hash errors. Therefore always set ipv6 field while destroying ipv6 listening servers. Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Link: https://lore.kernel.org/r/20210324190453.8171-1-bharat@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-22RDMA/hns: Fix bug during CMDQ initializationLang Cheng
When reloading driver, the head/tail pointer of CMDQ may be not at position 0. Then during initialization of CMDQ, if head is reset first, the firmware will start to handle CMDQ because the head is not equal to the tail. The driver can reset tail first since the firmware will be triggerred only by head. This bug is introduced by changing macros of head/tail register without changing the order of initialization. Fixes: 292b3352bd5b ("RDMA/hns: Adjust fields and variables about CMDQ tail/head") Link: https://lore.kernel.org/r/1615602611-7963-1-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-12net/mlx5: E-Switch, Refactor send to vport to be more genericMark Bloch
Now that each representor stores a pointer to the managing E-Switch use that information when creating the send-to-vport rules. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12RDMA/mlx5: Use representor E-Switch when getting netdev and metadataMark Bloch
Now that a pointer to the managing E-Switch is stored in the representor use it. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11RDMA/mlx5: Fix typo in destroy_mkey inboxMark Zhang
Set mkey_index to the correct place when preparing the destroy_mkey inbox, this will fix the following syndrome: mlx5_core 0000:08:00.0: mlx5_cmd_check:782:(pid 980716): DEALLOC_PD(0x801) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x31b635) Link: https://lore.kernel.org/r/20210311142024.1282004-1-leon@kernel.org Fixes: 1368ead04c36 ("RDMA/mlx5: Use strict get/set operations for obj_id") Reviewed-by: Ido Kalir <idok@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-10RDMA/mlx5: Fix timestamp default modeMaor Gottlieb
1. Don't set the ts_format bit to default when it reserved - device is running in the old mode (free running). 2. XRC doesn't have a CQ therefore the ts format in the QP context should be default / free running. 3. Set ts_format to WQ. Fixes: 2fe8d4b87802 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-05RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()Bob Pearson
In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb which triggered a warning at 'done:' and could possibly at 'exit:'. The WARN_ONCE() calls are not actually needed. The call to free_pkt() is moved to the end to clearly show that all skbs are freed. Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()") Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-05RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()Bob Pearson
rxe_rcv_mcast_pkt() dropped a reference to ib_device when no error occurred causing an underflow on the reference counter. This code is cleaned up to be clearer and easier to read. Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()") Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-05RDMA/rxe: Fix missed IB reference counting in loopbackBob Pearson
When the noted patch below extending the reference taken by rxe_get_dev_from_net() in rxe_udp_encap_recv() until each skb is freed it was not matched by a reference in the loopback path resulting in underflows. Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()") Link: https://lore.kernel.org/r/20210304192048.2958-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-04scsi: target: core: Add gfp_t arg to target_cmd_init_cdb()Mike Christie
tcm_loop could be used like a normal block device, so we can't use GFP_KERNEL and should use GFP_NOIO. This adds a gfp_t arg to target_cmd_init_cdb() and converts the users. For every driver but loop GFP_KERNEL is kept. This will also be useful in subsequent patches where loop needs to do target_submit_prep() from interrupt context to get a ref to the se_device, and so it will need to use GFP_ATOMIC. Link: https://lore.kernel.org/r/20210227170006.5077-16-michael.christie@oracle.com Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: target: srpt: Convert to new submission APIMike Christie
target_submit_cmd_map_sgls() is being removed, so convert srpt to the new submission API. srpt uses target_stop_session() to sync session shutdown with LIO core, so we use target_init_cmd()/target_submit_prep()/target_submit(), because target_init_cmd() will detect the target_stop_session() call and return an error. Link: https://lore.kernel.org/r/20210227170006.5077-6-michael.christie@oracle.com Cc: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-03RDMA/uverbs: Fix kernel-doc warning of _uverbs_allocLeon Romanovsky
Fix the following W=1 compilation warning: drivers/infiniband/core/uverbs_ioctl.c:108: warning: expecting prototype for uverbs_alloc(). Prototype was for _uverbs_alloc() instead Fixes: 461bb2eee4e1 ("IB/uverbs: Add a simple allocator to uverbs_attr_bundle") Link: https://lore.kernel.org/r/20210302074214.1054299-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-03RDMA/mlx5: Set correct kernel-doc identifierLeon Romanovsky
The W=1 allmodconfig build produces the following warning: drivers/infiniband/hw/mlx5/odp.c:1086: warning: wrong kernel-doc identifier on line: * Parse a series of data segments for page fault handling. Fix it by changing /** to be /* as it is written in kernel-doc documentation. Fixes: 5e769e444d26 ("RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'") Link: https://lore.kernel.org/r/20210302074214.1054299-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-01IB/mlx5: Add missing error codeYueHaibing
Set err to -ENOMEM if kzalloc fails instead of 0. Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") Link: https://lore.kernel.org/r/20210222122343.19720-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-01RDMA/rxe: Fix missing kconfig dependency on CRYPTOJulian Braha
When RDMA_RXE is enabled and CRYPTO is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for CRYPTO_CRC32 Depends on [n]: CRYPTO [=n] Selected by [y]: - RDMA_RXE [=y] && (INFINIBAND_USER_ACCESS [=y] || !INFINIBAND_USER_ACCESS [=y]) && INET [=y] && PCI [=y] && INFINIBAND [=y] && INFINIBAND_VIRT_DMA [=y] This is because RDMA_RXE selects CRYPTO_CRC32, without depending on or selecting CRYPTO, despite that config option being subordinate to CRYPTO. Fixes: cee2688e3cd6 ("IB/rxe: Offload CRC calculation when possible") Signed-off-by: Julian Braha <julianbraha@gmail.com> Link: https://lore.kernel.org/r/21525878.NYvzQUHefP@ubuntu-mate-laptop Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-01RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_repSaeed Mahameed
ib_send_cm_sidr_rep() { spin_lock_irqsave() cm_send_sidr_rep_locked() { ... spin_lock_irq() .... spin_unlock_irq() <--- this will enable interrupts } spin_unlock_irqrestore() } spin_unlock_irqrestore() expects interrupts to be disabled but the internal spin_unlock_irq() will always enable hard interrupts. Fix this by replacing the internal spin_{lock,unlock}_irq() with irqsave/restore variants. It fixes the following kernel trace: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20 Call Trace: _raw_spin_unlock_irqrestore+0x4e/0x50 ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm] cma_send_sidr_rep+0xa1/0x160 [rdma_cm] rdma_accept+0x25e/0x350 [rdma_cm] ucma_accept+0x132/0x1cc [rdma_ucm] ucma_write+0xbf/0x140 [rdma_ucm] vfs_write+0xc1/0x340 ksys_write+0xb3/0xe0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 87c4c774cbef ("RDMA/cm: Protect access to remote_sidr_table") Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.org Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-18Merge tag 'v5.11' into rdma.git for-nextJason Gunthorpe
Linux 5.11 Merged to resolve conflicts with RDMA rc commits - drivers/infiniband/sw/rxe/rxe_net.c The final logic is to call rxe_get_dev_from_net() again with the master netdev if the packet was rx'd on a vlan. To keep the elimination of the local variables requires a trivial edit to the code in -rc Link: https://lore.kernel.org/r/20210210131542.215ea67c@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR()Jack Wang
smatch gives the warning: drivers/infiniband/ulp/rtrs/rtrs-srv.c:1805 rtrs_rdma_connect() warn: passing zero to 'PTR_ERR' Which is trying to say smatch has shown that srv is not an error pointer and thus cannot be passed to PTR_ERR. The solution is to move the list_add() down after full initilization of rtrs_srv. To avoid holding the srv_mutex too long, only hold it during the list operation as suggested by Leon. Fixes: 03e9b33a0fd6 ("RDMA/rtrs: Only allow addition of path to an already established session") Link: https://lore.kernel.org/r/20210216143807.65923-1-jinpu.wang@cloud.ionos.com Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodesNicolas Morey-Chaisemartin
The current code computes a number of channels per SRP target and spreads them equally across all online NUMA nodes. Each channel is then assigned a CPU within this node. In the case of unbalanced, or even unpopulated nodes, some channels do not get a CPU associated and thus do not get connected. This causes the SRP connection to fail. This patch solves the issue by rewriting channel computation and allocation: - Drop channel to node/CPU association as it had no real effect on locality but added unnecessary complexity. - Tweak the number of channels allocated to reduce CPU contention when possible: - Up to one channel per CPU (instead of up to 4 by node) - At least 4 channels per node, unless ch_count module parameter is used. Link: https://lore.kernel.org/r/9cb4d9d3-30ad-2276-7eff-e85f7ddfb411@suse.com Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16Merge branch 'mlx5_timestamp' into rdma.git for-nextJason Gunthorpe
Leon Romanovsky says: ==================== Add an extra timestamp format for mlx5_ib device. ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> * branch 'mlx5_timestamp': RDMA/mlx5: Fail QP creation if the device can not support the CQE TS net/mlx5: Add new timestamp mode bits
2021-02-16RDMA/mlx5: Fail QP creation if the device can not support the CQE TSAharon Landau
In ConnectX6Dx device, HW can work in real time timestamp mode according to the device capabilities per RQ/SQ/QP. When the flag IB_UVERBS_CQ_FLAGS_TIMESTAMP_COMPLETION is set, the user expect to get TS on the CQEs in free running format, so we need to fail the QP creation if the current mode of the device doesn't support it. Link: https://lore.kernel.org/r/20210209131107.698833-3-leon@kernel.org Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/mlx5: Allow CQ creation without attached EQsTal Gilboa
The traditional DevX CQ creation flow goes through mlx5_core_create_cq() which checks that the given EQN corresponds to an existing EQ and attaches a devx handler to the EQN for the CQ. In some cases the EQ will not be a kernel EQ, but will be controlled by modify CQ, don't block creating these just because the EQN can't be found in the kernel. Link: https://lore.kernel.org/r/20210211085549.1277674-1-leon@kernel.org Signed-off-by: Tal Gilboa <talgi@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/rtrs-srv-sysfs: fix missing put_deviceGioh Kim
put_device() decreases the ref-count and then the device will be cleaned-up, while at is also add missing put_device in rtrs_srv_create_once_sysfs_root_folders This patch solves a kmemleak error as below: unreferenced object 0xffff88809a7a0710 (size 8): comm "kworker/4:1H", pid 113, jiffies 4295833049 (age 6212.380s) hex dump (first 8 bytes): 62 6c 61 00 6b 6b 6b a5 bla.kkk. backtrace: [<0000000054413611>] kstrdup+0x2e/0x60 [<0000000078e3120a>] kobject_set_name_vargs+0x2f/0xb0 [<00000000f1a17a6b>] dev_set_name+0xab/0xe0 [<00000000d5502e32>] rtrs_srv_create_sess_files+0x2fb/0x314 [rtrs_server] [<00000000ed11a1ef>] rtrs_srv_info_req_done+0x631/0x800 [rtrs_server] [<000000008fc5aa8f>] __ib_process_cq+0x94/0x100 [ib_core] [<00000000a9599cb4>] ib_cq_poll_work+0x32/0xc0 [ib_core] [<00000000cfc376be>] process_one_work+0x4bc/0x980 [<0000000016e5c96a>] worker_thread+0x78/0x5c0 [<00000000c20b8be0>] kthread+0x191/0x1e0 [<000000006c9c0003>] ret_from_fork+0x3a/0x50 Fixes: baa5b28b7a47 ("RDMA/rtrs-srv: Replace device_register with device_initialize and device_add") Link: https://lore.kernel.org/r/20210212134525.103456-5-jinpu.wang@cloud.ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/rtrs-srv: fix memory leak by missing kobject freeGioh Kim
kmemleak reported an error as below: unreferenced object 0xffff8880674b7640 (size 64): comm "kworker/4:1H", pid 113, jiffies 4296403507 (age 507.840s) hex dump (first 32 bytes): 69 70 3a 31 39 32 2e 31 36 38 2e 31 32 32 2e 31 ip:192.168.122.1 31 30 40 69 70 3a 31 39 32 2e 31 36 38 2e 31 32 10@ip:192.168.12 backtrace: [<0000000054413611>] kstrdup+0x2e/0x60 [<0000000078e3120a>] kobject_set_name_vargs+0x2f/0xb0 [<00000000ca2be3ee>] kobject_init_and_add+0xb0/0x120 [<0000000062ba5e78>] rtrs_srv_create_sess_files+0x14c/0x314 [rtrs_server] [<00000000b45b7217>] rtrs_srv_info_req_done+0x5b1/0x800 [rtrs_server] [<000000008fc5aa8f>] __ib_process_cq+0x94/0x100 [ib_core] [<00000000a9599cb4>] ib_cq_poll_work+0x32/0xc0 [ib_core] [<00000000cfc376be>] process_one_work+0x4bc/0x980 [<0000000016e5c96a>] worker_thread+0x78/0x5c0 [<00000000c20b8be0>] kthread+0x191/0x1e0 [<000000006c9c0003>] ret_from_fork+0x3a/0x50 It is caused by the not-freed kobject of rtrs_srv_sess. The kobject embedded in rtrs_srv_sess has ref-counter 2 after calling process_info_req(). Therefore it must call kobject_put twice. Currently it calls kobject_put only once at rtrs_srv_destroy_sess_files because kobject_del removes the state_in_sysfs flag and then kobject_put in free_sess() is not called. This patch moves kobject_del() into free_sess() so that the kobject of rtrs_srv_sess can be freed. And also this patch adds the missing call of sysfs_remove_group() to clean-up the sysfs directory. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210212134525.103456-4-jinpu.wang@cloud.ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/rtrs: Only allow addition of path to an already established sessionMd Haris Iqbal
While adding a path from the client side to an already established session, it was possible to provide the destination IP to a different server. This is dangerous. This commit adds an extra member to the rtrs_msg_conn_req structure, named first_conn; which is supposed to notify if the connection request is the first for that session or not. On the server side, if a session does not exist but the first_conn received inside the rtrs_msg_conn_req structure is 1, the connection request is failed. This signifies that the connection request is for an already existing session, and since the server did not find one, it is an wrong connection request. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210212134525.103456-3-jinpu.wang@cloud.ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/rtrs-srv: Fix stack-out-of-boundsJack Wang
BUG: KASAN: stack-out-of-bounds in _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib] Read of size 4 at addr ffff8880d5a7f980 by task kworker/0:1H/565 CPU: 0 PID: 565 Comm: kworker/0:1H Tainted: G O 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10 Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] Call Trace: dump_stack+0x96/0xe0 print_address_description.constprop.4+0x1f/0x300 ? irq_work_claim+0x2e/0x50 __kasan_report.cold.8+0x78/0x92 ? _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib] kasan_report+0x10/0x20 _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib] ? check_chain_key+0x1d7/0x2e0 ? _mlx4_ib_post_recv+0x630/0x630 [mlx4_ib] ? lockdep_hardirqs_on+0x1a8/0x290 ? stack_depot_save+0x218/0x56e ? do_profile_hits.isra.6.cold.13+0x1d/0x1d ? check_chain_key+0x1d7/0x2e0 ? save_stack+0x4d/0x80 ? save_stack+0x19/0x80 ? __kasan_slab_free+0x125/0x170 ? kfree+0xe7/0x3b0 rdma_write_sg+0x5b0/0x950 [rtrs_server] The problem is when we send imm_wr, the type should be ib_rdma_wr, so hw driver like mlx4 can do rdma_wr(wr), so fix it by use the ib_rdma_wr as type for imm_wr. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210212134525.103456-2-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/rxe: Remove unused pkt->offsetBob Pearson
The pkt->offset field is never used except to assign it to 0. But it adds lots of unneeded code. This patch removes the field and related code. This causes a measurable improvement in performance. Link: https://lore.kernel.org/r/20210211210455.3274-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/ucma: Fix use-after-free bug in ucma_create_ueventAvihai Horon
ucma_process_join() allocates struct ucma_multicast mc and frees it if an error occurs during its run. Specifically, if an error occurs in copy_to_user(), a use-after-free might happen in the following scenario: 1. mc struct is allocated. 2. rdma_join_multicast() is called and succeeds. During its run, cma_iboe_join_multicast() enqueues a work that will later use the aforementioned mc struct. 3. copy_to_user() is called and fails. 4. mc struct is deallocated. 5. The work that was enqueued by cma_iboe_join_multicast() is run and calls ucma_create_uevent() which tries to access mc struct (which is freed by now). Fix this bug by cancelling the work enqueued by cma_iboe_join_multicast(). Since cma_work_handler() frees struct cma_work, we don't use it in cma_iboe_join_multicast() so we can safely cancel the work later. The following syzkaller report revealed it: BUG: KASAN: use-after-free in ucma_create_uevent+0x2dd/0x;3f0 drivers/infiniband/core/ucma.c:272 Read of size 8 at addr ffff88810b3ad110 by task kworker/u8:1/108 CPU: 1 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: rdma_cm cma_work_handler Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xbe/0xf9 lib/dump_stack.c:118 print_address_description.constprop.0+0x3e/0×60 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report.cold+0x1f/0×37 mm/kasan/report.c:562 ucma_create_uevent+0x2dd/0×3f0 drivers/infiniband/core/ucma.c:272 ucma_event_handler+0xb7/0×3c0 drivers/infiniband/core/ucma.c:349 cma_cm_event_handler+0x5d/0×1c0 drivers/infiniband/core/cma.c:1977 cma_work_handler+0xfa/0×190 drivers/infiniband/core/cma.c:2718 process_one_work+0x54c/0×930 kernel/workqueue.c:2272 worker_thread+0x82/0×830 kernel/workqueue.c:2418 kthread+0x1ca/0×220 kernel/kthread.c:292 ret_from_fork+0x1f/0×30 arch/x86/entry/entry_64.S:296 Allocated by task 359: kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:461 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:434 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:664 [inline] ucma_process_join+0x16e/0×3f0 drivers/infiniband/core/ucma.c:1453 ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538 ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724 vfs_write fs/read_write.c:603 [inline] vfs_write+0x191/0×4c0 fs/read_write.c:585 ksys_write+0x1a1/0×1e0 fs/read_write.c:658 do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 359: kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48 kasan_set_track+0x1c/0×30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0×30 mm/kasan/generic.c:355 __kasan_slab_free+0x112/0×160 mm/kasan/common.c:422 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0xb3/0×3e0 mm/slub.c:4124 ucma_process_join+0x22d/0×3f0 drivers/infiniband/core/ucma.c:1497 ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538 ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724 vfs_write fs/read_write.c:603 [inline] vfs_write+0x191/0×4c0 fs/read_write.c:585 ksys_write+0x1a1/0×1e0 fs/read_write.c:658 do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88810b3ad100 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 16 bytes inside of 192-byte region [ffff88810b3ad100, ffff88810b3ad1c0) Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join") Link: https://lore.kernel.org/r/20210211090517.1278415-1-leon@kernel.org Reported-by: Amit Matityahu <mitm@nvidia.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/core: Fix kernel doc warnings for ib_port_immutable_read()Leon Romanovsky
drivers/infiniband/core/device.c:859: warning: Function parameter or member 'dev' not described in 'ib_port_immutable_read' drivers/infiniband/core/device.c:859: warning: Function parameter or member 'port' not described in 'ib_port_immutable_read' Fixes: 7416790e2245 ("RDMA/core: Introduce and use API to read port immutable data") Link: https://lore.kernel.org/r/20210210151421.1108809-1-leon@kernel.org Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/qedr: Use true and false for bool variableJiapeng Chong
Fix the following coccicheck warning: ./drivers/infiniband/hw/qedr/qedr.h:629:9-10: WARNING: return of 0/1 in function 'qedr_qp_has_rq' with return type bool. ./drivers/infiniband/hw/qedr/qedr.h:620:9-10: WARNING: return of 0/1 in function 'qedr_qp_has_sq' with return type bool. Link: https://lore.kernel.org/r/1612949901-109873-1-git-send-email-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot<abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/hns: Adjust definition of FRMR fieldsYixing Liu
FRMR is not well-supported on HIP08, it is re-designed for HIP09 and the position of related fields is changed. Then the ULPs should be forbidden to use FRMR on older hardwares. Link: https://lore.kernel.org/r/1612924424-28217-1-git-send-email-liweihang@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/hns: Refactor process of posting CMDQLang Cheng
Simplify __hns_roce_cmq_send() then remove the redundant variables. Link: https://lore.kernel.org/r/1612688143-28226-6-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/hns: Adjust fields and variables about CMDQ tail/headLang Cheng
The register 0x07014 is actually the head pointer of CMDQ, and 0x07010 means tail pointer. Current definitions are confusing, so rename them and related variables. The next_to_use of structure hns_roce_v2_cmq_ring has the same semantics as head, merge them into one member. The next_to_clean of structure hns_roce_v2_cmq_ring has the same semantics as tail. After deleting next_to_clean, tail should also be deleted. Link: https://lore.kernel.org/r/1612688143-28226-5-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>