summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw
AgeCommit message (Collapse)Author
2016-08-04Merge tag 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford
2016-08-04Soft RoCE driverMoni Shoua
Soft RoCE (RXE) - The software RoCE driver ib_rxe implements the RDMA transport and registers to the RDMA core device as a kernel verbs provider. It also implements the packet IO layer. On the other hand ib_rxe registers to the Linux netdev stack as a udp encapsulating protocol, in that case RDMA, for sending and receiving packets over any Ethernet device. This yields a RDMA transport over the UDP/Ethernet network layer forming a RoCEv2 compatible device. The configuration procedure of the Soft RoCE drivers requires binding to any existing Ethernet network device. This is done with /sys interface. A userspace Soft RoCE library (librxe) provides user applications the ability to run with Soft RoCE devices. The use of rxe verbs ins user space requires the inclusion of librxe as a device specifics plug-in to libibverbs. librxe is packaged separately. Architecture: +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ +---------------------------------------------------------------+ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Soft RoCE resources: [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in Github [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE Wiki page [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03IB/rdmavt: Disable by defaultBart Van Assche
There is a strict policy in the Linux kernel that new drivers must be disabled by default. Hence leave out the "default m" line from Kconfig. Fixes: 0194621b2253 ("IB/rdmavt: Create module framework and handle driver registration") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Jubin John <jubin.john@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: <stable@vger.kernel.org> # v4.6+ Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Eliminate redundant opcode test in mr ref clearIra Weiny
The use of the specific opcode test is redundant since all ack entry users correctly manipulate the mr pointer to selectively trigger the reference clearing. The overly specific test hinders the use of implementation specific operations. The change needs to get rid of the union to insure that an atomic value is not seen as an MR pointer. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabledJianxin Xiong
Hanging has been observed while writing a file over NFSoRDMA. Dmesg on the server contains messages like these: [ 931.992501] svcrdma: Error -22 posting RDMA_READ [ 952.076879] svcrdma: Error -22 posting RDMA_READ [ 982.154127] svcrdma: Error -22 posting RDMA_READ [ 1012.235884] svcrdma: Error -22 posting RDMA_READ [ 1042.319194] svcrdma: Error -22 posting RDMA_READ Here is why: With the base memory management extension enabled, FRMR is used instead of FMR. The xprtrdma server issues each RDMA read request as the following bundle: (1)IB_WR_REG_MR, signaled; (2)IB_WR_RDMA_READ, signaled; (3)IB_WR_LOCAL_INV, signaled & fencing. These requests are signaled. In order to generate completion, the fast register work request is processed by the hfi1 send engine after being posted to the work queue, and the corresponding lkey is not valid until the request is processed. However, the rdmavt driver validates lkey when the RDMA read request is posted and thus it fails immediately with error -EINVAL (-22). This patch changes the work flow of local operations (fast register and local invalidate) so that fast register work requests are always processed immediately to ensure that the corresponding lkey is valid when subsequent work requests are posted. Local invalidate requests are processed immediately if fencing is not required and no previous local invalidate request is pending. To allow completion generation for signaled local operations that have been processed before posting to the work queue, an internal send flag RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag and only generates completion for such requests. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/hfi1: Add the capability for reserved operationsMike Marciniszyn
This fix allows for support of in-kernel reserved operations without impacting the ULP user. The low level driver can register a non-zero value which will be transparently added to the send queue size and hidden from the ULP in every respect. ULP post sends will never see a full queue due to a reserved post send and reserved operations will never exceed that registered value. The s_avail will continue to track the ULP swqe availability and the difference between the reserved value and the reserved in use will track reserved availabity. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Add missing spin_lock_init call for rdi->n_cqs_lockJianxin Xiong
This fixes the following warning with PROV_LOCKING enabled kernel: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 15 PID: 12286 Comm: modprobe Not tainted 4.7.0-rc5.prove_rcu+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, ...... Call Trace: [<ffffffff8139ec0d>] dump_stack+0x85/0xc8 [<ffffffff810eb765>] register_lock_class+0x415/0x4b0 [<ffffffff810ede1c>] ? __lock_acquire+0x40c/0x1960 [<ffffffff810edaa9>] __lock_acquire+0x99/0x1960 [<ffffffff8120ab62>] ? find_vmap_area+0x42/0x60 [<ffffffff8120ab39>] ? find_vmap_area+0x19/0x60 [<ffffffff810ef9d3>] lock_acquire+0xd3/0x200 [<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffff81763391>] _raw_spin_lock+0x31/0x40 [<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffffa049d598>] rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffff810ead46>] ? static_obj+0x36/0x50 [<ffffffffa0469e39>] ib_alloc_cq+0x49/0x180 [ib_core] [<ffffffffa047bed4>] ib_mad_init_device+0x204/0x6d0 [ib_core] [<ffffffff810e968f>] ? up_write+0x1f/0x40 [<ffffffffa046e2c0>] ib_register_device+0x3d0/0x510 [ib_core] [<ffffffffa0752410>] ? read_cc_setting_bin+0x200/0x200 [hfi1] [<ffffffff810ead46>] ? static_obj+0x36/0x50 [<ffffffff810eb888>] ? lockdep_init_map+0x88/0x200 [<ffffffffa049cbff>] rvt_register_device+0x17f/0x320 [rdmavt] [<ffffffffa0766caa>] hfi1_register_ib_device+0x6ca/0x7c0 [hfi1] [<ffffffffa0733de4>] init_one+0x2b4/0x430 [hfi1] [<ffffffff813e40a5>] local_pci_probe+0x45/0xa0 [<ffffffff813e5110>] ? pci_match_device+0xe0/0x110 [<ffffffff813e550c>] pci_device_probe+0xfc/0x140 [<ffffffff814daee9>] driver_probe_device+0x239/0x460 [<ffffffff814db1dd>] __driver_attach+0xcd/0xf0 [<ffffffff814db110>] ? driver_probe_device+0x460/0x460 [<ffffffff814d89b3>] bus_for_each_dev+0x73/0xc0 [<ffffffff814da74e>] driver_attach+0x1e/0x20 [<ffffffff814da1b3>] bus_add_driver+0x1d3/0x290 [<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1] [<ffffffff814dbf60>] driver_register+0x60/0xe0 [<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1] [<ffffffff813e39d0>] __pci_register_driver+0x60/0x70 [<ffffffffa04cc2aa>] hfi1_mod_init+0x196/0x1fe [hfi1] [<ffffffff81002190>] do_one_initcall+0x50/0x190 [<ffffffff8110be72>] ? rcu_read_lock_sched_held+0x62/0x70 [<ffffffff8122d4aa>] ? kmem_cache_alloc_trace+0x23a/0x2a0 [<ffffffff811c1881>] ? do_init_module+0x27/0x1dc [<ffffffff811c18ba>] do_init_module+0x60/0x1dc [<ffffffff811360cc>] load_module+0x132c/0x1ac0 [<ffffffff81132c40>] ? __symbol_put+0x60/0x60 [<ffffffff8133e50d>] ? ima_post_read_file+0x3d/0x80 Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Handle local operations in post sendJianxin Xiong
Some work requests are local operations, such as IB_WR_REG_MR and IB_WR_LOCAL_INV. They differ from non-local operations in that: (1) Local operations can be processed immediately without being posted to the send queue if neither fencing nor completion generation is needed. However, to ensure correct ordering, once a local operation is posted to the work queue due to fencing or completion requiement, all subsequent local operations must also be posted to the work queue until all the local operations on the work queue have completed. (2) Local operations don't send packets over the wire and thus don't need (and shouldn't update) the packet sequence numbers. Define a new a flag bit for the post send table to identify local operations. Add a new field to the QP structure to track the number of local operations on the send queue to determine if direct processing of new local operations should be enabled/disabled. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Add mechanism to invalidate MR keysJianxin Xiong
In order to support extended memory management, add the mechanism to invalidate MR keys. This includes a flag "lkey_invalid" in the MR data structure that is to be checked when validating access to the MR via the associated key, and two utility functions to perform fast memory registration and memory key invalidate operations. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Add support for ib_map_mr_sgJianxin Xiong
This implements the device specific function needed by the verbs API function ib_map_mr_sg(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Use new driver specific post send tableMike Marciniszyn
Change rvt_post_one_wr to use the new table mechanism for post send. Validate that each low level driver specifies the table. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Add data structures and routines for table driven post sendMike Marciniszyn
Add flexibility for driver dependent operations in post send because different drivers will have differing post send operation support. This includes data structure definitions to support a table driven scheme along with the necessary validation routine using the new table. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/rdmavt: Correct qp_priv_alloc() return value testMike Marciniszyn
The current drivers return errors from this calldown wrapped in an ERR_PTR(). The rdmavt code incorrectly tests for NULL. The code is fixed to use IS_ERR() and change ret according to the driver return value. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qpAshutosh Dixit
Since rvt_reset_qp already zero's out qp->s_ack_queue head and tail pointers, there is no need to zero out qp->s_ack_queue itself. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-17IB/rdmavt: Correct warning during QPN allocationBrian Welty
Correct calculation of the low order bits which should be unset based on use of qos_shift parameter when assigning QPN. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-17IB/rdmavt: Correct required callback functions for MODIFY_QPBrian Welty
Functions required for MODIFY_PORT were incorrectly being required for MODIFY_QP. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/rdmavt: Annotate rvt_reset_qp()Bart Van Assche
This patch avoids that sparse reports the following warning: rdmavt/qp.c:507:17: warning: context imbalance in 'rvt_reset_qp' - unexpected unlock Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/rdamvt: Fix rdmavt s_ack_queue sizingMike Marciniszyn
rdmavt allows the driver to specify the size of the ack queue, but only uses it for the modify QP limit testing for setting the atomic limit value. The driver dependent size is now used to size the s_ack_queue ring dynamicially. Since the driver knows its size, the driver will use its define for any ring size dependent code. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/rdmavt: Use kzalloc_nodeJubin John
Use kzalloc_node instead of kzalloc for rdmavt memory region segment allocation to optimize for performance on NUMA platforms. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/rdmavt: Insure QP vmalloc variants zero memoryMike Marciniszyn
The usage of the various vmalloc APIs do not consistently zero memory when allocating the swqe. Insure zeroing variants are used. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/rdmavt: Increase CQ callback thread priorityMike Marciniszyn
The priority of the send engines is higher than the CQ completion thread potentially causing completions to be starved for very fast interfaces. Change the CQ kthread to match the send engine threads to minimize this delay for ULP completion processing. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/hfi1: Use global defines for upper bits in opcodeMike Marciniszyn
The awkward coding for setting the allowed_ops field was tripping an smatch warning. This patch uses the more appropriate defines from include/rdma to avoid the issue. As part of the patch remove a mask that was duplicated in rdmavt include files and use that mask as appropriate. Fixes: 8bea6b1cfe6f ("IB/rdmavt: Add create queue pair functionality") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/rdmavt,hfi1,qib: Fix memory leakJubin John
rdi->ports has memory allocated in rvt_alloc_device(), but does not get freed because the hfi1 and qib drivers drivers call ib_dealloc_device() directly instead of going through rdmavt. Add a rvt_dealloc_device() that frees rdi->ports and then calls ib_dealloc_device(). Switch hfi1 and qib drivers to calling rvt_dealloc_device() instead of ib_dealloc_device() directly. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/rdmavt: Fix send schedulingJubin John
call_send is used to determine whether to send immediately or schedule a send for later. The current logic in rdmavt is inverted and has a negative impact on the latency of the hfi1 and qib drivers. Fix this regression by correctly calling send immediately when call_send is set. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/rdmavt: Post receive for QP in ERR stateAlex Estrin
Accordingly IB Spec post WR to receive queue must complete with error if QP is in Error state. Please refer to C10-42, C10-97.2.1 Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Report pid in qp_stats to aid debugMike Marciniszyn
Tracking user/QP ownership is needed to debug issues with user ULPs like OpenMPI. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Check lkey_table_size value before useJubin John
The lkey_table_size driver specific parameter value is used before its value is sanity checked and restricted to RVT_MAX_LKEY_TABLE_BITS. This causes a vmalloc allocation failure for large values. Fix this by moving the value check before the first usage of the value. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdamvt: fix cross build with rdmavtMike Marciniszyn
The new check routine causes a larger than supported frame size on s390. Changing the check routine to noinline fixes the issue. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/qib, staging/rdma/hfi1, IB/rdmavt: progress selection changesMike Marciniszyn
The non-rdamvt versions of qib and hfi1 allow for a differing heuristic to override a schedule progress in favor of a direct call the the progress routine. This patch adds that for both drivers and rdmavt. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Remove unnecessary exported functionsDennis Dalessandro
Remove exported functions which are no longer required as the functionality has moved into rdmavt. This also requires re-ordering some of the functions since their prototype no longer appears in a header file. Rather than add forward declarations it is just cleaner to re-order some of the functions. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Remove RVT_FLAGsDennis Dalessandro
While hfi1 and qib were still supporting bits and pieces of core verbs components there needed to be a way to convey if rdmavt should handle allocation and initialize of resources like the queue pair table. Now that all of this is moved into rdmavt there is no need for these flags. They are no longer used in the drivers. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add per verb driver callback checkingDennis Dalessandro
For each verb validate that all requirements for driver callbacks are met. If a function is called without checking for a valid pointer, it is a required function. Also document what each callback function does. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Clean up comments and add more documentationDennis Dalessandro
Add, remove, and otherwise clean up existing comments that are leftover from the initial code postings of rdmavt. Many of the comments were added to provide an idea on the direction we were thinking of going. Now that the design is solidified make a pass over and clean everything up. Also add details where lacking. Ensure all non static functions have nano comments. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add trace and error print statements in post_one_wrHarish Chegondi
These trace and error print statements would help in debugging issues which are caused due to messed up QP ring buffer pointers. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/qib, staging/rdma/hfi1: add s_hlock for use in post sendMike Marciniszyn
This patch adds an additional lock to reduce contention on the s_lock. This lock is used in post_send() so that the post_send is not serialized with the send engine and other send related processing. To do this the s_next_psn is now maintained on post_send() while post_send() related fields are moved to a new cache line. There is an s_avail maintained for the post_send() to mitigate trading cache lines with the send engine. The lock is released/acquired around releasing the just built packet to the egress mechanism. Reviewed-by: Jubin John <jubin.john@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt, staging/rdma/hfi1: use qps to dynamically scale timeout valueVennila Megavannan
A busy_jiffies variable is maintained and updated when rc qps are created and deleted. busy_jiffies is a scaled value of the number of rc qps in the device. busy_jiffies is incremented every rc qp scaling interval. busy_jiffies is added to the rc timeout in add_retry_timer and mod_retry_timer. The rc qp scaling interval is selected based on extensive performance evaluation of targeted workloads. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10staging/rdma/hfi1: use new RNR timerMike Marciniszyn
Use the new RNR timer for hfi1. For qib, this timer doesn't exist, so exploit driver callbacks to use the new timer as appropriate. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10staging/rdma/hfi1: Remove modify queue pair from hfi1Dennis Dalessandro
In addition to removing the modify queue pair verb from hfi1 we also remove ancillary functions which existed only for modify queue pair and are also already present in hfi1. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10staging/rdma/hfi1: Clean up return handlingDennis Dalessandro
Return directly from rvt_resize_cq rather than use a goto/label. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Properly pass gfp to hw driver functionIra Weiny
alloc_qpn must use GFP and the hardware drivers should use it as well. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add support for query_port, modify_port and get_port_immutableHarish Chegondi
rvt_query_port calls into the driver through a call back function query_port_state to populate the rest of ib_port_attr elements. rvt_modify_port calls into the driver if needed through a call back function shut_down_port() Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add query gid support.Dennis Dalessandro
Addin query gid support. Rdmavt still relies on the driver to maintain the gid table. Rdmavt simply calls into the driver to retrive the guid for a particular port. Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Clean up distinction between port number and indexDennis Dalessandro
IB core uses 1 relative indexing for ports. All of our data structures use 0 based indexing. Add an inline function that we can use whenever we need to validate a legal value and try to convert a port number to a port index at the entrance into rdmavt. Try to follow the policy that when we are talking about a port from IB core point of view we refer to it as a port number. When port is an index into our arrays refer to it as a port index. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add Mem affinity supportMitko Haralanov
Change verbs memory allocations to the device numa node. This keeps memory close to the device for optimal performance. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add hardware driver send work request checkIra Weiny
Some hardware drivers requires additional checks on send WRs. Create an optional call back to allow hardware drivers to reject a send WR. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add srq functionality to rdmavtJubin John
Fill in srq function stubs with code derived from hfi1 and qib. Move necessary functions and data structure members as well. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add support for rvt_query_qpHarish Chegondi
Drivers using rdmavt can rely on rvt_query_qp instead of defining their own query_qp functions. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Fix copyright dateDennis Dalessandro
Update all files added by rdmavt which do not yet have 2016 as the copyright year. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>