summaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp
AgeCommit message (Collapse)Author
2016-10-09Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
2016-10-07IB/srp: Fix infinite loop when FMR sg[0].offset != 0Bart Van Assche
Avoid that mapping an sg-list in which the first element has a non-zero offset triggers an infinite loop when using FMR. This patch makes the FMR mapping code similar to that of ib_sg_to_pages(). Note: older Mellanox HCAs do not support non-zero offsets for FMR. See also commit 8c4037b501ac ("IB/srp: always avoid non-zero offsets into an FMR"). Reported-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/srp: Remove an unused argumentBart Van Assche
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07ipoib: Make ipoib_warn ratelimitedkernel@kyup.com
In certain cases it's possible to be flooded by warning messages. To cope with such situations make the ipoib_warn macro be ratelimited. To prevent accidental limiting of legitimate, bursty messages make the limit fairly liberal by allowing up to 100 messages in 10 seconds. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/ipoib_verbs: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "wq" queues mulitple work items viz &priv->restart_task, &priv->cm.rx_reap_task, &priv->cm.skb_task, &priv->neigh_reap_task, &priv->ah_reap_task, &priv->mcast_task and &priv->carrier_on_task. The work items require strict execution ordering. Hence, an ordered dedicated workqueue has been used. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/ipoib: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar
alloc_ordered_workqueue() replaces deprecated create_singlethread_workqueue(). The workqueue "ipoib_workqueue" that is used for all flush operations for the device. WQ_MEM_RECLAIM has been set since the flush operations may need to complete in order for other network functions to continue, and the memory reclaim operation might need the network functioning in order to make progress. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/srp: use IB_PD_UNSAFE_GLOBAL_RKEYChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/iser: use IB_PD_UNSAFE_GLOBAL_RKEYChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/ipoib: Don't allow MC joins during light MC flushAlex Vesker
This fix solves a race between light flush and on the fly joins. Light flush doesn't set the device to down and unset IPOIB_OPER_UP flag, this means that if while flushing we have a MC join in progress and the QP was attached to BC MGID we can have a mismatches when re-attaching a QP to the BC MGID. The light flush would set the broadcast group to NULL causing an on the fly join to rejoin and reattach to the BC MCG as well as adding the BC MGID to the multicast list. The flush process would later on remove the BC MGID and detach it from the QP. On the next flush the BC MGID is present in the multicast list but not found when trying to detach it because of the previous double attach and single detach. [18332.714265] ------------[ cut here ]------------ [18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core] ... [18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [18332.779411] 0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000 [18332.784960] 0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300 [18332.790547] ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280 [18332.796199] Call Trace: [18332.798015] [<ffffffff813fed47>] dump_stack+0x63/0x8c [18332.801831] [<ffffffff8109add1>] __warn+0xd1/0xf0 [18332.805403] [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20 [18332.809706] [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core] [18332.814384] [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib] [18332.820031] [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib] [18332.825220] [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib] [18332.830290] [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib] [18332.834911] [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0 [18332.839741] [<ffffffff81772bd1>] rollback_registered+0x31/0x40 [18332.844091] [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80 [18332.848880] [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib] [18332.853848] [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib] [18332.858474] [<ffffffff81520c08>] dev_attr_store+0x18/0x30 [18332.862510] [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50 [18332.866349] [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170 [18332.870471] [<ffffffff81207198>] __vfs_write+0x28/0xe0 [18332.874152] [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50 [18332.878274] [<ffffffff81208062>] vfs_write+0xa2/0x1a0 [18332.881896] [<ffffffff812093a6>] SyS_write+0x46/0xa0 [18332.885632] [<ffffffff810039b7>] do_syscall_64+0x57/0xb0 [18332.889709] [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25 [18332.894727] ---[ end trace 09ebbe31f831ef17 ]--- Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/ipoib: Fix memory corruption in ipoib cm mode connect flowErez Shitrit
When a new CM connection is being requested, ipoib driver copies data from the path pointer in the CM/tx object, the path object might be invalid at the point and memory corruption will happened later when now the CM driver will try using that data. The next scenario demonstrates it: neigh_add_path --> ipoib_cm_create_tx --> queue_work (pointer to path is in the cm/tx struct) #while the work is still in the queue, #the port goes down and causes the ipoib_flush_paths: ipoib_flush_paths --> path_free --> kfree(path) #at this point the work scheduled starts. ipoib_cm_tx_start --> copy from the (invalid)path pointer: (memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);) -> memory corruption. To fix that the driver now starts the CM/tx connection only if that specific path exists in the general paths database. This check is protected with the relevant locks, and uses the gid from the neigh member in the CM/tx object which is valid according to the ref count that was taken by the CM/tx. Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/isert: Properly release resources on DEVICE_REMOVALRaju Rangoju
When the low level driver exercises the hot unplug they would call rdma_cm cma_remove_one which would fire DEVICE_REMOVAL event to all cma consumers. Now, if consumer doesn't make sure they destroy all IB objects created on that IB device instance prior to finalizing all processing of DEVICE_REMOVAL callback, rdma_cm will let the lld to de-register with IB core and destroy the IB device instance. And if the consumer calls (say) ib_dereg_mr(), it will crash since that dev object is NULL. In the current implementation, iser-target just initiates the cleanup and returns from DEVICE_REMOVAL callback. This deferred work creates a race between iser-target cleaning IB objects(say MR) and lld destroying IB device instance. This patch includes the following fixes -> make sure that consumer frees all IB objects associated with device instance -> return non-zero from the callback to destroy the rdma_cm id Signed-off-by: Raju Rangoju <rajur@chelsio.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-24IB/srpt: Update sport->port_guid with each port refreshDoug Ledford
If port_guid is set with the default subnet_prefix, then we get a change event and run a port refresh, we don't update the port_guid. As a result, attempts to create a target device that uses the new subnet_prefix in the wwn will fail to find a match and be rejected by the ib_srpt driver. This makes it impossible to configure a port if it was initialized with a default subnet_prefix and later changed to any non-default subnet-prefix. Updating the port refresh task to always update the wwn based upon the current subnext_prefix solves this problem. Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: nab@linux-iscsi.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/isert: fix error return code in isert_alloc_login_buf()Wei Yongjun
Fix to return error code -ENOMEM from the alloc error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge tag 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford
2016-08-03IB/ipoib: Report SG feature regardless of HW UD CSUM capabilityYuval Shaia
Decouple SG support from HW ability to do UD checksum. This coupling is for historical reasons and removed with 'commit ec5f06156423 ("net: Kill link between CSUM and SG features.")' During driver load it is assumed that device does not supports SG. The final decision is taken after creating UD QP based on device capability. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/isert: Remove an unused member variableBart Van Assche
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/srpt: Simplify srpt_queue_response()Bart Van Assche
Initialize first_wr to &send_wr. This allows to remove a ternary operator and an else branch. This patch does not change the behavior of srpt_queue_response(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/srpt: Limit the number of SG elements per work requestBart Van Assche
Limit the number of SG elements per work request to what the HCA and the queue pair support. Fixes: 34693573fde0 ("IB/srpt: Reduce QP buffer size") Reported-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> #v4.7+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/ipoib: Use new device FW version stringIra Weiny
Using this allows for devices to specify the format of their firmware version rather than forcing a format. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/srpt: Reduce QP buffer sizeBart Van Assche
The memory needed for the send and receive queues associated with a QP is proportional to the max_sge parameter. The current value of that parameter is such that with an mlx4 HCA the QP buffer size is 8 MB. Since DMA is used for communication between HCA and CPU that buffer either has to be allocated coherently or map_single() must succeed for that buffer. Since large contiguous allocations are fragile and since the maximum segment size for e.g. swiotlb is 256 KB, reduce the max_sge parameter. This patch avoids that the following text appears on the console after SRP logout and relogin on a system equipped with multiple IB HCAs: mlx4_core 0000:05:00.0: swiotlb buffer is full (sz: 8388608 bytes) swiotlb: coherent allocation failed for device 0000:05:00.0 size=8388608 CPU: 11 PID: 148 Comm: kworker/11:1 Not tainted 4.7.0-rc4-dbg+ #1 Call Trace: [<ffffffff812c6d35>] dump_stack+0x67/0x92 [<ffffffff812efe71>] swiotlb_alloc_coherent+0x141/0x150 [<ffffffff810458be>] x86_swiotlb_alloc_coherent+0x3e/0x50 [<ffffffffa03861fa>] mlx4_buf_direct_alloc.isra.5+0x9a/0x120 [mlx4_core] [<ffffffffa0386545>] mlx4_buf_alloc+0x165/0x1a0 [mlx4_core] [<ffffffffa035053d>] create_qp_common.isra.29+0x57d/0xff0 [mlx4_ib] [<ffffffffa03510da>] mlx4_ib_create_qp+0x12a/0x3f0 [mlx4_ib] [<ffffffffa031154a>] ib_create_qp+0x3a/0x250 [ib_core] [<ffffffffa055dd4b>] srpt_cm_handler+0x4bb/0xcad [ib_srpt] [<ffffffffa02c1ab0>] cm_process_work+0x20/0xf0 [ib_cm] [<ffffffffa02c3640>] cm_work_handler+0x1ac0/0x2059 [ib_cm] [<ffffffff810737ed>] process_one_work+0x19d/0x490 [<ffffffff81073b29>] worker_thread+0x49/0x490 [<ffffffff8107a0ea>] kthread+0xea/0x100 [<ffffffff815b25af>] ret_from_fork+0x1f/0x40 Fixes: b99f8e4d7bcd ("IB/srpt: convert to the generic RDMA READ/WRITE API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Don't update neigh validity for unresolved entriesErez Shitrit
ipoib_neigh_get unconditionally updates the "alive" variable member on any packet send. This prevents the neighbor garbage collection from cleaning out a dead neighbor entry if we are still queueing packets for it. If the queue for this neighbor is full, then don't update the alive timestamp. That way the neighbor can time out even if packets are still being queued as long as none of them are being sent. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Disable bottom half when dealing with device addressMark Bloch
Align locking usage when touching device address with rest of the kernel. Lock the bottom half when doing so using netif_addr_lock_bh. This also solves the following case as reported by lockdep: CPU0 CPU1 ---- ---- lock(_xmit_INFINIBAND); local_irq_disable(); lock(&(&mc->mca_lock)->rlock); lock(_xmit_INFINIBAND); <Interrupt> lock(&(&mc->mca_lock)->rlock); *** DEADLOCK *** Fixes: 492a7e67ff83 ("IB/IPoIB: Allow setting the device address") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Fix race between ipoib_remove_one to sysfs functionsErez Shitrit
In ipoib_remove_one the driver holds the rtnl_lock and tries to do some operation like dev_change_flags or unregister_netdev, while sysfs callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to free its sysfs directory, and we will get a->b, b->a deadlock. Trace like the following: schedule+0x37/0x80 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0xb5/0x120 mutex_lock+0x23/0x40 rtnl_lock+0x15/0x20 netdev_run_todo+0x17c/0x320 rtnl_unlock+0xe/0x10 ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib] delete_child+0x54/0x80 [ib_ipoib] dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 mutex_lock+0x16/0x40 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x16/0x75 And schedule+0x37/0x80 __kernfs_remove+0x1a8/0x260 ? wake_atomic_t_function+0x60/0x60 kernfs_remove+0x25/0x40 sysfs_remove_dir+0x50/0x80 kobject_del+0x18/0x50 device_del+0x19f/0x260 netdev_unregister_kobject+0x6a/0x80 rollback_registered_many+0x1fd/0x340 rollback_registered+0x3c/0x70 unregister_netdevice_queue+0x55/0xc0 unregister_netdev+0x20/0x30 ipoib_remove_one+0x114/0x1b0 [ib_ipoib] ib_unregister_client+0x4a/0x170 [ib_core] ? find_module_all+0x71/0xa0 ipoib_cleanup_module+0x10/0x94 [ib_ipoib] SyS_delete_module+0x1b5/0x210 entry_SYSCALL_64_fastpath+0x16/0x75 The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to get out from the sysfs function. Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/srp: Fix srp_map_sg_dma()Bart Van Assche
Because patch "IB/srp: Move common code into the caller" was applied partially srp_map_sg_dma() doesn't work properly. Fix this by applying the remainder of that patch. See also http://thread.gmane.org/gmane.linux.drivers.rdma/35803/focus=35811. Fixes: 3849e44d1c4b ("IB/srp: Move common code into the caller") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Sagi Grimberg <sai@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: Laurence Oberman <loberman@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/srp: Always initialize use_fast_reg and use_fmrBart Van Assche
Avoid that mapping fails due to use_fast_reg != 0 or use_fmr != 0 if both member variables should be zero (if never_register == 1 or if neither FMR nor FR is supported). Remove an initialization that became superfluous due to changing a kmalloc() into a kzalloc() call. Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures") Cc: Sagi Grimberg <sai@grimberg.m> Cc: Christoph Hellwig <hch@lst.de> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-28Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "Here are the outstanding target pending updates for v4.7-rc1. The highlights this round include: - Allow external PR/ALUA metadata path be defined at runtime via top level configfs attribute (Lee) - Fix target session shutdown bug for ib_srpt multi-channel (hch) - Make TFO close_session() and shutdown_session() optional (hch) - Drop se_sess->sess_kref + convert tcm_qla2xxx to internal kref (hch) - Add tcm_qla2xxx endpoint attribute for basic FC jammer (Laurence) - Refactor iscsi-target RX/TX PDU encode/decode into common code (Varun) - Extend iscsit_transport with xmit_pdu, release_cmd, get_rx_pdu, validate_parameters, and get_r2t_ttt for generic ISO offload (Varun) - Initial merge of cxgb iscsi-segment offload target driver (Varun) The bulk of the changes are Chelsio's new driver, along with a number of iscsi-target common code improvements made by Varun + Co along the way" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (29 commits) iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race cxgbit: Use type ISCSI_CXGBIT + cxgbit tpg_np attribute iscsi-target: Convert transport drivers to signal rdma_shutdown iscsi-target: Make iscsi_tpg_np driver show/store use generic code tcm_qla2xxx Add SCSI command jammer/discard capability iscsi-target: graceful disconnect on invalid mapping to iovec target: need_to_release is always false, remove redundant check and kfree target: remove sess_kref and ->shutdown_session iscsi-target: remove usage of ->shutdown_session tcm_qla2xxx: introduce a private sess_kref target: make close_session optional target: make ->shutdown_session optional target: remove acl_stop target: consolidate and fix session shutdown cxgbit: add files for cxgbit.ko iscsi-target: export symbols iscsi-target: call complete on conn_logout_comp iscsi-target: clear tx_thread_active iscsi-target: add new offload transport type iscsi-target: use conn_transport->transport_type in text rsp ...
2016-05-28Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "This is the second group of code for the 4.7 merge window. It looks large, but only in one sense. I'll get to that in a minute. The list of changes here breaks down as follows: - Dynamic counter infrastructure in the IB drivers This is a sysfs based code to allow free form access to the hardware counters RDMA devices might support so drivers don't need to code this up repeatedly themselves - SendOnlyFullMember multicast support - IB router support - A couple misc fixes - The big item on the list: hfi1 driver updates, plus moving the hfi1 driver out of staging There was a group of 15 patches in the hfi1 list that I thought I had in the first pull request but they weren't. So that added to the length of the hfi1 section here. As far as these go, everything but the hfi1 is pretty straight forward. The hfi1 is, if you recall, the driver that Al had complaints about how it used the write/writev interfaces in an overloaded fashion. The write portion of their interface behaved like the write handler in the IB stack proper and did bi-directional communications. The writev interface, on the other hand, only accepts SDMA request structures. The completions for those structures are sent back via an entirely different event mechanism. With the security patch, we put security checks on the write interface, however, we also knew they would be going away soon. Now, we've converted the write handler in the hfi1 driver to use ioctls from the IB reserved magic area for its bidirectional communications. With that change, Intel has addressed all of the items originally on their TODO when they went into staging (as well as many items added to the list later). As such, I moved them out, and since they were the last item in the staging/rdma directory, and I don't have immediate plans to use the staging area again, I removed the staging/rdma area. Because of the move out of staging, as well as a series of 5 patches in the hfi1 driver that removed code people thought should be done in a different way and was optional to begin with (a snoop debug interface, an eeprom driver for an eeprom connected directory to their hfi1 chip and not via an i2c bus, and a few other things like that), the line count, especially the removal count, is high" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (56 commits) staging/rdma: Remove the entire rdma subdirectory of staging IB/core: Make device counter infrastructure dynamic IB/hfi1: Fix pio map initialization IB/hfi1: Correct 8051 link parameter settings IB/hfi1: Update pkey table properly after link down or FM start IB/rdamvt: Fix rdmavt s_ack_queue sizing IB/rdmavt: Max atomic value should be a u8 IB/hfi1: Fix hard lockup due to not using save/restore spin lock IB/hfi1: Add tracing support for send with invalidate opcode IB/hfi1, qib: Add ieth to the packet header definitions IB/hfi1: Move driver out of staging IB/hfi1: Do not free hfi1 cdev parent structure early IB/hfi1: Add trace message in user IOCTL handling IB/hfi1: Remove write(), use ioctl() for user cmds IB/hfi1: Add ioctl() interface for user commands IB/hfi1: Remove unused user command IB/hfi1: Remove snoop/diag interface IB/hfi1: Remove EPROM functionality from data device IB/hfi1: Remove UI char device IB/hfi1: Remove multiple device cdev ...
2016-05-25IB/IPoIB: Allow setting the device addressMark Bloch
In IB networks, and specifically in IPoIB/rdmacm traffic, the device address of an IPoIB interface is used as a means to exchange information between nodes needed for communication. Currently an IPoIB interface will always be created with a device address based on its node GUID without a way to change that. This change adds the ability to set the device address of an IPoIB interface by value. We use the set mac address ndo to do that. The flow should be broken down to two: 1) The GID value is already in the GID table, in this case the interface will be able to set carrier up. 2) The GID value is not yet in the GID table, in this case the interface won't try to join the multicast group and will wait (listen on GID_CHANGE event) until the GID is inserted. In order to track those changes, we add a new flag: * IPOIB_FLAG_DEV_ADDR_SET. When set, it means the dev_addr is a based on a value in the gid table. this bit will be cleared upon a dev_addr change triggered by the user and set after validation. Per IB spec the port GUID can't change if the module is loaded. port GUID is the basis for GID at index 0 which is the basis for the default device address of a ipoib interface. The issue is that there are devices that don't follow the spec, they change the port GUID while HCA is powered on, so in order not to break userspace applications. We need to check if the user wanted to control the device address and we assume that if he sets the device address back to be based on GID index 0, he no longer wishs to control it. In order to track this, we add an additional flag: * IPOIB_FLAG_DEV_ADDR_CTRL When setting the device address, there is no validation of the upper twelve bytes of the device address (flags, qpn, subnet prefix) as those bytes are not under the control of the user. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-25IB/ipoib: Support SendOnlyFullMember MCG for SendOnly joinErez Shitrit
Check (via an SA query) if the SM supports the new option for SendOnly multicast joins. If the SM supports that option it will use the new join state to create such multicast group. If SendOnlyFullMember is supported, we wouldn't use faked FullMember state join for SendOnly MCG, use the correct state if supported. This check is performed at every invocation of mcast_restart task, to be sure that the driver stays in sync with the current state of the SM. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-25IB/core: Introduce capabilitymask2 field in ClassPortInfo madErez Shitrit
Change struct ib_class_port_info to conform to IB Spec 1.3 That in order to get specific capability mask from ClassPortInfo mad. >From the IB Spec, ClassPortInfo section: "CapabilityMask2 Bits 0-26: Additional class-specific capabilities... RespTimeValue the rest 5 bits" The new struct now has one field for capabilitymask2 (previously was the reserved field) and the resp_time field. And it fixes up qib and srpt, use of the field repurposed to be used as capabilitymask2: IB/qib: Change pma_get_classportinfo IB/srpt: Adjust the use of ib_class_port_info Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-20Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Primary 4.7 merge window changes - Updates to the new Intel X722 iWARP driver - Updates to the hfi1 driver - Fixes for the iw_cxgb4 driver - Misc core fixes - Generic RDMA READ/WRITE API addition - SRP updates - Misc ipoib updates - Minor mlx5 updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (148 commits) IB/mlx5: Fire the CQ completion handler from tasklet net/mlx5_core: Use tasklet for user-space CQ completion events IB/core: Do not require CAP_NET_ADMIN for packet sniffing IB/mlx4: Fix unaligned access in send_reply_to_slave IB/mlx5: Report Scatter FCS device capability when supported IB/mlx5: Add Scatter FCS support for Raw Packet QP IB/core: Add Scatter FCS create flag IB/core: Add Raw Scatter FCS device capability IB/core: Add extended device capability flags i40iw: pass hw_stats by reference rather than by value i40iw: Remove unnecessary synchronize_irq() before free_irq() i40iw: constify i40iw_vf_cqp_ops structure IB/mlx5: Add UARs write-combining and non-cached mapping IB/mlx5: Allow mapping the free running counter on PROT_EXEC IB/mlx4: Use list_for_each_entry_safe IB/SA: Use correct free function IB/core: Fix a potential array overrun in CMA and SA agent IB/core: Remove unnecessary check in ibnl_rcv_msg IB/IWPM: Fix a potential skb leak RDMA/nes: replace custom print_hex_dump() ...
2016-05-18Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "First round of SCSI updates for the 4.6+ merge window. This batch includes the usual quota of driver updates (bnx2fc, mp3sas, hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas). There's also a multiqueue update for scsi_debug, assorted bug fixes and a few other minor updates (refactor of scsi_sg_pools into generic code, alua and VPD updates, and struct timeval conversions)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (138 commits) mpt3sas: Used "synchronize_irq()"API to synchronize timed-out IO & TMs mpt3sas: Set maximum transfer length per IO to 4MB for VDs mpt3sas: Updating mpt3sas driver version to 13.100.00.00 mpt3sas: Fix initial Reference tag field for 4K PI drives. mpt3sas: Handle active cable exception event mpt3sas: Update MPI header to 2.00.42 Revert "lpfc: Delete unnecessary checks before the function call mempool_destroy" eata_pio: missing break statement hpsa: Fix type ZBC conditional checks scsi_lib: Decode T10 vendor IDs scsi_dh_alua: do not fail for unknown VPD identification scsi_debug: use locally assigned naa scsi_debug: uuid for lu name scsi_debug: vpd and mode page work scsi_debug: add multiple queue support bfa: fix bfa_fcb_itnim_alloc() error handling megaraid_sas: Downgrade two success messages to info cxlflash: Fix to resolve dead-lock during EEH recovery scsi_debug: rework resp_report_luns scsi_debug: use pdt constants ...
2016-05-16iscsi-target: Convert transport drivers to signal rdma_shutdownNicholas Bellinger
Instead of special casing the handful of callers that check for iser-target rdma verbs specific shutdown, use a simple flag at iscsit_transport->rdma_shutdown so each driver can signal this. Also, update iscsi-target/tcp + cxgbit to rdma_shutdown = false. Cc: Varun Prakash <varun@chelsio.com> Cc: Hariprasad Shenai <hariprasad@chelsio.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2016-05-13Merge branches 'cxgb4-2', 'i40iw-2', 'ipoib', 'misc-4.7' and 'mlx5-fcs' into ↵Doug Ledford
k.o/for-4.7
2016-05-13IB/srp: Fix a debug kernel crashBart Van Assche
Avoid that the following BUG() is triggered against a debug kernel: kernel BUG at include/linux/scatterlist.h:92! RIP: 0010:[<ffffffffa0467199>] [<ffffffffa0467199>] srp_map_idb+0x199/0x1a0 [ib_srp] Call Trace: [<ffffffffa04685fa>] srp_map_data+0x84a/0x890 [ib_srp] [<ffffffffa0469674>] srp_queuecommand+0x1e4/0x610 [ib_srp] [<ffffffff813f5a5e>] scsi_dispatch_cmd+0x9e/0x180 [<ffffffff813f8b07>] scsi_request_fn+0x477/0x610 [<ffffffff81298ffe>] __blk_run_queue+0x2e/0x40 [<ffffffff81299070>] blk_delay_work+0x20/0x30 [<ffffffff81071f07>] process_one_work+0x197/0x480 [<ffffffff81072239>] worker_thread+0x49/0x490 [<ffffffff810787ea>] kthread+0xea/0x100 [<ffffffff8159b632>] ret_from_fork+0x22/0x40 Fixes: f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v4.4+ Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/ipoib: Add readout of statistics using ethtoolHans Westgaard Ry
IPoIB collects statistics of traffic including number of packets sent/received, number of bytes transferred, and certain errors. This patch makes these statistics available to be queried by ethtool. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13infiniband/ulp/ipoib: remove pkey_mutexSebastian Andrzej Siewior
The last user of pkey_mutex was removed in db84f8803759 ("IB/ipoib: Use P_Key change event instead of P_Key polling mechanism") but the lock remained. This patch removes it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Do not register memory if never_register has been setBart Van Assche
This makes it easier to test the code path that does not use memory registration (srp_map_sg_dma()). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Prevent mapping failuresBart Van Assche
If both max_sectors and the queue_depth are high enough it can happen that the MR pool is depleted temporarily. This causes the SRP initiator to report mapping failures. Although the SRP initiator recovers from such mapping failures, prevent that this can happen by allocating more memory regions. Additionally, only enable memory registration if at least two pages can be registered per memory region. Reported-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Swap two code blocks in srp_add_one()Bart Van Assche
This patch does not change any functionality but makes the next patch in this series easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Enhance ib_map_mr_sg()Bart Van Assche
The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Fix srp_create_target() error handlingBart Van Assche
Avoid that the following kernel oops occurs if memory pool allocation fails: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa048d0a0>] ib_drain_rq+0x0/0x20 [ib_core] Call Trace: [<ffffffffa04af386>] srp_create_target+0xca6/0x13a9 [ib_srp] [<ffffffff813cc863>] dev_attr_store+0x13/0x20 [<ffffffff81214b50>] sysfs_kf_write+0x40/0x50 [<ffffffff81213f1c>] kernfs_fop_write+0x13c/0x180 [<ffffffff81197683>] __vfs_write+0x23/0xf0 [<ffffffff81198744>] vfs_write+0xa4/0x1a0 [<ffffffff81199a44>] SyS_write+0x44/0xa0 [<ffffffff8159e3e9>] entry_SYSCALL_64_fastpath+0x1c/0xac Fixes: 1dc7b1f10dcb ("IB/srp: use the new CQ API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: <stable@vger.kernel.org> # v4.5+ Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Fix a memory descriptor leak in an error pathBart Van Assche
If an error occurs after srp_fr_pool_get() succeeded and before the descriptor is stored in srp_map_state (*state->fr.next++ = desc) then srp_unmap_data() won't free the newly allocated memory descriptor. Hence free the descriptor explicitly. Fixes: f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Sagi Grimberg <sai@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srp: Print "ib_srp: " prefix onceBart Van Assche
pr_debug() already prints prefix PFX. Avoid that PFX is printed twice if the debug statement in srp_add_target() is enabled. Fixes: 34aa654ecb8e ("IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/isert: convert to the generic RDMA READ/WRITE APIChristoph Hellwig
Replace the homegrown RDMA READ/WRITE code in isert with the generic API, which also adds iWarp support to the I/O path as a side effect. Note that full iWarp operation will need a few additional patches from Steve. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/srpt: convert to the generic RDMA READ/WRITE APIChristoph Hellwig
Replace the homegrown RDMA READ/WRITE code in srpt with the generic API. The only real twist here is that we need to allocate one Linux scatterlist per direct buffer in the SRP command, and chain them before handing them off to the target core. As a side-effect of the conversion the driver will also chain the SEND of the SRP response to the RDMA WRITE WRs for a DATA OUT command, and properly account for RDMA WRITE WRs instead of just for RDMA READ WRs like the driver previously did. We now allocate half of the SQ size to RDMA READ/WRITE contexts, assuming by default one RDMA READ or WRITE operation per command. If a command has multiple operations it will eat into the budget but will still succeed, possible after waiting for WQEs to be available. Also ensure the QPs request the maximum allowed SGEs so that RDMA R/W API works correctly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Add passing an offset into the SG to ib_map_mr_sgChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>