summaryrefslogtreecommitdiff
path: root/net/rds/ib_recv.c
AgeCommit message (Collapse)Author
2015-09-09Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...
2015-08-30rds/ib: Remove ib_get_dma_mr callsJason Gunthorpe
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-25RDS: fix the dangling reference to rds_ib_incoming_slabsantosh.shilimkar@oracle.com
On rds_ib_frag_slab allocation failure, ensure rds_ib_incoming_slab is not pointing to the detsroyed memory. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25rds: Fix improper gfp_t usage.David S. Miller
>> net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base types) net/rds/ib_recv.c:382:28: expected int [signed] can_wait net/rds/ib_recv.c:382:28: got restricted gfp_t net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64 Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: make sure we post recv bufferssantosh.shilimkar@oracle.com
If we get an ENOMEM during rds_ib_recv_refill, we might never come back and refill again later. Patch makes sure to kick krdsd into helping out. To achieve this we add RDS_RECV_REFILL flag and update in the refill path based on that so that at least some therad will keep posting receive buffers. Since krdsd and softirq both might race for refill, we decide to schedule on work queue based on ring_low instead of ring_empty. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: always free recv frag as we free its ring entrysantosh.shilimkar@oracle.com
We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which indicates that the recv refill path was finding allocated frags in ring entries that were marked free. These were usually followed by OOM crashes. They only seem to be occurring in the presence of completion errors and connection resets. This patch ensures that we free the frag as we mark the ring entry free. This should stop the refill path from finding allocated frags in ring entries that were marked free. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-18RDS: Switch to generic logging helpersSagi Grimberg
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2014-11-24rds: switch ->inc_copy_to_user() to passing iov_iterAl Viro
instances get considerably simpler from that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-18arch: Mass conversion of smp_mb__*()Peter Zijlstra
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-17net: rds: fix per-cpu helper usageGerald Schaefer
commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu handling for rds. chpfirst is the result of __this_cpu_read(), so it is an absolute pointer and not __percpu. Therefore, __this_cpu_write() should not operate on chpfirst, but rather on cache->percpu->first, just like __this_cpu_read() did before. Cc: <stable@vger.kernel.org> # 3.8+ Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26IB/rds: Correct ib_api use with gs_dma_address/sg_dma_lenMarciniszyn, Mike
0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs") added uses of sg_dma_len() and sg_dma_address(). This makes RDS DOA with the qib driver. IB ulps should use ib_sg_dma_len() and ib_sg_dma_address respectively since some HCAs overload ib_sg_dma* operations. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19net: rds: use this_cpu_* per-cpu helperShan Wei
Signed-off-by: Shan Wei <davidshan@tencent.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-21Merge branch 'kmap_atomic' of git://github.com/congwang/linuxLinus Torvalds
Pull kmap_atomic cleanup from Cong Wang. It's been in -next for a long time, and it gets rid of the (no longer used) second argument to k[un]map_atomic(). Fix up a few trivial conflicts in various drivers, and do an "evil merge" to catch some new uses that have come in since Cong's tree. * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits) feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename] drbd: remove the second argument of k[un]map_atomic() zcache: remove the second argument of k[un]map_atomic() gma500: remove the second argument of k[un]map_atomic() dm: remove the second argument of k[un]map_atomic() tomoyo: remove the second argument of k[un]map_atomic() sunrpc: remove the second argument of k[un]map_atomic() rds: remove the second argument of k[un]map_atomic() net: remove the second argument of k[un]map_atomic() mm: remove the second argument of k[un]map_atomic() lib: remove the second argument of k[un]map_atomic() power: remove the second argument of k[un]map_atomic() kdb: remove the second argument of k[un]map_atomic() udf: remove the second argument of k[un]map_atomic() ubifs: remove the second argument of k[un]map_atomic() squashfs: remove the second argument of k[un]map_atomic() reiserfs: remove the second argument of k[un]map_atomic() ocfs2: remove the second argument of k[un]map_atomic() ntfs: remove the second argument of k[un]map_atomic() ...
2012-03-20rds: remove the second argument of k[un]map_atomic()Cong Wang
Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com>
2012-02-09rds: Fix typo in iw_recv.c and ib_recv.cMasanari Iida
Correct spelling "inclue" to "include" in net/rds/iw_recv.c and net/rds/ib_recv.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-09-08RDS/IB: print string constants in more placesZach Brown
This prints the constant identifier for work completion status and rdma cm event types, like we already do for IB event types. A core string array helper is added that each string type uses. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS: properly use sg_init_tableChris Mason
This is only needed to keep debugging code from bugging. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-09-08RDS: remove __init and __exit annotationZach Brown
The trivial amount of memory saved isn't worth the cost of dealing with section mismatches. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: Use SLAB_HWCACHE_ALIGN flag for kmem_cache_create()Andy Grover
We are *definitely* counting cycles as closely as DaveM, so ensure hwcache alignment for our recv ring control structs. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: always process recv completionsZach Brown
The recv refill path was leaking fragments because the recv event handler had marked a ring element as free without freeing its frag. This was happening because it wasn't processing receives when the conn wasn't marked up or connecting, as can be the case if it races with rmmod. Two observations support always processing receives in the callback. First, buildup should only post receives, thus triggering recv event handler calls, once it has built up all the state to handle them. Teardown should destroy the CQ and drain the ring before tearing down the state needed to process recvs. Both appear to be true today. Second, this test was fundamentally racy. There is nothing to stop rmmod and connection destruction from swooping in the moment after the conn state was sampled but before real receive procesing starts. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: Make ib_recv_refill return voidAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: use friendly gfp masks for prefillChris Mason
When prefilling the rds frags, we end up doing a lot of allocations. We're not in atomic context here, and so there's no reason to dip into atomic reserves. This changes the prefills to use masks that allow waiting. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-09-08RDS/IB: Add caching of frags and incsChris Mason
This patch is based heavily on an initial patch by Chris Mason. Instead of freeing slab memory and pages, it keeps them, and funnels them back to be reused. The lock minimization strategy uses xchg and cmpxchg atomic ops for manipulation of pointers to list heads. We anchor the lists with a pointer to a list_head struct instead of a static list_head struct. We just have to carefully use the existing primitives with the difference between a pointer and a static head struct. For example, 'list_empty()' means that our anchor pointer points to a list with a single item instead of meaning that our static head element doesn't point to any list items. Original patch by Chris, with significant mods and fixes by Andy and Zach. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: Remove ib_recv_unmap_page()Andy Grover
All it does is call unmap_sg(), so just call that directly. The comment above unmap_page also may be incorrect, so we shouldn't hold on to it, either. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Assume recv->r_frag is always NULL in refill_one()Andy Grover
refill_one() should never be called on a recv struct that doesn't need a new r_frag allocated. Add a WARN and remove conditional around r_frag alloc code. Also, add a comment to explain why r_ibinc may or may not need refilling. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Use page_remainder_alloc() for recv bufsAndy Grover
Instead of splitting up a page into RDS_FRAG_SIZE chunks ourselves, ask rds_page_remainder_alloc() to do it. While it is possible PAGE_SIZE > FRAG_SIZE, on x86en it isn't, so having duplicate "carve up a page into buffers" code seems excessive. The other modification this spawns is the use of a single struct scatterlist in rds_page_frag instead of a bare page ptr. This causes verbosity to increase in some places, and decrease in others. Finally, I decided to unify the lifetimes and alloc/free of rds_page_frag and its page. This is a nice simplification in itself, but will be extra-nice once we come to adding cmason's recycling patch. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08rds: don't let RDS shutdown a connection while senders are presentChris Mason
This is the first in a long line of patches that tries to fix races between RDS connection shutdown and RDS traffic. Here we are maintaining a count of active senders to make sure the connection doesn't go away while they are using it. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-09-08RDS: Refill recv ring directly from taskletAndy Grover
Performance is better if we use allocations that don't block to refill the receive ring. Since the whole reason we were kicking out to the worker thread was so we could do blocking allocs, we no longer need to do this. Remove gfp params from rds_ib_recv_refill(); we always use GFP_NOWAIT. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: Remove ib_[header/data]_sge() functionsAndy Grover
These functions were to cope with differently ordered sg entries depending on RDS 3.0 or 3.1+. Now that we've dropped 3.0 compatibility we no longer need them. Also, modify usage sites for these to refer to sge[0] or [1] directly. Reorder code to initialize header sgs first. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: Disallow connections less than RDS 3.1Andy Grover
RDS 3.0 connections (in OFED 1.3 and earlier) put the header at the end. 3.1 connections put it at the head. The code has significant added complexity in order to handle both configurations. In OFED 1.6 we can drop this and simplify the code by only supporting "header-first" configuration. This patch checks the protocol version, and if prior to 3.1, does not complete the connection. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: inc_purge() transport function unused - remove itAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisonsAndy Grover
Favor "if (foo)" style over "if (foo != NULL)". Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-04-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/stmmac/stmmac_main.c drivers/net/wireless/wl12xx/wl1271_cmd.c drivers/net/wireless/wl12xx/wl1271_main.c drivers/net/wireless/wl12xx/wl1271_spi.c net/core/ethtool.c net/mac80211/scan.c
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-16RDS: Do not BUG() on error returned from ib_post_sendAndy Grover
BUGging on a runtime error code should be avoided. This patch also eliminates all other BUG()s that have no real reason to exist. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29net: Move && and || to end of previous lineJoe Perches
Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30RDS/IB+IW: Move recv processing to a taskletAndy Grover
Move receive processing from event handler to a tasklet. This should help prevent hangcheck timer from going off when RDS is under heavy load. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30RDS: Fix potential race around rds_i[bw]_allocationAndy Grover
"At rds_ib_recv_refill_one(), it first executes atomic_read(&rds_ib_allocation) for if-condition checking, and then executes atomic_inc(&rds_ib_allocation) if the condition was not satisfied. However, if any other code which updates rds_ib_allocation executes between these two atomic operation executions, it seems that it may result race condition. (especially when rds_ib_allocation + 1 == rds_ib_sysctl_max_recv_allocation)" This patch fixes this by using atomic_inc_unless to eliminate the possibility of allocating more than rds_ib_sysctl_max_recv_allocation and then decrementing the count if the allocation fails. It also makes an identical change to the iwarp transport. Reported-by: Shin Hong <hongshin@gmail.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20RDS/IB: Rename byte_len to data_len to enhance readabilityAndy Grover
Of course len is in bytes. Calling it data_len hopefully indicates a little better what the variable is actually for. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20RDS/IB: Handle connections using RDS 3.0 wire protocolAndy Grover
The big differences between RDS 3.0 and 3.1 are protocol-level flow control, and with 3.1 the header is in front of the data. The header always ends up in the header buffer, and the data goes in the data page. In 3.0 our "header" is a trailer, and will end up either in the data page, the header buffer, or split across the two. Since 3.1 is backwards- compatible with 3.0, we need to continue to support these cases. This patch does that -- if using RDS 3.0 wire protocol, it will copy the header from wherever it ended up into the header buffer. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-09RDS/IW+IB: Allow max credit advertise window.Steve Wise
Fix hack that restricts the credit advertisement to 127. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-02RDS: Use spinlock to protect 64b value update on 32b archsAndy Grover
We have a 64bit value that needs to be set atomically. This is easy and quick on all 64bit archs, and can also be done on x86/32 with set_64bit() (uses cmpxchg8b). However other 32b archs don't have this. I actually changed this to the current state in preparation for mainline because the old way (using a spinlock on 32b) resulted in unsightly #ifdefs in the code. But obviously, being correct takes precedence. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26RDS/IB: Receive datagrams via IBAndy Grover
Header parsing, ring refill. It puts the incoming data into an rds_incoming struct, which is passed up to rds-core. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>