summaryrefslogtreecommitdiff
path: root/net/rds/ib_send.c
AgeCommit message (Collapse)Author
2017-10-26rds: Fix inaccurate accounting of unsignaled wrsHåkon Bugge
The number of unsignaled work-requests posted to the IB send queue is tracked by a counter in the rds_ib_connection struct. When it reaches zero, or the caller explicitly asks for it, the send-signaled bit is set in send_flags and the counter is reset. This is performed by the rds_ib_set_wr_signal_state() function. However, this function is not always used which yields inaccurate accounting. This commit fixes this, re-factors a code bloat related to the matter, and makes the actual parameter type to the function consistent. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26rds: ib: Fix uninitialized variableHåkon Bugge
send_flags needs to be initialized before calling rds_ib_set_wr_signal_state(). Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17rds:Remove unnecessary ib_ring unallocZhu Yanjun
In the function rds_ib_xmit_atomic, ib_ring is not allocated successfully. As such, it is not necessary to unalloc it. Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02RDS: RDMA: Fix the composite message user notificationSantosh Shilimkar
When application sends an RDS RDMA composite message consist of RDMA transfer to be followed up by non RDMA payload, it expect to be notified *only* when the full message gets delivered. RDS RDMA notification doesn't behave this way though. Thanks to Venkat for debug and root casuing the issue where only first part of the message(RDMA) was successfully delivered but remainder payload delivery failed. In that case, application should not be notified with a false positive of message delivery success. Fix this case by making sure the user gets notified only after the full message delivery. Reviewed-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: include faddr in connection logSantosh Shilimkar
Also use pr_* for it. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2016-07-01RDS: Rework path specific indirectionsSowmini Varadhan
Refactor code to avoid separate indirections for single-path and multipath transports. All transports (both single and mp-capable) will get a pointer to the rds_conn_path, and can trivially derive the rds_connection from the ->cp_conn. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14RDS: split out connection specific state from rds_connection to rds_conn_pathSowmini Varadhan
In preparation for multipath RDS, split the rds_connection structure into a base structure, and a per-path struct rds_conn_path. The base structure tracks information and locks common to all paths. The workqs for send/recv/shutdown etc are tracked per rds_conn_path. Thus the workq callbacks now work with rds_conn_path. This commit allows for one rds_conn_path per rds_connection, and will be extended into multiple conn_paths in subsequent commits. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: Remove the RDS_IB_SEND_OP dependencysantosh.shilimkar@oracle.com
This helps to combine asynchronous fastreg MR completion handler with send completion handler. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-07Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is my initial round of 4.4 merge window patches. There are a few other things I wish to get in for 4.4 that aren't in this pull, as this represents what has gone through merge/build/run testing and not what is the last few items for which testing is not yet complete. - "Checksum offload support in user space" enablement - Misc cxgb4 fixes, add T6 support - Misc usnic fixes - 32 bit build warning fixes - Misc ocrdma fixes - Multicast loopback prevention extension - Extend the GID cache to store and return attributes of GIDs - Misc iSER updates - iSER clustering update - Network NameSpace support for rdma CM - Work Request cleanup series - New Memory Registration API" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits) IB/core, cma: Make __attribute_const__ declarations sparse-friendly IB/core: Remove old fast registration API IB/ipath: Remove fast registration from the code IB/hfi1: Remove fast registration from the code RDMA/nes: Remove old FRWR API IB/qib: Remove old FRWR API iw_cxgb4: Remove old FRWR API RDMA/cxgb3: Remove old FRWR API RDMA/ocrdma: Remove old FRWR API IB/mlx4: Remove old FRWR API support IB/mlx5: Remove old FRWR API support IB/srp: Dont allocate a page vector when using fast_reg IB/srp: Remove srp_finish_mapping IB/srp: Convert to new registration API IB/srp: Split srp_map_sg RDS/IW: Convert to new memory registration API svcrdma: Port to new memory registration API xprtrdma: Port to new memory registration API iser-target: Port to new memory registration API IB/iser: Port to new fast registration API ...
2015-10-08IB: split struct ib_send_wrChristoph Hellwig
This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>
2015-10-05RDS: IB: split send completion handling and do batch ackSantosh Shilimkar
Similar to what we did with receive CQ completion handling, we split the transmit completion handler so that it lets us implement batched work completion handling. We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to identify the send vs receive completion event handler invocation. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-09-09Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...
2015-08-30rds/ib: Remove ib_get_dma_mr callsJason Gunthorpe
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-25RDS: Make sure we do a signaled send for large-sendsantosh.shilimkar@oracle.com
WR(Work Requests )always generate a WC(Work Completion) with signaled send. Default RDS ib code is setup for un-signaled completion. Since RDS connction is persistent, we can end up sending the data even after large-send when the remote end is not active(for any reason). By doing a signaled send at least once per large-send, we can at least detect the problem in work completion handler there by avoiding sending more data to inactive remote. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-02Merge branch 'for-4.2-misc' into k.o/for-4.2Doug Ledford
2015-06-02rds: re-entry of rds_ib_xmit/rds_iw_xmitWengang Wang
The BUG_ON at line 452/453 is triggered in function rds_send_xmit. 441 while (ret) { 442 tmp = min_t(int, ret, sg->length - 443 conn->c_xmit_data_off); 444 conn->c_xmit_data_off += tmp; 445 ret -= tmp; 446 if (conn->c_xmit_data_off == sg->length) { 447 conn->c_xmit_data_off = 0; 448 sg++; 449 conn->c_xmit_sg++; 450 if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents) 451 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret); 452 BUG_ON(ret != 0 && 453 conn->c_xmit_sg == rm->data.op_nents); 454 } 455 } it is complaining the total sent length is bigger that we want to send. rds_ib_xmit() is wrong for the second entry for the same rds_message returning wrong value. the sg and off passed by rds_send_xmit to rds_ib_xmit is based on scatterlist.offset/length, but the rds_ib_xmit action is based on scatterlist.dma_address/dma_length. in case dma_length is larger than length there is problem. for the 2nd and later entries of rds_ib_xmit for same rds_message, at least one of the following two is wrong: 1) the scatterlist to start with, the choosen one can far beyond the correct one. 2) the offset to start with within the scatterlist. fix: add op_dmasg and op_dmaoff to rm_data_op structure indicating the scatterlist and offset within the it to start with for rds_ib_xmit respectively. op_dmasg and op_dmaoff are initialized to zero when doing dma mapping for the first see of the message and are changed when filling send slots. the same applies to rds_iw_xmit too. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18RDS: Switch to generic logging helpersSagi Grimberg
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-02-07net: rds: Remove repeated function names from debug outputRasmus Villemoes
The macro rdsdebug is defined as pr_debug("%s(): " fmt, __func__ , ##args) Hence it doesn't make sense to include the name of the calling function explicitly in the format string passed to rdsdebug. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18net: rds: Use time_after() for time comparisonManuel Schölling
To be future-proof and for better readability the time comparisons are modified to use time_after() instead of raw math. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-03rds: prevent BUG_ON triggered on congestion update to loopbackVenkat Venkatsubra
After congestion update on a local connection, when rds_ib_xmit returns less bytes than that are there in the message, rds_send_xmit calls back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE) to trigger. For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually the message contains 8240 bytes. rds_send_xmit thinks there is more to send and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header) =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k]. The commit 6094628bfd94323fc1cea05ec2c6affd98c18f7f "rds: prevent BUG_ON triggering on congestion map updates" introduced this regression. That change was addressing the triggering of a different BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE: BUG_ON(ret != 0 && conn->c_xmit_sg == rm->data.op_nents); This was the sequence it was going through: (rds_ib_xmit) /* Do not send cong updates to IB loopback */ if (conn->c_loopback && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) { rds_cong_map_updated(conn->c_fcong, ~(u64) 0); return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES; } rds_ib_xmit returns 8240 rds_send_xmit: c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time) = 8192 c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again rds_ib_xmit returns 8240 rds_send_xmit: c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again and so on (c_xmit_data_off 24672,32912,41152,49392,57632) rds_ib_xmit returns 8240 On this iteration this sequence causes the BUG_ON in rds_send_xmit: while (ret) { tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off); [tmp = 65536 - 57632 = 7904] conn->c_xmit_data_off += tmp; [c_xmit_data_off = 57632 + 7904 = 65536] ret -= tmp; [ret = 8240 - 7904 = 336] if (conn->c_xmit_data_off == sg->length) { conn->c_xmit_data_off = 0; sg++; conn->c_xmit_sg++; BUG_ON(ret != 0 && conn->c_xmit_sg == rm->data.op_nents); [c_xmit_sg = 1, rm->data.op_nents = 1] What the current fix does: Since the congestion update over loopback is not actually transmitted as a message, all that rds_ib_xmit needs to do is let the caller think the full message has been transmitted and not return partial bytes. It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb. And 64Kb+48 when page size is 64Kb. Reported-by: Josh Hunt <joshhunt00@gmail.com> Tested-by: Honggang Li <honli@redhat.com> Acked-by: Bang Nguyen <bang.nguyen@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-17net/rds: use prink_ratelimited() instead of printk_ratelimit()Manuel Zerpies
Since printk_ratelimit() shouldn't be used anymore (see comment in include/linux/printk.h), replace it with printk_ratelimited() Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-08rds: prevent BUG_ON triggering on congestion map updatesNeil Horman
Recently had this bug halt reported to me: kernel BUG at net/rds/send.c:329! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt dm_mod [last unloaded: scsi_wait_scan] NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770 REGS: c000000175cab980 TRAP: 0700 Not tainted (2.6.32-118.el6.ppc64) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 44000022 XER: 00000000 TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0 GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030 GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030 GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000 GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00 GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001 GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000 GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860 GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8 NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds] LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds] Call Trace: [c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds] (unreliable) [c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds] [c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0 [c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0 [c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70 Instruction dump: 4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c 7d094a78 7d290074 7929d182 394a0020 <0b090000> 40e2ff68 4bffffa4 39200000 Kernel panic - not syncing: Fatal exception Call Trace: [c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable) [c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4 [c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0 [c000000175cab750] [c000000000030000] ._exception+0x110/0x220 [c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180 Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08RDS: Implement masked atomic operationsAndy Grover
Add two CMSGs for masked versions of cswp and fadd. args struct modified to use a union for different atomic op type's arguments. Change IB to do masked atomic ops. Atomic op type in rds_message similarly unionized. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: print string constants in more placesZach Brown
This prints the constant identifier for work completion status and rdma cm event types, like we already do for IB event types. A core string array helper is added that each string type uses. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: track signaled sendsZach Brown
We're seeing bugs today where IB connection shutdown clears the send ring while the tasklet is processing completed sends. Implementation details cause this to dereference a null pointer. Shutdown needs to wait for send completion to stop before tearing down the connection. We can't simply wait for the ring to empty because it may contain unsignaled sends that will never be processed. This patch tracks the number of signaled sends that we've posted and waits for them to complete. It also makes sure that the tasklet has finished executing. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08rds: fix rds_send_xmit() serializationZach Brown
rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a mutex so that it could be called from the IB receive tasklet path. This broke the TCP transport because its xmit method can block and masks and unmasks interrupts. This patch serializes callers to rds_send_xmit() with a simple bit instead of the current spinlock or previous mutex. This enables rds_send_xmit() to be called from any context and to call functions which block. Getting rid of the c_send_lock exposes the bare c_lock acquisitions which are changed to block interrupts. A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave rds_send_xmit() before tearing down partial send state. This lets us get rid of c_senders. rds_send_xmit() is changed to check the conn state after acquiring the RDS_IN_XMIT bit to resolve races with the shutdown path. Previously both worked with the conn state and then the lock in the same order, allowing them to race and execute the paths concurrently. rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown() properly ensures that rds_send_xmit() can't start once the conn state has been changed. We can remove its previous use of the spinlock. Finally, c_send_generation is redundant. Callers can race to test the c_flags bit by simply retrying instead of racing to test the c_send_generation atomic. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: get the xmit max_sge from the RDS IB device on the connectionZach Brown
rds_ib_xmit_rdma() was calling ib_get_client_data() to get at the rds_ibdevice just to get the max_sge for the transmit. This patch instead has it get it directly off the rds_ibdev which is stored on the connection. The current code won't free the rds_ibdev until all the IB connections that use it are freed. So it's safe to reference the rds_ibdev this way. In the future it also makes it easier to support proper reference counting of the rds_ibdev struct. As an additional bonus, this gets rid of the performance hit of calling in to the IB stack to look up the rds_ibdev. The current implementation in the IB stack acquires an interrupt blocking spinlock to protect the registration of client callback data. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08rds: Fix reference counting on the for xmit_atomic and xmit_rdmaChris Mason
This makes sure we have the proper number of references in rds_ib_xmit_atomic and rds_ib_xmit_rdma. We also consistently drop references the same way for all message types as the IOs end. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-09-08rds: Fix RDMA message reference countingChris Mason
The RDS send_xmit code was trying to get fancy with message counting and was dropping the final reference on the RDMA messages too early. This resulted in memory corruption and oopsen. The fix here is to always add a ref as the parts of the message passes through rds_send_xmit, and always drop a ref as the parts of the message go through completion handling. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-09-08RDS: Move atomic stats from general to ib-specific areaAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Perform unmapping ops in stagesAndy Grover
Previously, RDS would wait until the final send WR had completed and then handle cleanup. With silent ops, we do not know if an atomic, rdma, or data op will be last. This patch handles any of these cases by keeping a pointer to the last op in the message in m_last_op. When the TX completion event fires, rds dispatches to per-op-type cleanup functions, and then does whole-message cleanup, if the last op equalled m_last_op. This patch also moves towards having op-specific functions take the op struct, instead of the overall rm struct. rds_ib_connection has a pointer to keep track of a a partially- completed data send operation. This patch changes it from an rds_message pointer to the narrower rm_data_op pointer, and modifies places that use this pointer as needed. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Rename data op members prefix from m_ to op_Andy Grover
For consistency. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Remove struct rds_rdma_opAndy Grover
A big changeset, but it's all pretty dumb. struct rds_rdma_op was already embedded in struct rm_rdma_op. Remove rds_rdma_op and put its members in rm_rdma_op. Rename members with "op_" prefix instead of "r_", for consistency. Of course this breaks a lot, so fixup the code accordingly. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Implement silent atomicsAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: Make all flow control code conditional on i_flowctlAndy Grover
Maybe things worked fine with the flow control code running even in the non-flow-control case, but making it explicitly conditional helps the non-fc case be easier to read. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Remove unsignaled_bytes sysctlAndy Grover
Removed unsignaled_bytes sysctl and code to signal based on it. I believe unsignaled_wrs is more than sufficient for our purposes. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: rewrite rds_ib_xmitAndy Grover
Now that the header always goes first, it is possible to simplify rds_ib_xmit. Instead of having a path to handle 0-byte dgrams and another path to handle >0, these can both be handled in one path. This lets us eliminate xmit_populate_wr(). Rename sent to bytes_sent, to differentiate better from other variable named "send". Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: Remove ib_[header/data]_sge() functionsAndy Grover
These functions were to cope with differently ordered sg entries depending on RDS 3.0 or 3.1+. Now that we've dropped 3.0 compatibility we no longer need them. Also, modify usage sites for these to refer to sge[0] or [1] directly. Reorder code to initialize header sgs first. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: Remove dead codeAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: eliminate duplicate codeAndy Grover
both atomics and rdmas need to convert ib-specific completion codes into RDS status codes. Rename rds_ib_rdma_send_complete to rds_ib_send_complete, and have it take a pointer to the function to call with the new error code. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Implement atomic operationsAndy Grover
Implement a CMSG-based interface to do FADD and CSWP ops. Alter send routines to handle atomic ops. Add atomic counters to stats. Add xmit_atomic() to struct rds_transport Inline rds_ib_send_unmap_rdma into unmap_rm Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: make m_rdma_op a member of rds_messageAndy Grover
This eliminates a separate memory alloc, although it is now necessary to add an "r_active" flag, since it is no longer to use the m_rdma_op pointer as an indicator of if an rdma op is present. rdma SGs allocated from rm sg pool. rds_rm_size also gets bigger. It's a little inefficient to run through CMSGs twice, but it makes later steps a lot smoother. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: fold rdma.h into rds.hAndy Grover
RDMA is now an intrinsic part of RDS, so it's easier to just have a single header. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: break out rdma and data ops into nested structs in rds_messageAndy Grover
Clearly separate rdma-related variables in rm from data-related ones. This is in anticipation of adding atomic support. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisonsAndy Grover
Favor "if (foo)" style over "if (foo != NULL)". Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-03-16RDS: Properly unmap when getting a remote access errorSherman Pun
If the RDMA op has aborted with a remote access error, in addition to what we already do (tell userspace it has completed with an error) also unmap it and put() the rm. Otherwise, hangs may occur on arches that track maps and will not exit without proper cleanup. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16RDS: Fix congestion issues for loopbackAndy Grover
We have two kinds of loopback: software (via loop transport) and hardware (via IB). sw is used for 127.0.0.1, and doesn't support rdma ops. hw is used for sends to local device IPs, and supports rdma. Both are used in different cases. For both of these, when there is a congestion map update, we want to call rds_cong_map_updated() but not actually send anything -- since loopback local and foreign congestion maps point to the same spot, they're already in sync. The old code never called sw loop's xmit_cong_map(),so rds_cong_map_updated() wasn't being called for it. sw loop ports would not work right with the congestion monitor. Fixing that meant that hw loopback now would send congestion maps to itself. This is also undesirable (racy), so we check for this case in the ib-specific xmit code. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16RDS: Do not BUG() on error returned from ib_post_sendAndy Grover
BUGging on a runtime error code should be avoided. This patch also eliminates all other BUG()s that have no real reason to exist. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29net: Move && and || to end of previous lineJoe Perches
Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>