summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-09-22nl80211: put current TX power in interface infoRafał Miłecki
Many drivers implement reading current TX power (using either cfg80211 or ieee80211 op) but userspace can't get it using nl80211. Right now the only way to access it is to call some wext ioctl. Let's put TX power in interface info reply (callback is wdev specific) just like we do with current channel. To be consistent (e.g. NL80211_CMD_SET_WIPHY) let's use mBm as na unit. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: use DECLARE_EWMA for ave_beacon_signalJohannes Berg
It doesn't seem problematic to change the weight for the average beacon signal from 3 to 4, so use DECLARE_EWMA. This also makes the code easier to maintain since bugs like the one fixed in the previous patch can't happen as easily. With a fix from Avraham Stern to invert the sign since EMWA uses unsigned values only. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: fix driver RSSI event calculationsJohannes Berg
The ifmgd->ave_beacon_signal value cannot be taken as is for comparisons, it must be divided by since it's represented like that for better accuracy of the EWMA calculations. This would lead to invalid driver RSSI events. Fix the used value. Fixes: 615f7b9bb1f8 ("mac80211: add driver RSSI threshold events") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: remove last_beacon/ave_beacon debugfs filesJohannes Berg
These file aren't really useful: - if per beacon data is required then you need to use radiotap or similar anyway, debugfs won't help much - average beacon signal is reported in station info in nl80211 and can be looked up with iw Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: implement VHT support for meshBob Copeland
Implement the basics required for supporting very high throughput with mesh: include VHT information elements in beacons, probe responses, and peering action frames, and check for compatible VHT configurations when peering. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: zero center freq segment 2 in VHT oper IEChun-Yeow Yeoh
Clear the Channel Center Frequency Segment 2 in VHT operation IEs to avoid sending non-zero values if the SKB wasn't zeroed before adding the VHT operation IE. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> [change commit message a bit - not necessarily just mesh related] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: allow the driver to advertise A-MSDU within A-MPDU Rx supportEmmanuel Grumbach
Drivers may be interested in receiving A-MSDU within A-MDPU. Not all the devices may be able to do so, make it configurable. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: remove direct probe step before authenticationJohannes Berg
The direct probe step before authentication was done mostly for two reasons: 1) the BSS data could be stale 2) the beacon might not have included all IEs The concern (1) doesn't really seem to be relevant any more as we time out BSS information after about 30 seconds, and in fact the original patch only did the direct probe if the data was older than the BSS timeout to begin with. This condition got (likely inadvertedly) removed later though. Analysing this in more detail shows that since we mostly use data from the association response, the only real reason for needing the probe response was that the code validates the WMM parameters, and those are optional in beacons. As the previous patches removed that behaviour, we can now remove the direct probe step entirely. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: allow to transmit A-MSDU within A-MPDUEmmanuel Grumbach
Advertise the capability to send A-MSDU within A-MPDU in the AddBA request sent by mac80211. Let the driver know about the peer's capabilities. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: introduce per vif frame registration APIAndrei Otcheretianski
Currently the cfg80211's frame registration api receives wdev, however mac80211 assumes per device filter configuration and ignores wdev. Per device filtering is too wasteful, especially for multi-channel devices. Introduce new per vif frame registration API and use it for probe request registrations in ieee80211_mgmt_frame_register() Also call directly to ieee80211_configure_filter instead of using a work since it is now allowed to sleep in ieee80211_mgmt_frame_register. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22nl80211: support vendor dumpit commandsJohannes Berg
In order to transfer many items in vendor commands, support the dumpit netlink method for them. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: TDLS: check reg with IR-relax on chandef upgradeArik Nemtsov
When checking if a TDLS chandef can be upgraded, IR-relaxation can be taken into account to allow more channels. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: debugfs: add file to disallow TDLS wider-bwArik Nemtsov
Sometimes we are interested in testing TDLS performance in a specific width setting. Add the ability to disable the wider-band feature, thereby allowing the TDLS channel width to be controlled by the BSS width. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22mac80211: process skb_queue while scanning in HWAndrei Otcheretianski
Queued frames aren't processed during scan, which results in an inability to complete the BA session establishment until the scan ends. Since we can't tx frames until the BA agreement setup is complete, it might result in a very large latency during scan. Fix this by allowing to process queued skbs while scanning in HW. This should be ok since the devices which support hw scan should be able to handle tx/rx while scanning. During SW scan, mac80211 drops any txed frames besides probes and NDPs, so it is still needed to delay processing of the queued frames till the SW scan is done. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-09-22wireless: make __freq_reg_info staticJohannes Berg
As pointed out by sparse, this symbol should be static, make it so. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-08-26bpf: fix bpf_skb_set_tunnel_key() helperAlexei Starovoitov
Make sure to indicate to tunnel driver that key.tun_id is set, otherwise gre won't recognize the metadata. Fixes: d3aa45ce6b94 ("bpf: add helpers to access tunnel metadata") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: act_bpf: remove spinlock in fast pathAlexei Starovoitov
Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing with extra care taken to free bpf programs after rcu grace period. Replacement of existing act_bpf (very rare) is done with synchronize_rcu() and final destruction is done from tc_action_ops->cleanup() callback that is called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when bind and refcnt reach zero which is only possible when classifier is destroyed. Previous two patches fixed the last two classifiers (tcindex and rsvp) to call tcf_exts_destroy() from rcu callback. Similar to gact/mirred there is a race between prog->filter and prog->tcf_action. Meaning that the program being replaced may use previous default action if it happened to return TC_ACT_UNSPEC. act_mirred race betwen tcf_action and tcfm_dev is similar. In all cases the race is harmless. Long term we may want to improve the situation by replacing the whole tc_action->priv as single pointer instead of updating inner fields one by one. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: convert rsvp to call tcf_exts_destroy from rcu callbackAlexei Starovoitov
Adjust destroy path of cls_rsvp to call tcf_exts_destroy() after rcu grace period. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: convert tcindex to call tcf_exts_destroy from rcu callbackAlexei Starovoitov
Adjust destroy path of cls_tcindex to call tcf_exts_destroy() after rcu grace period. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: act_bpf: remove unnecessary copyAlexei Starovoitov
Fix harmless typo and avoid unnecessary copy of empty 'prog' into unused 'strcut tcf_bpf_cfg old'. Fixes: f4eaed28c783 ("act_bpf: fix memory leaks when replacing bpf programs") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: make tcf_hash_destroy() staticAlexei Starovoitov
tcf_hash_destroy() used once. Make it static. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: remove superfluous from rds_ib_alloc_fmr()santosh.shilimkar@oracle.com
Memory allocated for 'ibmr' uses kzalloc_node() which already initialises the memory to zero. There is no need to do memset() 0 on that memory. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: flush the FMR pool less oftensantosh.shilimkar@oracle.com
FMR flush is an expensive and time consuming operation. Reduce the frequency of FMR pool flush by 50% so that more FMR work gets accumulated for more efficient flushing. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: push FMR pool flush work to its own workersantosh.shilimkar@oracle.com
RDS FMR flush operation and also it races with connect/reconect which happes a lot with RDS. FMR flush being on common rds_wq aggrevates the problem. Lets push RDS FMR pool flush work to its own worker. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: fix fmr pool dirty_countWengang Wang
In rds_ib_flush_mr_pool(), dirty_count accounts the clean ones which is wrong. This can lead to a negative dirty count value. Lets fix it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Fix rds MR reference count in rds_rdma_unuse()santosh.shilimkar@oracle.com
rds_rdma_unuse() drops the mr reference count which it hasn't taken. Correct way of removing mr is to remove mr from the tree and then rdma_destroy_mr() it first, then rds_mr_put() to decrement its reference count. Whichever thread holds last reference will free the mr via rds_mr_put() This bug was triggering weird null pointer crashes. One if the trace for it is captured below. BUG: unable to handle kernel NULL pointer dereference at 0000000000000104 IP: [<ffffffffa0899471>] rds_ib_free_mr+0x31/0x130 [rds_rdma] PGD 4366fa067 PUD 4366f9067 PMD 0 Oops: 0000 [#1] SMP [...] task: ffff88046da6a000 ti: ffff88046da6c000 task.ti: ffff88046da6c000 RIP: 0010:[<ffffffffa0899471>] [<ffffffffa0899471>] rds_ib_free_mr+0x31/0x130 [rds_rdma] RSP: 0018:ffff88046fa43bd8 EFLAGS: 00010286 RAX: 0000000071d38b80 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880079e7ff40 RBP: ffff88046fa43bf8 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88046fa43ca8 R11: ffff88046a802ed8 R12: ffff880079e7fa40 R13: 0000000000000000 R14: ffff880079e7ff40 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88046fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000104 CR3: 00000004366fb000 CR4: 00000000000006e0 Stack: ffff880079e7fa40 ffff880671d38f08 ffff880079e7ff40 0000000000000296 ffff88046fa43c28 ffffffffa087a38b ffff880079e7fa40 ffff880671d38f10 0000000000000000 0000000000000292 ffff88046fa43c48 ffffffffa087a3b6 Call Trace: <IRQ> [<ffffffffa087a38b>] rds_destroy_mr+0x8b/0xa0 [rds] [<ffffffffa087a3b6>] __rds_put_mr_final+0x16/0x30 [rds] [<ffffffffa087a492>] rds_rdma_unuse+0xc2/0x120 [rds] [<ffffffffa08766d3>] rds_recv_incoming_exthdrs+0x83/0xa0 [rds] [<ffffffffa0876782>] rds_recv_incoming+0x92/0x200 [rds] [<ffffffffa0895269>] rds_ib_process_recv+0x259/0x320 [rds_rdma] [<ffffffffa08962a8>] rds_ib_recv_tasklet_fn+0x1a8/0x490 [rds_rdma] [<ffffffff810dcd78>] ? __remove_hrtimer+0x58/0x90 [<ffffffff810799e1>] tasklet_action+0xb1/0xc0 [<ffffffff81079b52>] __do_softirq+0xe2/0x290 [<ffffffff81079df6>] irq_exit+0xa6/0xb0 [<ffffffff81613915>] do_IRQ+0x65/0xf0 [<ffffffff816118ab>] common_interrupt+0x6b/0x6b Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: fix the dangling reference to rds_ib_incoming_slabsantosh.shilimkar@oracle.com
On rds_ib_frag_slab allocation failure, ensure rds_ib_incoming_slab is not pointing to the detsroyed memory. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Antonio Quartulli says: ==================== Included changes: - code restyling and beautification - use int kernel types instead of C99 - update kereldoc - prevent potential hlist double deletion of VLAN objects - fix gw bandwidth calculation - convert list to hlist when needed - add lockdep_asserts calls in function with lock requirements described in kerneldoc ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25rds: Fix improper gfp_t usage.David S. Miller
>> net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base types) net/rds/ib_recv.c:382:28: expected int [signed] can_wait net/rds/ib_recv.c:382:28: got restricted gfp_t net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64 Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25route: fix a use-after-freeWANG Cong
This patch fixes the following crash: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88010656d280 ti: ffff880106570000 task.ti: ffff880106570000 RIP: 0010:[<ffffffff8182f91b>] [<ffffffff8182f91b>] dst_destroy+0xa6/0xef RSP: 0018:ffff880107603e38 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff8800d225a000 RCX: ffffffff82250fd0 RDX: 0000000000000001 RSI: ffffffff82250fd0 RDI: 6b6b6b6b6b6b6b6b RBP: ffff880107603e58 R08: 0000000000000001 R09: 0000000000000001 R10: 000000000000b530 R11: ffff880107609000 R12: 0000000000000000 R13: ffffffff82343c40 R14: 0000000000000000 R15: ffffffff8182fb4f FS: 0000000000000000(0000) GS:ffff880107600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fcabd9d3000 CR3: 00000000d7279000 CR4: 00000000000006e0 Stack: ffffffff82250fd0 ffff8801077d6f00 ffffffff82253c40 ffff8800d225a000 ffff880107603e68 ffffffff8182fb5d ffff880107603f08 ffffffff810d795e ffffffff810d7648 ffff880106574000 ffff88010656d280 ffff88010656d280 Call Trace: <IRQ> [<ffffffff8182fb5d>] dst_destroy_rcu+0xe/0x1d [<ffffffff810d795e>] rcu_process_callbacks+0x618/0x7eb [<ffffffff810d7648>] ? rcu_process_callbacks+0x302/0x7eb [<ffffffff8182fb4f>] ? dst_gc_task+0x1eb/0x1eb [<ffffffff8107e11b>] __do_softirq+0x178/0x39f [<ffffffff8107e52e>] irq_exit+0x41/0x95 [<ffffffff81a4f215>] smp_apic_timer_interrupt+0x34/0x40 [<ffffffff81a4d5cd>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff8100b968>] ? default_idle+0x21/0x32 [<ffffffff8100b966>] ? default_idle+0x1f/0x32 [<ffffffff8100bf19>] arch_cpu_idle+0xf/0x11 [<ffffffff810b0bc7>] default_idle_call+0x1f/0x21 [<ffffffff810b0dce>] cpu_startup_entry+0x1ad/0x273 [<ffffffff8102fe67>] start_secondary+0x135/0x156 dst is freed right before lwtstate_put(), this is not correct... Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry") Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25net-next: Fix warning while make xmldocs caused by skbuff.cMasanari Iida
This patch fix following warnings. .//net/core/skbuff.c:407: warning: No description found for parameter 'len' .//net/core/skbuff.c:407: warning: Excess function parameter 'length' description in '__netdev_alloc_skb' .//net/core/skbuff.c:476: warning: No description found for parameter 'len' .//net/core/skbuff.c:476: warning: Excess function parameter 'length' description in '__napi_alloc_skb' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25ah4: Fix error return in ah_input().David S. Miller
Noticed by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25ah6: fix error return codeJulia Lawall
Return a negative error code on failure. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e1,e2; @@ ( if (\(ret < 0\|ret != 0\)) { ... return ret; } | ret = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: check for valid cm_id before initiating connectionsantosh.shilimkar@oracle.com
Connection could have been dropped while the route is being resolved so check for valid cm_id before initiating the connection. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: return EMSGSIZE for oversize requests before processing/queueingMukesh Kacker
rds_send_queue_rm() allows for the "current datagram" being queued to exceed SO_SNDBUF thresholds by checking bytes queued without counting in length of current datagram. (Since sk_sndbuf is set to twice requested SO_SNDBUF value as a kernel heuristic this is usually fine!) If this "current datagram" squeezing past the threshold is itself many times the size of the sk_sndbuf threshold itself then even twice the SO_SNDBUF does not save us and it gets queued but cannot be transmitted. Threads block and deadlock and device becomes unusable. The check for this datagram not exceeding SNDBUF thresholds (EMSGSIZE) is not done on this datagram as that check is only done if queueing attempt fails. (Datagrams that follow this datagram fail queueing attempts, go through the check and eventually trip EMSGSIZE error but zero length datagrams silently fail!) This fix moves the check for datagrams exceeding SNDBUF limits before any processing or queueing is attempted and returns EMSGSIZE early in the rds_sndmsg() code. This change also ensures that all datagrams get checked for exceeding SNDBUF/sk_sndbuf size limits and the large datagrams that exceed those limits do not get to rds_send_queue_rm() code for processing. Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: make sure rds_send_drop_to properly takes the m_rs_locksantosh.shilimkar@oracle.com
rds_send_drop_to() is used during socket tear down to find all the messages on the socket and flush them . It can race with the acking code unless it takes the m_rs_lock on each and every message. This plugs a hole where we didn't take m_rs_lock on any message that didn't have the RDS_MSG_ON_CONN set. Taking m_rs_lock avoids double frees and other memory corruptions as the ack code trusts the message m_rs pointer on a socket that had actually been freed. We must take m_rs_lock to access m_rs. Because of lock nesting and rs access, we also need to acquire rs_lock. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Don't destroy the rdma id until after we're done using itSantosh Shilimkar
During connection resets, we are destroying the rdma id too soon. We can't destroy it when it is still in use. So lets move rdma_destroy_id() after we clear the rings. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Fix assertion level from fatal to warningsantosh.shilimkar@oracle.com
Fix the asserion level since its not fatal and can be hit in normal execution paths. There is no need to take the system down. We keep the WARN_ON() to detect the condition if we get here with bad pages. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Make sure we do a signaled send for large-sendsantosh.shilimkar@oracle.com
WR(Work Requests )always generate a WC(Work Completion) with signaled send. Default RDS ib code is setup for un-signaled completion. Since RDS connction is persistent, we can end up sending the data even after large-send when the remote end is not active(for any reason). By doing a signaled send at least once per large-send, we can at least detect the problem in work completion handler there by avoiding sending more data to inactive remote. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Mark message mapped before transmitsantosh.shilimkar@oracle.com
rds_send_xmit() marks the rds message map flag after xmit_[rdma/atomic]() which is clearly wrong. We need to maintain the ownership between transport and rds. Also take care of error path. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: add a sock_destruct callback debug aidsantosh.shilimkar@oracle.com
This helps to detect the accidental processes/apps trying to destroy the RDS socket which they are sharing with other processes/apps. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: check for congestion updates during rds_send_xmitsantosh.shilimkar@oracle.com
Ensure we don't keep sending the data if the link is congested. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: make sure we post recv bufferssantosh.shilimkar@oracle.com
If we get an ENOMEM during rds_ib_recv_refill, we might never come back and refill again later. Patch makes sure to kick krdsd into helping out. To achieve this we add RDS_RECV_REFILL flag and update in the refill path based on that so that at least some therad will keep posting receive buffers. Since krdsd and softirq both might race for refill, we decide to schedule on work queue based on ring_low instead of ring_empty. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: don't update ip address tables if the address hasn't changedsantosh.shilimkar@oracle.com
If the ip address tables hasn't changed, there is no need to remove them only to be added back again. Lets fix it. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: destroy the ib state earlier during shutdownsantosh.shilimkar@oracle.com
Destroy ib state early during shutdown. Otherwise we can get callbacks after the QP isn't really able to handle them. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: always free recv frag as we free its ring entrysantosh.shilimkar@oracle.com
We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which indicates that the recv refill path was finding allocated frags in ring entries that were marked free. These were usually followed by OOM crashes. They only seem to be occurring in the presence of completion errors and connection resets. This patch ensures that we free the frag as we mark the ring entry free. This should stop the refill path from finding allocated frags in ring entries that were marked free. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: restore return value in rds_cmsg_rdma_args()santosh.shilimkar@oracle.com
In rds_cmsg_rdma_args() 'ret' is used by rds_pin_pages() which returns number of pinned pages on success. And the same value is returned to the caller of rds_cmsg_rdma_args() on success which is not intended. Commit f4a3fc03c1d7 ("RDS: Clean up error handling in rds_cmsg_rdma_args") removed the 'ret = 0' line which broke RDS RDMA mode. Fix it by restoring the return value on rds_pin_pages() success keeping the clean-up in place. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25tcp: refine pacing rate determinationEric Dumazet
When TCP pacing was added back in linux-3.12, we chose to apply a fixed ratio of 200 % against current rate, to allow probing for optimal throughput even during slow start phase, where cwnd can be doubled every other gRTT. At Google, we found it was better applying a different ratio while in Congestion Avoidance phase. This ratio was set to 120 %. We've used the normal tcp_in_slow_start() helper for a while, then tuned the condition to select the conservative ratio as soon as cwnd >= ssthresh/2 : - After cwnd reduction, it is safer to ramp up more slowly, as we approach optimal cwnd. - Initial ramp up (ssthresh == INFINITY) still allows doubling cwnd every other RTT. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25xfrm: Use VRF master index if output device is enslavedDavid Ahern
Directs route lookups to VRF table. Compiles out if NET_VRF is not enabled. With this patch able to successfully bring up ipsec tunnels in VRFs, even with duplicate network configuration. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25tcp: fix slow start after idle vs TSO/GSOEric Dumazet
slow start after idle might reduce cwnd, but we perform this after first packet was cooked and sent. With TSO/GSO, it means that we might send a full TSO packet even if cwnd should have been reduced to IW10. Moving the SSAI check in skb_entail() makes sense, because we slightly reduce number of times this check is done, especially for large send() and TCP Small queue callbacks from softirq context. As Neal pointed out, we also need to perform the check if/when receive window opens. Tested: Following packetdrill test demonstrates the problem // Test of slow start after idle `sysctl -q net.ipv4.tcp_slow_start_after_idle=1` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.100 < . 1:1(0) ack 1 win 511 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0 +0 write(4, ..., 26000) = 26000 +0 > . 1:5001(5000) ack 1 +0 > . 5001:10001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10 }% +.100 < . 1:1(0) ack 10001 win 511 +0 %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }% +0 > . 10001:20001(10000) ack 1 +0 > P. 20001:26001(6000) ack 1 +.100 < . 1:1(0) ack 26001 win 511 +0 %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }% +4 write(4, ..., 20000) = 20000 // If slow start after idle works properly, we should send 5 MSS here (cwnd/2) +0 > . 26001:31001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }% +0 > . 31001:36001(5000) ack 1 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>