summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-08-27batman-adv: rearrange batadv_neigh_node_new() arguments to follow conventionMarek Lindner
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27batman-adv: remove obsolete deleted attribute for gateway nodeSimon Wunderlich
With rcu, the gateway node deleted attribute is not needed anymore. In fact, it may delay the free of the gateway node and its referenced structures. Therefore remove it altogether and simplify purging as well. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27batman-adv: move neigh_node list add into batadv_neigh_node_new()Marek Lindner
All batadv_neigh_node_* functions expect the neigh_node list item to be part of the orig_node->neigh_list, therefore the constructor of said list item should be adding the newly created neigh_node to the respective list. Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27batman-adv: remove redundant hard_iface assignmentMarek Lindner
The batadv_neigh_node_new() function already sets the hard_iface pointer. Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27batman-adv: move hardif refcount inc to batadv_neigh_node_new()Marek Lindner
The batadv_neigh_node cleanup function 'batadv_neigh_node_free_rcu()' takes care of reducing the hardif refcounter, hence it's only logical to assume the creating function of that same object 'batadv_neigh_node_new()' takes care of increasing the same refcounter. Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Acked-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-26bpf: fix bpf_skb_set_tunnel_key() helperAlexei Starovoitov
Make sure to indicate to tunnel driver that key.tun_id is set, otherwise gre won't recognize the metadata. Fixes: d3aa45ce6b94 ("bpf: add helpers to access tunnel metadata") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26Merge tag 'ipvs2-for-v4.3' of ↵Pablo Neira Ayuso
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Second Round of IPVS Updates for v4.3 I realise these are a little late in the cycle, so if you would prefer me to repost them for v4.4 then just let me know. The updates include: * A new scheduler from Raducu Deaconu * Enhanced configurability of the sync daemon from Julian Anastasov ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-26netfilter: ip6t_REJECT: added missing icmpv6 codesAndreas Herz
RFC 4443 added two new codes values for ICMPv6 type 1: 5 - Source address failed ingress/egress policy 6 - Reject route to destination And RFC 7084 states in L-14 that IPv6 Router MUST send ICMPv6 Destination Unreachable with code 5 for packets forwarded to it that use an address from a prefix that has been invalidated. Codes 5 and 6 are more informative subsets of code 1. Signed-off-by: Andreas Herz <andi@geekosphere.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-26net_sched: act_bpf: remove spinlock in fast pathAlexei Starovoitov
Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing with extra care taken to free bpf programs after rcu grace period. Replacement of existing act_bpf (very rare) is done with synchronize_rcu() and final destruction is done from tc_action_ops->cleanup() callback that is called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when bind and refcnt reach zero which is only possible when classifier is destroyed. Previous two patches fixed the last two classifiers (tcindex and rsvp) to call tcf_exts_destroy() from rcu callback. Similar to gact/mirred there is a race between prog->filter and prog->tcf_action. Meaning that the program being replaced may use previous default action if it happened to return TC_ACT_UNSPEC. act_mirred race betwen tcf_action and tcfm_dev is similar. In all cases the race is harmless. Long term we may want to improve the situation by replacing the whole tc_action->priv as single pointer instead of updating inner fields one by one. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: convert rsvp to call tcf_exts_destroy from rcu callbackAlexei Starovoitov
Adjust destroy path of cls_rsvp to call tcf_exts_destroy() after rcu grace period. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: convert tcindex to call tcf_exts_destroy from rcu callbackAlexei Starovoitov
Adjust destroy path of cls_tcindex to call tcf_exts_destroy() after rcu grace period. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: act_bpf: remove unnecessary copyAlexei Starovoitov
Fix harmless typo and avoid unnecessary copy of empty 'prog' into unused 'strcut tcf_bpf_cfg old'. Fixes: f4eaed28c783 ("act_bpf: fix memory leaks when replacing bpf programs") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26net_sched: make tcf_hash_destroy() staticAlexei Starovoitov
tcf_hash_destroy() used once. Make it static. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25cls_u32: complete the check for non-forced case in u32_destroy()WANG Cong
In commit 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") I added a check in u32_destroy() to see if all real filters are gone for each tp, however, that is only done for root_ht, same is needed for others. This can be reproduced by the following tc commands: tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256 tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32 ht 15:2: match ip src 10.0.0.2 flowid 1:10 tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32 ht 15:2: match ip src 10.0.0.3 flowid 1:10 Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") Reported-by: Akshat Kakkar <akshat.1984@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: remove superfluous from rds_ib_alloc_fmr()santosh.shilimkar@oracle.com
Memory allocated for 'ibmr' uses kzalloc_node() which already initialises the memory to zero. There is no need to do memset() 0 on that memory. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: flush the FMR pool less oftensantosh.shilimkar@oracle.com
FMR flush is an expensive and time consuming operation. Reduce the frequency of FMR pool flush by 50% so that more FMR work gets accumulated for more efficient flushing. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: push FMR pool flush work to its own workersantosh.shilimkar@oracle.com
RDS FMR flush operation and also it races with connect/reconect which happes a lot with RDS. FMR flush being on common rds_wq aggrevates the problem. Lets push RDS FMR pool flush work to its own worker. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: fix fmr pool dirty_countWengang Wang
In rds_ib_flush_mr_pool(), dirty_count accounts the clean ones which is wrong. This can lead to a negative dirty count value. Lets fix it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Fix rds MR reference count in rds_rdma_unuse()santosh.shilimkar@oracle.com
rds_rdma_unuse() drops the mr reference count which it hasn't taken. Correct way of removing mr is to remove mr from the tree and then rdma_destroy_mr() it first, then rds_mr_put() to decrement its reference count. Whichever thread holds last reference will free the mr via rds_mr_put() This bug was triggering weird null pointer crashes. One if the trace for it is captured below. BUG: unable to handle kernel NULL pointer dereference at 0000000000000104 IP: [<ffffffffa0899471>] rds_ib_free_mr+0x31/0x130 [rds_rdma] PGD 4366fa067 PUD 4366f9067 PMD 0 Oops: 0000 [#1] SMP [...] task: ffff88046da6a000 ti: ffff88046da6c000 task.ti: ffff88046da6c000 RIP: 0010:[<ffffffffa0899471>] [<ffffffffa0899471>] rds_ib_free_mr+0x31/0x130 [rds_rdma] RSP: 0018:ffff88046fa43bd8 EFLAGS: 00010286 RAX: 0000000071d38b80 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880079e7ff40 RBP: ffff88046fa43bf8 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88046fa43ca8 R11: ffff88046a802ed8 R12: ffff880079e7fa40 R13: 0000000000000000 R14: ffff880079e7ff40 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88046fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000104 CR3: 00000004366fb000 CR4: 00000000000006e0 Stack: ffff880079e7fa40 ffff880671d38f08 ffff880079e7ff40 0000000000000296 ffff88046fa43c28 ffffffffa087a38b ffff880079e7fa40 ffff880671d38f10 0000000000000000 0000000000000292 ffff88046fa43c48 ffffffffa087a3b6 Call Trace: <IRQ> [<ffffffffa087a38b>] rds_destroy_mr+0x8b/0xa0 [rds] [<ffffffffa087a3b6>] __rds_put_mr_final+0x16/0x30 [rds] [<ffffffffa087a492>] rds_rdma_unuse+0xc2/0x120 [rds] [<ffffffffa08766d3>] rds_recv_incoming_exthdrs+0x83/0xa0 [rds] [<ffffffffa0876782>] rds_recv_incoming+0x92/0x200 [rds] [<ffffffffa0895269>] rds_ib_process_recv+0x259/0x320 [rds_rdma] [<ffffffffa08962a8>] rds_ib_recv_tasklet_fn+0x1a8/0x490 [rds_rdma] [<ffffffff810dcd78>] ? __remove_hrtimer+0x58/0x90 [<ffffffff810799e1>] tasklet_action+0xb1/0xc0 [<ffffffff81079b52>] __do_softirq+0xe2/0x290 [<ffffffff81079df6>] irq_exit+0xa6/0xb0 [<ffffffff81613915>] do_IRQ+0x65/0xf0 [<ffffffff816118ab>] common_interrupt+0x6b/0x6b Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: fix the dangling reference to rds_ib_incoming_slabsantosh.shilimkar@oracle.com
On rds_ib_frag_slab allocation failure, ensure rds_ib_incoming_slab is not pointing to the detsroyed memory. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Antonio Quartulli says: ==================== Included changes: - code restyling and beautification - use int kernel types instead of C99 - update kereldoc - prevent potential hlist double deletion of VLAN objects - fix gw bandwidth calculation - convert list to hlist when needed - add lockdep_asserts calls in function with lock requirements described in kerneldoc ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25rds: Fix improper gfp_t usage.David S. Miller
>> net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base types) net/rds/ib_recv.c:382:28: expected int [signed] can_wait net/rds/ib_recv.c:382:28: got restricted gfp_t net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64 Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25ip6_gre: release cached dst on tunnel removalhuaibin Wang
When a tunnel is deleted, the cached dst entry should be released. This problem may prevent the removal of a netns (seen with a x-netns IPv6 gre tunnel): unregister_netdevice: waiting for lo to become free. Usage count = 3 CC: Dmitry Kozlov <xeb@mail.ru> Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: huaibin Wang <huaibin.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25route: fix a use-after-freeWANG Cong
This patch fixes the following crash: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88010656d280 ti: ffff880106570000 task.ti: ffff880106570000 RIP: 0010:[<ffffffff8182f91b>] [<ffffffff8182f91b>] dst_destroy+0xa6/0xef RSP: 0018:ffff880107603e38 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff8800d225a000 RCX: ffffffff82250fd0 RDX: 0000000000000001 RSI: ffffffff82250fd0 RDI: 6b6b6b6b6b6b6b6b RBP: ffff880107603e58 R08: 0000000000000001 R09: 0000000000000001 R10: 000000000000b530 R11: ffff880107609000 R12: 0000000000000000 R13: ffffffff82343c40 R14: 0000000000000000 R15: ffffffff8182fb4f FS: 0000000000000000(0000) GS:ffff880107600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fcabd9d3000 CR3: 00000000d7279000 CR4: 00000000000006e0 Stack: ffffffff82250fd0 ffff8801077d6f00 ffffffff82253c40 ffff8800d225a000 ffff880107603e68 ffffffff8182fb5d ffff880107603f08 ffffffff810d795e ffffffff810d7648 ffff880106574000 ffff88010656d280 ffff88010656d280 Call Trace: <IRQ> [<ffffffff8182fb5d>] dst_destroy_rcu+0xe/0x1d [<ffffffff810d795e>] rcu_process_callbacks+0x618/0x7eb [<ffffffff810d7648>] ? rcu_process_callbacks+0x302/0x7eb [<ffffffff8182fb4f>] ? dst_gc_task+0x1eb/0x1eb [<ffffffff8107e11b>] __do_softirq+0x178/0x39f [<ffffffff8107e52e>] irq_exit+0x41/0x95 [<ffffffff81a4f215>] smp_apic_timer_interrupt+0x34/0x40 [<ffffffff81a4d5cd>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff8100b968>] ? default_idle+0x21/0x32 [<ffffffff8100b966>] ? default_idle+0x1f/0x32 [<ffffffff8100bf19>] arch_cpu_idle+0xf/0x11 [<ffffffff810b0bc7>] default_idle_call+0x1f/0x21 [<ffffffff810b0dce>] cpu_startup_entry+0x1ad/0x273 [<ffffffff8102fe67>] start_secondary+0x135/0x156 dst is freed right before lwtstate_put(), this is not correct... Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry") Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25net-next: Fix warning while make xmldocs caused by skbuff.cMasanari Iida
This patch fix following warnings. .//net/core/skbuff.c:407: warning: No description found for parameter 'len' .//net/core/skbuff.c:407: warning: Excess function parameter 'length' description in '__netdev_alloc_skb' .//net/core/skbuff.c:476: warning: No description found for parameter 'len' .//net/core/skbuff.c:476: warning: Excess function parameter 'length' description in '__napi_alloc_skb' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25ah4: Fix error return in ah_input().David S. Miller
Noticed by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25ah6: fix error return codeJulia Lawall
Return a negative error code on failure. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e1,e2; @@ ( if (\(ret < 0\|ret != 0\)) { ... return ret; } | ret = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: check for valid cm_id before initiating connectionsantosh.shilimkar@oracle.com
Connection could have been dropped while the route is being resolved so check for valid cm_id before initiating the connection. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: return EMSGSIZE for oversize requests before processing/queueingMukesh Kacker
rds_send_queue_rm() allows for the "current datagram" being queued to exceed SO_SNDBUF thresholds by checking bytes queued without counting in length of current datagram. (Since sk_sndbuf is set to twice requested SO_SNDBUF value as a kernel heuristic this is usually fine!) If this "current datagram" squeezing past the threshold is itself many times the size of the sk_sndbuf threshold itself then even twice the SO_SNDBUF does not save us and it gets queued but cannot be transmitted. Threads block and deadlock and device becomes unusable. The check for this datagram not exceeding SNDBUF thresholds (EMSGSIZE) is not done on this datagram as that check is only done if queueing attempt fails. (Datagrams that follow this datagram fail queueing attempts, go through the check and eventually trip EMSGSIZE error but zero length datagrams silently fail!) This fix moves the check for datagrams exceeding SNDBUF limits before any processing or queueing is attempted and returns EMSGSIZE early in the rds_sndmsg() code. This change also ensures that all datagrams get checked for exceeding SNDBUF/sk_sndbuf size limits and the large datagrams that exceed those limits do not get to rds_send_queue_rm() code for processing. Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: make sure rds_send_drop_to properly takes the m_rs_locksantosh.shilimkar@oracle.com
rds_send_drop_to() is used during socket tear down to find all the messages on the socket and flush them . It can race with the acking code unless it takes the m_rs_lock on each and every message. This plugs a hole where we didn't take m_rs_lock on any message that didn't have the RDS_MSG_ON_CONN set. Taking m_rs_lock avoids double frees and other memory corruptions as the ack code trusts the message m_rs pointer on a socket that had actually been freed. We must take m_rs_lock to access m_rs. Because of lock nesting and rs access, we also need to acquire rs_lock. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Don't destroy the rdma id until after we're done using itSantosh Shilimkar
During connection resets, we are destroying the rdma id too soon. We can't destroy it when it is still in use. So lets move rdma_destroy_id() after we clear the rings. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Fix assertion level from fatal to warningsantosh.shilimkar@oracle.com
Fix the asserion level since its not fatal and can be hit in normal execution paths. There is no need to take the system down. We keep the WARN_ON() to detect the condition if we get here with bad pages. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Make sure we do a signaled send for large-sendsantosh.shilimkar@oracle.com
WR(Work Requests )always generate a WC(Work Completion) with signaled send. Default RDS ib code is setup for un-signaled completion. Since RDS connction is persistent, we can end up sending the data even after large-send when the remote end is not active(for any reason). By doing a signaled send at least once per large-send, we can at least detect the problem in work completion handler there by avoiding sending more data to inactive remote. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: Mark message mapped before transmitsantosh.shilimkar@oracle.com
rds_send_xmit() marks the rds message map flag after xmit_[rdma/atomic]() which is clearly wrong. We need to maintain the ownership between transport and rds. Also take care of error path. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: add a sock_destruct callback debug aidsantosh.shilimkar@oracle.com
This helps to detect the accidental processes/apps trying to destroy the RDS socket which they are sharing with other processes/apps. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: check for congestion updates during rds_send_xmitsantosh.shilimkar@oracle.com
Ensure we don't keep sending the data if the link is congested. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: make sure we post recv bufferssantosh.shilimkar@oracle.com
If we get an ENOMEM during rds_ib_recv_refill, we might never come back and refill again later. Patch makes sure to kick krdsd into helping out. To achieve this we add RDS_RECV_REFILL flag and update in the refill path based on that so that at least some therad will keep posting receive buffers. Since krdsd and softirq both might race for refill, we decide to schedule on work queue based on ring_low instead of ring_empty. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: don't update ip address tables if the address hasn't changedsantosh.shilimkar@oracle.com
If the ip address tables hasn't changed, there is no need to remove them only to be added back again. Lets fix it. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: destroy the ib state earlier during shutdownsantosh.shilimkar@oracle.com
Destroy ib state early during shutdown. Otherwise we can get callbacks after the QP isn't really able to handle them. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: always free recv frag as we free its ring entrysantosh.shilimkar@oracle.com
We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which indicates that the recv refill path was finding allocated frags in ring entries that were marked free. These were usually followed by OOM crashes. They only seem to be occurring in the presence of completion errors and connection resets. This patch ensures that we free the frag as we mark the ring entry free. This should stop the refill path from finding allocated frags in ring entries that were marked free. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: restore return value in rds_cmsg_rdma_args()santosh.shilimkar@oracle.com
In rds_cmsg_rdma_args() 'ret' is used by rds_pin_pages() which returns number of pinned pages on success. And the same value is returned to the caller of rds_cmsg_rdma_args() on success which is not intended. Commit f4a3fc03c1d7 ("RDS: Clean up error handling in rds_cmsg_rdma_args") removed the 'ret = 0' line which broke RDS RDMA mode. Fix it by restoring the return value on rds_pin_pages() success keeping the clean-up in place. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25tcp: refine pacing rate determinationEric Dumazet
When TCP pacing was added back in linux-3.12, we chose to apply a fixed ratio of 200 % against current rate, to allow probing for optimal throughput even during slow start phase, where cwnd can be doubled every other gRTT. At Google, we found it was better applying a different ratio while in Congestion Avoidance phase. This ratio was set to 120 %. We've used the normal tcp_in_slow_start() helper for a while, then tuned the condition to select the conservative ratio as soon as cwnd >= ssthresh/2 : - After cwnd reduction, it is safer to ramp up more slowly, as we approach optimal cwnd. - Initial ramp up (ssthresh == INFINITY) still allows doubling cwnd every other RTT. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25xfrm: Use VRF master index if output device is enslavedDavid Ahern
Directs route lookups to VRF table. Compiles out if NET_VRF is not enabled. With this patch able to successfully bring up ipsec tunnels in VRFs, even with duplicate network configuration. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25tcp: fix slow start after idle vs TSO/GSOEric Dumazet
slow start after idle might reduce cwnd, but we perform this after first packet was cooked and sent. With TSO/GSO, it means that we might send a full TSO packet even if cwnd should have been reduced to IW10. Moving the SSAI check in skb_entail() makes sense, because we slightly reduce number of times this check is done, especially for large send() and TCP Small queue callbacks from softirq context. As Neal pointed out, we also need to perform the check if/when receive window opens. Tested: Following packetdrill test demonstrates the problem // Test of slow start after idle `sysctl -q net.ipv4.tcp_slow_start_after_idle=1` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.100 < . 1:1(0) ack 1 win 511 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0 +0 write(4, ..., 26000) = 26000 +0 > . 1:5001(5000) ack 1 +0 > . 5001:10001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10 }% +.100 < . 1:1(0) ack 10001 win 511 +0 %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }% +0 > . 10001:20001(10000) ack 1 +0 > P. 20001:26001(6000) ack 1 +.100 < . 1:1(0) ack 26001 win 511 +0 %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }% +4 write(4, ..., 20000) = 20000 // If slow start after idle works properly, we should send 5 MSS here (cwnd/2) +0 > . 26001:31001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }% +0 > . 31001:36001(5000) ack 1 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25batman-adv: beautify supported routing algorithm listMarek Lindner
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-25batman-adv: Fix conditional statements indentationSven Eckelmann
commit 29b9256e6631 ("batman-adv: consider outgoing interface in OGM sending") incorrectly indented the interface check code. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-25batman-adv: Add lockdep_asserts for documented external locksSven Eckelmann
Some functions already have documentation about locks they require inside their kerneldoc header. These can be directly tested during runtime using the lockdep asserts. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-25batman-adv: Annotate deleting functions with external lock via lockdepSven Eckelmann
Functions which use (h)list_del* are requiring correct locking when they operate on global lists. Most of the time the search in the list and the delete are done in the same function. All other cases should have it visible that they require a special lock to avoid race conditions. Lockdep asserts can be used to check these problem during runtime when the lockdep functionality is enabled. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-25batman-adv: convert bat_priv->tt.req_list to hlistMarek Lindner
Since the list's tail is never accessed using a double linked list head wastes memory. Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-25batman-adv: Fix gw_bandwidth calculation on 32 bit systemsSven Eckelmann
The TVLV for the gw_bandwidth stores everything as u32. But the gw_bandwidth reads the signed long which limits the maximum value to (2 ** 31 - 1) on systems with 4 byte long. Also the input value is always converted from either Mibit/s or Kibit/s to 100Kibit/s. This reduces the values even further when the user sets it via the default unit Kibit/s. It may even cause an integer overflow and end up with a value the user never intended. Instead read the values as u64, check for possible overflows, do the unit adjustments and then reduce the size to u32. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>