summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-06-17net: remove DST_NOGC flagWei Wang
Now that all the components have been changed to release dst based on refcnt only and not depend on dst gc anymore, we can remove the temporary flag DST_NOGC. Note that we also need to remove the DST_NOCACHE check in dst_release() and dst_hold_safe() because now all the dst are released based on refcnt and behaves as DST_NOCACHE. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17net: remove dst gc related codeWei Wang
This patch removes all dst gc related code and all the dst free functions Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17decnet: take dst->__refcnt when struct dn_route is createdWei Wang
struct dn_route is inserted into dn_rt_hash_table but no dst->__refcnt is taken. This patch makes sure the dn_rt_hash_table's reference to the dst is ref counted. As the dst is always ref counted properly, we can safely mark DST_NOGC flag so dst_release() will release dst based on refcnt only. And dst gc is no longer needed and all dst_free() or its related function calls should be replaced with dst_release() or dst_release_immediate(). And dst_dev_put() is called when removing dst from the hash table to release the reference on dst->dev before we lose pointer to it. Also, correct the logic in dn_dst_check_expire() and dn_dst_gc() to check dst->__refcnt to be > 1 to indicate it is referenced by other users. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17xfrm: take refcnt of dst when creating struct xfrm_dst bundleWei Wang
During the creation of xfrm_dst bundle, always take ref count when allocating the dst. This way, xfrm_bundle_create() will form a linked list of dst with dst->child pointing to a ref counted dst child. And the returned dst pointer is also ref counted. This makes the link from the flow cache to this dst now ref counted properly. As the dst is always ref counted properly, we can safely mark DST_NOGC flag so dst_release() will release dst based on refcnt only. And dst gc is no longer needed and all dst_free() and its related function calls should be replaced with dst_release() or dst_release_immediate(). The special handling logic for dst->child in dst_destroy() can be replaced with a simple dst_release_immediate() call on the child to release the whole list linked by dst->child pointer. Previously used DST_NOHASH flag is not needed anymore as well. The reason that DST_NOHASH is used in the existing code is mainly to prevent the dst inserted in the fib tree to be wrongly destroyed during the deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH flag is marked in all the dst children except the one which is in the fib tree. However, with this patch series to remove dst gc logic and release dst only based on ref count, it is safe to release all the children from a xfrm_dst bundle as long as the dst children are all ref counted properly which is already the case in the existing code. So, this patch removes the use of DST_NOHASH flag. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv6: get rid of icmp6 dst garbage collectorWei Wang
icmp6 dst route is currently ref counted during creation and will be freed by user during its call of dst_release(). So no need of a garbage collector for it. Remove all icmp6 dst garbage collector related code. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv6: mark DST_NOGC and remove the operation of dst_free()Wei Wang
With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv6 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv6 dst and remove the calls to dst_free() and its related functions. At this point, all dst created in ipv6 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Also, as icmp6 dst route is refcounted during creation and will be freed by user during its call of dst_release(), there is no need to add this dst to the icmp6 gc list as well. Instead, we need to add it into uncached list so that when a NETDEV_DOWN/NETDEV_UNREGISRER event comes, we can properly go through these icmp6 dst as well and release the net device properly. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv6: call dst_hold_safe() properlyWei Wang
Similar as ipv4, ipv6 path also needs to call dst_hold_safe() when necessary to avoid double free issue on the dst. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv6: call dst_dev_put() properlyWei Wang
As the intend of this patch series is to completely remove dst gc, we need to call dst_dev_put() to release the reference to dst->dev when removing routes from fib because we won't keep the gc list anymore and will lose the dst pointer right after removing the routes. Without the gc list, there is no way to find all the dst's that have dst->dev pointing to the going-down dev. Hence, we are doing dst_dev_put() immediately before we lose the last reference of the dst from the routing code. The next dst_check() will trigger a route re-lookup to find another route (if there is any). Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv6: take dst->__refcnt for insertion into fib6 treeWei Wang
In IPv6 routing code, struct rt6_info is created for each static route and RTF_CACHE route and inserted into fib6 tree. In both cases, dst ref count is not taken. As explained in the previous patch, this leads to the need of the dst garbage collector. This patch holds ref count of dst before inserting the route into fib6 tree and properly releases the dst when deleting it from the fib6 tree as a preparation in order to fully get rid of dst gc later. Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate no user is referencing the dst. And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts dst->__refcnt to 1. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv4: mark DST_NOGC and remove the operation of dst_free()Wei Wang
With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv4: call dst_hold_safe() properlyWei Wang
This patch checks all the calls to dst_hold()/skb_dst_force()/dst_clone()/dst_use() to see if dst_hold_safe() is needed to avoid double free issue if dst gc is removed and dst_release() directly destroys dst when dst->__refcnt drops to 0. In tx path, TCP hold sk->sk_rx_dst ref count and also hold sock_lock(). UDP and other similar protocols always hold refcount for skb->_skb_refdst. So both paths seem to be safe. In rx path, as it is lockless and skb_dst_set_noref() is likely to be used, dst_hold_safe() should always be used when trying to hold dst. In the routing code, if dst is held during an rcu protected session, it is necessary to call dst_hold_safe() as the current dst might be in its rcu grace period. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv4: call dst_dev_put() properlyWei Wang
As the intend of this patch series is to completely remove dst gc, we need to call dst_dev_put() to release the reference to dst->dev when removing routes from fib because we won't keep the gc list anymore and will lose the dst pointer right after removing the routes. Without the gc list, there is no way to find all the dst's that have dst->dev pointing to the going-down dev. Hence, we are doing dst_dev_put() immediately before we lose the last reference of the dst from the routing code. The next dst_check() will trigger a route re-lookup to find another route (if there is any). Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv4: take dst->__refcnt when caching dst in fibWei Wang
In IPv4 routing code, fib_nh and fib_nh_exception can hold pointers to struct rtable but they never increment dst->__refcnt. This leads to the need of the dst garbage collector because when user is done with this dst and calls dst_release(), it can only decrement dst->__refcnt and can not free the dst even it sees dst->__refcnt drops from 1 to 0 (unless DST_NOCACHE flag is set) because the routing code might still hold reference to it. And when the routing code tries to delete a route, it has to put the dst to the gc_list if dst->__refcnt is not yet 0 and have a gc thread running periodically to check on dst->__refcnt and finally to free dst when refcnt becomes 0. This patch increments dst->__refcnt when fib_nh/fib_nh_exception holds reference to this dst and properly release the dst when fib_nh/fib_nh_exception has been updated with a new dst. This patch is a preparation in order to fully get rid of dst gc later. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17net: introduce a new function dst_dev_put()Wei Wang
This function should be called when removing routes from fib tree after the dst gc is no longer in use. We first mark DST_OBSOLETE_DEAD on this dst to make sure next dst_ops->check() fails and returns NULL. Secondly, as we no longer keep the gc_list, we need to properly release dst->dev right at the moment when the dst is removed from the fib/fib6 tree. It does the following: 1. change dst->input and output pointers to dst_discard/dst_dscard_out to discard all packets 2. replace dst->dev with loopback interface Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17net: introduce DST_NOGC in dst_release() to destroy dst based on refcntWei Wang
The current mechanism of freeing dst is a bit complicated. dst has its ref count and when user grabs the reference to the dst, the ref count is properly taken in most cases except in IPv4/IPv6/decnet/xfrm routing code due to some historic reasons. If the reference to dst is always taken properly, we should be able to simplify the logic in dst_release() to destroy dst when dst->__refcnt drops from 1 to 0. And this should be the only condition to determine if we can call dst_destroy(). And as dst is always ref counted, there is no need for a dst garbage list to hold the dst entries that already get removed by the routing code but are still held by other users. And the task to periodically check the list to free dst if ref count become 0 is also not needed anymore. This patch introduces a temporary flag DST_NOGC(no garbage collector). If it is set in the dst, dst_release() will call dst_destroy() when dst->__refcnt drops to 0. dst_hold_safe() will also check for this flag and do atomic_inc_not_zero() similar as DST_NOCACHE to avoid double free issue. This temporary flag is mainly used so that we can make the transition component by component without breaking other parts. This flag will be removed after all components are properly transitioned. This patch also introduces a new function dst_release_immediate() which destroys dst without waiting on the rcu when refcnt drops to 0. It will be used in later patches. Follow-up patches will correct all the places to properly take ref count on dst and mark DST_NOGC. dst_release() or dst_release_immediate() will be used to release the dst instead of dst_free() and its related functions. And final clean-up patch will remove the DST_NOGC flag. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17net: use loopback dev when generating blackhole routeWei Wang
Existing ipv4/6_blackhole_route() code generates a blackhole route with dst->dev pointing to the passed in dst->dev. It is not necessary to hold reference to the passed in dst->dev because the packets going through this route are dropped anyway. A loopback interface is good enough so that we don't need to worry about releasing this dst->dev when this dev is going down. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17udp: call dst_hold_safe() in udp_sk_rx_set_dst()Wei Wang
In udp_v4/6_early_demux() code, we try to hold dst->__refcnt for dst with DST_NOCACHE flag. This is because later in udp_sk_rx_dst_set() function, we will try to cache this dst in sk for connected case. However, a better way to achieve this is to not try to hold dst in early_demux(), but in udp_sk_rx_dst_set(), call dst_hold_safe(). This approach is also more consistant with how tcp is handling it. And it will make later changes simpler. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv6: remove unnecessary dst_hold() in ip6_fragment()Wei Wang
In ipv6 tx path, rcu_read_lock() is taken so that dst won't get freed during the execution of ip6_fragment(). Hence, no need to hold dst in it. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: dsa: add cross-chip multicast supportVivien Didelot
Similarly to how cross-chip VLAN works, define a bitmap of multicast group members for a switch, now including its DSA ports, so that multicast traffic can be sent to all switches of the fabric. A switch may drop the frames if no user port is a member. This brings support for multicast in a multi-chip environment. As of now, all switches of the fabric must support the multicast operations in order to program a single fabric port. Reported-by: Jason Cobham <jcobham@questertangent.com> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Tested-by: Jason Cobham <jcobham@questertangent.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16decnet: always not take dst->__refcnt when inserting dst into hash tableWei Wang
In the existing dn_route.c code, dn_route_output_slow() takes dst->__refcnt before calling dn_insert_route() while dn_route_input_slow() does not take dst->__refcnt before calling dn_insert_route(). This makes the whole routing code very buggy. In dn_dst_check_expire(), dnrt_free() is called when rt expires. This makes the routes inserted by dn_route_output_slow() not able to be freed as the refcnt is not released. In dn_dst_gc(), dnrt_drop() is called to release rt which could potentially cause the dst->__refcnt to be dropped to -1. In dn_run_flush(), dst_free() is called to release all the dst. Again, it makes the dst inserted by dn_route_output_slow() not able to be released and also, it does not wait on the rcu and could potentially cause crash in the path where other users still refer to this dst. This patch makes sure both input and output path do not take dst->__refcnt before calling dn_insert_route() and also makes sure dnrt_free()/dst_free() is called when removing dst from the hash table. The only difference between those 2 calls is that dnrt_free() waits on the rcu while dst_free() does not. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16rds: tcp: Set linger when rejecting an incoming conn in rds_tcp_accept_oneSowmini Varadhan
Each time we get an incoming SYN to the RDS_TCP_PORT, the TCP layer accepts the connection and then the rds_tcp_accept_one() callback is invoked to process the incoming connection. rds_tcp_accept_one() may reject the incoming syn for a number of reasons, e.g., commit 1a0e100fb2c9 ("RDS: TCP: Force every connection to be initiated by numerically smaller IP address"), or because we are getting spammed by a malicious node that is triggering a flood of connection attempts to RDS_TCP_PORT. If the incoming syn is rejected, no data would have been sent on the TCP socket, and we do not need to be in TIME_WAIT state, so we set linger on the TCP socket before closing, thereby closing the socket efficiently with a RST. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16rds: tcp: various endian-ness fixesSowmini Varadhan
Found when testing between sparc and x86 machines on different subnets, so the address comparison patterns hit the corner cases and brought out some bugs fixed by this patch. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16rds: tcp: remove cp_outgoingSowmini Varadhan
After commit 1a0e100fb2c9 ("RDS: TCP: Force every connection to be initiated by numerically smaller IP address") we no longer need the logic associated with cp_outgoing, so clean up usage of this field. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: Add IFLA_XDP_PROG_IDMartin KaFai Lau
Expose prog_id through IFLA_XDP_PROG_ID. This patch makes modification to generic_xdp. The later patches will modify other xdp-supported drivers. prog_id is added to struct net_dev_xdp. iproute2 patch will be followed. Here is how the 'ip link' will look like: > ip link show eth0 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp(prog_id:1) qdisc fq_codel state UP mode DEFAULT group default qlen 1000 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: add and use skb_put_u8()Johannes Berg
Joe and Bjørn suggested that it'd be nicer to not have the cast in the fairly common case of doing *(u8 *)skb_put(skb, 1) = c; Add skb_put_u8() for this case, and use it across the code, using the following spatch: @@ expression SKB, C, S; typedef u8; identifier fn = {skb_put}; fresh identifier fn2 = fn ## "_u8"; @@ - *(u8 *)fn(SKB, S) = C; + fn2(SKB, C); Note that due to the "S", the spatch isn't perfect, it should have checked that S is 1, but there's also places that use a sizeof expression like sizeof(var) or sizeof(u8) etc. Turns out that nobody ever did something like *(u8 *)skb_put(skb, 2) = c; which would be wrong anyway since the second byte wouldn't be initialized. Suggested-by: Joe Perches <joe@perches.com> Suggested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_push & __skb_push return void pointersJohannes Berg
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) @@ expression SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - fn(SKB, LEN)[0] + *(u8 *)fn(SKB, LEN) Note that the last part there converts from push(...)[0] to the more idiomatic *(u8 *)push(...). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_pull & friends return void pointersJohannes Berg
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_pull, __skb_pull, skb_pull_inline, __pskb_pull_tail, __pskb_pull, pskb_pull }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_pull, __skb_pull, skb_pull_inline, __pskb_pull_tail, __pskb_pull, pskb_pull }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_put & friends return void pointersJohannes Berg
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions (skb_put, __skb_put and pskb_put) return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_put, __skb_put }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_put, __skb_put }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) which actually doesn't cover pskb_put since there are only three users overall. A handful of stragglers were converted manually, notably a macro in drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many instances in net/bluetooth/hci_sock.c. In the former file, I also had to fix one whitespace problem spatch introduced. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: introduce and use skb_put_data()Johannes Berg
A common pattern with skb_put() is to just want to memcpy() some data into the new space, introduce skb_put_data() for this. An spatch similar to the one for skb_put_zero() converts many of the places using it: @@ identifier p, p2; expression len, skb, data; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_data(skb, data, len); | -p = (t)skb_put(skb, len); +p = skb_put_data(skb, data, len); ) ( p2 = (t2)p; -memcpy(p2, data, len); | -memcpy(p, data, len); ) @@ type t, t2; identifier p, p2; expression skb, data; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); ) ( p2 = (t2)p; -memcpy(p2, data, sizeof(*p)); | -memcpy(p, data, sizeof(*p)); ) @@ expression skb, len, data; @@ -memcpy(skb_put(skb, len), data, len); +skb_put_data(skb, data, len); (again, manually post-processed to retain some comments) Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: convert many more places to skb_put_zero()Johannes Berg
There were many places that my previous spatch didn't find, as pointed out by yuan linyu in various patches. The following spatch found many more and also removes the now unnecessary casts: @@ identifier p, p2; expression len; expression skb; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_zero(skb, len); | -p = (t)skb_put(skb, len); +p = skb_put_zero(skb, len); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, len); | -memset(p, 0, len); ) @@ type t, t2; identifier p, p2; expression skb; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, sizeof(*p)); | -memset(p, 0, sizeof(*p)); ) @@ expression skb, len; @@ -memset(skb_put(skb, len), 0, len); +skb_put_zero(skb, len); Apply it to the tree (with one manual fixup to keep the comment in vxlan.c, which spatch removed.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16tls: Depend upon INET not plain NET.David S. Miller
We refer to TCP et al. symbols so have to use INET as the dependency. ERROR: "tcp_prot" [net/tls/tls.ko] undefined! >> ERROR: "tcp_rate_check_app_limited" [net/tls/tls.ko] undefined! ERROR: "tcp_register_ulp" [net/tls/tls.ko] undefined! ERROR: "tcp_unregister_ulp" [net/tls/tls.ko] undefined! ERROR: "do_tcp_sendpages" [net/tls/tls.ko] undefined! Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net: dsa: assign default CPU port to all portsVivien Didelot
The current code only assigns the default cpu_dp to all user ports of the switch to which the CPU port belongs. The user ports of the other switches of the fabric thus don't have a default CPU port. This patch fixes this by assigning the cpu_dp of all user ports of all switches of the fabric when the tree is fully parsed. Fixes: a29342e73911 ("net: dsa: Associate slave network device with CPU port") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net: sched: act_tunnel_key: make UDP checksum configurableJiri Benc
Allow requesting of zero UDP checksum for encapsulated packets. The name and meaning of the attribute is "NO_CSUM" in order to have the same meaning of the attribute missing and being 0. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net: sched: act_tunnel_key: request UDP checksum by defaultJiri Benc
There's currently no way to request (outer) UDP checksum with act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key action with IPv6 does not work without going through hassles: both sides have to have udp6zerocsumrx configured on the tunnel interface. This is obviously not a good solution universally. It makes more sense to compute the UDP checksum by default even for IPv4. Just set the default to request the checksum when using act_tunnel_key. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15tls: kernel TLS supportDave Watson
Software implementation of transport layer security, implemented using ULP infrastructure. tcp proto_ops are replaced with tls equivalents of sendmsg and sendpage. Only symmetric crypto is done in the kernel, keys are passed by setsockopt after the handshake is complete. All control messages are supported via CMSG data - the actual symmetric encryption is the same, just the message type needs to be passed separately. For user API, please see Documentation patch. Pieces that can be shared between hw and sw implementation are in tls_main.c Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functionsDave Watson
Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to sendpages while the socket is already locked. tcp_sendpage is exported, but requires the socket lock to not be held already. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15tcp: ULP infrastructureDave Watson
Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP sockets. Based on a similar infrastructure in tcp_cong. The idea is that any ULP can add its own logic by changing the TCP proto_ops structure to its own methods. Example usage: setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); modules will call: tcp_register_ulp(&tcp_tls_ulp_ops); to register/unregister their ulp, with an init function and name. A list of registered ulps will be returned by tcp_get_available_ulp, which is hooked up to /proc. Example: $ cat /proc/sys/net/ipv4/tcp_available_ulp tls There is currently no functionality to remove or chain ULPs, but it should be possible to add these in the future if needed. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14rxrpc: Cache the congestion window settingDavid Howells
Cache the congestion window setting that was determined during a call's transmission phase when it finishes so that it can be used by the next call to the same peer, thereby shortcutting the slow-start algorithm. The value is stored in the rxrpc_peer struct and is accessed without locking. Each call takes the value that happens to be there when it starts and just overwrites the value when it finishes. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14net: don't global ICMP rate limit packets originating from loopbackJesper Dangaard Brouer
Florian Weimer seems to have a glibc test-case which requires that loopback interfaces does not get ICMP ratelimited. This was broken by commit c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited"). An ICMP response will usually be routed back-out the same incoming interface. Thus, take advantage of this and skip global ICMP ratelimit when the incoming device is loopback. In the unlikely event that the outgoing it not loopback, due to strange routing policy rules, ICMP rate limiting still works via peer ratelimiting via icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812 (section 4.3.2.8 "Rate Limiting"). This seems to fix the reproducer given by Florian. While still avoiding to perform expensive and unneeded outgoing route lookup for rate limited packets (in the non-loopback case). Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: "H.J. Lu" <hjl.tools@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14net/act_pedit: fix an error codeDan Carpenter
I'm reviewing static checker warnings where we do ERR_PTR(0), which is the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL) here. Sometimes these bugs lead to a NULL dereference but I don't immediately see that problem here. Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14net: use skb_unref() in napi_consume_skb()Paolo Abeni
The commit 83ada39bb79d ("net: factor out a helper to decrement the skb refcount") provided and used a helper for decrementing skb usage, but I missed at least a spot for it. This change remove some more duplicated code reusing skb_unref() in napi_consume_skb(), too. The helper uses an additional, unneeded unlikely(!skb) test - napi_consume_skb() already check it a few lines above - but the compiler is smart enough to optimize the duplicated test out. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-06-14 Here's another batch of Bluetooth patches for the 4.13 kernel: - Fix for Broadcom controllers not supporting Event Mask Page 2 - New QCA ROME USB ID for btusb - Fix for Security Manager Protocol to use constant-time memcmp - Improved support for TI WiLink chips Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14bpf: permits narrower load from bpf program context fieldsYonghong Song
Currently, verifier will reject a program if it contains an narrower load from the bpf context structure. For example, __u8 h = __sk_buff->hash, or __u16 p = __sk_buff->protocol __u32 sample_period = bpf_perf_event_data->sample_period which are narrower loads of 4-byte or 8-byte field. This patch solves the issue by: . Introduce a new parameter ctx_field_size to carry the field size of narrower load from prog type specific *__is_valid_access validator back to verifier. . The non-zero ctx_field_size for a memory access indicates (1). underlying prog type specific convert_ctx_accesses supporting non-whole-field access (2). the current insn is a narrower or whole field access. . In verifier, for such loads where load memory size is less than ctx_field_size, verifier transforms it to a full field load followed by proper masking. . Currently, __sk_buff and bpf_perf_event_data->sample_period are supporting narrowing loads. . Narrower stores are still not allowed as typical ctx stores are just normal stores. Because of this change, some tests in verifier will fail and these tests are removed. As a bonus, rename some out of bound __sk_buff->cb access to proper field name and remove two redundant "skb cb oob" tests. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14net_sched: move tcf_lock down after gen_replace_estimator()WANG Cong
Laura reported a sleep-in-atomic kernel warning inside tcf_act_police_init() which calls gen_replace_estimator() with spinlock protection. It is not necessary in this case, we already have RTNL lock here so it is enough to protect concurrent writers. For the reader, i.e. tcf_act_police(), it needs to make decision based on this rate estimator, in the worst case we drop more/less packets than necessary while changing the rate in parallel, it is still acceptable. Reported-by: Laura Abbott <labbott@redhat.com> Reported-by: Nick Huber <nicholashuber@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13net: dsa: Introduce dsa_get_cpu_port()Florian Fainelli
Introduce a helper function which will return a reference to the CPU port used in a dsa_switch_tree. Right now this is a singleton, but this will change once we introduce multi-CPU port support, so ease the transition by converting the affected code paths. Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13net: dsa: Associate slave network device with CPU portFlorian Fainelli
In preparation for supporting multiple CPU ports with DSA, have the dsa_port structure know which CPU it is associated with. This will be important in order to make sure the correct CPU is used for transmission of the frames. If not for functional reasons, for performance (e.g: load balancing) and forwarding decisions. Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13net: dsa: Relocate master ethtool operationsFlorian Fainelli
Relocate master_ethtool_ops and master_orig_ethtool_ops into struct dsa_port in order to be both consistent, and make things self contained within the dsa_port structure. This is a preliminary change to supporting multiple CPU port interfaces. Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13net: dsa: Remove master_netdev and use dst->cpu_dp->netdevFlorian Fainelli
In preparation for supporting multiple CPU ports, remove dst->master_netdev and ds->master_netdev and replace them with only one instance of the common object we have for a port: struct dsa_port::netdev. ds->master_netdev is currently write only and would be helpful in the case where we have two switches, both with CPU ports, and also connected within each other, which the multi-CPU port patch series would address. While at it, introduce a helper function used in net/dsa/slave.c to immediately get a reference on the master network device called dsa_master_netdev(). Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13caif: Add sockaddr length check before accessing sa_family in connect handlerMateusz Jurczyk
Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in the connect() handler of the AF_CAIF socket. Since the syscall doesn't enforce a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>