summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-08xfrm: store xfrm_mode directly, not its addressFlorian Westphal
This structure is now only 4 bytes, so its more efficient to cache a copy rather than its address. No significant size difference in allmodconfig vmlinux. With non-modular kernel that has all XFRM options enabled, this series reduces vmlinux image size by ~11kb. All xfrm_mode indirections are gone and all modes are built-in. before (ipsec-next master): text data bss dec filename 21071494 7233140 11104324 39408958 vmlinux.master after this series: 21066448 7226772 11104324 39397544 vmlinux.patched With allmodconfig kernel, the size increase is only 362 bytes, even all the xfrm config options removed in this series are modular. before: text data bss dec filename 15731286 6936912 4046908 26715106 vmlinux.master after this series: 15731492 6937068 4046908 26715468 vmlinux Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: make xfrm modes builtinFlorian Westphal
after previous changes, xfrm_mode contains no function pointers anymore and all modules defining such struct contain no code except an init/exit functions to register the xfrm_mode struct with the xfrm core. Just place the xfrm modes core and remove the modules, the run-time xfrm_mode register/unregister functionality is removed. Before: text data bss dec filename 7523 200 2364 10087 net/xfrm/xfrm_input.o 40003 628 440 41071 net/xfrm/xfrm_state.o 15730338 6937080 4046908 26714326 vmlinux 7389 200 2364 9953 net/xfrm/xfrm_input.o 40574 656 440 41670 net/xfrm/xfrm_state.o 15730084 6937068 4046908 26714060 vmlinux The xfrm*_mode_{transport,tunnel,beet} modules are gone. v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6 ones rather than removing them. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove afinfo pointer from xfrm_modeFlorian Westphal
Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from ipv6 in case of CONFIG_IPV6=m. This change has virtually no effect on vmlinux size, but it reduces afinfo size and allows followup patch to make xfrm modes const. v2: mark if (afinfo) tests as likely (Sabrina) re-fetch afinfo according to inner_mode in xfrm_prepare_input(). Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove output2 indirection from xfrm_modeFlorian Westphal
similar to previous patch: no external module dependencies, so we can avoid the indirection by placing this in the core. This change removes the last indirection from xfrm_mode and the xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore. Before: text data bss dec hex filename 3957 136 0 4093 ffd net/xfrm/xfrm_output.o 587 44 0 631 277 net/ipv4/xfrm4_mode_beet.o 649 32 0 681 2a9 net/ipv4/xfrm4_mode_tunnel.o 625 44 0 669 29d net/ipv6/xfrm6_mode_beet.o 599 32 0 631 277 net/ipv6/xfrm6_mode_tunnel.o After: text data bss dec hex filename 5359 184 0 5543 15a7 net/xfrm/xfrm_output.o 171 24 0 195 c3 net/ipv4/xfrm4_mode_beet.o 171 24 0 195 c3 net/ipv4/xfrm4_mode_tunnel.o 172 24 0 196 c4 net/ipv6/xfrm6_mode_beet.o 172 24 0 196 c4 net/ipv6/xfrm6_mode_tunnel.o v2: fold the *encap_add functions into xfrm*_prepare_output preserve (move) output2 comment (Sabrina) use x->outer_mode->encap, not inner fix a build breakage on ppc (kbuild robot) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove input2 indirection from xfrm_modeFlorian Westphal
No external dependencies on any module, place this in the core. Increase is about 1800 byte for xfrm_input.o. The beet helpers get added to internal header, as they can be reused from xfrm_output.c in the next patch (kernel contains several copies of them in the xfrm{4,6}_mode_beet.c files). Before: text data bss dec filename 5578 176 2364 8118 net/xfrm/xfrm_input.o 1180 64 0 1244 net/ipv4/xfrm4_mode_beet.o 171 40 0 211 net/ipv4/xfrm4_mode_transport.o 1163 40 0 1203 net/ipv4/xfrm4_mode_tunnel.o 1083 52 0 1135 net/ipv6/xfrm6_mode_beet.o 172 40 0 212 net/ipv6/xfrm6_mode_ro.o 172 40 0 212 net/ipv6/xfrm6_mode_transport.o 1056 40 0 1096 net/ipv6/xfrm6_mode_tunnel.o After: text data bss dec filename 7373 200 2364 9937 net/xfrm/xfrm_input.o 587 44 0 631 net/ipv4/xfrm4_mode_beet.o 171 32 0 203 net/ipv4/xfrm4_mode_transport.o 649 32 0 681 net/ipv4/xfrm4_mode_tunnel.o 625 44 0 669 net/ipv6/xfrm6_mode_beet.o 172 32 0 204 net/ipv6/xfrm6_mode_ro.o 172 32 0 204 net/ipv6/xfrm6_mode_transport.o 599 32 0 631 net/ipv6/xfrm6_mode_tunnel.o v2: pass inner_mode to xfrm_inner_mode_encap_remove to fix AF_UNSPEC selector breakage (bisected by Benedict Wong) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove gso_segment indirection from xfrm_modeFlorian Westphal
These functions are small and we only have versions for tunnel and transport mode for ipv4 and ipv6 respectively. Just place the 'transport or tunnel' conditional in the protocol specific function instead of using an indirection. Before: 3226 12 0 3238 net/ipv4/esp4_offload.o 7004 492 0 7496 net/ipv4/ip_vti.o 3339 12 0 3351 net/ipv6/esp6_offload.o 11294 460 0 11754 net/ipv6/ip6_vti.o 1180 72 0 1252 net/ipv4/xfrm4_mode_beet.o 428 48 0 476 net/ipv4/xfrm4_mode_transport.o 1271 48 0 1319 net/ipv4/xfrm4_mode_tunnel.o 1083 60 0 1143 net/ipv6/xfrm6_mode_beet.o 172 48 0 220 net/ipv6/xfrm6_mode_ro.o 429 48 0 477 net/ipv6/xfrm6_mode_transport.o 1164 48 0 1212 net/ipv6/xfrm6_mode_tunnel.o 15730428 6937008 4046908 26714344 vmlinux After: 3461 12 0 3473 net/ipv4/esp4_offload.o 7000 492 0 7492 net/ipv4/ip_vti.o 3574 12 0 3586 net/ipv6/esp6_offload.o 11295 460 0 11755 net/ipv6/ip6_vti.o 1180 64 0 1244 net/ipv4/xfrm4_mode_beet.o 171 40 0 211 net/ipv4/xfrm4_mode_transport.o 1163 40 0 1203 net/ipv4/xfrm4_mode_tunnel.o 1083 52 0 1135 net/ipv6/xfrm6_mode_beet.o 172 40 0 212 net/ipv6/xfrm6_mode_ro.o 172 40 0 212 net/ipv6/xfrm6_mode_transport.o 1056 40 0 1096 net/ipv6/xfrm6_mode_tunnel.o 15730424 6937008 4046908 26714340 vmlinux Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove xmit indirection from xfrm_modeFlorian Westphal
There are only two versions (tunnel and transport). The ip/ipv6 versions are only differ in sizeof(iphdr) vs ipv6hdr. Place this in the core and use x->outer_mode->encap type to call the correct adjustment helper. Before: text data bss dec filename 15730311 6937008 4046908 26714227 vmlinux After: 15730428 6937008 4046908 26714344 vmlinux (about 117 byte increase) v2: use family from x->outer_mode, not inner Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove output indirection from xfrm_modeFlorian Westphal
Same is input indirection. Only exception: we need to export xfrm_outer_mode_output for pktgen. Increases size of vmlinux by about 163 byte: Before: text data bss dec filename 15730208 6936948 4046908 26714064 vmlinux After: 15730311 6937008 4046908 26714227 vmlinux xfrm_inner_extract_output has no more external callers, make it static. v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca) add WARN_ON_ONCE for 'call AF_INET6 related output function, but CONFIG_IPV6=n' case. make xfrm_inner_extract_output static Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: remove input indirection from xfrm_modeFlorian Westphal
No need for any indirection or abstraction here, both functions are pretty much the same and quite small, they also have no external dependencies. xfrm_prepare_input can then be made static. With allmodconfig build, size increase of vmlinux is 25 byte: Before: text data bss dec filename 15730207 6936924 4046908 26714039 vmlinux After: 15730208 6936948 4046908 26714064 vmlinux v2: Fix INET_XFRM_MODE_TRANSPORT name in is-enabled test (Sabrina Dubroca) change copied comment to refer to transport and network header, not skb->{h,nh}, which don't exist anymore. (Sabrina) make xfrm_prepare_input static (Eyal Birger) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: prefer family stored in xfrm_mode structFlorian Westphal
Now that we have the family available directly in the xfrm_mode struct, we can use that and avoid one extra dereference. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08xfrm: place af number into xfrm_mode structFlorian Westphal
This will be useful to know if we're supposed to decode ipv4 or ipv6. While at it, make the unregister function return void, all module_exit functions did just BUG(); there is never a point in doing error checks if there is no way to handle such error. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-03-24vti4: eliminated some duplicate code.Jeremy Sowden
The ipip tunnel introduced in commit dd9ee3444014 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel") largely duplicated the existing vti_input and vti_recv functions. Refactored to deduplicate the common code. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-03-24xfrm: gso partial offload supportBoris Pismenny
This patch introduces support for gso partial ESP offload. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-03-23tcp: remove conditional branches from tcp_mstamp_refresh()Eric Dumazet
tcp_clock_ns() (aka ktime_get_ns()) is using monotonic clock, so the checks we had in tcp_mstamp_refresh() are no longer relevant. This patch removes cpu stall (when the cache line is not hot) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-23net: phy: Correct Cygnus/Omega PHY driver promptFlorian Fainelli
The tristate prompt should have been replaced rather than defined a few lines below, rebase mistake. Fixes: 17cc9821766c ("net: phy: Move Omega PHY entry to Cygnus PHY driver") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22genetlink: make policy common to familyJohannes Berg
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22r8169: use netif_start_queue instead of netif_wake_qeueue in rtl8169_start_xmitHeiner Kallweit
Replace the call to netif_wake_queue in rtl8169_start_xmit with netif_start_queue as we don't need to actually wake up the queue since we are still in mid transmit so we just need to reset the bit so it doesn't prevent the next transmit. (Description shamelessly copied from a mail sent by Alex.) Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22net: phy: aquantia: add downshift supportHeiner Kallweit
Aquantia PHY's of the AQR107 family support the downshift feature. Add support for it as standard PHY tunable so that it can be controlled via ethtool. The AQCS109 supports a proprietary 2-pair 1Gbps mode. If two such PHY's are connected to each other with a 2-pair cable, they may not be able to establish a link if both advertise modes > 1Gbps. v2: - add downshift event detection - warn if downshift occurred - read downshifted rate from vendor register - enable downshift per default on all AQR107 family members Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21Merge branch 'Refactor-flower-classifier-to-remove-dependency-on-rtnl-lock'David S. Miller
Vlad Buslov says: ==================== Refactor flower classifier to remove dependency on rtnl lock Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a third step to remove rtnl lock dependency from TC rules update path. Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are already registered with this flag and only take rtnl lock when qdisc or classifier requires it. Classifiers can indicate that their ops callbacks don't require caller to hold rtnl lock by setting the TCF_PROTO_OPS_DOIT_UNLOCKED flag. The goal of this change is to refactor flower classifier to support unlocked execution and register it with unlocked flag. This patch set implements following changes to make flower classifier concurrency-safe: - Implement reference counting for individual filters. Change fl_get to take reference to filter. Implement tp->ops->put callback that was introduced in cls API patch set to release reference to flower filter. - Use tp->lock spinlock to protect internal classifier data structures from concurrent modification. - Handle concurrent tcf proto deletion by returning EAGAIN, which will cause cls API to retry and create new proto instance or return error to the user (depending on message type). - Handle concurrent insertion of filter with same priority and handle by returning EAGAIN, which will cause cls API to lookup filter again and process it accordingly to netlink message flags. - Extend flower mask with reference counting and protect masks list with masks_lock spinlock. - Prevent concurrent mask insertion by inserting temporary value to masks hash table. This is necessary because mask initialization is a sleeping operation and cannot be done while holding tp->lock. Both chain level and classifier level conflicts are resolved by returning -EAGAIN to cls API that results restart of whole operation. This retry mechanism is a result of fine-grained locking approach used in this and previous changes in series and is necessary to allow concurrent updates on same chain instance. Alternative approach would be to lock the whole chain while updating filters on any of child tp's, adding and removing classifier instances from the chain. However, since most CPU-intensive parts of filter update code are specifically in classifier code and its dependencies (extensions and hw offloads), such approach would negate most of the gains introduced by this change and previous changes in the series when updating same chain instance. Tcf hw offloads API is not changed by this patch set and still requires caller to hold rtnl lock. Refactored flower classifier tracks rtnl lock state by means of 'rtnl_held' flag provided by cls API and obtains the lock before calling hw offloads. Following patch set will lift this restriction and refactor cls hw offloads API to support unlocked execution. With these changes flower classifier is safely registered with TCF_PROTO_OPS_DOIT_UNLOCKED flag in last patch. Changes from V2 to V3: - Rebase on latest net-next Changes from V1 to V2: - Extend cover letter with explanation about retry mechanism. - Rebase on current net-next. - Patch 1: - Use rcu_dereference_raw() for tp->root dereference. - Update comment in fl_head_dereference(). - Patch 2: - Remove redundant check in fl_change error handling code. - Add empty line between error check and new handle assignment. - Patch 3: - Refactor loop in fl_get_next_filter() to improve readability. - Patch 4: - Refactor __fl_delete() to improve readability. - Patch 6: - Fix comment in fl_check_assign_mask(). - Patch 9: - Extend commit message. - Fix error code in comment. - Patch 11: - Fix fl_hw_replace_filter() to always release rtnl lock in error handlers. - Patch 12: - Don't take rtnl lock before calling __fl_destroy_filter() in workqueue context. - Extend commit message with explanation why flower still takes rtnl lock before calling hardware offloads API. Github: <https://github.com/vbuslov/linux/tree/unlocked-flower-cong3> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: set unlocked flag for flower proto opsVlad Buslov
Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock in fl_destroy_filter_work() that is executed on workqueue instead of being called by cls API and is not affected by setting TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower classifier before calling hardware offloads API that has not been updated for unlocked execution. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: track rtnl lock stateVlad Buslov
Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag to internal functions that need to know rtnl lock state. Take rtnl lock before calling tcf APIs that require it (hw offload, bind filter, etc.). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: protect flower classifier state with spinlockVlad Buslov
struct tcf_proto was extended with spinlock to be used by classifiers instead of global rtnl lock. Use it to protect shared flower classifier data structures (handle_idr, mask hashtable and list) and fields of individual filters that can be accessed concurrently. This patch set uses tcf_proto->lock as per instance lock that protects all filters on tcf_proto. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: handle concurrent tcf proto deletionVlad Buslov
Without rtnl lock protection tcf proto can be deleted concurrently. Check tcf proto 'deleting' flag after taking tcf spinlock to verify that no concurrent deletion is in progress. Return EAGAIN error if concurrent deletion detected, which will cause caller to retry and possibly create new instance of tcf proto. Retry mechanism is a result of fine-grained locking approach used in this and previous changes in series and is necessary to allow concurrent updates on same chain instance. Alternative approach would be to lock the whole chain while updating filters on any of child tp's, adding and removing classifier instances from the chain. However, since most CPU-intensive parts of filter update code are specifically in classifier code and its dependencies (extensions and hw offloads), such approach would negate most of the gains introduced by this change and previous changes in the series when updating same chain instance. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: handle concurrent filter insertion in fl_changeVlad Buslov
Check if user specified a handle and another filter with the same handle was inserted concurrently. Return EAGAIN to retry filter processing (in case it is an overwrite request). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: protect masks list with spinlockVlad Buslov
Protect modifications of flower masks list with spinlock to remove dependency on rtnl lock and allow concurrent access. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: handle concurrent mask insertionVlad Buslov
Without rtnl lock protection masks with same key can be inserted concurrently. Insert temporary mask with reference count zero to masks hashtable. This will cause any concurrent modifications to retry. Wait for rcu grace period to complete after removing temporary mask from masks hashtable to accommodate concurrent readers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Suggested-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: add reference counter to flower maskVlad Buslov
Extend fl_flow_mask structure with reference counter to allow parallel modification without relying on rtnl lock. Use rcu read lock to safely lookup mask and increment reference counter in order to accommodate concurrent deletes. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: track filter deletion with flagVlad Buslov
In order to prevent double deletion of filter by concurrent tasks when rtnl lock is not used for synchronization, add 'deleted' filter field. Check value of this field when modifying filters and return error if concurrent deletion is detected. Refactor __fl_delete() to accept pointer to 'last' boolean as argument, and return error code as function return value instead. This is necessary to signal concurrent filter delete to caller. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: introduce reference counting for filtersVlad Buslov
Extend flower filters with reference counting in order to remove dependency on rtnl lock in flower ops and allow to modify filters concurrently. Reference to flower filter can be taken/released concurrently as soon as it is marked as 'unlocked' by last patch in this series. Use atomic reference counter type to make concurrent modifications safe. Always take reference to flower filter while working with it: - Modify fl_get() to take reference to filter. - Implement tp->put() callback as fl_put() function to allow cls API to release reference taken by fl_get(). - Modify fl_change() to assume that caller holds reference to fold and take reference to fnew. - Take reference to filter while using it in fl_walk(). Implement helper functions to get/put filter reference counter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: refactor fl_changeVlad Buslov
As a preparation for using classifier spinlock instead of relying on external rtnl lock, rearrange code in fl_change. The goal is to group the code which changes classifier state in single block in order to allow following commits in this set to protect it from parallel modification with tp->lock. Data structures that require tp->lock protection are mask hashtable and filters list, and classifier handle_idr. fl_hw_replace_filter() is a sleeping function and cannot be called while holding a spinlock. In order to execute all sequence of changes to shared classifier data structures atomically, call fl_hw_replace_filter() before modifying them. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: sched: flower: don't check for rtnl on head dereferenceVlad Buslov
Flower classifier only changes root pointer during init and destroy. Cls API implements reference counting for tcf_proto, so there is no danger of concurrent access to tp when it is being destroyed, even without protection provided by rtnl lock. Implement new function fl_head_dereference() to dereference tp->root without checking for rtnl lock. Use it in all flower function that obtain head pointer instead of rtnl_dereference(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21nfp: remove defines for unused control bitsJakub Kicinski
NFP driver ABI contains bits for L2 switching which were never implemented in initially envisioned form. Remove the defines, and open up the possibility of reclaiming the bits for other uses. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21Merge branch 'rhashtable-cleanups'David S. Miller
NeilBrown says: ==================== Two clean-ups for rhashtable. These two patches make small improvements to rhashtable, but are otherwise unrelated. Thanks to Herbert, Miguel, and Paul for the review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21rhashtable: rename rht_for_each*continue as *from.NeilBrown
The pattern set by list.h is that for_each..continue() iterators start at the next entry after the given one, while for_each..from() iterators start at the given entry. The rht_for_each*continue() iterators are documented as though the start at the 'next' entry, but actually start at the given entry, and they are used expecting that behaviour. So fix the documentation and change the names to *from for consistency with list.h Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21rhashtable: don't hold lock on first table throughout insertion.NeilBrown
rhashtable_try_insert() currently holds a lock on the bucket in the first table, while also locking buckets in subsequent tables. This is unnecessary and looks like a hold-over from some earlier version of the implementation. As insert and remove always lock a bucket in each table in turn, and as insert only inserts in the final table, there cannot be any races that are not covered by simply locking a bucket in each table in turn. When an insert call reaches that last table it can be sure that there is no matchinf entry in any other table as it has searched them all, and insertion never happens anywhere but in the last table. The fact that code tests for the existence of future_tbl while holding a lock on the relevant bucket ensures that two threads inserting the same key will make compatible decisions about which is the "last" table. This simplifies the code and allows the ->rehash field to be discarded. We still need a way to ensure that a dead bucket_table is never re-linked by rhashtable_walk_stop(). This can be achieved by calling call_rcu() inside the locked region, and checking with rcu_head_after_call_rcu() in rhashtable_walk_stop() to see if the bucket table is empty and dead. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21Merge branch 'net-phy-Move-Omega-PHY-entry-to-Cygnus-PHY-driver'David S. Miller
Florian Fainelli says: ==================== net: phy: Move Omega PHY entry to Cygnus PHY driver In order to pave the way for adding some specific Omega PHY features that may not be desirable on other products covered by the bcm7xxx PHY driver, split the Omega PHY entry into the Cygnus PHY driver such that the PHY drivers are reflective of product lines/business units maintaining them within Broadcom. No functional changes intended. ==================== Acked-by: Arun Parameswaran <arun.parameswaran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: phy: Move Omega PHY entry to Cygnus PHY driverFlorian Fainelli
Cygnus and Omega are part of the same business unit and product line, it makes sense to group PHY entries by products such that a platform can select only the drivers that it needs. Bring all the functionality that the BCM7XXX_28NM_GPHY() macro hides for us and remove the Omega PHY entry from bcm7xxx.c. As an added bonus, we now have a proper mdio_device_id entry to permit auto-loading. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: phy: Prepare for moving Omega out of bcm7xxxFlorian Fainelli
The Omega PHY entry was added to bcm7xxx.c out of convenience and this breaks the one driver per product line paradigm that was applied up until now. Since the AFE initialization is shared between Omega and BCM7xxx move the relevant functions to bcm-phy-lib.[ch]. No functional changes introduced. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: dst: remove gc leftoversJulian Wiedmann
Get rid of some obsolete gc-related documentation and macros that were missed in commit 5b7c9a8ff828 ("net: remove dst gc related code"). CC: Wei Wang <weiwan@google.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21Merge branch 'net-broadcom-Remove-print-of-base-address'David S. Miller
Florian Fainelli says: ==================== net: broadcom: Remove print of base address Some broadcom MDIO/switch/Ethernet MAC drivers insist on printing the base register virtual address which has little value. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: systemport: Remove print of base addressFlorian Fainelli
Since commit ad67b74d2469 ("printk: hash addresses printed with %p") pointers are being hashed when printed. Displaying the virtual memory at bootup time is not helpful, especially given we use a dev_info() which already displays the platform device's address. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: dsa: bcm_sf2: Remove print of base addressFlorian Fainelli
Since commit ad67b74d2469 ("printk: hash addresses printed with %p") pointers are being hashed when printed. Displaying the virtual memory at bootup time is not helpful, we use a dev_info() print which already displays the platform device's address. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net: phy: mdio-bcm-unimac: Remove print of base addressFlorian Fainelli
Since commit ad67b74d2469 ("printk: hash addresses printed with %p") pointers are being hashed when printed. Displaying the virtual memory at bootup time is not helpful, especially given we use a dev_info() which already displays the platform device's address. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21ipv6: Remove fallback argument from ip6_hold_safeDavid Ahern
net and null_fallback are redundant. Remove null_fallback in favor of !net check. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21ipv4: Allow amount of dirty memory from fib resizing to be controllableDavid Ahern
fib_trie implementation calls synchronize_rcu when a certain amount of pages are dirty from freed entries. The number of pages was determined experimentally in 2009 (commit c3059477fce2d). At the current setting, synchronize_rcu is called often -- 51 times in a second in one test with an average of an 8 msec delay adding a fib entry. The total impact is a lot of slow down modifying the fib. This is seen in the output of 'time' - the difference between real time and sys+user. For example, using 720,022 single path routes and 'ip -batch'[1]: $ time ./ip -batch ipv4/routes-1-hops real 0m14.214s user 0m2.513s sys 0m6.783s So roughly 35% of the actual time to install the routes is from the ip command getting scheduled out, most notably due to synchronize_rcu (this is observed using 'perf sched timehist'). This patch makes the amount of dirty memory configurable between 64k where the synchronize_rcu is called often (small, low end systems that are memory sensitive) to 64M where synchronize_rcu is called rarely during a large FIB change (for high end systems with lots of memory). The default is 512kB which corresponds to the current setting of 128 pages with a 4kB page size. As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu in a second blocking for up to 30 msec in a single instance, and a total of almost 100 msec across the 4 calls in the second. The trade off is allowing FIB entries to consume more memory in a given time window but but with much better fib insertion rates (~30% increase in prefixes/sec). With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch file runs in: $ time ./ip -batch ipv4/routes-1-hops real 0m9.692s user 0m2.491s sys 0m6.769s So the dead time is reduced to about 1/2 second or <5% of the real time. [1] 'ip' modified to not request ACK messages which improves route insertion times by about 20% Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21tun: Remove unused first parameter of tun_get_iff()Kirill Tkhai
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun deviceKirill Tkhai
In commit f2780d6d7475 "tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device" it was missed that tun may change its net ns, while net ns of socket remains the same as it was created initially. SIOCGSKNS returns net ns of socket, so it is not suitable for obtaining net ns of device. We may have two tun devices with the same names in two net ns, and in this case it's not possible to determ, which of them fd refers to (TUNGETIFF will return the same name). This patch adds new ioctl() cmd for obtaining net ns of a device. Reported-by: Harald Albrecht <harald.albrecht@gmx.net> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21Merge branch 'ipv6-Change-addrconf_f6i_alloc-to-use-ip6_route_info_create'David S. Miller
David Ahern says: ==================== ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create addrconf_f6i_alloc is the last caller of fib6_info_alloc besides ip6_route_info_create. There really is no good reason for it do its own fib6_info initialization, so convert it to call ip6_route_info_create. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21ipv6: Change addrconf_f6i_alloc to use ip6_route_info_createDavid Ahern
Change addrconf_f6i_alloc to generate a fib6_config and call ip6_route_info_create. addrconf_f6i_alloc is the last caller to fib6_info_alloc besides ip6_route_info_create, and there is no reason for it to do its own initialization on a fib6_info. Host routes need to be created even if the device is down, so add a new flag, fc_ignore_dev_down, to fib6_config and update fib6_nh_init to not error out if device is not up. Notes on the conversion: - ip_fib_metrics_init is the same as fib6_config has fc_mx set to NULL and fc_mx_len set to 0 - dst_nocount is handled by the RTF_ADDRCONF flag - dst_host is handled by fc_dst_len = 128 nh_gw does not get set after the conversion to ip6_route_info_create but it should not be set in addrconf_f6i_alloc since this is a host route not a gateway route. Everything else is a straight forward map between fib6_info and fib6_config. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21ipv6: Move setting default metric for routesDavid Ahern
ip6_route_info_create is a low level function for ensuring fc_metric is set. Move the check and default setting to the 2 locations that do not already set fc_metric before calling ip6_route_info_create. This is required for the next patch which moves addrconf allocations to ip6_route_info_create and want the metric for host routes to be 0. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>