summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-09-12net: dsa: microchip: remove NET_DSA_TAG_KSZ_COMMONGeorge McCollister
Remove the superfluous NET_DSA_TAG_KSZ_COMMON and just use the existing NET_DSA_TAG_KSZ. Update the description to mention the three switch families it supports. No functional change. Signed-off-by: George McCollister <george.mccollister@gmail.com> Reviewed-by: Marek Vasut <marex@denx.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11tcp: force a PSH flag on TSO packetsEric Dumazet
When tcp sends a TSO packet, adding a PSH flag on it reduces the sojourn time of GRO packet in GRO receivers. This is particularly the case under pressure, since RX queues receive packets for many concurrent flows. A sender can give a hint to GRO engines when it is appropriate to flush a super-packet, especially when pacing is in the picture, since next packet is probably delayed by one ms. Having less packets in GRO engine reduces chance of LRU eviction or inflated RTT, and reduces GRO cost. We found recently that we must not set the PSH flag on individual full-size MSS segments [1] : Under pressure (CWR state), we better let the packet sit for a small delay (depending on NAPI logic) so that the ACK packet is delayed, and thus next packet we send is also delayed a bit. Eventually the bottleneck queue can be drained. DCTCP flows with CWND=1 have demonstrated the issue. This patch allows to slowdown the aggregate traffic without involving high resolution timers on senders and/or receivers. It has been used at Google for about four years, and has been discussed at various networking conferences. [1] segments smaller than MSS already have PSH flag set by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE has been requested by the user. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11ipv6: Don't use dst gateway directly in ip6_confirm_neigh()Stefano Brivio
This is the equivalent of commit 2c6b55f45d53 ("ipv6: fix neighbour resolution with raw socket") for ip6_confirm_neigh(): we can send a packet with MSG_CONFIRM on a raw socket for a connected route, so the gateway would be :: here, and we should pick the next hop using rt6_nexthop() instead. This was found by code review and, to the best of my knowledge, doesn't actually fix a practical issue: the destination address from the packet is not considered while confirming a neighbour, as ip6_confirm_neigh() calls choose_neigh_daddr() without passing the packet, so there are no similar issues as the one fixed by said commit. A possible source of issues with the existing implementation might come from the fact that, if we have a cached dst, we won't consider it, while rt6_nexthop() takes care of that. I might just not be creative enough to find a practical problem here: the only way to affect this with cached routes is to have one coming from an ICMPv6 redirect, but if the next hop is a directly connected host, there should be no topology for which a redirect applies here, and tests with redirected routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on raw sockets destined to a directly connected host. However, directly using the dst gateway here is not consistent anymore with neighbour resolution, and, in general, as we want the next hop, using rt6_nexthop() looks like the only sane way to fetch it. Reported-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11Merge tag 'mac80211-next-for-davem-2019-09-11' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a number of changes, but things are settling down: * a fix in the new 6 GHz channel support * a fix for recent minstrel (rate control) updates for an infinite loop * handle interface type changes better wrt. management frame registrations (for management frames sent to userspace) * add in-BSS RX time to survey information * handle HW rfkill properly if !CONFIG_RFKILL * send deauth on IBSS station expiry, to avoid state mismatches * handle deferred crypto tailroom updates in mac80211 better when device restart happens * fix a spectre-v1 - really a continuation of a previous patch * advertise NL80211_CMD_UPDATE_FT_IES as supported if so * add some missing parsing in VHT extended NSS support * support HE in mac80211_hwsim * let mac80211 drivers determine the max MTU themselves along with the usual cleanups etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11cfg80211: Purge frame registrations on iftype changeDenis Kenzior
Currently frame registrations are not purged, even when changing the interface type. This can lead to potentially weird situations where frames possibly not allowed on a given interface type remain registered due to the type switching happening after registration. The kernel currently relies on userspace apps to actually purge the registrations themselves, this is not something that the kernel should rely on. Add a call to cfg80211_mlme_purge_registrations() to forcefully remove any registrations left over prior to switching the iftype. Cc: stable@vger.kernel.org Signed-off-by: Denis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190828211110.15005-1-denkenz@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11nl80211: Fix possible Spectre-v1 for CQM RSSI thresholdsMasashi Honma
commit 1222a1601488 ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds") was incomplete and requires one more fix to prevent accessing to rssi_thresholds[n] because user can control rssi_thresholds[i] values to make i reach to n. For example, rssi_thresholds = {-400, -300, -200, -100} when last is -34. Cc: stable@vger.kernel.org Fixes: 1222a1601488 ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Link: https://lore.kernel.org/r/20190908005653.17433-1-masashi.honma@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: allow drivers to set max MTUWen Gong
Make it possibly for drivers to adjust the default max_mtu by storing it in the hardware struct and using that value for all interfaces. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/1567738137-31748-1-git-send-email-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11cfg80211: Do not compare with boolean in nl80211_common_reg_change_eventzhong jiang
With the help of boolinit.cocci, we use !nl80211_reg_change_event_fill instead of (nl80211_reg_change_event_fill == false). Meanwhile, Clean up the code. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Link: https://lore.kernel.org/r/1567657537-65472-1-git-send-email-zhongjiang@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: IBSS: send deauth when expiring inactive STAsJohannes Berg
When we expire an inactive station, try to send it a deauth. This helps if it's actually still around, and just has issues with beacon distribution (or we do), and it will not also remove us. Then, if we have shared state, this may not be reset properly, causing problems; for example, we saw a case where aggregation sessions weren't removed properly (due to the TX start being offloaded to firmware and it relying on deauth for stop), causing a lot of traffic to get lost due to the SN reset after remove/add of the peer. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-9-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: don't check if key is NULL in ieee80211_key_link()Luca Coelho
We already assume that key is not NULL and dereference it in a few other places before we check whether it is NULL, so the check is unnecessary. Remove it. Fixes: 96fc6efb9ad9 ("mac80211: IEEE 802.11 Extended Key ID support") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-8-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: clear crypto tx tailroom counter upon keys enableLior Cohen
In case we got a fw restart while roaming from encrypted AP to non-encrypted one, we might end up with hitting a warning on the pending counter crypto_tx_tailroom_pending_dec having a non-zero value. The following comment taken from net/mac80211/key.c explains the rational for the delayed tailroom needed: /* * The reason for the delayed tailroom needed decrementing is to * make roaming faster: during roaming, all keys are first deleted * and then new keys are installed. The first new key causes the * crypto_tx_tailroom_needed_cnt to go from 0 to 1, which invokes * the cost of synchronize_net() (which can be slow). Avoid this * by deferring the crypto_tx_tailroom_needed_cnt decrementing on * key removal for a while, so if we roam the value is larger than * zero and no 0->1 transition happens. * * The cost is that if the AP switching was from an AP with keys * to one without, we still allocate tailroom while it would no * longer be needed. However, in the typical (fast) roaming case * within an ESS this usually won't happen. */ The next flow lead to the warning eventually reported as a bug: 1. Disconnect from encrypted AP 2. Set crypto_tx_tailroom_pending_dec = 1 for the key 3. Schedule work 4. Reconnect to non-encrypted AP 5. Add a new key, setting the tailroom counter = 1 6. Got FW restart while pending counter is set ---> hit the warning While on it, the ieee80211_reset_crypto_tx_tailroom() func was merged into its single caller ieee80211_reenable_keys (previously called ieee80211_enable_keys). Also, we reset the crypto_tx_tailroom_pending_dec and remove the counters warning as we just reset both. Signed-off-by: Lior Cohen <lior2.cohen@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-7-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: remove unnecessary key conditionJohannes Berg
When we reach this point, the key cannot be NULL. Remove the condition that suggests otherwise. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-6-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: list features in WEP/TKIP disable in better orderJohannes Berg
"HE/HT/VHT" is a bit confusing since really the order of development (and possible support) is different - change this to "HT/VHT/HE". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-4-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11cfg80211: always shut down on HW rfkillJohannes Berg
When the RFKILL subsystem isn't available, then rfkill_blocked() always returns false. In the case of hardware rfkill this will be wrong though, as if the hardware reported being killed then it cannot operate any longer. Since we only ever call the rfkill_sync work in this case, just rename it to rfkill_block and always pass "true" for the blocked parameter, rather than passing rfkill_blocked(). We rely on the underlying driver to still reject any new attempt to bring up the device by itself. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830112451.21655-2-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11mac80211: vht: add support VHT EXT NSS BW in parsing VHTMordechay Goodstein
This fixes was missed in parsing the vht capabilities max bw support. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Fixes: e80d642552a3 ("mac80211: copy VHT EXT NSS BW Support/Capable data to station") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/20190830114057.22197-1-luca@coelho.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-11cfg80211: fix boundary value in ieee80211_frequency_to_channel()Arend van Spriel
The boundary value used for the 6G band was incorrect as it would result in invalid 6G channel number for certain frequencies. Reported-by: Amar Singhal <asinghal@codeaurora.org> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://lore.kernel.org/r/1567510772-24263-1-git-send-email-arend.vanspriel@broadcom.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-09-10devlink: add 'reset_dev_on_drv_probe' paramDirk van der Merwe
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the device reset policy on driver probe. This parameter is useful in conjunction with the existing 'fw_load_policy' parameter. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: align non temporal copy to cache linesJakub Kicinski
Unlike normal TCP code TLS has to touch the cache lines it copies into to fill header info. On memory-heavy workloads having non temporal stores and normal accesses targeting the same cache line leads to significant overhead. Measured 3% overhead running 3600 round robin connections with additional memory heavy workload. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: remove the record tail optimizationJakub Kicinski
For TLS device offload the tag/message authentication code are filled in by the device. The kernel merely reserves space for them. Because device overwrites it, the contents of the tag make do no matter. Current code tries to save space by reusing the header as the tag. This, however, leads to an additional frag being created and defeats buffer coalescing (which trickles all the way down to the drivers). Remove this optimization, and try to allocate the space for the tag in the usual way, leave the memory uninitialized. If memory allocation fails rewind the record pointer so that we use the already copied user data as tag. Note that the optimization was actually buggy, as the tag for TLS 1.2 is 16 bytes, but header is just 13, so the reuse may had looked past the end of the page.. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: use RCU for the adder to the offload record listJakub Kicinski
All modifications to TLS record list happen under the socket lock. Since records form an ordered queue readers are only concerned about elements being removed, additions can happen concurrently. Use RCU primitives to ensure the correct access types (READ_ONCE/WRITE_ONCE). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net/tls: unref frags in orderJakub Kicinski
It's generally more cache friendly to walk arrays in order, especially those which are likely not in cache. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-09-06 Here's the main bluetooth-next pull request for the 5.4 kernel. - Cleanups & fixes to btrtl driver - Fixes for Realtek devices in btusb, e.g. for suspend handling - Firmware loading support for BCM4345C5 - hidp_send_message() return value handling fixes - Added support for utilizing Fast Advertising Interval - Various other minor cleanups & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07ipmr: remove hard code cache_resolve_queue_len limitHangbin Liu
This is a re-post of previous patch wrote by David Miller[1]. Phil Karn reported[2] that on busy networks with lots of unresolved multicast routing entries, the creation of new multicast group routes can be extremely slow and unreliable. The reason is we hard-coded multicast route entries with unresolved source addresses(cache_resolve_queue_len) to 10. If some multicast route never resolves and the unresolved source addresses increased, there will be no ability to create new multicast route cache. To resolve this issue, we need either add a sysctl entry to make the cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len limit directly, as we already have the socket receive queue limits of mrouted socket, pointed by David. >From my side, I'd perfer to remove the cache_resolve_queue_len limit instead of creating two more(IPv4 and IPv6 version) sysctl entry. [1] https://lkml.org/lkml/2018/7/22/11 [2] https://lkml.org/lkml/2018/7/21/343 v3: instead of remove cache_resolve_queue_len totally, let's only remove the hard code limit when allocate the unresolved cache, as Eric Dumazet suggested, so we don't need to re-count it in other places. v2: hold the mfc_unres_lock while walking the unresolved list in queue_count(), as Nikolay Aleksandrov remind. Reported-by: Phil Karn <karn@ka9q.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07tcp: ulp: fix possible crash in tcp_diag_get_aux_size()Eric Dumazet
tcp_diag_get_aux_size() can be called with sockets in any state. icsk_ulp_ops is only present for full sockets. For SYN_RECV or TIME_WAIT ones we would access garbage. Fixes: 61723b393292 ("tcp: ulp: add functions to dump ulp-specific information") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Luke Hsiao <lukehsiao@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Cc: Davide Caratti <dcaratti@redhat.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07net: fib_notifier: move fib_notifier_ops from struct net into per-net structJiri Pirko
No need for fib_notifier_ops to be in struct net. It is used only by fib_notifier as a private data. Use net_generic to introduce per-net fib_notifier struct and move fib_notifier_ops there. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add nft_reg_store64() and nft_reg_load64() helpers, from Ander Juaristi. 2) Time matching support, also from Ander Juaristi. 3) VLAN support for nfnetlink_log, from Michael Braun. 4) Support for set element deletions from the packet path, also from Ander. 5) Remove __read_mostly from conntrack spinlock, from Li RongQing. 6) Support for updating stateful objects, this also includes the initial client for this infrastructure: the quota extension. A follow up fix for the control plane also comes in this batch. Patches from Fernando Fernandez Mancera. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add the ability to use unaligned chunks in the AF_XDP umem. By relaxing where the chunks can be placed, it allows to use an arbitrary buffer size and place whenever there is a free address in the umem. Helps more seamless DPDK AF_XDP driver integration. Support for i40e, ixgbe and mlx5e, from Kevin and Maxim. 2) Addition of a wakeup flag for AF_XDP tx and fill rings so the application can wake up the kernel for rx/tx processing which avoids busy-spinning of the latter, useful when app and driver is located on the same core. Support for i40e, ixgbe and mlx5e, from Magnus and Maxim. 3) bpftool fixes for printf()-like functions so compiler can actually enforce checks, bpftool build system improvements for custom output directories, and addition of 'bpftool map freeze' command, from Quentin. 4) Support attaching/detaching XDP programs from 'bpftool net' command, from Daniel. 5) Automatic xskmap cleanup when AF_XDP socket is released, and several barrier/{read,write}_once fixes in AF_XDP code, from Björn. 6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf inclusion as well as libbpf versioning improvements, from Andrii. 7) Several new BPF kselftests for verifier precision tracking, from Alexei. 8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya. 9) And more BPF kselftest improvements all over the place, from Stanislav. 10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub. 11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan. 12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni. 13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin. 14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar. 15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari, Peter, Wei, Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06Bluetooth: hidp: Fix assumptions on the return value of hidp_send_messageDan Elkouby
hidp_send_message was changed to return non-zero values on success, which some other bits did not expect. This caused spurious errors to be propagated through the stack, breaking some drivers, such as hid-sony for the Dualshock 4 in Bluetooth mode. As pointed out by Dan Carpenter, hid-microsoft directly relied on that assumption as well. Fixes: 48d9cc9d85dd ("Bluetooth: hidp: Let hidp_send_message return number of queued bytes") Signed-off-by: Dan Elkouby <streetwalkermc@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-09-06net_sched: act_police: add 2 new attributes to support police 64bit rate and ↵David Dai
peakrate For high speed adapter like Mellanox CX-5 card, it can reach upto 100 Gbits per second bandwidth. Currently htb already supports 64bit rate in tc utility. However police action rate and peakrate are still limited to 32bit value (upto 32 Gbits per second). Add 2 new attributes TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support so that tc utility can use them for 64bit rate and peakrate value to break the 32bit limit, and still keep the backward binary compatibility. Tested-by: David Dai <zdai@linux.vnet.ibm.com> Signed-off-by: David Dai <zdai@linux.vnet.ibm.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06net: openvswitch: Set OvS recirc_id from tc chain indexPaul Blakey
Offloaded OvS datapath rules are translated one to one to tc rules, for example the following simplified OvS rule: recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2) Will be translated to the following tc rule: $ tc filter add dev dev1 ingress \ prio 1 chain 0 proto ip \ flower tcp ct_state -trk \ action ct pipe \ action goto chain 2 Received packets will first travel though tc, and if they aren't stolen by it, like in the above rule, they will continue to OvS datapath. Since we already did some actions (action ct in this case) which might modify the packets, and updated action stats, we would like to continue the proccessing with the correct recirc_id in OvS (here recirc_id(2)) where we left off. To support this, introduce a new skb extension for tc, which will be used for translating tc chain to ovs recirc_id to handle these miss cases. Last tc chain index will be set by tc goto chain action and read by OvS datapath. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05Bluetooth: mgmt: Use struct_size() helperGustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct mgmt_rp_get_connections { ... struct mgmt_addr_info addr[0]; } __packed; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(*rp) + (i * sizeof(struct mgmt_addr_info)); with: struct_size(rp, addr, i) Also, notice that, in this case, variable rp_len is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-09-05Bluetooth: 6lowpan: Make variable header_ops constantNishka Dasgupta
Static variable header_ops, of type header_ops, is used only once, when it is assigned to field header_ops of a variable having type net_device. This corresponding field is declared as const in the definition of net_device. Hence make header_ops constant as well to protect it from unnecessary modification. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-09-05Bluetooth: Add support for utilizing Fast Advertising IntervalSpoorthi Ravishankar Koppad
Changes made to add support for fast advertising interval as per core 4.1 specification, section 9.3.11.2. A peripheral device entering any of the following GAP modes and sending either non-connectable advertising events or scannable undirected advertising events should use adv_fast_interval2 (100ms - 150ms) for adv_fast_period(30s). - Non-Discoverable Mode - Non-Connectable Mode - Limited Discoverable Mode - General Discoverable Mode Signed-off-by: Spoorthi Ravishankar Koppad <spoorthix.k@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-09-05xsk: lock the control mutex in sock_diag interfaceBjörn Töpel
When accessing the members of an XDP socket, the control mutex should be held. This commit fixes that. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: a36b38aa2af6 ("xsk: add sock_diag interface for AF_XDP") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-05xsk: use state member for socket synchronizationBjörn Töpel
Prior the state variable was introduced by Ilya, the dev member was used to determine whether the socket was bound or not. However, when dev was read, proper SMP barriers and READ_ONCE were missing. In order to address the missing barriers and READ_ONCE, we start using the state variable as a point of synchronization. The state member read/write is paired with proper SMP barriers, and from this follows that the members described above does not need READ_ONCE if used in conjunction with state check. In all syscalls and the xsk_rcv path we check if state is XSK_BOUND. If that is the case we do a SMP read barrier, and this implies that the dev, umem and all rings are correctly setup. Note that no READ_ONCE are needed for these variable if used when state is XSK_BOUND (plus the read barrier). To summarize: The members struct xdp_sock members dev, queue_id, umem, fq, cq, tx, rx, and state were read lock-less, with incorrect barriers and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state are read lock-less. When these members are updated, WRITE_ONCE is used. When read, READ_ONCE are only used when read outside the control mutex (e.g. mmap) or, not synchronized with the state member (XSK_BOUND plus smp_rmb()) Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due to the introduce state synchronization (XSK_BOUND plus smp_rmb()). Introducing the state check also fixes a race, found by syzcaller, in xsk_poll() where umem could be accessed when stale. Suggested-by: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-05xsk: avoid store-tearing when assigning umemBjörn Töpel
The umem member of struct xdp_sock is read outside of the control mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid potential store-tearing. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-05xsk: avoid store-tearing when assigning queuesBjörn Töpel
Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid potential store-tearing. These members are read outside of the control mutex in the mmap implementation. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: 37b076933a8e ("xsk: add missing write- and data-dependency barrier") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-05netfilter: nf_tables: fix possible null-pointer dereference in object updateFernando Fernandez Mancera
Not all objects have an update operation. If the object type doesn't implement an update operation and the user tries to update it will hit EOPNOTSUPP. Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-05pppoatm: use %*ph to print small bufferAndy Shevchenko
Use %*ph format to print small buffer as hex string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05Merge tag 'linux-can-next-for-5.4-20190904' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2019-09-04 j1939 this is a pull request for net-next/master consisting of 21 patches. the first 12 patches are by me and target the CAN core infrastructure. They clean up the names of variables , structs and struct members, convert can_rx_register() to use max() instead of open coding it and remove unneeded code from the can_pernet_exit() callback. The next three patches are also by me and they introduce and make use of the CAN midlayer private structure. It is used to hold protocol specific per device data structures. The next patch is by Oleksij Rempel, switches the &net->can.rcvlists_lock from a spin_lock() to a spin_lock_bh(), so that it can be used from NAPI (soft IRQ) context. The next 4 patches are by Kurt Van Dijck, he first updates his email address via mailmap and then extends sockaddr_can to include j1939 members. The final patch is the collective effort of many entities (The j1939 authors: Oliver Hartkopp, Bastian Stender, Elenita Hinds, kbuild test robot, Kurt Van Dijck, Maxime Jayat, Robin van der Gracht, Oleksij Rempel, Marc Kleine-Budde). It adds support of SAE J1939 protocol to the CAN networking stack. SAE J1939 is the vehicle bus recommended practice used for communication and diagnostics among vehicle components. Originating in the car and heavy-duty truck industry in the United States, it is now widely used in other parts of the world. P.S.: This pull request doesn't invalidate my last pull request: "pull-request: can-next 2019-09-03". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net: mpoa: Use kzfree rather than its implementation.zhong jiang
Use kzfree instead of memset() + kfree(). Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05sunrpc: Use kzfree rather than its implementation.zhong jiang
Use kzfree instead of memset() + kfree(). Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05vsock/virtio: a better comment on credit updateMichael S. Tsirkin
The comment we have is just repeating what the code does. Include the *reason* for the condition instead. Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net/tls: dedup the record cleanupJakub Kicinski
If retransmit record hint fall into the cleanup window we will free it by just walking the list. No need to duplicate the code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net/tls: clean up the number of #ifdefs for CONFIG_TLS_DEVICEJakub Kicinski
TLS code has a number of #ifdefs which make the code a little harder to follow. Recent fixes removed the ifdef around the TLS_HW define, so we can switch to the often used pattern of defining tls_device functions as empty static inlines in the header when CONFIG_TLS_DEVICE=n. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net/tls: narrow down the critical area of device_offload_lockJakub Kicinski
On setsockopt path we need to hold device_offload_lock from the moment we check netdev is up until the context is fully ready to be added to the tls_device_list. No need to hold it around the get_netdev_for_sock(). Change the code and remove the confusing comment. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net/tls: don't jump to returnJakub Kicinski
Reusing parts of error path for normal exit will make next commit harder to read, untangle the two. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net/tls: use the full sk_proto pointerJakub Kicinski
Since we already have the pointer to the full original sk_proto stored use that instead of storing all individual callback pointers as well. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05Convert usage of IN_MULTICAST to ipv4_is_multicastDave Taht
IN_MULTICAST's primary intent is as a uapi macro. Elsewhere in the kernel we use ipv4_is_multicast consistently. This patch unifies linux's multicast checks to use that function rather than this macro. Signed-off-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net/sched: cbs: remove redundant assignment to variable port_rateColin Ian King
Variable port_rate is being initialized with a value that is never read and is being re-assigned a little later on. The assignment is redundant and hence can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>