summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-10-02tls: Add support for inplace records encryptionVakul Garg
Presently, for non-zero copy case, separate pages are allocated for storing plaintext and encrypted text of records. These pages are stored in sg_plaintext_data and sg_encrypted_data scatterlists inside record structure. Further, sg_plaintext_data & sg_encrypted_data are passed to cryptoapis for record encryption. Allocating separate pages for plaintext and encrypted text is inefficient from both required memory and performance point of view. This patch adds support of inplace encryption of records. For non-zero copy case, we reuse the pages from sg_encrypted_data scatterlist to copy the application's plaintext data. For the movement of pages from sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new function move_to_plaintext_sg(). This function add pages into sg_plaintext_data from sg_encrypted_data scatterlists. tls_do_encryption() is modified to pass the same scatterlist as both source and destination into aead_request_set_crypt() if inplace crypto has been enabled. A new ariable 'inplace_crypto' has been introduced in record structure to signify whether the same scatterlist can be used. By default, the inplace_crypto is enabled in get_rec(). If zero-copy is used (i.e. plaintext data is not copied), inplace_crypto is set to '0'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Reviewed-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2018-09-30 Here's the first bluetooth-next pull request for the 4.20 kernel. - Fixes & cleanups to hci_qca driver - NULL dereference fix to debugfs - Improved L2CAP Connection-oriented Channel MTU & MPS handling - Added support for USB-based RTL8822C controller - Added device ID for BCM4335C0 UART-based controller - Various other smaller cleanups & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: usbnet: make driver_info constBen Dooks
The driver_info field that is used for describing each of the usb-net drivers using the usbnet.c core all declare their information as const and the usbnet.c itself does not try and modify the struct. It is therefore a good idea to make this const in the usbnet.c structure in case anyone tries to modify it. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02tcp: do not release socket ownership in tcp_close()Eric Dumazet
syzkaller was able to hit the WARN_ON(sock_owned_by_user(sk)); in tcp_close() While a socket is being closed, it is very possible other threads find it in rtnetlink dump. tcp_get_info() will acquire the socket lock for a short amount of time (slow = lock_sock_fast(sk)/unlock_sock_fast(sk, slow);), enough to trigger the warning. Fixes: 67db3e4bfbc9 ("tcp: no longer hold ehash lock while calling tcp_get_info()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: remove 1 always zero parameter from ip6_redirect_no_header()Maciej Żenczykowski
(the parameter in question is mark) Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02qed: Add driver support for 20G link speed.Sudarsana Reddy Kalluru
Add driver support for configuring/reading the 20G link speed. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: drop unused skb_append_datato_frags()Paolo Abeni
This helper is unused since commit 988cf74deb45 ("inet: Stop generating UFO packets.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01tcp/fq: move back to CLOCK_MONOTONICEric Dumazet
In the recent TCP/EDT patch series, I switched TCP and sch_fq clocks from MONOTONIC to TAI, in order to meet the choice done earlier for sch_etf packet scheduler. But sure enough, this broke some setups were the TAI clock jumps forward (by almost 50 year...), as reported by Leonard Crestez. If we want to converge later, we'll probably need to add an skb field to differentiate the clock bases, or a socket option. In the meantime, an UDP application will need to use CLOCK_MONOTONIC base for its SCM_TXTIME timestamps if using fq packet scheduler. Fixes: 72b0094f9182 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base") Fixes: 142537e41923 ("net_sched: sch_fq: switch to CLOCK_TAI") Fixes: fd2bca2aa789 ("tcp: switch internal pacing timer to CLOCK_TAI") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Leonard Crestez <leonard.crestez@nxp.com> Tested-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: improve handling delayed workHeiner Kallweit
Using mod_delayed_work() allows to simplify handling delayed work and removes the need for the sync parameter in phy_trigger_machine(). Also introduce a helper phy_queue_state_machine() to encapsulate the low-level delayed work calls. No functional change intended. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01netlink: add validation function to policyJohannes Berg
Add the ability to have an arbitrary validation function attached to a netlink policy that doesn't already use the validation_data pointer in another way. This can be useful to validate for example the content of a binary attribute, like in nl80211 the "(information) elements", which must be valid streams of "u8 type, u8 length, u8 value[length]". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01netlink: add attribute range validation to policyJohannes Berg
Without further bloating the policy structs, we can overload the `validation_data' pointer with a struct of s16 min, max and use those to validate ranges in NLA_{U,S}{8,16,32,64} attributes. It may sound strange to validate NLA_U32 with a s16 max, but in many cases NLA_U32 is used for enums etc. since there's no size benefit in using a smaller attribute width anyway, due to netlink attribute alignment; in cases like that it's still useful, particularly when the attribute really transports an enum value. Doing so lets us remove quite a bit of validation code, if we can be sure that these attributes aren't used by userspace in places where they're ignored today. To achieve all this, split the 'type' field and introduce a new 'validation_type' field which indicates what further validation (beyond the validation prescribed by the type of the attribute) is done. This currently allows for no further validation (the default), as well as min, max and range checks. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Replace phy driver features u32 with link_mode bitmapAndrew Lunn
This is one step in allowing phylib to make use of link_mode bitmaps, instead of u32 for supported and advertised features. Convert the phy drivers to use bitmaps to indicates the features they support. Build bitmap equivalents of the u32 values at runtime, and have the drivers point to the appropriate bitmap. These bitmaps are shared, and we don't want a driver to modify them. So mark them __ro_after_init. Within phylib, the features bitmap is currently turned back into a u32. This will be removed once the whole of phylib, and the drivers are converted to use bitmaps. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add limkmode equivalents to some of the MII ethtool helpersAndrew Lunn
Add helpers which take a linkmode rather than a u32 ethtool for advertising settings. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add helper for advertise to lcl valueAndrew Lunn
Add a helper to convert the local advertising to an LCL capabilities, which is then used to resolve pause flow control settings. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add helper to convert MII ADV register to a linkmodeAndrew Lunn
The phy_mii_ioctl can be used to write a value into the MII_ADVERTISE register in the PHY. Since this changes the state of the PHY, we need to make the same change to phydev->advertising. Add a helper which can convert the register value to a linkmode. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add phydev_info()Andrew Lunn
Add phydev_info() and make use of it within the phy drivers and core code. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Add phydev_warn()Andrew Lunn
Not all new style LINK_MODE bits can be converted into old style SUPPORTED bits. We need to warn when such a conversion is attempted. Add a helper for this. Convert all pr_warn() calls to phydev_warn() where possible. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net: phy: Move linkmode helpers to somewhere publicAndrew Lunn
phylink has some useful helpers to working with linkmode bitmaps. Move them to there own header so other code can use them. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net/mlx5: Cache the system image guidAlaa Hleihel
The system image guid is a read-only field which is used by the TC offloads code to determine if two mlx5 devices belong to the same ASIC while adding flows. Read this once and save it on the core device rather than querying each time an offloaded flow is added. Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-29tls: Remove redundant vars from tls record structureVakul Garg
Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point to a aad_space and then chain scatterlists sg_plaintext_data, sg_encrypted_data respectively. Rather than using chained scatterlists for plaintext and encrypted data in aead_req, it is efficient to store aad_space in sg_encrypted_data and sg_plaintext_data itself in the first index and get rid of sg_aead_in, sg_aead_in and further chaining. This requires increasing size of sg_encrypted_data & sg_plaintext_data arrarys by 1 to accommodate entry for aad_space. The code which uses sg_encrypted_data and sg_plaintext_data has been modified to skip first index as it points to aad_space. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28Bluetooth: Fix debugfs NULL pointer dereferenceMatias Karhumaa
Fix crash caused by NULL pointer dereference when debugfs functions le_max_key_read, le_max_key_size_write, le_min_key_size_read or le_min_key_size_write and Bluetooth adapter was powered off. Fix is to move max_key_size and min_key_size from smp_dev to hci_dev. At the same time they were renamed to le_max_key_size and le_min_key_size. BUG: unable to handle kernel NULL pointer dereference at 00000000000002e8 PGD 0 P4D 0 Oops: 0000 [#24] SMP PTI CPU: 2 PID: 6255 Comm: cat Tainted: G D OE 4.18.9-200.fc28.x86_64 #1 Hardware name: LENOVO 4286CTO/4286CTO, BIOS 8DET76WW (1.46 ) 06/21/2018 RIP: 0010:le_max_key_size_read+0x45/0xb0 [bluetooth] Code: 00 00 00 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 48 8b 87 c8 00 00 00 48 8d 7c 24 04 48 8b 80 48 0a 00 00 <48> 8b 80 e8 02 00 00 0f b6 48 52 e8 fb b6 b3 ed be 04 00 00 00 48 RSP: 0018:ffffab23c3ff3df0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00007f0b4ca2e000 RCX: ffffab23c3ff3f08 RDX: ffffffffc0ddb033 RSI: 0000000000000004 RDI: ffffab23c3ff3df4 RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000 R10: ffffab23c3ff3ed8 R11: 0000000000000000 R12: ffffab23c3ff3f08 R13: 00007f0b4ca2e000 R14: 0000000000020000 R15: ffffab23c3ff3f08 FS: 00007f0b4ca0f540(0000) GS:ffff91bd5e280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000002e8 CR3: 00000000629fa006 CR4: 00000000000606e0 Call Trace: full_proxy_read+0x53/0x80 __vfs_read+0x36/0x180 vfs_read+0x8a/0x140 ksys_read+0x4f/0xb0 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matias Karhumaa <matias.karhumaa@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-28netlink: add nested array policy validationJohannes Berg
Sometimes nested netlink attributes are just used as arrays, with the nla_type() of each not being used; we have this in nl80211 and e.g. NFTA_SET_ELEM_LIST_ELEMENTS. Add the ability to validate this type of message directly in the policy, by adding the type NLA_NESTED_ARRAY which does exactly this: require a first level of nesting but ignore the attribute type, and then inside each require a second level of nested and validate those attributes against a given policy (if present). Note that some nested array types actually require that all of the entries have the same index, this is possible to express in a nested policy already, apart from the validation that only the one allowed type is used. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28netlink: allow NLA_NESTED to specify nested policy to validateJohannes Berg
Now that we have a validation_data pointer, and the len field in the policy is unused for NLA_NESTED, we can allow using them both to have nested validation. This can be nice in code, although we still have to use nla_parse_nested() or similar which would also take a policy; however, it also serves as documentation in the policy without requiring a look at the code. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28netlink: make validation_data constJohannes Berg
The validation data is only used within the policy that should usually already be const, and isn't changed in any code that uses it. Therefore, make the validation_data pointer const. While at it, remove the duplicate variable in the bitfield validation that I'd otherwise have to change to const. Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28netlink: remove NLA_NESTED_COMPATJohannes Berg
This isn't used anywhere, so we might as well get rid of it. Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-27Bluetooth: L2CAP: Derive rx credits from MTU and MPSLuiz Augusto von Dentz
Give enough rx credits for a full packet instead of using an arbitrary number which may not be enough depending on the MTU and MPS which can cause interruptions while waiting for more credits, also remove debugfs entry for l2cap_le_max_credits. With these changes the credits are restored after each SDU is received instead of using fixed threshold, this way it is garanteed that there will always be enough credits to send a packet without waiting more credits to arrive. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-27Bluetooth: L2CAP: Derive MPS from connection MTULuiz Augusto von Dentz
This ensures the MPS can fit in a single HCI fragment so each segment don't have to be reassembled at HCI level, in addition to that also remove the debugfs entry to configure the MPS. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-27Bluetooth: Add definitions and track LE resolve list modificationAnkit Navik
Add the definitions for adding entries to the LE resolve list and removing entries from the LE resolve list. When the LE resolve list gets changed via HCI commands make sure that the internal storage of the resolve list entries gets updated. Signed-off-by: Ankit Navik <ankit.p.navik@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-09-26net-ipv4: remove 2 always zero parameters from ipv4_redirect()Maciej Żenczykowski
(the parameters in question are mark and flow_flags) Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()Maciej Żenczykowski
(the parameters in question are mark and flow_flags) Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26tcp: expose sk_state in tcp_retransmit_skb tracepointYafang Shao
After sk_state exposed, we can get in which state this retransmission occurs. That could give us more detail for dignostic. For example, if this retransmission occurs in SYN_SENT state, it may also indicates that the syn packet may be dropped on the remote peer due to syn backlog queue full and then we could check the remote peer. BTW,SYNACK retransmission is traced in tcp_retransmit_synack tracepoint. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26net/af_iucv: locate IUCV header via skb_network_header()Julian Wiedmann
This patch attempts to untangle the TX and RX code in qeth from af_iucv's respective HiperTransport path: On the TX side, pointing skb_network_header() at the IUCV header means that qeth_l3_fill_af_iucv_hdr() no longer needs a magical offset to access the header. On the RX side, qeth pulls the (fake) L2 header off the skb like any normal ethernet driver would. This makes working with the IUCV header in af_iucv easier, since we no longer have to assume a fixed skb layout. While at it, replace the open-coded length checks in af_iucv's RX path with pskb_may_pull(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-09-25 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Allow for RX stack hardening by implementing the kernel's flow dissector in BPF. Idea was originally presented at netconf 2017 [0]. Quote from merge commit: [...] Because of the rigorous checks of the BPF verifier, this provides significant security guarantees. In particular, the BPF flow dissector cannot get inside of an infinite loop, as with CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot read outside of packet bounds, because all memory accesses are checked. Also, with BPF the administrator can decide which protocols to support, reducing potential attack surface. Rarely encountered protocols can be excluded from dissection and the program can be updated without kernel recompile or reboot if a bug is discovered. [...] Also, a sample flow dissector has been implemented in BPF as part of this work, from Petar and Willem. [0] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf 2) Add support for bpftool to list currently active attachment points of BPF networking programs providing a quick overview similar to bpftool's perf subcommand, from Yonghong. 3) Fix a verifier pruning instability bug where a union member from the register state was not cleared properly leading to branches not being pruned despite them being valid candidates, from Alexei. 4) Various smaller fast-path optimizations in XDP's map redirect code, from Jesper. 5) Enable to recognize BPF_MAP_TYPE_REUSEPORT_SOCKARRAY maps in bpftool, from Roman. 6) Remove a duplicate check in libbpf that probes for function storage, from Taeung. 7) Fix an issue in test_progs by avoid checking for errno since on success its value should not be checked, from Mauricio. 8) Fix unused variable warning in bpf_getsockopt() helper when CONFIG_INET is not configured, from Anders. 9) Fix a compilation failure in the BPF sample code's use of bpf_flow_keys, from Prashant. 10) Minor cleanups in BPF code, from Yue and Zhong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-09-25 This series contains updates to i40e and xsk. Mariusz fixes an issue where the VF link state was not being updated properly when the PF is down or up. Also cleaned up the promiscuous configuration during a VF reset. Patryk simplifies the code a bit to use the variables for PF and HW that are declared, rather than using the VSI pointers. Cleaned up the message length parameter to several virtchnl functions, since it was not being used (or needed). Harshitha fixes two potential race conditions when trying to change VF settings by creating a helper function to validate that the VF is enabled and that the VSI is set up. Sergey corrects a double "link down" message by putting in a check for whether or not the link is up or going down. Björn addresses an AF_XDP zero-copy issue that buffers passed from userspace to the kernel was leaked when the hardware descriptor ring was torn down. A zero-copy capable driver picks buffers off the fill ring and places them on the hardware receive ring to be completed at a later point when DMA is complete. Similar on the transmit side; The driver picks buffers off the transmit ring and places them on the transmit hardware ring. In the typical flow, the receive buffer will be placed onto an receive ring (completed to the user), and the transmit buffer will be placed on the completion ring to notify the user that the transfer is done. However, if the driver needs to tear down the hardware rings for some reason (interface goes down, reconfiguration and such), the userspace buffers cannot be leaked. They have to be reused or completed back to userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25net: sched: implement tcf_block_refcnt_{get|put}()Vlad Buslov
Implement get/put function for blocks that only take/release the reference and perform deallocation. These functions are intended to be used by unlocked rules update path to always hold reference to block while working with it. They use on new fine-grained locking mechanisms introduced in previous patches in this set, instead of relying on global protection provided by rtnl lock. Extract code that is common with tcf_block_detach_ext() into common function __tcf_block_put(). Extend tcf_block with rcu to allow safe deallocation when it is accessed concurrently. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25net: sched: change tcf block reference counter type to refcount_tVlad Buslov
As a preparation for removing rtnl lock dependency from rules update path, change tcf block reference counter type to refcount_t to allow modification by concurrent users. In block put function perform decrement and check reference counter once to accommodate concurrent modification by unlocked users. After this change tcf_chain_put at the end of block put function is called with block->refcnt==0 and will deallocate block after the last chain is released, so there is no need to manually deallocate block in this case. However, if block reference counter reached 0 and there are no chains to release, block must still be deallocated manually. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25net: sched: add helper function to take reference to QdiscVlad Buslov
Implement function to take reference to Qdisc that relies on rcu read lock instead of rtnl mutex. Function only takes reference to Qdisc if reference counter isn't zero. Intended to be used by unlocked cls API. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25net: sched: extend Qdisc with rcuVlad Buslov
Currently, Qdisc API functions assume that users have rtnl lock taken. To implement rtnl unlocked classifiers update interface, Qdisc API must be extended with functions that do not require rtnl lock. Extend Qdisc structure with rcu. Implement special version of put function qdisc_put_unlocked() that is called without rtnl lock taken. This function only takes rtnl lock if Qdisc reference counter reached zero and is intended to be used as optimization. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25net: sched: rename qdisc_destroy() to qdisc_put()Vlad Buslov
Current implementation of qdisc_destroy() decrements Qdisc reference counter and only actually destroy Qdisc if reference counter value reached zero. Rename qdisc_destroy() to qdisc_put() in order for it to better describe the way in which this function currently implemented and used. Extract code that deallocates Qdisc into new private qdisc_destroy() function. It is intended to be shared between regular qdisc_put() and its unlocked version that is introduced in next patch in this series. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25net: core: netlink: add helper refcount dec and lock functionVlad Buslov
Rtnl lock is encapsulated in netlink and cannot be accessed by other modules directly. This means that reference counted objects that rely on rtnl lock cannot use it with refcounter helper function that atomically releases decrements reference and obtains mutex. This patch implements simple wrapper function around refcount_dec_and_lock that obtains rtnl lock if reference counter value reached 0. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25net: xsk: add a simple buffer reuse queueJakub Kicinski
XSK UMEM is strongly single producer single consumer so reuse of frames is challenging. Add a simple "stash" of FILL packets to reuse for drivers to optionally make use of. This is useful when driver has to free (ndo_stop) or resize a ring with an active AF_XDP ZC socket. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-25Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Version bump conflict in batman-adv, take what's in net-next. iavf conflict, adjustment of netdev_ops in net-next conflicting with poll controller method removal in net. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25Revert "uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct ↵Lubomir Rintel
member name" This changes UAPI, breaking iwd and libell: ell/key.c: In function 'kernel_dh_compute': ell/key.c:205:38: error: 'struct keyctl_dh_params' has no member named 'private'; did you mean 'dh_private'? struct keyctl_dh_params params = { .private = private, ^~~~~~~ dh_private This reverts commit 8a2336e549d385bb0b46880435b411df8d8200e8. Fixes: 8a2336e549d3 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name") Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: David Howells <dhowells@redhat.com> cc: Randy Dunlap <rdunlap@infradead.org> cc: Mat Martineau <mathew.j.martineau@linux.intel.com> cc: Stephan Mueller <smueller@chronox.de> cc: James Morris <jmorris@namei.org> cc: "Serge E. Hallyn" <serge@hallyn.com> cc: Mat Martineau <mathew.j.martineau@linux.intel.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: <stable@vger.kernel.org> Signed-off-by: James Morris <james.morris@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-25Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
Dave writes: "Networking fixes: 1) Fix multiqueue handling of coalesce timer in stmmac, from Jose Abreu. 2) Fix memory corruption in NFC, from Suren Baghdasaryan. 3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi. 4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun. 5) Fix TX done race in mvpp2, from Antoine Tenart. 6) ipv6 metrics leak, from Wei Wang. 7) Adjust firmware version requirements in mlxsw, from Petr Machata. 8) Fix autonegotiation on resume in r8169, from Heiner Kallweit. 9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff Barnhill. 10) Fix double free in devlink, from Dan Carpenter. 11) Fix ethtool regression from UFO feature removal, from Maciej Żenczykowski. 12) Fix drivers that have a ndo_poll_controller() that captures the cpu entirely on loaded hosts by trying to drain all rx and tx queues, from Eric Dumazet. 13) Fix memory corruption with jumbo frames in aquantia driver, from Friedemann Gerold." * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits) net: mvneta: fix the remaining Rx descriptor unmapping issues ip_tunnel: be careful when accessing the inner header mpls: allow routes on ip6gre devices net: aquantia: memory corruption on jumbo frames tun: remove ndo_poll_controller nfp: remove ndo_poll_controller bnxt: remove ndo_poll_controller bnx2x: remove ndo_poll_controller mlx5: remove ndo_poll_controller mlx4: remove ndo_poll_controller i40evf: remove ndo_poll_controller ice: remove ndo_poll_controller igb: remove ndo_poll_controller ixgb: remove ndo_poll_controller fm10k: remove ndo_poll_controller ixgbevf: remove ndo_poll_controller ixgbe: remove ndo_poll_controller bonding: use netpoll_poll_dev() helper netpoll: make ndo_poll_controller() optional rds: Fix build regression. ...
2018-09-24net/tls: Fixed race condition in async encryptionVakul Garg
On processors with multi-engine crypto accelerators, it is possible that multiple records get encrypted in parallel and their encryption completion is notified to different cpus in multicore processor. This leads to the situation where tls_encrypt_done() starts executing in parallel on different cores. In current implementation, encrypted records are queued to tx_ready_list in tls_encrypt_done(). This requires addition to linked list 'tx_ready_list' to be protected. As tls_decrypt_done() could be executing in irq content, it is not possible to protect linked list addition operation using a lock. To fix the problem, we remove linked list addition operation from the irq context. We do tx_ready_list addition/removal operation from application context only and get rid of possible multiple access to the linked list. Before starting encryption on the record, we add it to the tail of tx_ready_list. To prevent tls_tx_records() from transmitting it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'. When record encryption gets completed, tls_encrypt_done() has to only update the 'tx_ready' flag to true & linked list add operation is not required. The changed logic brings some other side benefits. Since the records are always submitted in tls sequence number order for encryption, the tx_ready_list always remains sorted and addition of new records to it does not have to traverse the linked list. Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to 'tx_list'. This is because now, the some of the records at the tail are not ready to transmit. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24neighbour: send netlink notification if NTF_ROUTER changesRoopa Prabhu
send netlink notification if neigh_update results in NTF_ROUTER change and if NEIGH_UPDATE_F_ISROUTER is on. Also move the NTF_ROUTER change function into a helper. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24net/sched: Add hardware specific counters to TC actionsEelco Chaudron
Add additional counters that will store the bytes/packets processed by hardware. These will be exported through the netlink interface for displaying by the iproute2 tc tool Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24net/core: Add new basic hardware counterEelco Chaudron
Add a new hardware specific basic counter, TCA_STATS_BASIC_HW. This can be used to count packets/bytes processed by hardware offload. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23netpoll: make ndo_poll_controller() optionalEric Dumazet
As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. It seems that all networking drivers that do use NAPI for their TX completions, should not provide a ndo_poll_controller(). NAPI drivers have netpoll support already handled in core networking stack, since netpoll_poll_dev() uses poll_napi(dev) to iterate through registered NAPI contexts for a device. This patch allows netpoll_poll_dev() to process NAPI contexts even for drivers not providing ndo_poll_controller(), allowing for following patches in NAPI drivers. Also we export netpoll_poll_dev() so that it can be called by bonding/team drivers in following patches. Reported-by: Song Liu <songliubraving@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23Merge tag 'mfd-fixes-4.19' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Lee writes: "MFD fixes for v4.19 - Fix Dialog DA9063 regulator constraints issue causing failure in probe - Fix OMAP Device Tree compatible strings to match DT" * tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: omap-usb-host: Fix dts probe of children mfd: da9063: Fix DT probing with constraints