summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-11-08net: sched: set TCQ_F_OFFLOADED flag for MQJakub Kicinski
PRIO and RED mark the qdisc with TCQ_F_OFFLOADED upon successful offload, make MQ do the same. The consistency will help with consistent graft callback behaviour. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: sched: red: remove unnecessary red_dump_offload_stats parameterJakub Kicinski
Offload dump helper does not use opt parameter, remove it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: sched: add an offload dump helperJakub Kicinski
Qdisc dump operation of offload-capable qdiscs performs a few extra steps which are identical among all the qdiscs. Add a helper to share this code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08Merge branch 'net-phy-improve-and-simplify-phylib-state-machine'David S. Miller
Heiner Kallweit says: ==================== net: phy: improve and simplify phylib state machine This patch series is based on two axioms: - During autoneg a PHY always reports the link being down - Info in clause 22/45 registers doesn't allow to differentiate between these two states: 1. Link is physically down 2. A link partner is connected and PHY is autonegotiating In both cases "link up" and "aneg finished" bits aren't set. One consequence is that having separate states PHY_NOLINK and PHY_AN isn't needed. By using these two axioms the state machine can be significantly simplified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: phy: use phy_check_link_status in more places in the state machineHeiner Kallweit
Use phy_check_link_status in more places in the state machine. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: phy: remove state PHY_ANHeiner Kallweit
After the recent changes in the state machine state PHY_AN isn't used any longer and can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: phy: add phy_check_link_statusHeiner Kallweit
In few places in the state machine the state is set to PHY_RUNNING or PHY_NOLINK after doing a phy_read_status(). So factor this out to phy_check_link_status(). First use it in phy_start_aneg(): By setting the state to PHY_RUNNING or PHY_NOLINK directly we can remove the code to handle the case that we're using interrupts and aneg was finished already. Definition of phy_link_up and phy_link_down needs to be moved because they are called in the new function. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: phy: remove useless check in state machine case PHY_RESUMINGHeiner Kallweit
If aneg isn't finished yet then the PHY reports the link as down. There's no benefit in setting the state to PHY_AN because the next state machine run would set the status to PHY_NOLINK anyway (except in the meantime aneg has been finished and link is up). Therefore we can set the state to PHY_RUNNING or PHY_NOLINK directly. In addition change the do_carrier parameter in phy_link_down() to true. If carrier was marked as up before (what should never be the case because PHY was in state PHY_HALTED before) then we should mark it as down now. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: phy: remove useless check in state machine case PHY_NOLINKHeiner Kallweit
If aneg is enabled and the PHY reports the link as up then definitely aneg finished successfully. Therefore this check is useless and can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-11-07 This series contains updates to almost all of the Intel wired LAN drivers. Lance Roy replaces a spin lock with lockdep_assert_held() for igbvf driver in move toward trying to remove spin_is_locked(). Colin Ian King fixes a potential null pointer dereference by adding a check in ixgbe. Also fixed the igc driver by properly assigning the return error code of a function call, so that we can properly check it. Shannon Nelson updates the ixgbe driver to not block IPsec offload when in VEPA mode, in VEB mode, IPsec offload is still blocked because the device drops packets into a black hole. Jake adds support for software timestamping for packets sent over ixgbevf. Also modifies i40e, iavf, igb, igc, and ixgbe to delay calling skb_tx_timestamp() to the latest point possible, which is just prior to notifying the hardware of the new Tx packet. Todd adds the new WoL filter flag so that we properly report that we do not support this new feature. YueHaibing from Huawei fixes the igc driver by cleaning up variables that are not "really" used. Dan Carpenter cleans up igc whitespace issues. Miroslav Lichvar fixes e1000e for potential underflow issue in the timecounter, so modify the driver to use timecounter_cyc2time() to allow non-monotonic SYSTIM readings. Sasha provides additional igc cleanups based on community feedback. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07Merge branch 'nfp-add-and-use-tunnel-netdev-helpers'David S. Miller
John Hurley says: ==================== nfp: add and use tunnel netdev helpers A recent patch introduced the function netif_is_vxlan() to verify the tunnel type of a given netdev as vxlan. Add a similar function to detect geneve netdevs and make use of this function in the NFP driver. Also make use of the vxlan helper where applicable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: flower: include geneve as supported offload tunnel typeJohn Hurley
Offload of geneve decap rules is supported in NFP. Include geneve in the check for supported types. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07nfp: flower: use geneve and vxlan helpersJohn Hurley
Make use of the recently added VXLAN and geneve helper functions to determine the type of the netdev from its rtnl_link_ops. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: add netif_is_geneve()John Hurley
Add a helper function to determine if the type of a netdev is geneve based on its rtnl_link_ops. This allows drivers that may wish to offload tunnels to check the underlying type of the device. A recent patch added a similar helper to vxlan.h Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07sfc: add missing NVRAM partition types for EF10Edward Cree
Expose the MUM/SUC Firmware, UEFI Expansion ROM and MC Status partitions of the NIC's NVRAM as MTDs if found on the NIC. The first two are needed in order to properly update them when performing firmware updates; the MC Status partition is used to determine whether a signed firmware image was accepted or rejected by a Secure NIC. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07Merge branch 'vlan-prepare-for-removal-of-VLAN_TAG_PRESENT'David S. Miller
Michał Mirosław says: ==================== net/vlan: prepare for removal of VLAN_TAG_PRESENT This is a preparatory patchset before removing the use of VLAN_TAG_PRESENT bit in skb->vlan_tci as indication of VLAN offload. This set includes only cleanups that allow abstracting of code testing VLAN tag presence in drivers and networking code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/vlan: remove unused #define HAVE_VLAN_GET_TAGMichał Mirosław
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/vlan: include the shift in skb_vlan_tag_get_prio()Michał Mirosław
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/vlan: introduce __vlan_hwaccel_copy_tag() helperMichał Mirosław
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/vlan: introduce __vlan_hwaccel_clear_tag() helperMichał Mirosław
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07inet: minor optimization for backlog setting in listen(2)Yafang Shao
Set the backlog earlier in inet_dccp_listen() and inet_listen(), then we can avoid the redundant setting. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: vlan: add support for tunnel offloadDavide Caratti
GSO tunneled packets are always segmented in software before they are transmitted by a VLAN, even when the lower device can offload tunnel encapsulation and VLAN together (i.e., some bits in NETIF_F_GSO_ENCAP_ALL mask are set in the lower device 'vlan_features'). If we let VLANs have the same tunnel offload capabilities as their lower device, throughput can improve significantly when CPU is limited on the transmitter side. - set NETIF_F_GSO_ENCAP_ALL bits in the VLAN 'hw_features', to ensure that 'features' will have those bits zeroed only when the lower device has no hardware support for tunnel encapsulation. - for the same reason, copy GSO-related bits of 'hw_enc_features' from lower device to VLAN, and ensure to update that value when the lower device changes its features. - set NETIF_F_HW_CSUM bit in the VLAN 'hw_enc_features' if 'real_dev' is able to compute checksums at least for a kind of packets, like done with commit 8403debeead8 ("vlan: Keep NETIF_F_HW_CSUM similar to other software devices"). This avoids software segmentation due to mismatching checksum capabilities between VLAN's 'features' and 'hw_enc_features'. Reported-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07tun: compute the RFS hash only if needed.Paolo Abeni
The tun XDP sendmsg code path, unconditionally computes the symmetric hash of each packet for RFS's sake, even when we could skip it. e.g. when the device has a single queue. This change adds the check already in-place for the skb sendmsg path to avoid unneeded hashing. The above gives small, but measurable, performance gain for VM xmit path when zerocopy is not enabled. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net/wan/fsl_ucc_hdlc: add BQL supportMathias Thore
Add byte queue limits support in the fsl_ucc_hdlc driver. Signed-off-by: Mathias Thore <mathias.thore@infinera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: phy: realtek: load driver for all PHYs with a Realtek OUIHeiner Kallweit
Instead of listing every single PHYID, load the driver for every PHYID with a Realtek OUI, independent of model number and revision. This patch also improves two further aspects: - constify realtek_tbl[] - the mask should have been 0xffffffff instead of 0x001fffff so far, by masking out some bits a PHY from another vendor could have been matched Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: phy: make phy_trigger_machine staticHeiner Kallweit
phy_trigger_machine() is used in phy.c only, so we can make it static. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: dsa: bcm_sf2: fix semicolon.cocci warningskbuild test robot
drivers/net/dsa/bcm_sf2_cfp.c:1168:2-3: Unneeded semicolon drivers/net/dsa/bcm_sf2_cfp.c:532:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: ae7a5aff783c ("net: dsa: bcm_sf2: Keep copy of inserted rules") CC: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: phy: bcm7xxx: Add entry for BCM7255Justin Chen
Add support for BCM7255 EPHY. Signed-off-by: Justin Chen <justinpopo6@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07Merge branch 'udp-gro'David S. Miller
Paolo Abeni says: ==================== udp: implement GRO support This series implements GRO support for UDP sockets, as the RX counterpart of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT"). The core functionality is implemented by the second patch, introducing a new sockopt to enable UDP_GRO, while patch 3 implements support for passing the segment size to the user space via a new cmsg. UDP GRO performs a socket lookup for each ingress packets and aggregate datagram directed to UDP GRO enabled sockets with constant l4 tuple. UDP GRO packets can land on non GRO-enabled sockets, e.g. due to iptables NAT rules, and that could potentially confuse existing applications. The solution adopted here is to de-segment the GRO packet before enqueuing as needed. Since we must cope with packet reinsertion after de-segmentation, the relevant code is factored-out in ipv4 and ipv6 specific helpers and exposed to UDP usage. While the current code can probably be improved, this safeguard ,implemented in the patches 4-7, allows future enachements to enable UDP GSO offload on more virtual devices eventually even on forwarded packets. The last 4 for patches implement some performance and functional self-tests, re-using the existing udpgso infrastructure. The problematic scenario described above is explicitly tested. This revision of the series try to address the feedback provided by Willem and Subash on previous iteration. ==================== Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07selftests: add functionals test for UDP GROPaolo Abeni
Extends the existing udp programs to allow checking for proper GRO aggregation/GSO size, and run the tests via a shell script, using a veth pair with XDP program attached to trigger the GRO code path. rfc v3 -> v1: - use ip route to attach the xdp helper to the veth rfc v2 -> rfc v3: - add missing test program options documentation - fix sporatic test failures (receiver faster than sender) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07selftests: add some benchmark for UDP GROPaolo Abeni
Run on top of veth pair, using a dummy XDP program to enable the GRO. rfc v3 -> v1: - use ip route to attach the xdp helper to the veth Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07selftests: add dummy xdp test helperPaolo Abeni
This trivial XDP program does nothing, but will be used by the next patch to test the GRO path in a net namespace, leveraging the veth XDP implementation. It's added here, despite its 'net' usage, to avoid the duplication of the llc-related makefile boilerplate. rfc v3 -> v1: - move the helper implementation into the bpf directory, don't touch udpgso_bench_rx rfc v2 -> rfc v3: - move 'x' option handling here Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07selftests: add GRO support to udp bench rx programPaolo Abeni
And fix a couple of buglets (port option processing, clean termination on SIGINT). This is preparatory work for GRO tests. rfc v2 -> rfc v3: - use ETH_MAX_MTU Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: cope with UDP GRO packet misdirectionPaolo Abeni
In some scenarios, the GRO engine can assemble an UDP GRO packet that ultimately lands on a non GRO-enabled socket. This patch tries to address the issue explicitly checking for the UDP socket features before enqueuing the packet, and eventually segmenting the unexpected GRO packet, as needed. We must also cope with re-insertion requests: after segmentation the UDP code calls the helper introduced by the previous patches, as needed. Segmentation is performed by a common helper, which takes care of updating socket and protocol stats is case of failure. rfc v3 -> v1 - fix compile issues with rxrpc - when gso_segment returns NULL, treat is as an error - added 'ipv4' argument to udp_rcv_segment() rfc v2 -> rfc v3 - moved udp_rcv_segment() into net/udp.h, account errors to socket and ns, always return NULL or segs list Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ipv6: factor out protocol delivery helperPaolo Abeni
So that we can re-use it at the UDP level in the next patch rfc v3 -> v1: - add the helper declaration into the ipv6 header Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ip: factor out protocol delivery helperPaolo Abeni
So that we can re-use it at the UDP level in a later patch rfc v3 -> v1 - add the helper declaration into the ip header Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: add support for UDP_GRO cmsgPaolo Abeni
When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress datagram size. User-space can use such info to compute the original packets layout. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: implement GRO for plain UDP sockets.Paolo Abeni
This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also eligible for GRO in the rx path: UDP segments directed to such socket are assembled into a larger GSO_UDP_L4 packet. The core UDP GRO support is enabled with setsockopt(UDP_GRO). Initial benchmark numbers: Before: udp rx: 1079 MB/s 769065 calls/s After: udp rx: 1466 MB/s 24877 calls/s This change introduces a side effect in respect to UDP tunnels: after a UDP tunnel creation, now the kernel performs a lookup per ingress UDP packet, while before such lookup happened only if the ingress packet carried a valid internal header csum. rfc v2 -> rfc v3: - fixed typos in macro name and comments - really enforce UDP_GRO_CNT_MAX, instead of UDP_GRO_CNT_MAX + 1 - acquire socket lock in UDP_GRO setsockopt rfc v1 -> rfc v2: - use a new option to enable UDP GRO - use static keys to protect the UDP GRO socket lookup Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07udp: implement complete book-keeping for encap_neededPaolo Abeni
The *encap_needed static keys are enabled by UDP tunnels and several UDP encapsulations type, but they are never turned off. This can cause unneeded overall performance degradation for systems where such features are used transiently. This patch introduces complete book-keeping for such keys, decreasing the usage at socket destruction time, if needed, and avoiding that the same socket could increase the key usage multiple times. rfc v3 -> v1: - add socket lock around udp_tunnel_encap_enable() rfc v2 -> rfc v3: - use udp_tunnel_encap_enable() in setsockopt() Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07Merge branch ↵David S. Miller
'vrf-allow-simultaneous-service-instances-in-default-and-other-VRFs' Mike Manning says: ==================== vrf: allow simultaneous service instances in default and other VRFs Services currently have to be VRF-aware if they are using an unbound socket. One cannot have multiple service instances running in the default and other VRFs for services that are not VRF-aware and listen on an unbound socket. This is because there is no easy way of isolating packets received in the default VRF from those arriving in other VRFs. This series provides this isolation for stream sockets subject to the existing kernel parameter net.ipv4.tcp_l3mdev_accept not being set, given that this is documented as allowing a single service instance to work across all VRF domains. Similarly, net.ipv4.udp_l3mdev_accept is checked for datagram sockets, and net.ipv4.raw_l3mdev_accept is introduced for raw sockets. The functionality applies to UDP & TCP services as well as those using raw sockets, and is for IPv4 and IPv6. Example of running ssh instances in default and blue VRF: $ /usr/sbin/sshd -D $ ip vrf exec vrf-blue /usr/sbin/sshd $ ss -ta | egrep 'State|ssh' State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 0.0.0.0%vrf-blue:ssh 0.0.0.0:* LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:* ESTAB 0 0 192.168.122.220:ssh 192.168.122.1:50282 LISTEN 0 128 [::]%vrf-blue:ssh [::]:* LISTEN 0 128 [::]:ssh [::]:* ESTAB 0 0 [3000::2]%vrf-blue:ssh [3000::9]:45896 ESTAB 0 0 [2000::2]:ssh [2000::9]:46398 v1: - Address Paolo Abeni's comments (patch 4/5) - Fix build when CONFIG_NET_L3_MASTER_DEV not defined (patch 1/5) v2: - Address David Aherns' comments (patches 4/5 and 5/5) - Remove patches 3/5 and 5/5 from series for individual submissions - Include a sysctl for raw sockets as recommended by David Ahern - Expand series into 10 patches and provide improved descriptions v3: - Update description for patch 1/10 and remove patch 6/10 v4: - Set default to enabled for raw socket sysctl as recommended by David Ahern v5: - Address review comments from David Ahern in patches 2-5 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ipv6: do not drop vrf udp multicast packetsDewi Morgan
For bound udp sockets in a vrf, also check the sdif to get the index for ingress devices enslaved to an l3mdev. Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ipv6: handling of multicast packets received in VRFMike Manning
If the skb for multicast packets marked as enslaved to a VRF are received, then the secondary device index should be used to obtain the real device. And verify the multicast address against the enslaved rather than the l3mdev device. Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07ipv6: allow ping to link-local address in VRFMike Manning
If link-local packets are marked as enslaved to a VRF, then to allow ping to the link-local from a vrf, the error handling for IPV6_PKTINFO needs to be relaxed to also allow the pkt ipi6_ifindex to be that of a slave device to the vrf. Note that the real device also needs to be retrieved in icmp6_iif() to set the ipv6 flow oif to this for icmp echo reply handling. The recent commit 24b711edfc34 ("net/ipv6: Fix linklocal to global address with VRF") takes care of this, so the sdif does not need checking here. This fix makes ping to link-local consistent with that to global addresses, in that this can now be done from within the same VRF that the address is in. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07vrf: mark skb for multicast or link-local as enslaved to VRFMike Manning
The skb for packets that are multicast or to a link-local address are not marked as being enslaved to a VRF, if they are received on a socket bound to the VRF. This is needed for ND and it is preferable for the kernel not to have to deal with the additional use-cases if ll or mcast packets are handled as enslaved. However, this does not allow service instances listening on unbound and bound to VRF sockets to distinguish the VRF used, if packets are sent as multicast or to a link-local address. The fix is for the VRF driver to also mark these skb as being enslaved to the VRF. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: fix raw socket lookup device bind matching with VRFsDuncan Eastoe
When there exist a pair of raw sockets one unbound and one bound to a VRF but equal in all other respects, when a packet is received in the VRF context, __raw_v4_lookup() matches on both sockets. This results in the packet being delivered over both sockets, instead of only the raw socket bound to the VRF. The bound device checks in __raw_v4_lookup() are replaced with a call to raw_sk_bound_dev_eq() which correctly handles whether the packet should be delivered over the unbound socket in such cases. In __raw_v6_lookup() the match on the device binding of the socket is similarly updated to use raw_sk_bound_dev_eq() which matches the handling in __raw_v4_lookup(). Importantly raw_sk_bound_dev_eq() takes the raw_l3mdev_accept sysctl into account. Signed-off-by: Duncan Eastoe <deastoe@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFsMike Manning
Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept for datagram sockets. Have this default to enabled for reasons of backwards compatibility. This is so as to specify the output device with cmsg and IP_PKTINFO, but using a socket not bound to the corresponding VRF. This allows e.g. older ping implementations to be run with specifying the device but without executing it in the VRF. If the option is disabled, packets received in a VRF context are only handled by a raw socket bound to the VRF, and correspondingly packets in the default VRF are only handled by a socket not bound to any VRF. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: ensure unbound datagram socket to be chosen when not in a VRFMike Manning
Ensure an unbound datagram skt is chosen when not in a VRF. The check for a device match in compute_score() for UDP must be performed when there is no device match. For this, a failure is returned when there is no device match. This ensures that bound sockets are never selected, even if there is no unbound socket. Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These packets are currently blocked, as flowi6_oif was set to that of the master vrf device, and the ipi6_ifindex is that of the slave device. Allow these packets to be sent by checking the device with ipi6_ifindex has the same L3 scope as that of the bound device of the skt, which is the master vrf device. Note that this check always succeeds if the skt is unbound. Even though the right datagram skt is now selected by compute_score(), a different skt is being returned that is bound to the wrong vrf. The difference between these and stream sockets is the handling of the skt option for SO_REUSEPORT. While the handling when adding a skt for reuse correctly checks that the bound device of the skt is a match, the skts in the hashslot are already incorrect. So for the same hash, a skt for the wrong vrf may be selected for the required port. The root cause is that the skt is immediately placed into a slot when it is created, but when the skt is then bound using SO_BINDTODEVICE, it remains in the same slot. The solution is to move the skt to the correct slot by forcing a rehash. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: ensure unbound stream socket to be chosen when not in a VRFMike Manning
The commit a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev") only ensures that the correct socket is selected for packets in a VRF. However, there is no guarantee that the unbound socket will be selected for packets when not in a VRF. By checking for a device match in compute_score() also for the case when there is no bound device and attaching a score to this, the unbound socket is selected. And if a failure is returned when there is no device match, this ensures that bound sockets are never selected, even if there is no unbound socket. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: allow binding socket in a VRF when there's an unbound socketRobert Shearman
Change the inet socket lookup to avoid packets arriving on a device enslaved to an l3mdev from matching unbound sockets by removing the wildcard for non sk_bound_dev_if and instead relying on check against the secondary device index, which will be 0 when the input device is not enslaved to an l3mdev and so match against an unbound socket and not match when the input device is enslaved. Change the socket binding to take the l3mdev into account to allow an unbound socket to not conflict sockets bound to an l3mdev given the datapath isolation now guaranteed. Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07net: hns3: Remove set but not used variable 'reset_level'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c: In function 'hclge_log_and_clear_ppp_error': drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c:821:24: warning: variable 'reset_level' set but not used [-Wunused-but-set-variable] enum hnae3_reset_type reset_level = HNAE3_NONE_RESET; It never used since introduction in commit 01865a50d78f ("net: hns3: Add enable and process hw errors of TM scheduler") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>