summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/phy/amd-xgbe-phy.c drivers/net/wireless/iwlwifi/Kconfig include/net/mac80211.h iwlwifi/Kconfig and mac80211.h were both trivial overlapping changes. The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and the bug fix that happened on the 'net' side is already integrated into the rest of the amd-xgbe driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Merge branch 'cxgb4-next'David S. Miller
Hariprasad Shenai says: ==================== cxgb4/cxgb4vf: Adds support for Chelsio T6 adapter This patch series adds the following: Adds NIC driver support for T6 adapter Adds vNIC driver support for T6 adapter This patch series has been created against net-next tree and includes patches on cxgb4 and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. Thanks V2: Fixed compilation issue, when CHELSIO_T4_FCOE is set ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01cxgb4vf: Adds SRIOV driver changes for T6 adapterHariprasad Shenai
Adds vnic driver register related changes for T6 adapter Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01cxgb4: Adds support for T6 adapterHariprasad Shenai
Adds NIC driver related changes for T6 adapter. Register related changes, MC related changes, VF related changes, doorbell related changes, debugfs changes, etc Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01cxgb4: Add is_t6 macro and T6 register rangesHariprasad Shenai
Adds new macro is_t6 and adds the register address range for T6 adapter Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Various VTI tunnel (mark handling, PMTU) bug fixes from Alexander Duyck and Steffen Klassert. 2) Revert ethtool PHY query change, it wasn't correct. The PHY address selected by the driver running the PHY to MAC connection decides what PHY address GET ethtool operations return information from. 3) Fix handling of sequence number bits for encryption IV generation in ESP driver, from Herbert Xu. 4) UDP can return -EAGAIN when we hit a bad checksum on receive, even when there are other packets in the receive queue which is wrong. Just respect the error returned from the generic socket recv datagram helper. From Eric Dumazet. 5) Fix BNA driver firmware loading on big-endian systems, from Ivan Vecera. 6) Fix regression in that we were inheriting the congestion control of the listening socket for new connections, the intended behavior always was to use the default in this case. From Neal Cardwell. 7) Fix NULL deref in brcmfmac driver, from Arend van Spriel. 8) OTP parsing fix in iwlwifi from Liad Kaufman. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) vti6: Add pmtu handling to vti6_xmit. Revert "net: core: 'ethtool' issue with querying phy settings" bnx2x: Move statistics implementation into semaphores xen: netback: read hotplug script once at start of day. xen: netback: fix printf format string warning Revert "netfilter: ensure number of counters is >0 in do_replace()" net: dsa: Properly propagate errors from dsa_switch_setup_one tcp: fix child sockets to use system default congestion control if not set udp: fix behavior of wrong checksums sfc: free multiple Rx buffers when required bna: fix soft lock-up during firmware initialization failure bna: remove unreasonable iocpf timer start bna: fix firmware loading on big-endian machines bridge: fix br_multicast_query_expired() bug via-rhine: Resigning as maintainer brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails mac80211: Fix mac80211.h docbook comments iwlwifi: nvm: fix otp parsing in 8000 hw family iwlwifi: pcie: fix tracking of cmd_in_flight ip_vti/ip6_vti: Preserve skb->mark after rcv_cb call ...
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull Sparc fixes from David Miller: 1) Setup the core/threads/sockets bitmaps correctly so that 'lscpus' and friends operate properly. Frtom Chris Hyser. 2) The bit that normally means "Cached Virtually" on sun4v systems, actually changes meaning in M7 and later chips. Fix from Khalid Aziz. 3) One some PCI-E systems we need to probe different OF properties to fill in the PCI slot information properly, from Eric Snowberg. 4) Kill an extraneous memset after kzalloc(), from Christophe Jaillet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE sparc64: pci slots information is not populated in sysfs sparc: kernel: GRPCI2: Remove a useless memset sparc64: Setup sysfs to mark LDOM sockets, cores and threads correctly
2015-06-01Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fix from Michael Tsirkin: "Last-minute virtio fix for 4.1 This tweaks an exported user-space header to fix build breakage for userspace using it" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h
2015-06-01geneve: allow user to specify TOS info for tunnel framesJohn W. Linville
Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01geneve: allow user to specify TTL for tunnel framesJohn W. Linville
Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Merge branch 'rocker-next'David S. Miller
Scott Feldman says: ==================== rocker: enable by default untagged VLAN support This patch set is a followup to Simon Horman's RFC patch: [PATCH/RFC net-next] rocker: by default accept untagged packets Now, on port probe, we install untagged VLAN (vid=0) support for each port as the default. This is equivalent to the command: bridge vlan add vid 0 dev DEV self Accepting untagged VLAN pkts is a reasonable default, but the user could override this with: bridge vlan del vid 0 dev DEV self With this, we no longer need 8021q module to install vid=0 when port interface opens. In fact, we don't need support for legacy VLAN ndo ops at all since they're superseded by bridge_setlink/dellink. So remove legacy VLAN ndo ops support in driver. (The legacy VLAN ndo ops are supported by bonding/team drivers, but don't fit into the transaction model offered by switchdev, so switching all VLAN functions to bridge_setlink/dellink switchdev support gets us stacked driver + transaction model support). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01rocker: remove support for legacy VLAN ndo opsScott Feldman
Remove support for legacy ndo ops .ndo_vlan_rx_add_vid/.ndo_vlan_rx_kill_vid. Rocker will use bridge_setlink/dellink exclusively for VLAN add/del operations. The legacy ops are needed if using 8021q driver module to setup VLANs on the port. But an alternative exists in using bridge_setlink/delink to setup VLANs, which doesn't depend on 8021q module. So rocker will switch to the newer setlink/dellink ops. VLANs can added/delete from the port, regardless if port is bridged or not, using the bridge commands: bridge vlan [add|del] vid VID dev DEV self (Yes, I agree it's confusing to use the "bridge" command to set a VLAN on a non-bridged port). Using setlink/dellink over legacy ops let's us handle the stacked driver case automatically. It's built-in. setlink also pass additional flags (PVID, egress untagged) that aren't available with the legacy ops. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01rocker: install/remove router MAC for untagged VLAN when joining/leaving bridgeScott Feldman
When the port joins a bridge, the port's internal VLAN ID needs to change to the bridge's internal VLAN ID. Likewise, when leaving the bridge, the internal VLAN ID reverts back the port's original internal VLAN ID. (The internal VLAN ID is used by device to internally mark untagged pkts with some VLAN, which will eventually be removed on egress...think PVID). When the internal VLAN ID changes, we need to update the VLAN table entries and the router MAC entries for IP/IPv6 to reflect the new internal VLAN ID. This patch makes use of the common rocker_port_vlan_add/del functions to make sure the tables are updated for the current internal VLAN ID. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01rocker: install untagged VLAN (vid=0) support for each portScott Feldman
On port probe, install by default untagged VLAN support. This is equivalent to running the command: bridge vlan add vid 0 dev DEV self A user could, if they wanted, manaully removing untagged support from the port by running the command: bridge vlan del vid 0 dev DEV self But installing it by default on port initialization gives the normal expected behavior. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01rocker: cleanup vlan table on error adding vlanScott Feldman
Basic house keeping: If there is an error adding the router MAC for this vlan, removing the just installed VLAN table entry to leave device in same state as before failure. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01rocker: zero allocate ports arrayScott Feldman
When allocating the array of rocker port pointers, zero the array values so we can test for !NULL to see if port is allocated/registered. We'll need this later when installing untagged VLAN support for each port, during port probe. It's a long story, but to install a VLAN (vid=0 for untagged, in this case) on a port, we'll need to scan other ports to see if the VLAN group for that VLAN has been setup. To scan the other ports, we need to walk the port array. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fix for net The following patch reverts the ebtables chunk that enforces counters that was introduced in the recently applied d26e2c9ffa38 ('Revert "netfilter: ensure number of counters is >0 in do_replace()"') since this breaks ebtables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01vlan: Add GRO support for non hardware accelerated vlanToshiaki Makita
Currently packets with non-hardware-accelerated vlan cannot be handled by GRO. This causes low performance for 802.1ad and stacked vlan, as their vlan tags are currently not stripped by hardware. This patch adds GRO support for non-hardware-accelerated vlan and improves receive performance of them. Test Environment: vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599) Result: - Before $ netperf -t TCP_STREAM -H 192.168.20.2 -l 60 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.00 5233.17 Rx side CPU usage: %usr %sys %irq %soft %idle 0.27 58.03 0.00 41.70 0.00 - After $ netperf -t TCP_STREAM -H 192.168.20.2 -l 60 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 60.00 7586.85 Rx side CPU usage: %usr %sys %irq %soft %idle 0.50 25.83 0.00 59.53 14.14 [ Register VLAN offloads with priority 10 -DaveM ] Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01cxgb4: remove unused fn to enable/disable db coalescingHariprasad Shenai
Remove unused function cxgb4_enable_db_coalescing() and cxgb4_disable_db_coalescing() Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Merge tag 'wireless-drivers-for-davem-2015-06-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== iwlwifi: * fix OTP parsing 8260 * fix powersave handling for 8260 brcmfmac: * fix null pointer crash ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01rocker: remove rocker parameter from functions that have rocker_port parameterSimon Horman
The rocker (switch) of a rocker_port may be trivially obtained from the latter it seems cleaner not to pass the former to a function when the latter is being passed anyway. rocker_port_rx_proc() is omitted from this change as it is a hot path case. Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01vti6: Add pmtu handling to vti6_xmit.Steffen Klassert
We currently rely on the PMTU discovery of xfrm. However if a packet is localy sent, the PMTU mechanism of xfrm tries to to local socket notification what might not work for applications like ping that don't check for this. So add pmtu handling to vti6_xmit to report MTU changes immediately. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01bnx2x: Alloc 4k fragment for each rx ring buffer elementGabriel Krisman Bertazi
The driver allocates one page for each buffer on the rx ring, which is too much on architectures like ppc64 and can cause unexpected allocation failures when the system is under stress. Now, we keep a memory pool per queue, and if the architecture's PAGE_SIZE is greater than 4k, we fragment pages and assign each 4k segment to a ring element, which reduces the overall memory consumption on such architectures. This helps avoiding errors like the example below: [bnx2x_alloc_rx_sge:435(eth1)]Can't alloc sge [c00000037ffeb900] [d000000075eddeb4] .bnx2x_alloc_rx_sge+0x44/0x200 [bnx2x] [c00000037ffeb9b0] [d000000075ee0b34] .bnx2x_fill_frag_skb+0x1ac/0x460 [bnx2x] [c00000037ffebac0] [d000000075ee11f0] .bnx2x_tpa_stop+0x160/0x2e8 [bnx2x] [c00000037ffebb90] [d000000075ee1560] .bnx2x_rx_int+0x1e8/0xc30 [bnx2x] [c00000037ffebcd0] [d000000075ee2084] .bnx2x_poll+0xdc/0x3d8 [bnx2x] (unreliable) Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01openvswitch: include datapath actions with sampled-packet upcall to userspaceNeil McKee
If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions in the upcall. This Directly associates the sampled packet with the path it takes through the virtual switch. Path information currently includes mangling, encapsulation and decapsulation actions for tunneling protocols GRE, VXLAN, Geneve, MPLS and QinQ, but this extension requires no further changes to accommodate datapath actions that may be added in the future. Adding path information enhances visibility into complex virtual networks. Signed-off-by: Neil McKee <neil.mckee@inmon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01net: Add priority to packet_offload objects.David S. Miller
When we scan a packet for GRO processing, we want to see the most common packet types in the front of the offload_base list. So add a priority field so we can handle this properly. IPv4/IPv6 get the highest priority with the implicit zero priority field. Next comes ethernet with a priority of 10, and then we have the MPLS types with a priority of 15. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Revert "net: core: 'ethtool' issue with querying phy settings"David S. Miller
This reverts commit f96dee13b8e10f00840124255bed1d8b4c6afd6f. It isn't right, ethtool is meant to manage one PHY instance per netdevice at a time, and this is selected by the SET command. Therefore by definition the GET command must only return the settings for the configured and selected PHY. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01bnx2x: Move statistics implementation into semaphoresYuval Mintz
Commit dff173de84958 ("bnx2x: Fix statistics locking scheme") changed the bnx2x locking around statistics state into using a mutex - but the lock is being accessed via a timer which is forbidden. [If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about accessing the mutex in interrupt context] This moves the implementation into using a semaphore [with size '1'] instead. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01xen: netback: read hotplug script once at start of day.Ian Campbell
When we come to tear things down in netback_remove() and generate the uevent it is possible that the xenstore directory has already been removed (details below). In such cases netback_uevent() won't be able to read the hotplug script and will write a xenstore error node. A recent change to the hypervisor exposed this race such that we now sometimes lose it (where apparently we didn't ever before). Instead read the hotplug script configuration during setup and use it for the lifetime of the backend device. The apparently more obvious fix of moving the transition to state=Closed in netback_remove() to after the uevent does not work because it is possible that we are already in state=Closed (in reaction to the guest having disconnected as it shutdown). Being already in Closed means the toolstack is at liberty to start tearing down the xenstore directories. In principal it might be possible to arrange to unregister the device sooner (e.g on transition to Closing) such that xenstore would still be there but this state machine is fragile and prone to anger... A modern Xen system only relies on the hotplug uevent for driver domains, when the backend is in the same domain as the toolstack it will run the necessary setup/teardown directly in the correct sequence wrt xenstore changes. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01xen: netback: fix printf format string warningIan Campbell
drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=] (txreq.offset&~PAGE_MASK) + txreq.size); ^ PAGE_MASK's type can vary by arch, so a cast is needed. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> ---- v2: Cast to unsigned long, since PAGE_MASK can vary by arch. Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01Revert "netfilter: ensure number of counters is >0 in do_replace()"Bernhard Thaler
This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c. Setting rules with ebtables does not work any more with 1086bbe97a07 place. There is an error message and no rules set in the end. e.g. ~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP Unable to update the kernel. Two possible causes: 1. Multiple ebtables programs were executing simultaneously. The ebtables userspace tool doesn't by default support multiple ebtables programs running Reverting the ebtables part of 1086bbe97a07 makes this work again. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-01include/uapi/linux/virtio_balloon.h: include linux/virtio_types.hMikko Rapeli
Fixes userspace compilation error: error: unknown type name ‘__virtio16’ __virtio16 tag; Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-05-31xen-netfront: Use setup_timerVaishali Thakkar
Use the timer API function setup_timer instead of structure field assignments to initialize a timer. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @change@ expression e, func, da; @@ -init_timer (&e); +setup_timer (&e, func, da); -e.data = da; -e.function = func; Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTEKhalid Aziz
sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE Bit 9 of TTE is CV (Cacheable in V-cache) on sparc v9 processor while the same bit 9 is MCDE (Memory Corruption Detection Enable) on M7 processor. This creates a conflicting usage of the same bit. Kernel sets TTE.cv bit on all pages for sun4v architecture which works well for sparc v9 but enables memory corruption detection on M7 processor which is not the intent. This patch adds code to determine if kernel is running on M7 processor and takes steps to not enable memory corruption detection in TTE erroneously. Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31sparc64: pci slots information is not populated in sysfsEric Snowberg
Add PCI slot numbers within sysfs for PCIe hardware. Larger PCIe systems with nested PCI bridges and slots further down on these bridges were not being populated within sysfs. This will add ACPI style PCI slot numbers for these systems since the OF 'slot-names' information is not available on all PCIe platforms. Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31sparc: kernel: GRPCI2: Remove a useless memsetChristophe Jaillet
grpci2priv is allocated using kzalloc, so there is no need to memset it. Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31net: dsa: Properly propagate errors from dsa_switch_setup_oneFlorian Fainelli
While shuffling some code around, dsa_switch_setup_one() was introduced, and it was modified to return either an error code using ERR_PTR() or a NULL pointer when running out of memory or failing to setup a switch. This is a problem for its caler: dsa_switch_setup() which uses IS_ERR() and expects to find an error code, not a NULL pointer, so we still try to proceed with dsa_switch_setup() and operate on invalid memory addresses. This can be easily reproduced by having e.g: the bcm_sf2 driver built-in, but having no such switch, such that drv->setup will fail. Fix this by using PTR_ERR() consistently which is both more informative and avoids for the caller to use IS_ERR_OR_NULL(). Fixes: df197195a5248 ("net: dsa: split dsa_switch_setup into two functions") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31tcp: fix child sockets to use system default congestion control if not setNeal Cardwell
Linux 3.17 and earlier are explicitly engineered so that if the app doesn't specifically request a CC module on a listener before the SYN arrives, then the child gets the system default CC when the connection is established. See tcp_init_congestion_control() in 3.17 or earlier, which says "if no choice made yet assign the current value set as default". The change ("net: tcp: assign tcp cong_ops when tcp sk is created") altered these semantics, so that children got their parent listener's congestion control even if the system default had changed after the listener was created. This commit returns to those original semantics from 3.17 and earlier, since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]: Congestion control initialization."), and some Linux congestion control workflows depend on that. In summary, if a listener socket specifically sets TCP_CONGESTION to "x", or the route locks the CC module to "x", then the child gets "x". Otherwise the child gets current system default from net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and earlier, and this commit restores that. Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created") Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31Merge branch 'rds-next'David S. Miller
Sowmini Varadhan says: ==================== net/rds: SOL_RDS socket option to explicitly select transport Today the underlying transport (TCP or IB) for a PF_RDS socket is implicitly selected based on the local address used to bind(2) the PF_RDS socket. This results in some non-deterministic behavior when there are un-numbered and IPoIB interfaces sharing the same IP address. It also places the constraint that the IB interface must have an IP address (and thus, IPoIB) configured on it. The non-determinism may be avoided by providing the user-space application a socket option that allows it to explicitly select the transport prior to bind(2). Patch 1 of this series provides the constant definitions needed by the application via <linux/rds.h>. Patch 2 provides the setsockopt support, and Patch 3 provides the getsockopt support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31net/rds Add getsockopt support for SO_RDS_TRANSPORTSowmini Varadhan
The currently attached transport for a PF_RDS socket may be obtained from user space by invoking getsockopt(2) using the SO_RDS_TRANSPORT option at the SOL_RDS level. The integer optval returned will be one of the RDS_TRANS_* constants defined in linux/rds.h. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31net/rds: Add setsockopt support for SO_RDS_TRANSPORTSowmini Varadhan
An application may deterministically attach the underlying transport for a PF_RDS socket by invoking setsockopt(2) with the SO_RDS_TRANSPORT option at the SOL_RDS level. The integer argument to setsockopt must be one of the RDS_TRANS_* transport types, e.g., RDS_TRANS_TCP. The option must be specified before invoking bind(2) on the socket, and may only be used once on the socket. An attempt to set the option on a bound socket, or to invoke the option after a successful SO_RDS_TRANSPORT attachment, will return EOPNOTSUPP. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31net/rds: Declare SO_RDS_TRANSPORT and RDS_TRANS_* constants in uapi/linux/rds.hSowmini Varadhan
User space applications that desire to explicitly select the underlying transport for a PF_RDS socket may do so by using the SO_RDS_TRANSPORT socket option at the SOL_RDS level before bind(). The integer argument provided to the socket option would be one of the RDS_TRANS_* values, e.g., RDS_TRANS_TCP. This commit exports the constant values need by such applications via <linux/rds.h> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31ethernet/intel: Use setup_timerVaishali Thakkar
Use the timer API function setup_timer instead of structure field assignments to initialize a timer. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @change@ expression e1, e2, e3, e4, a, b; @@ -init_timer(&e1); +setup_timer(&e1, a, b); ... when != a = e2 when != b = e3 -e1.function = a; ... when != b = e4 -e1.data = b; Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31ebpf: misc core cleanupDaniel Borkmann
Besides others, move bpf_tail_call_proto to the remaining definitions of other protos, improve comments a bit (i.e. remove some obvious ones, where the code is already self-documenting, add objectives for others), simplify bpf_prog_array_compatible() a bit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31ebpf: allow bpf_ktime_get_ns_proto also for networkingDaniel Borkmann
As this is already exported from tracing side via commit d9847d310ab4 ("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might as well want to move it to the core, so also networking users can make use of it, e.g. to measure diffs for certain flows from ingress/egress. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31udp: fix behavior of wrong checksumsEric Dumazet
We have two problems in UDP stack related to bogus checksums : 1) We return -EAGAIN to application even if receive queue is not empty. This breaks applications using edge trigger epoll() 2) Under UDP flood, we can loop forever without yielding to other processes, potentially hanging the host, especially on non SMP. This patch is an attempt to make things better. We might in the future add extra support for rt applications wanting to better control time spent doing a recv() in a hostile environment. For example we could validate checksums before queuing packets in socket receive queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31isdn/capi: Use setup_timerVaishali Thakkar
Use the timer API function setup_timer instead of structure field assignments to initialize a timer. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @change@ expression e1, e2, e3, e4, a, b; @@ -init_timer(&e1); +setup_timer(&e1, a, b); ... when != a = e2 when != b = e3 -e1.data = b; ... when != a = e4 -e1.function = a; Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31net: dl2k: Use setup_timerVaishali Thakkar
Use the timer API function setup_timer instead of structure field assignments to initialize a timer. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @change@ expression e1, e2, e3, e4, a, b; @@ -init_timer(&e1); +setup_timer(&e1, a, b); ... when != a = e2 when != b = e3 -e1.data = b; ... when != a = e4 -e1.function = a; Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31net: mv643xx_eth: Use setup_timerVaishali Thakkar
Use the timer API function setup_timer instead of structure field assignments to initialize a timer. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @change@ expression e, func, da; @@ -init_timer (&e); +setup_timer (&e, func, da); -e.data = da; -e.function = func; Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31Linux 4.1-rc6Linus Torvalds
2015-05-31sfc: free multiple Rx buffers when requiredDaniel Pieczko
When Rx packet data must be dropped, all the buffers associated with that Rx packet must be freed. Extend and rename efx_free_rx_buffer() to efx_free_rx_buffers() and loop through all the fragments. By doing so this patch fixes a possible memory leak. Signed-off-by: Shradha Shah <sshah@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>