linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2014-10-28	ethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting support	Saeed Mahameed
	Added 100M, 20G and 56G ethtool speed reporting support. Update mlx4_en_test_speed self test with the new speeds. Defined new link speeds in include/uapi/linux/ethtool.h: +#define SPEED_20000 20000 +#define SPEED_40000 40000 +#define SPEED_56000 56000 Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net/mlx4_core: Add ethernet backplane autoneg device capability	Saeed Mahameed
	Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap	Saeed Mahameed
	Adding ACCESS REG mlx4 command and use it to implement Query method for PTYS (Port Type and Speed Register). Query and store eth_prot_ctrl dev cap. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support	Saeed Mahameed
	Added support for get_module_info/get_module_eeprom ethtool support for cable info reading. Added new cable types enum in include/uapi/linux/ethtool.h for ethtool use. +#define ETH_MODULE_SFF_8636 0x3 +#define ETH_MODULE_SFF_8636_LEN 256 +#define ETH_MODULE_SFF_8436 0x4 +#define ETH_MODULE_SFF_8436_LEN 256 Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net/mlx4_core: Introduce mlx4_get_module_info for cable module info reading	Saeed Mahameed
	Added new MAD_IFC command to read cable module info with attribute id (0xFF60). Update include/linux/mlx4/device.h with function declaration (mlx4_get_module_info) and the needed defines/enums for future use. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	datapath: Rename last_action() as nla_is_last() and move to netlink.h	Simon Horman
	The original motivation for this change was to allow the helper to be used in files other than actions.c as part of work on an odp select group action. It was as pointed out by Thomas Graf that this helper would be best off living in netlink.h. Furthermore, I think that the generic nature of this helper means it is best off in netlink.h regardless of if it is used more than one .c file or not. Thus, I would like it considered independent of the work on an odp select group action. Cc: Thomas Graf <tgraf@suug.ch> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Andy Zhou <azhou@nicira.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net: pxa168_eth: Fix providing of phy_interface mode on platform_data	Sebastian Hesselbarth
	Do not add phy include to the board file but platform_data include instead. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net: phy: Adding SGMII support for Marvell 88ee1145 driver	Viet Nga Dao
	Additional code to m88e1145_config_init function to allow the driver to support SGMII mode. Signed-off-by: Viet Nga Dao <vndao@altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	ovs: Turn vports with dependencies into separate modules	Thomas Graf
	The internal and netdev vport remain part of openvswitch.ko. Encap vports including vxlan, gre, and geneve can be built as separate modules and are loaded on demand. Modules can be unloaded after use. Datapath ports keep a reference to the vport module during their lifetime. Allows to remove the error prone maintenance of the global list vport_ops_list. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	Merge branch 'unnecessary_resource_check'	David S. Miller
	Varka Bhadram says: ==================== cleanup on resource check This series removes the duplication of sanity check for platform_get_resource() return resource. It will be checked with devm_ioremap_resource() changes since v2: - Merge #1 and #2 patches into single patch - remove the comment changes since v1: - remove NULL dereference on resource_size() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ethernet: samsung: sxgbe: remove unnecessary check	Varka Bhadram
	devm_ioremap_resource checks platform_get_resource() return value. We can remove the duplicate check here. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ethernet: renesas: remove unnecessary check	Varka Bhadram
	devm_ioremap_resource checks platform_get_resource() return value. We can remove the duplicate check here. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ethernet: marvell: remove unnecessary check	Varka Bhadram
	devm_ioremap_resource checks platform_get_resource() return value. We can remove the duplicate check here. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ethernet: apm: xgene: remove unnecessary check	Varka Bhadram
	devm_ioremap_resource checks platform_get_resource() return value. We can remove the duplicate check here. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ethernet: wiznet: remove unnecessary check	Varka Bhadram
	devm_ioremap_resource checks platform_get_resource() return value. We can remove the duplicate check here. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	bridge: Add support for IEEE 802.11 Proxy ARP	Kyeyoon Park
	This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows the AP devices to keep track of the hardware-address-to-IP-address mapping of the mobile devices within the WLAN network. The AP will learn this mapping via observing DHCP, ARP, and NS/NA frames. When a request for such information is made (i.e. ARP request, Neighbor Solicitation), the AP will respond on behalf of the associated mobile device. In the process of doing so, the AP will drop the multicast request frame that was intended to go out to the wireless medium. It was recommended at the LKS workshop to do this implementation in the bridge layer. vxlan.c is already doing something very similar. The DHCP snooping code will be added to the userspace application (hostapd) per the recommendation. This RFC commit is only for IPv4. A similar approach in the bridge layer will be taken for IPv6 as well. Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ipx: remove __inline__ in c file on static	Fabian Frederick
	Let compiler decide what to do with static void __ipxitf_put() Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ipx: remove unnecessary casting on ntohl	Fabian Frederick
	use %08X instead of %08lX and remove casting. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ipx: move extern sysctl_ipx_pprop_broadcasting to header file	Fabian Frederick
	include ipx.h from sysctl_net_ipx.c Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ipv6: include linux/uaccess.h instead of asm/uaccess.h	Fabian Frederick
	Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ipv6: replace min/casting by min_t	Fabian Frederick
	Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	ipv4: remove set but unused variable sha	Fabian Frederick
	unsigned char *sha (source) was already in original git version but was never used. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	Merge branch 's390-next'	David S. Miller
	Frank Blaschka says: ==================== s390: network patches for net-next looks like there was a problem with my previous posting. Hope this time it will work. Sorry for any inconvenience. The patches are mostly cleanups and small enhancements for net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	ctcm: replace sscanf by kstrto function	Thomas Richter
	Since a single integer value is read from the supplied buffer use the kstrto functions instead of sscanf. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	lcs: replace sscanf by kstrto function	Thomas Richter
	Since a single integer value is read from the supplied buffer use the kstrto functions instead of sscanf. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	qeth: s390 ethernet device driver dependency	Thomas Richter
	Compile the s390 10GB ethernet device driver only when ETHERNET has been defined in the kernel configuration file. Right now the qeth device driver is always built regardless of which network connectivity is active. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	qeth: make local functions static in qeth_l3 module	Thomas Richter
	This patch makes 4 local functions static and removes the prototypes from the header file. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	qeth: fix some trace formating issues	Thomas Richter
	This patch fixes trace formatting issues using the QETH_CARD_TEXT_ macro. The total size of each trace entry is 8 bytes. Some of the sprintf formats exceed these 8 bytes (for example using abcd:%d and the converted value needs more than 3 bytes). The solution is to shorten the text prepending the value or use a different format (%x). Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	qeth: qeth_core_main make local functions static	Thomas Richter
	This patch makes some global functions static and removes the prototypes from the header file. Also function qeth_query_card_info is not exported anymore, there is no external user for it, this function should never have been exported in the first place. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	xen-netfront: always keep the Rx ring full of requests	David Vrabel
	A full Rx ring only requires 1 MiB of memory. This is not enough memory that it is useful to dynamically scale the number of Rx requests in the ring based on traffic rates, because: a) Even the full 1 MiB is a tiny fraction of a typically modern Linux VM (for example, the AWS micro instance still has 1 GiB of memory). b) Netfront would have used up to 1 MiB already even with moderate data rates (there was no adjustment of target based on memory pressure). c) Small VMs are going to typically have one VCPU and hence only one queue. Keeping the ring full of Rx requests handles bursty traffic better than trying to converge on an optimal number of requests to keep filled. On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest improved from 5.1 Gbit/s to 5.6 Gbit/s. Gains with more bursty traffic are expected to be higher. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	Merge branch 'sunvnet-napi'	David S. Miller
	Sowmini Varadhan says: ==================== sunvnet: NAPIfy sunvnet This patchset converts the sunvnet driver to use the NAPI framework. Changes since v4 to Patch1: vnet_event accumulates LDC_EVENT_* bits into rx_event. vnet_event_napi() unrolls send_events() logic to process all rx_event bits. Changes since v5: Patch 1: use net_device.h definition for NAPI_POLL_WEIGHT. Drop sparclinux changes (patch3) per David Miller feedback Patch 1 in the series addresses the packet-receive path- all the vnet_event() processing is moved into NAPI context. This patch is dependant on the sparc-next commit: "sparc64: Add vio_set_intr() to enable/disable Rx interrupts" (sparc commit id ca605b7dd740c8909408d67911d8ddd272c2b320) Patch 2 uses RCU to fix race conditions between vnet_port_remove and paths that access/modify port-related state, such as vnet_start_xmit. Patch 3 leverages from the NAPIfied Rx path, dropping superfluous usage of the irqsave/irqrestores on the vio.lock where possible. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	sunvnet: Remove irqsave/irqrestore on vio.lock	Sowmini Varadhan
	After the NAPIfication of sunvnet, we no longer need to synchronize by doing irqsave/restore on vio.lock in the I/O fastpath. NAPI ->poll() is non-reentrant, so all RX processing occurs strictly in a serialized environment. TX reclaim is done in NAPI context, so the netif_tx_lock can be used to serialize critical sections between Tx and Rx paths. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	sunvnet: Use RCU to synchronize port usage with vnet_port_remove()	Sowmini Varadhan
	A vnet_port_remove could be triggered as a result of an ldm-unbind operation by the peer, module unload, or other changes to the inter-vnet-link configuration. When this is concurrent with vnet_start_xmit(), there are several race sequences possible, such as thread 1 thread 2 vnet_start_xmit -> tx_port_find spin_lock_irqsave(&vp->lock..) ret = __tx_port_find(..) spin_lock_irqrestore(&vp->lock..) vio_remove -> .. ->vnet_port_remove spin_lock_irqsave(&vp->lock..) cleanup spin_lock_irqrestore(&vp->lock..) kfree(port) /* attempt to use ret will bomb */ This patch adds RCU locking for port access so that vnet_port_remove will correctly clean up port-related state. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Dwight Engen <dwight.engen@oracle.com> Acked-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	sunvnet: NAPIfy sunvnet	Sowmini Varadhan
	Move Rx packet procssing to the NAPI poll callback. Disable VIO interrupt and unconditioanlly go into NAPI context from vnet_event. Note that we want to minimize the number of LDC STOP/START messages sent. Specifically, do not send a STOP message if vnet_walk_rx does not read all the available descriptors because of the NAPI budget limitation. Instead, note the end index as part of port state, and resume from this index when the next poll callback is triggered. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com> Acked-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-10-23 This series contains updates to i40e and i40evf. Jesse modifies the i40e driver to only notify the firmware on link up/down and qualified module events. Also simplified the job of managing link state by using the admin queue receive event for link events as a signal to tell the driver to update link state. Jeff (me) cleans up the inconsistent use of tabs for indentation in the admin queue command header file. Neerav converts the use of udelay() to usleep_range(). Anjali fixes a bug where receive would stop after some stress by adding a sleep and restart as well as moving the setting of flow control because it should be done at a PF level and not a VSI level. Mitch adds code to handle link events when updating the PF switch, which allows link information to be properly provided to VFS in all cases. Catherine adds driver support for 10GBaseT and bumps driver version. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	net: llc: include linux/errno.h instead of asm/errno.h	Fabian Frederick
	Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	lapb: move EXPORT_SYMBOL after functions.	Fabian Frederick
	See Documentation/CodingStyle Chapter 6 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	Merge branch 'berlin_ethernet'	David S. Miller
	Sebastian Hesselbarth says: ==================== Marvell PXA168 libphy handling and Berlin Ethernet This patch series deals with a removing a IP feature that can be found on all currently supported Marvell Ethernet IP (pxa168_eth, mv643xx_eth, mvneta). The MAC IP allows to automatically perform PHY auto-negotiation without software interaction. However, this feature (a) fundamentally clashes with the way libphy works and (b) is unable to deal with quirky PHYs that require special treatment. In this series, pxa168_eth driver is rewritten to completely disable that feature and properly deal with libphy provided PHYs. As usual, a branch on top of v3.18-rc1 can be found at git://git.infradead.org/users/hesselba/linux-berlin.git devel/bg2-bg2cd-eth-v2 Patches 1-5 should go through David's net tree, I'll pick up the DT patches 6-9. There have been some changes, compared to the RFT - added phy-connection-type property to BG2Q PHY DT node - bail out from pxa168_eth_adjust_link when there is no change in PHY parameters. Also, add a call to phy_print_status. compared to v1 - move phy-connection-type to ethernet node instead of PHY node Patch 1 adds support for Marvell 88E3016 FastEthernet PHY that is also integrated in Marvell Berlin BG2/BG2CD SoCs. Patch 2 allows to pass phy_interface_t on pxa168_eth platform_data that is only used by mach-mmp/gplug. From the board setup, I guessed gplug's PHY is connected via RMII. The patch still isn't even compile tested. Patches 3-5 prepare proper libphy handling and finally remove all in-driver PHY mangling related to the feature explained above. Patches 6-9 add corresponding ethernet DT nodes to BG2, BG2CD, add a phy-connection-type property to BG2Q and enable ethernet on BG2-based Sony NSZ-GS7. I have tested all this on GS7 successfully with ip=dhcp on 100M FD. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	net: pxa168_eth: Remove in-driver PHY mangling	Sebastian Hesselbarth
	With properly using libphy PHYs now, remove the in-driver PHY mangling. Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	net: pxa168_eth: Remove HW auto-negotiaion	Sebastian Hesselbarth
	Marvell Ethernet IP supports PHY negotiation driven by HW. This fundamentally clashes with libphy (software) driven negotiation and also cannot cope with quirky PHYs. Therefore, always disable any HW negotiation features and properly use libphy's phy_device. Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	net: pxa168_eth: Prepare proper libphy handling	Sebastian Hesselbarth
	Current libphy handling in pxa168_eth lacks proper phy_connect. Prepare to fix this by first moving phy properties from platform_data to private driver data. Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	net: pxa168_eth: Provide phy_interface mode on platform_data	Sebastian Hesselbarth
	The PXA168 Ethernet IP support MII and RMII connection to its PHY. Currently, pxa168 platform_data does not provide a way to pass that and there is one user of pxa168 platform_data (mach-mmp/gplug). Given the pinctrl settings of gplug it uses RMII, so add and pass a corresponding phy_interface_t. Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	phy: marvell: Add support for 88E3016 FastEthernet PHY	Sebastian Hesselbarth
	Marvell 88E3016 is a FastEthernet PHY that also can be found in Marvell Berlin SoCs as integrated PHY. Tested-by: Antoine Ténart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	natsemi/macsonic: Remove superfluous interrupt disable/restore	Geert Uytterhoeven
	As of commit e4dc601bf99ccd1c ("m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()"), this is no longer needed. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	cirrus/mac89x0: Remove superfluous interrupt disable/restore	Geert Uytterhoeven
	As of commit e4dc601bf99ccd1c ("m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()"), this is no longer needed. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	net: typhoon: Remove redundant casts	Rasmus Villemoes
	Both image_data and typhoon_fw->data are const u8, so the cast to u8 is unnecessary and confusing. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: David Dillow <dave@thedillows.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	Removed unused function sctp_addr_is_valid()	Sébastien Barré
	sctp_addr_is_valid() only appeared in its definition. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	Merge branch 'ipv6_route'	David S. Miller
	Martin KaFai Lau says: ==================== ipv6: Reduce the number of fib6_lookup() calls from ip6_pol_route() This patch set is trying to reduce the number of fib6_lookup() calls from ip6_pol_route(). I have adapted davem's udpflooda and kbench_mod test (https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git) to support IPv6 and here is the result: Before: [root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done real 0m34.190s user 0m3.047s sys 0m31.108s real 0m34.635s user 0m3.125s sys 0m31.475s real 0m34.517s user 0m3.034s sys 0m31.449s [root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2 [ 660.160976] ip6_route_kbench: ip6_route_output tdiff: 933 [ 660.207261] ip6_route_kbench: ip6_route_output tdiff: 988 [ 660.253492] ip6_route_kbench: ip6_route_output tdiff: 896 [ 660.298862] ip6_route_kbench: ip6_route_output tdiff: 898 After: [root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done real 0m32.695s user 0m2.925s sys 0m29.737s real 0m32.636s user 0m3.007s sys 0m29.596s real 0m32.797s user 0m2.866s sys 0m29.898s [root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2 [ 881.220793] ip6_route_kbench: ip6_route_output tdiff: 684 [ 881.253477] ip6_route_kbench: ip6_route_output tdiff: 640 [ 881.286867] ip6_route_kbench: ip6_route_output tdiff: 630 [ 881.320749] ip6_route_kbench: ip6_route_output tdiff: 653 /**************************** udpflood.c ***************************/ / It is an adaptation of the Eric Dumazet's and David Miller's * udpflood tool, by adding IPv6 support. / typedef uint32_t u32; static int debug =3D 0; / Allow -fstrict-aliasing / typedef union sa_u { struct sockaddr_storage a46; struct sockaddr_in a4; struct sockaddr_in6 a6; } sa_u; static int usage(void) { printf("usage: udpflood [ -l count ] [ -m message_size ] [ -c num_ip_addrs= ] IP_ADDRESS\n"); return -1; } static u32 get_last32h(const sa_u sa) { if (sa->a46.ss_family =3D=3D PF_INET) return ntohl(sa->a4.sin_addr.s_addr); else return ntohl(sa->a6.sin6_addr.s6_addr32[3]); } static void set_last32h(sa_u sa, u32 last32h) { if (sa->a46.ss_family =3D=3D PF_INET) sa->a4.sin_addr.s_addr =3D htonl(last32h); else sa->a6.sin6_addr.s6_addr32[3] =3D htonl(last32h); } static void print_saddr(const sa_u sa, const char msg) { char buf[64]; if (!debug) return; switch (sa->a46.ss_family) { case PF_INET: inet_ntop(PF_INET, &(sa->a4.sin_addr.s_addr), buf, sizeof(buf)); break; case PF_INET6: inet_ntop(PF_INET6, &(sa->a6.sin6_addr), buf, sizeof(buf)); break; } printf("%s: %s\n", msg, buf); } static int send_packets(const sa_u sa, size_t num_addrs, int count, int ms= g_sz) { char msg =3D malloc(msg_sz); sa_u saddr; u32 start_addr32h, end_addr32h, cur_addr32h; int fd, i, err; if (!msg) return -ENOMEM; memset(msg, 0, msg_sz); memcpy(&saddr, sa, sizeof(saddr)); cur_addr32h =3D start_addr32h =3D get_last32h(&saddr); end_addr32h =3D start_addr32h + num_addrs; fd =3D socket(saddr.a46.ss_family, SOCK_DGRAM, 0); if (fd < 0) { perror("socket"); err =3D fd; goto out_nofd; } / connect to avoid the kernel spending time in figuring * out the source address (i.e pin the src address) / err =3D connect(fd, (struct sockaddr ) &saddr, sizeof(saddr)); if (err < 0) { perror("connect"); goto out; } print_saddr(&saddr, "start_addr"); for (i =3D 0; i < count; i++) { print_saddr(&saddr, "sendto"); err =3D sendto(fd, msg, msg_sz, 0, (struct sockaddr )&saddr, sizeof(saddr)); if (err < 0) { perror("sendto"); goto out; } if (++cur_addr32h >=3D end_addr32h) cur_addr32h =3D start_addr32h; set_last32h(&saddr, cur_addr32h); } err =3D 0; out: close(fd); out_nofd: free(msg); return err; } int main(int argc, char argv, char envp) { int port, msg_sz, count, num_addrs, ret; sa_u start_addr; port =3D 6000; msg_sz =3D 32; count =3D 10000000; num_addrs =3D 1; while ((ret =3D getopt(argc, argv, "dl:s:p:c:")) >=3D 0) { switch (ret) { case 'l': sscanf(optarg, "%d", &count); break; case 's': sscanf(optarg, "%d", &msg_sz); break; case 'p': sscanf(optarg, "%d", &port); break; case 'c': sscanf(optarg, "%d", &num_addrs); break; case 'd': debug =3D 1; break; case '?': return usage(); } } if (num_addrs < 1) return usage(); if (!argv[optind]) return usage(); start_addr.a4.sin_port =3D htons(port); if (inet_pton(PF_INET, argv[optind], &start_addr.a4.sin_addr)) start_addr.a46.ss_family =3D PF_INET; else if (inet_pton(PF_INET6, argv[optind], &start_addr.a6.sin6_addr.s6_add= r)) start_addr.a46.ss_family =3D PF_INET6; else return usage(); return send_packets(&start_addr, num_addrs, count, msg_sz); } /*************** ip6_route_kbench_mod.c ***************/ / We can't just use "get_cycles()" as on some platforms, such * as sparc64, that gives system cycles rather than cpu clock * cycles. / static inline unsigned long long get_tick(void) { unsigned long long t; __asm__ __volatile__("rd %%tick, %0" : "=r" (t)); return t; } static inline unsigned long long get_tick(void) { unsigned long long t; rdtscll(t); return t; } static inline unsigned long long get_tick(void) { return get_cycles(); } static int flow_oif = DEFAULT_OIF; static int flow_iif = DEFAULT_IIF; static u32 flow_mark = DEFAULT_MARK; static struct in6_addr flow_dst_ip_addr; static struct in6_addr flow_src_ip_addr; static int flow_tos = DEFAULT_TOS; static char dst_string[64]; static char src_string[64]; module_param_string(dst, dst_string, sizeof(dst_string), 0); module_param_string(src, src_string, sizeof(src_string), 0); static int __init flow_setup(void) { if (dst_string[0] && !in6_pton(dst_string, -1, &flow_dst_ip_addr.s6_addr[0], -1, NULL)) { pr_info("cannot parse \"%s\"\n", dst_string); return -1; } if (src_string[0] && !in6_pton(src_string, -1, &flow_src_ip_addr.s6_addr[0], -1, NULL)) { pr_info("cannot parse \"%s\"\n", dst_string); return -1; } return 0; } module_param_named(oif, flow_oif, int, 0); module_param_named(iif, flow_iif, int, 0); module_param_named(mark, flow_mark, uint, 0); module_param_named(tos, flow_tos, int, 0); static int warmup_count = DEFAULT_WARMUP_COUNT; module_param_named(count, warmup_count, int, 0); static void flow_init(struct flowi6 fl6) { memset(fl6, 0, sizeof(fl6)); fl6->flowi6_proto = IPPROTO_ICMPV6; fl6->flowi6_oif = flow_oif; fl6->flowi6_iif = flow_iif; fl6->flowi6_mark = flow_mark; fl6->flowi6_tos = flow_tos; fl6->daddr = flow_dst_ip_addr; fl6->saddr = flow_src_ip_addr; } static struct sk_buff fake_skb_get(void) { struct ipv6hdr hdr; struct sk_buff skb; skb = alloc_skb(4096, GFP_KERNEL); if (!skb) { pr_info("Cannot alloc SKB for test\n"); return NULL; } skb->dev = __dev_get_by_index(&init_net, flow_iif); if (skb->dev == NULL) { pr_info("Input device (%d) does not exist\n", flow_iif); goto err; } skb_reset_mac_header(skb); skb_reset_network_header(skb); skb_reserve(skb, MAX_HEADER + sizeof(struct ipv6hdr)); hdr = ipv6_hdr(skb); hdr->priority = 0; hdr->version = 6; memset(hdr->flow_lbl, 0, sizeof(hdr->flow_lbl)); hdr->payload_len = htons(sizeof(struct icmp6hdr)); hdr->nexthdr = IPPROTO_ICMPV6; hdr->saddr = flow_src_ip_addr; hdr->daddr = flow_dst_ip_addr; skb->protocol = htons(ETH_P_IPV6); skb->mark = flow_mark; return skb; err: kfree_skb(skb); return NULL; } static void do_full_output_lookup_bench(void) { unsigned long long t1, t2, tdiff; struct rt6_info rt; struct flowi6 fl6; int i; rt = NULL; for (i = 0; i < warmup_count; i++) { flow_init(&fl6); rt = (struct rt6_info )ip6_route_output(&init_net, NULL, &fl6); if (IS_ERR(rt)) break; ip6_rt_put(rt); } if (IS_ERR(rt)) { pr_info("ip_route_output_key: err=%ld\n", PTR_ERR(rt)); return; } flow_init(&fl6); t1 = get_tick(); rt = (struct rt6_info )ip6_route_output(&init_net, NULL, &fl6); t2 = get_tick(); if (!IS_ERR(rt)) ip6_rt_put(rt); tdiff = t2 - t1; pr_info("ip6_route_output tdiff: %llu\n", tdiff); } static void do_full_input_lookup_bench(void) { unsigned long long t1, t2, tdiff; struct sk_buff skb; struct rt6_info rt; int err, i; skb = fake_skb_get(); if (skb == NULL) goto out_free; err = 0; local_bh_disable(); for (i = 0; i < warmup_count; i++) { ip6_route_input(skb); rt = (struct rt6_info )skb_dst(skb); err = (!rt \|\| rt == init_net.ipv6.ip6_null_entry); skb_dst_drop(skb); if (err) break; } local_bh_enable(); if (err) { pr_info("Input route lookup fails\n"); goto out_free; } local_bh_disable(); t1 = get_tick(); ip6_route_input(skb); t2 = get_tick(); local_bh_enable(); rt = (struct rt6_info *)skb_dst(skb); err = (!rt \|\| rt == init_net.ipv6.ip6_null_entry); skb_dst_drop(skb); if (err) { pr_info("Input route lookup fails\n"); goto out_free; } tdiff = t2 - t1; pr_info("ip6_route_input tdiff: %llu\n", tdiff); out_free: kfree_skb(skb); } static void do_full_lookup_bench(void) { if (!flow_iif) do_full_output_lookup_bench(); else do_full_input_lookup_bench(); } static void do_bench(void) { do_full_lookup_bench(); do_full_lookup_bench(); do_full_lookup_bench(); do_full_lookup_bench(); } static int __init kbench_init(void) { if (flow_setup()) return -EINVAL; pr_info("flow [IIF(%d),OIF(%d),MARK(0x%08x),D("IP6_FMT")," "S("IP6_FMT"),TOS(0x%02x)]\n", flow_iif, flow_oif, flow_mark, IP6_PRT(flow_dst_ip_addr), IP6_PRT(flow_src_ip_addr), flow_tos); if (!cpu_has_tsc) { pr_err("X86 TSC is required, but is unavailable.\n"); return -EINVAL; } pr_info("sizeof(struct rt6_info)==%zu\n", sizeof(struct rt6_info)); do_bench(); return -ENODEV; } static void __exit kbench_exit(void) { } module_init(kbench_init); module_exit(kbench_exit); MODULE_LICENSE("GPL"); ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn	Martin KaFai Lau
	This patch save the fn before doing rt6_backtrack. Hence, without redo-ing the fib6_lookup(), saved_fn can be used to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off. Some minor changes I think make sense to review as a single patch: * Remove the 'out:' goto label. * Remove the 'reachable' variable. Only use the 'strict' variable instead. After this patch, "failing ip6_ins_rt()" should be the only case that requires a redo of fib6_lookup(). Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24	ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case	Martin KaFai Lau
	When there is a RTF_CACHE hit, no need to redo fib6_lookup() with reachable=0. Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>