summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-11-11irda: Remove IRDA_<TYPE> logging macrosJoe Perches
And use the more common mechanisms directly. Other miscellanea: o Coalesce formats o Add missing newlines o Realign arguments o Remove unnecessary OOM message logging as there's a generic stack dump already on OOM. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net: kill netif_copy_real_num_queues()WANG Cong
vlan was the only user of netif_copy_real_num_queues(), but it no longer calls it after commit 4af429d29b341bb1735f04c2fb960178 ("vlan: lockless transmit path"). So we can just remove it. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-11-11 This series contains updates to i40e, i40evf and ixgbe. Kamil updated the i40e and i40evf driver to poll the firmware slower since we were polling faster than the firmware could respond. Shannon updates i40e to add a check to keep the service_task from running the periodic tasks more than once per second, while still allowing quick action to service the events. Jesse cleans up the throttle rate code by fixing the minimum interrupt throttle rate and removing some unused defines. Mitch makes the early init admin queue message receive code more robust by handling messages in a loop and ignoring those that we are not interested in. This also gets rid of some scary log messages that really do not indicate a problem. Don provides several ixgbe patches, first fixes an issue with x540 completion timeout where on topologies including few levels of PCIe switching for x540 can run into an unexpected completion error. Cleans up the functionality in ixgbe_ndo_set_vf_vlan() in preparation for future work. Adds support for x550 MAC's to the driver. v2: - Remove code comment in patch 01 of the series, based on feedback from David Liaght - Updated the "goto" to "break" statements in patch 06 of the series, based on feedback from Sergei Shtylyov - Initialized the variable err due to the possibility of use before being assigned a value in patch 07 of the series - Added patch "ixgbe: add helper function for setting RSS key in preparation of X550" since it is needed for the addition of X550 MAC support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11usbnet: smsc95xx: dereferencing NULL pointerSudip Mukherjee
we were dereferencing dev to initialize pdata. but just after that we have a BUG_ON(!dev). so we were basically dereferencing the pointer first and then tesing it for NULL. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11irda: Simplify IRDA logging macrosJoe Perches
These are the same as net_<level>_ratelimited, so use the more common style in the macro definition. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11neigh: remove dynamic neigh table registration supportWANG Cong
Currently there are only three neigh tables in the whole kernel: arp table, ndisc table and decnet neigh table. What's more, we don't support registering multiple tables per family. Therefore we can just make these tables statically built-in. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11stmmac: split to core library and probe driversAndy Shevchenko
Instead of registering the platform and PCI drivers in one module let's move necessary bits to where it belongs. During this procedure we convert the module registration part to use module_*_driver() macros which makes code simplier. >From now on the driver consists three parts: core library, PCI, and platform drivers. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net: Convert LIMIT_NETDEBUG to net_dbg_ratelimitedJoe Perches
Use the more common dynamic_debug capable net_dbg_ratelimited and remove the LIMIT_NETDEBUG macro. All messages are still ratelimited. Some KERN_<LEVEL> uses are changed to KERN_DEBUG. This may have some negative impact on messages that were emitted at KERN_INFO that are not not enabled at all unless DEBUG is defined or dynamic_debug is enabled. Even so, these messages are now _not_ emitted by default. This also eliminates the use of the net_msg_warn sysctl "/proc/sys/net/core/warnings". For backward compatibility, the sysctl is not removed, but it has no function. The extern declaration of net_msg_warn is removed from sock.h and made static in net/core/sysctl_net_core.c Miscellanea: o Update the sysctl documentation o Remove the embedded uses of pr_fmt o Coalesce format fragments o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11PPC: bpf_jit_comp: add SKF_AD_HATYPE instructionDenis Kirjanov
Add BPF extension SKF_AD_HATYPE to ppc JIT to check the hw type of the interface Before: [ 57.723666] test_bpf: #20 LD_HATYPE [ 57.723675] BPF filter opcode 0020 (@0) unsupported [ 57.724168] 48 48 PASS After: [ 103.053184] test_bpf: #20 LD_HATYPE 7 6 PASS CC: Alexei Starovoitov<alexei.starovoitov@gmail.com> CC: Daniel Borkmann<dborkman@redhat.com> CC: Philippe Bergheaud<felix@linux.vnet.ibm.com> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> v2: address Alexei's comments Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11Merge branch 'net_next_ovs' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch Pravin B Shelar says: ==================== Open vSwitch Following batch of patches brings feature parity between upstream ovs and out of tree ovs module. Two features are added, first adds support to export egress tunnel information for a packet. This is used to improve visibility in network traffic. Second feature allows userspace vswitchd process to probe ovs module features. Other patches are optimization and code cleanup. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11dsa: Use netdev_<level> instead of printkJoe Perches
Neaten and standardize the logging output. Other miscellanea: o Use pr_notice_once instead of a guard flag. o Convert existing pr_<level> uses too. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11Merge branch 'mlx4-next'David S. Miller
Or Gerlitz says: ==================== mlx4: Add CHECKSUM_COMPLETE support These patches from Shani, Matan and myself add support for CHECKSUM_COMPLETE reporting on non TCP/UDP packets such as GRE and ICMP. I'd like to deeply thank Jerry Chu for his innovation and support in that effort. Based on the feedback from Eric and Ido Shamay, in V2 we dropped the patch which removed the calls to napi_gro_frags() and added a patch which makes the RX code to go through that path regardless of the checksum status. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETEShani Michaeli
When processing received traffic, pass CHECKSUM_COMPLETE status to the stack, with calculated checksum for non TCP/UDP packets (such as GRE or ICMP). Although the stack expects checksum which doesn't include the pseudo header, the HW adds it. To address that, we are subtracting the pseudo header checksum from the checksum value provided by the HW. In the IPv6 case, we also compute/add the IP header checksum which is not added by the HW for such packets. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net/mlx4_en: Extend usage of napi_gro_fragsShani Michaeli
We can call napi_gro_frags for all the received traffic regardless of the checksum status. Specifically, received packets whose status is CHECKSUM_NONE (and soon to be added CHECKSUM_COMPLETE) are eligible for napi_gro_frags as well. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11Merge branch 'so_incoming_cpu'David S. Miller
Eric Dumazet says: ==================== net: SO_INCOMING_CPU support SO_INCOMING_CPU socket option (read by getsockopt()) provides an alternative to RPS/RFS for high performance servers using multi queues NIC. TCP should use sk_mark_napi_id() for established sockets only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11net: introduce SO_INCOMING_CPUEric Dumazet
Alternative to RPS/RFS is to use hardware support for multiple queues. Then split a set of million of sockets into worker threads, each one using epoll() to manage events on its own socket pool. Ideally, we want one thread per RX/TX queue/cpu, but we have no way to know after accept() or connect() on which queue/cpu a socket is managed. We normally use one cpu per RX queue (IRQ smp_affinity being properly set), so remembering on socket structure which cpu delivered last packet is enough to solve the problem. After accept(), connect(), or even file descriptor passing around processes, applications can use : int cpu; socklen_t len = sizeof(cpu); getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len); And use this information to put the socket into the right silo for optimal performance, as all networking stack should run on the appropriate cpu, without need to send IPI (RPS/RFS). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11tcp: move sk_mark_napi_id() at the right placeEric Dumazet
sk_mark_napi_id() is used to record for a flow napi id of incoming packets for busypoll sake. We should do this only on established flows, not on listeners. This was 'working' by virtue of the socket cloning, but doing this on SYN packets in unecessary cache line dirtying. Even if we move sk_napi_id in the same cache line than sk_lock, we are working to make SYN processing lockless, so it is desirable to set sk_napi_id only for established flows. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11ixgbe: add helper function for setting RSS key in preparation of X550Don Skidmore
Split off the setting of the RSS key into its own function. This will help when we add support for X550 which can have different RSS keys per pool. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11ixgbe: Add new support for X550 MAC'sDon Skidmore
This patch will add in the new MAC defines and fit it into the switch cases throughout the driver. New functionality and enablement support will be added in following patches. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11ixgbe: cleanup move setting PFQDE.HIDE_VLAN to support function.Don Skidmore
Move setting of drop enable to support function. This not only makes the code more readable but is also prep for following patches that add additional MAC support. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11ixgbe: cleanup ixgbe_ndo_set_vf_vlanDon Skidmore
Clean up functionality in ixgbe_ndo_set_vf_vlan that will simplify later patches. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11ixgbe: fix X540 Completion timeoutDon Skidmore
On topologies including few levels of PCIe switching X540 can run into an unexpected completion error. We get around this by waiting after enabling loopback a sufficient amount of time until Tx Data Fetch is sent. We then poll the pending transaction bit to ensure we received the completion. Only then do we go on to clear the buffers. Signed-of-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11i40evf: don't use more queues than CPUsMitch Williams
It's kind of silly to configure and attempt to use a bunch of queue pairs when you're running on a single (virtual) CPU. Instead of unconditionally configuring all of the queues that the PF gives us, clamp the number of queue pairs to the number of CPUs. Change-ID: I321714c9e15072ee76de8f95ab9a81f86ed347d1 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11i40evf: make early init processing more robustMitch Williams
In early init, if we get an unexpected message from the PF (such as link status), we just kick an error back to the init task, causing it to restart its state machine and delaying initialization. Make the early init AQ message receive code more robust by handling messages in a loop, and ignoring those that we aren't interested in. This also gets rid of some scary log messages that really didn't indicate a problem. Change-ID: I620e8c72e49c49c665ef33eeab2425dd10e721cf Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11i40e: clean up throttle rate codeJesse Brandeburg
The interrupt throttle rate minimum is actually 2us, so fix that define and while we are there, remove some unused defines. Change some strings in the function to be a bit less wrappy, and express the correct limits. Change-ID: I96829bbc77935e0b57c6f0fc1439fb4152b2960a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11i40e: don't do link_status or stats collection on every ARQShannon Nelson
The ARQ events cause a service_task execution, and we do a link_status check and full stats gathering for each service_task. However, when there are a lot of ARQ events, such as when doing an NVM update, we end up doing 10's if not 100's of these per second, thereby heavily abusing the PCI bus and especially the Firmware. This patch adds a check to keep the service_task from running these periodic tasks more than once per second, while still allowing quick action to service the events. Change-ID: Iec7670c37bfae9791c43fec26df48aea7f70b33e Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-11i40e: poll firmware slowerKamil Krawczyk
The code was polling the firmware tail register for completion every 10 microseconds, which is way faster than the firmware can respond. This changes the poll interval to 1ms, which reduces polling CPU utilization, and the number of times we loop. The maximum delay is still 100ms. Change-ID: I4bbfa6b66d802890baf8b4154061e55942b90958 Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-11-10mlx4: restore conditional call to napi_complete_done()Eric Dumazet
After commit 1a28817282 ("mlx4: use napi_complete_done()") we ended up calling napi_complete_done() in the case NAPI poll consumed all its budget. This added extra interrupt pressure, this patch restores proper behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 1a28817282 ("mlx4: use napi_complete_done()") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10Merge branch 'sunvnet-next'David S. Miller
Sowmini Varadhan says: ==================== sunvnet: edge-case/race-conditions bug fixes This patch series contains fixes for race-conditions in sunvnet, that can encountered when there is a difference in latency between producer and consumer. Patch 1 addresses a case when the STOPPED LDC ack from a peer is processed before vnet_start_xmit can finish updating the dr->prod state. Patch 2 fixes the edge-case when outgoing data and incoming stopped-ack cross each other in flight. Patch 3 adds a missing rcu_read_unlock(), found by code-inspection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10sunvnet: Add missing rcu_read_unlock() in vnet_start_xmitSowmini Varadhan
The out_dropped label will only do rcu_read_unlock for non-null port. So add the missing rcu_read_unlock() when bailing due to non-null port. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10sunvnet: vnet_ack() should check if !start_cons to send a missed triggerSowmini Varadhan
As per comments in vnet_start_xmit, for the edge case when outgoing vnet_start_xmit() data and an incoming STOPPED ACK cross each other in flight, we may need to send the missed START trigger from maybe_tx_wakeup() after checking for a false value of start_cons Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10sunvnet: Fix race between vnet_start_xmit() and vnet_ack()Sowmini Varadhan
When vnet_start_xmit() is concurrent with vnet_ack(), we may have a race that looks like: thread 1 thread 2 vnet_start_xmit vnet_event_napi -> vnet_rx __vnet_tx_trigger for some desc X at this point dr->prod == X peer sends back a stopped ack for X we process X, but X == dr->prod so we bail out in vnet_ack with !idx_is_pending update dr->prod As a result of the fact that we never processed the stopped ack for X, the Tx path is led to incorrectly believe that the peer is still "started" and reading, but the peer has stopped reading, which will ultimately end in flow-control assertions. The fix is to synchronize the above 2 paths on the netif_tx_lock. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-108139too: Allow using the largest possible MTUAlban Bedel
This driver allows MTU up to 1518 bytes which is not enought to run batman-adv. Simply raise the maximum packet size up to the maximum allowed by the transmit descriptor, 1792 bytes, giving a maximum MTU of 1774 bytes. Signed-off-by: Alban Bedel <albeu@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-108139too: Allow setting MTU larger than 1500Alban Bedel
Replace the default ndo_change_mtu callback with one that allow setting MTU that the driver can handle. Signed-off-by: Alban Bedel <albeu@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10Merge tag 'master-2014-11-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-11-07 Please pull this batch of updates intended for the 3.19 stream! For the mac80211 bits, Johannes says: "This relatively large batch of changes is comprised of the following: * large mac80211-hwsim changes from Ben, Jukka and a bit myself * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical University in Prague and Volkswagen Group Research * minstrel VHT work from Karl * more CSA work from Luca * WMM admission control support in mac80211 (myself) * various smaller fixes, spelling corrections, and minor API additions" For the Bluetooth bits, Johan says: "Here's the first bluetooth-next pull request for 3.19. The vast majority of patches are for ieee802154 from Alexander Aring with various fixes and cleanups. There are also several LE/SMP fixes as well as improved support for handling LE devices that have lost their pairing information (the patches from Alfonso). Jukka provides a couple of stability fixes for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers we have one new USB ID for an Acer controller as well as a reset handling fix for H5." For the Atheros bits, Kalle says: "Major changes are: o ethtool support (Ben) o print dev string prefix with debug hex buffers dump (Michal) o debugfs file to read calibration data from the firmware verification purposes (me) o fix fw_stats debugfs file, now results are more reliable (Michal) o firmware crash counters via debugfs (Ben&me) o various tracing points to debug firmware (Rajkumar) o make it possible to provide firmware calibration data via a file (me) And we have quite a lot of smaller fixes and clean up." For the iwlwifi bits, Emmanuel says: "The big new thing here is netdetect which allows the firmware to wake up the platform when a specific network is detected. Along with that I have fixes for d3 operation. The usual amount of rate scaling stuff - we now support STBC. The other commit that stands out is Johannes's work on devcoredump. He basically starts to use the standard infrastructure he built." Along with that are the usual sort of updates and such for ath9k, brcmfmac, wil6210, and a handful of other bits here and there... Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10Merge branch 'raw_probe_proto_opt'David S. Miller
Herbert Xu says: ==================== ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice This series rewrites the function raw_probe_proto_opt in a more readable fasion, and then fixes the long-standing bug where we read the probed bytes twice which means that what we're using to probe may in fact be invalid. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10ipv4: Avoid reading user iov twice after raw_probe_proto_optHerbert Xu
Ever since raw_probe_proto_opt was added it had the problem of causing the user iov to be read twice, once during the probe for the protocol header and once again in ip_append_data. This is a potential security problem since it means that whatever we're probing may be invalid. This patch plugs the hole by firstly advancing the iov so we don't read the same spot again, and secondly saving what we read the first time around for use by ip_append_data. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10ipv4: Use standard iovec primitive in raw_probe_proto_optHerbert Xu
The function raw_probe_proto_opt tries to extract the first two bytes from the user input in order to seed the IPsec lookup for ICMP packets. In doing so it's processing iovec by hand and overcomplicating things. This patch replaces the manual iovec processing with a call to memcpy_fromiovecend. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10net: Move bonding headers under include/netDavid S. Miller
This ways drivers like cxgb4 don't need to do ugly relative includes. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10cxgb4: Remove unnecessary struct in6_addr * castsJoe Perches
Just use the address of the in6_addr. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10Merge branch 'cxgb4-next'David S. Miller
Hariprasad Shenai says: ==================== RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros This series moves the debugfs code to a new file debugfs.c and cleans up macros/register defines. Various patches have ended up changing the style of the symbolic macros/register defines and some of them used the macros/register defines that matches the output of the script from the hardware team. As a result, the current kernel.org files are a mix of different macro styles. Since this macro/register defines is used by five different drivers, a few patch series have ended up adding duplicate macro/register define entries with different styles. This makes these register define/macro files a complete mess and we want to make them clean and consistent. Will post few more series so that we can cover all the macros so that they all follow the same style to be consistent. The patches series is created against 'net-next' tree. And includes patches on cxgb4, cxgb4vf, iw_cxgb4, csiostor and cxgb4i driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. V3: Use suffix instead of prefix for macros/register defines V2: Changes the description and cover-letter content to answer David Miller's question ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10cxgb4: Cleanup macros so they follow the same style and look consistent, part 2Hariprasad Shenai
Various patches have ended up changing the style of the symbolic macros/register defines to different style. As a result, the current kernel.org files are a mix of different macro styles. Since this macro/register defines is used by different drivers a few patch series have ended up adding duplicate macro/register define entries with different styles. This makes these register define/macro files a complete mess and we want to make them clean and consistent. This patch cleans up a part of it. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10cxgb4: Cleanup macros so they follow the same style and look consistentHariprasad Shenai
Various patches have ended up changing the style of the symbolic macros/register to different style. As a result, the current kernel.org files are a mix of different macro styles. Since this macro/register defines is used by different drivers a few patch series have ended up adding duplicate macro/register define entries with different styles. This makes these register define/macro files a complete mess and we want to make them clean and consistent. This patch cleans up a part of it. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10cxgb4: Add cxgb4_debugfs.c, move all debugfs code to new fileHariprasad Shenai
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10mlx4: use napi_complete_done()Eric Dumazet
To enable gro_flush_timeout, a driver has to use napi_complete_done() instead of napi_complete(). Tested: Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues) Without this feature, we send back about 305,000 ACK per second. GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet) Setting a timer of 2000 nsec is enough to increase GRO packet sizes and reduce number of ACK packets. (811/19.2 = 42) Receiver performs less calls to upper stacks, less wakes up. This also reduces cpu usage on the sender, as it receives less ACK packets. Note that reducing number of wakes up increases cpu efficiency, but can decrease QPS, as applications wont have the chance to warmup cpu caches doing a partial read of RPC requests/answers if they fit in one skb. B:~# sar -n DEV 1 10 | grep eth0 | tail -1 Average: eth0 811269.80 305732.30 1199462.57 19705.72 0.00 0.00 0.50 B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout B:~# sar -n DEV 1 10 | grep eth0 | tail -1 Average: eth0 811577.30 19230.80 1199916.51 1239.80 0.00 0.00 0.50 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10net: gro: add a per device gro flush timerEric Dumazet
Tuning coalescing parameters on NIC can be really hard. Servers can handle both bulk and RPC like traffic, with conflicting goals : bulk flows want as big GRO packets as possible, RPC want minimal latencies. To reach big GRO packets on 10Gbe NIC, one can use : ethtool -C eth0 rx-usecs 4 rx-frames 44 But this penalizes rpc sessions, with an increase of latencies, up to 50% in some cases, as NICs generally do not force an interrupt when a packet with TCP Push flag is received. Some NICs do not have an absolute timer, only a timer rearmed for every incoming packet. This patch uses a different strategy : Let GRO stack decides what do do, based on traffic pattern. Packets with Push flag wont be delayed. Packets without Push flag might be held in GRO engine, if we keep receiving data. This new mechanism is off by default, and shall be enabled by setting /sys/class/net/ethX/gro_flush_timeout to a value in nanosecond. To fully enable this mechanism, drivers should use napi_complete_done() instead of napi_complete(). Tested: Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues) Without this feature, we send back about 305,000 ACK per second. GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet) Setting a timer of 2000 nsec is enough to increase GRO packet sizes and reduce number of ACK packets. (811/19.2 = 42) Receiver performs less calls to upper stacks, less wakes up. This also reduces cpu usage on the sender, as it receives less ACK packets. Note that reducing number of wakes up increases cpu efficiency, but can decrease QPS, as applications wont have the chance to warmup cpu caches doing a partial read of RPC requests/answers if they fit in one skb. B:~# sar -n DEV 1 10 | grep eth0 | tail -1 Average: eth0 811269.80 305732.30 1199462.57 19705.72 0.00 0.00 0.50 B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout B:~# sar -n DEV 1 10 | grep eth0 | tail -1 Average: eth0 811577.30 19230.80 1199916.51 1239.80 0.00 0.00 0.50 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10rtnetlink: add babel protocol recognitionDave Taht
Babel uses rt_proto 42. Add to userspace visible header file. Signed-off-by: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-09openvswitch: Add support for OVS_FLOW_ATTR_PROBE.Jarno Rajahalme
This new flag is useful for suppressing error logging while probing for datapath features using flow commands. For backwards compatibility reasons the commands are executed normally, but error logging is suppressed. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-09openvswitch: Constify various function argumentsThomas Graf
Help produce better optimized code. Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-09openvswitch: Remove redundant key ref from upcall_info.Pravin B Shelar
struct dp_upcall_info has pointer to pkt_key which is already available in OVS_CB. This also simplifies upcall handling for gso packet. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>