summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-08-05net-timestamp: TCP timestampingWillem de Bruijn
TCP timestamping extends SO_TIMESTAMPING to bytestreams. Bytestreams do not have a 1:1 relationship between send() buffers and network packets. The feature interprets a send call on a bytestream as a request for a timestamp for the last byte in that send() buffer. The choice corresponds to a request for a timestamp when all bytes in the buffer have been sent. That assumption depends on in-order kernel transmission. This is the common case. That said, it is possible to construct a traffic shaping tree that would result in reordering. The guarantee is strong, then, but not ironclad. This implementation supports send and sendpages (splice). GSO replaces one large packet with multiple smaller packets. This patch also copies the option into the correct smaller packet. This patch does not yet support timestamping on data in an initial TCP Fast Open SYN, because that takes a very different data path. If ID generation in ee_data is enabled, bytestream timestamps return a byte offset, instead of the packet counter for datagrams. The implementation supports a single timestamp per packet. It silenty replaces requests for previous timestamps. To avoid missing tstamps, flush the tcp queue by disabling Nagle, cork and autocork. Missing tstamps can be detected by offset when the ee_data ID is enabled. Implementation details: - On GSO, the timestamping code can be included in the main loop. I moved it into its own loop to reduce the impact on the common case to a single branch. - To avoid leaking the absolute seqno to userspace, the offset returned in ee_data must always be relative. It is an offset between an skb and sk field. The first is always set (also for GSO & ACK). The second must also never be uninitialized. Only allow the ID option on sockets in the ESTABLISHED state, for which the seqno is available. Never reset it to zero (instead, move it to the current seqno when reenabling the option). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05net-timestamp: SCHED timestamp on entering packet schedulerWillem de Bruijn
Kernel transmit latency is often incurred in the packet scheduler. Introduce a new timestamp on transmission just before entering the scheduler. When data travels through multiple devices (bonding, tunneling, ...) each device will export an individual timestamp. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05net-timestamp: add key to disambiguate concurrent datagramsWillem de Bruijn
Datagrams timestamped on transmission can coexist in the kernel stack and be reordered in packet scheduling. When reading looped datagrams from the socket error queue it is not always possible to unique correlate looped data with original send() call (for application level retransmits). Even if possible, it may be expensive and complex, requiring packet inspection. Introduce a data-independent ID mechanism to associate timestamps with send calls. Pass an ID alongside the timestamp in field ee_data of sock_extended_err. The ID is a simple 32 bit unsigned int that is associated with the socket and incremented on each send() call for which software tx timestamp generation is enabled. The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to avoid changing ee_data for existing applications that expect it 0. The counter is reset each time the flag is reenabled. Reenabling does not change the ID of already submitted data. It is possible to receive out of order IDs if the timestamp stream is not quiesced first. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05net-timestamp: move timestamp flags out of sk_flagsWillem de Bruijn
sk_flags is reaching its limit. New timestamping options will not fit. Move all of them into a new field sk->sk_tsflags. Added benefit is that this removes boilerplate code to convert between SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt. SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive timestamp logic (netstamp_needed). That can be simplified and this last key removed, but will leave that for a separate patch. Signed-off-by: Willem de Bruijn <willemb@google.com> ---- The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs, though that scatters tstamp fields throughout the struct. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05net-timestamp: extend SCM_TIMESTAMPING ancillary data structWillem de Bruijn
Applications that request kernel tx timestamps with SO_TIMESTAMPING read timestamps as recvmsg() ancillary data. The response is defined implicitly as timespec[3]. 1) define struct scm_timestamping explicitly and 2) add support for new tstamp types. On tx, scm_timestamping always accompanies a sock_extended_err. Define previously unused field ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to define the existing behavior. The reception path is not modified. On rx, no struct similar to sock_extended_err is passed along with SCM_TIMESTAMPING. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05cxgb4i : Move stray CPL definitions to cxgb4 driverAnish Bhatt
These belong to the t4 msg header, will ensure there is no accidental code duplication in the future Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05tcp: reduce spurious retransmits due to transient SACK renegingNeal Cardwell
This commit reduces spurious retransmits due to apparent SACK reneging by only reacting to SACK reneging that persists for a short delay. When a sequence space hole at snd_una is filled, some TCP receivers send a series of ACKs as they apparently scan their out-of-order queue and cumulatively ACK all the packets that have now been consecutiveyly received. This is essentially misbehavior B in "Misbehaviors in TCP SACK generation" ACM SIGCOMM Computer Communication Review, April 2011, so we suspect that this is from several common OSes (Windows 2000, Windows Server 2003, Windows XP). However, this issue has also been seen in other cases, e.g. the netdev thread "TCP being hoodwinked into spurious retransmissions by lack of timestamps?" from March 2014, where the receiver was thought to be a BSD box. Since snd_una would temporarily be adjacent to a previously SACKed range in these scenarios, this receiver behavior triggered the Linux SACK reneging code path in the sender. This led the sender to clear the SACK scoreboard, enter CA_Loss, and spuriously retransmit (potentially) every packet from the entire write queue at line rate just a few milliseconds before the ACK for each packet arrives at the sender. To avoid such situations, now when a sender sees apparent reneging it does not yet retransmit, but rather adjusts the RTO timer to give the receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs that will restore sanity to the SACK scoreboard. If the reneging persists until this RTO then, as before, we clear the SACK scoreboard and enter CA_Loss. A 10ms delay tolerates a receiver sending such a stream of ACKs at 56Kbit/sec. And to allow for receivers with slower or more congested paths, we wait for at least RTT/2. We validated the resulting max(RTT/2, 10ms) delay formula with a mix of North American and South American Google web server traffic, and found that for ACKs displaying transient reneging: (1) 90% of inter-ACK delays were less than 10ms (2) 99% of inter-ACK delays were less than RTT/2 In tests on Google web servers this commit reduced reneging events by 75%-90% (as measured by the TcpExtTCPSACKReneging counter), without any measurable impact on latency for user HTTP and SPDY requests. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05Merge branch 'xen-netback-next'David S. Miller
Zoltan Kiss says: ==================== xen-netback: Changes around carrier handling This series starts using carrier off as a way to purge packets when the guest is not able (or willing) to receive them. It is a much faster way to get rid of packets waiting for an overwhelmed guest. The first patch changes current netback code where it relies currently on netif_carrier_ok. The second turns off the carrier if the guest times out on a queue, and only turn it on again if that queue (or queues) resurrects. ==================== Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05xen-netback: Turn off the carrier if the guest is not able to receiveZoltan Kiss
Currently when the guest is not able to receive more packets, qdisc layer starts a timer, and when it goes off, qdisc is started again to deliver a packet again. This is a very slow way to drain the queues, consumes unnecessary resources and slows down other guests shutdown. This patch change the behaviour by turning the carrier off when that timer fires, so all the packets are freed up which were stucked waiting for that vif. Instead of the rx_queue_purge bool it uses the VIF_STATUS_RX_PURGE_EVENT bit to signal the thread that either the timeout happened or an RX interrupt arrived, so the thread can check what it should do. It also disables NAPI, so the guest can't transmit, but leaves the interrupts on, so it can resurrect. Only the queues which brought down the interface can enable it again, the bit QUEUE_STATUS_RX_STALLED makes sure of that. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05xen-netback: Using a new state bit instead of carrierZoltan Kiss
This patch introduces a new state bit VIF_STATUS_CONNECTED to track whether the vif is in a connected state. Using carrier will not work with the next patch in this series, which aims to turn the carrier temporarily off if the guest doesn't seem to be able to receive packets. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org v2: - rename the bitshift type to "enum state_bit_shift" here, not in the next patch Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05Merge tag 'master-2014-07-31' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next Conflicts: net/6lowpan/iphc.c Minor conflicts in iphc.c were changes overlapping with some style cleanups. John W. Linville says: ==================== Please pull this last(?) batch of wireless change intended for the 3.17 stream... For the NFC bits, Samuel says: "This is a rather quiet one, we have: - A new driver from ST Microelectronics for their NCI ST21NFCB, including device tree support. - p2p support for the ST21NFCA driver - A few fixes an enhancements for the NFC digital laye" For the Atheros bits, Kalle says: "Michal and Janusz did some important RX aggregation fixes, basically we were missing RX reordering altogether. The 10.1 firmware doesn't support Ad-Hoc mode and Michal fixed ath10k so that it doesn't advertise Ad-Hoc support with that firmware. Also he implemented a workaround for a KVM issue." For the Bluetooth bits, Gustavo and Johan say: "To quote Gustavo from his previous request: 'Some last minute fixes for -next. We have a fix for a use after free in RFCOMM, another fix to an issue with ADV_DIRECT_IND and one for ADV_IND with auto-connection handling. Last, we added support for reading the codec and MWS setting for controllers that support these features.' Additionally there are fixes to LE scanning, an update to conform to the 4.1 core specification as well as fixes for tracking the page scan state. All of these fixes are important for 3.17." And, "We've got: - 6lowpan fixes/cleanups - A couple crash fixes, one for the Marvell HCI driver and another in LE SMP. - Fix for an incorrect connected state check - Fix for the bondable requirement during pairing (an issue which had crept in because of using "pairable" when in fact the actual meaning was "bondable" (these have different meanings in Bluetooth)" Along with those are some late-breaking hardware support patches in brcmfmac and b43 as well as a stray ath9k patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05net/usb/hso: Add support for Option GTM671WFSRicardo Ribalda
After this patch: [ 32.985530] hso: drivers/net/usb/hso.c: Option Wireless [ 33.000452] hso 2-1.4:1.7: Not our interface [ 33.001849] usbcore: registered new interface driver hso root@qt5022:~# ls /dev/ttyHS* /dev/ttyHS0 /dev/ttyHS1 /dev/ttyHS2 /dev/ttyHS3 /dev/ttyHS4 /dev/ttyHS5 root@qt5022:~# lsusb -d 0af0: -vvv Bus 002 Device 003: ID 0af0:9200 Option Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 255 Vendor Specific Class bDeviceSubClass 255 Vendor Specific Subclass bDeviceProtocol 255 Vendor Specific Protocol bMaxPacketSize0 64 idVendor 0x0af0 Option idProduct 0x9200 bcdDevice 0.00 iManufacturer 3 Option N.V. iProduct 2 Globetrotter HSUPA Modem iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 200 bNumInterfaces 8 bConfigurationValue 1 iConfiguration 1 Option Configuration bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x03 EP 3 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 3 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x84 EP 4 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x04 EP 4 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 4 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x85 EP 5 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x05 EP 5 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 5 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x06 EP 6 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x86 EP 6 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 6 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x87 EP 7 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 5 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x88 EP 8 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x07 EP 7 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 32 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 7 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk-Only iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x08 EP 8 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 1 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x89 EP 9 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 1 Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 255 Vendor Specific Class bDeviceSubClass 255 Vendor Specific Subclass bDeviceProtocol 255 Vendor Specific Protocol bMaxPacketSize0 64 bNumConfigurations 1 Device Status: 0x0001 Self Powered Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05net: smc911x: fix %d confusingly prefixed with 0x in format stringHans Wennborg
Signed-off-by: Hans Wennborg <hans@hanshq.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05drivers: atm: fix %d confusingly prefixed with 0x in format stringsHans Wennborg
Signed-off-by: Hans Wennborg <hans@hanshq.net> Acked-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04netlink: fix lockdep splatsEric Dumazet
With netlink_lookup() conversion to RCU, we need to use appropriate rcu dereference in netlink_seq_socket_idx() & netlink_seq_next() Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: e341694e3eb57fc ("netlink: Convert netlink_lookup() to use RCU protected hash table") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04tcp: md5: remove unneeded check in tcp_v4_parse_md5_keysDmitry Popov
tcpm_key is an array inside struct tcp_md5sig, there is no need to check it against NULL. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04Drivers: net-next: hyperv: Increase the size of the sendbuf regionKY Srinivasan
Intel did some benchmarking on our network throughput when Linux on Hyper-V is as used as a gateway. This fix gave us almost a 1 Gbps additional throughput on about 5Gbps base throughput we hadi, prior to increasing the sendbuf size. The sendbuf mechanism is a copy based transport that we have which is clearly more optimal than the copy-free page flipping mechanism (for small packets). In the forwarding scenario, we deal only with MTU sized packets, and increasing the size of the senbuf area gave us the additional performance. For what it is worth, Windows guests on Hyper-V, I am told use similar sendbuf size as well. The exact value of sendbuf I think is less important than the fact that it needs to be larger than what Linux can allocate as physically contiguous memory. Thus the change over to allocating via vmalloc(). We currently allocate 16MB receive buffer and we use vmalloc there for allocation. Also the low level channel code has already been modified to deal with physically dis-contiguous memory in the ringbuffer setup. Based on experimentation Intel did, they say there was some improvement in throughput as the sendbuf size was increased up to 16MB and there was no effect on throughput beyond 16MB. Thus I have chosen 16MB here. Increasing the sendbuf value makes a material difference in small packet handling In this version of the patch, based on David's feedback, I have added additional details in the commit log. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04net: remove spurious zd1201 rule.Francois Romieu
Leftover from 5c601d0c942f5aaf7f3cff7e08f61047d70a964e ("wireless: move zd1201 where it belongs"). Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04net: phy: spi_ks8995: Introduce the use of devm_kzallocHimangi Saraogi
This patch introduces the use of devm_kzalloc and does away with the kfrees in the probe and remove functions. Also, a label is removed. Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04cxgb4: only free allocated flsHariprasad Shenai
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04bridge: remove a useless commentMichael S. Tsirkin
commit 6cbdceeb1cb12c7d620161925a8c3e81daadb2e4 bridge: Dump vlan information from a bridge port introduced a comment in an attempt to explain the code logic. The comment is unfinished so it confuses more than it explains, remove it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02cxgb4i : remove spurious use of rcuAnish Bhatt
As pointed out by the intel guys, there is no need to hold rcu read lock in cxgbi_inet6addr_handler(), this patch removes it. Fixes: 759a0cc5a3e1 ("cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api") Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02Merge branch 'concurrent_hash_tables'David S. Miller
Thomas Graf says: ==================== Lockless netlink_lookup() with new concurrent hash table Netlink sockets are maintained in a hash table to allow efficient lookup via the port ID for unicast messages. However, lookups currently require a read lock to be taken. This series adds a new generic, resizable, scalable, concurrent hash table based on the paper referenced in the first patch. It then makes use of the new data type to implement lockless netlink_lookup(). Patch 3/3 to convert nft_hash is included for reference but should be merged via the netfilter tree. Inclusion in this series is to provide context for the suggested API. Against net-next since the initial user of the new hash table is in net/ Changes: v4-v5: - use GFP_KERNEL to alloc Netlink buckets as suggested by Nikolay Aleksandrov - free nft hash element on removal as spotted by Nikolay Aleksandrov and Patrick McHardy v3-v4: - fixed wrong shift assignment placement as spotted by Nikolay Aleksandrov - reverted default size of nft_hash to 4 as requested by Patrick McHardy, default size for other hash tables remains at 64 if no hint is given - fixed copyright as requested by Patrick McHardy v2-v3: - fixed typo in nft_hash_destroy() when passing rhashtable handle v1-v2: - fixed traversal off-by-one as spotted by Tobias Klauser - removed unlikely() from BUG_ON() as spotted by Josh Triplett - new 3rd patch to convert nft_hash to rhashtable - make rhashtable_insert() return void - nl_sk_hash_lock must be a mutex - fixed wrong name of rht_shrink_below_30() - exported symbols rht_grow_above_75() and rht_shrink_below_30() - allow table freeing with RCU callback ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02nftables: Convert nft_hash to use generic rhashtableThomas Graf
The sizing of the hash table and the practice of requiring a lookup to retrieve the pprev to be stored in the element cookie before the deletion of an entry is left intact. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Patrick McHardy <kaber@trash.net> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02netlink: Convert netlink_lookup() to use RCU protected hash tableThomas Graf
Heavy Netlink users such as Open vSwitch spend a considerable amount of time in netlink_lookup() due to the read-lock on nl_table_lock. Use of RCU relieves the lock contention. Makes use of the new resizable hash table to avoid locking on the lookup. The hash table will grow if entries exceeds 75% of table size up to a total table size of 64K. It will automatically shrink if usage falls below 30%. Also splits nl_table_lock into a separate mutex to protect hash table mutations and allow synchronize_rcu() to sleep while waiting for readers during expansion and shrinking. Before: 9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup 6.42% kpktgend_0 [pktgen] [k] mod_cur_headers 6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker 6.23% kpktgend_0 [kernel.kallsyms] [k] memset 4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup 4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy 3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract 2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2 After: 15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup 8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker 7.92% kpktgend_0 [pktgen] [k] mod_cur_headers 5.11% kpktgend_0 [kernel.kallsyms] [k] memset 4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract 4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock 3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2 [...] 0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02lib: Resizable, Scalable, Concurrent Hash TableThomas Graf
Generic implementation of a resizable, scalable, concurrent hash table based on [0]. The implementation supports both, fixed size keys specified via an offset and length, or arbitrary keys via own hash and compare functions. Lookups are lockless and protected as RCU read side critical sections. Automatic growing/shrinking based on user configurable watermarks is available while allowing concurrent lookups to take place. Objects to be hashed must include a struct rhash_head. The reason for not using the existing struct hlist_head is that the expansion and shrinking will have two buckets point to a single entry which would lead in obscure reverse chaining behaviour. Code includes a boot selftest if CONFIG_TEST_RHASHTABLE is defined. [0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02Merge branch 'intel-next'David S. Miller
Aaron Brown says: ==================== Intel Wired LAN Driver Updates This series contains updates to the i40e and i40evf drivers. Vasu adds FCOE support, build options and a documentation pointer to i40e. Shannon exposes a Firmware API request used to do register writes on the driver's behalf and disables local loopback on VMDQ VSI in order to stop the VEB from echoing the VMDQ packets back at the VSI. Ashish corrects the vf_id offset for virtchnl messages in the case of multiple PFs, removes support for vf unicast promiscuos mode to disallow VFs from receiving traffic intended for another VF, updates the vfr_stat state check to handle the existing and future mechanism and adds an adapter state check to prevent re-arming the watchdog timer after i40evf_remove has been called and the timer has been deleted. Serey fixes an issue where a guest OS would panic when removing the vf driver while the device is being reset due to an attempt to clean a non initialized mac_filter_list. Akeem makes a minor comment change. Jessie changes an instance of sprintf to snprintf that was missed when the driver was converted to use snprintf everywhere. Mitch plugs a few memory leaks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40evf: Fixed guest OS panic when removing vf driverSerey Kong
Removing VF driver during device still in reset caused guest OS panic. in the i40evf_remove(), we're trying to clean mac_filter_list which has not been initialized since the device is still stuck at the reset. The change is to initialize the filter_list before setting any task. Change-ID: I8b59df7384416c7e6f2d264b598f447e1c2c92b0 Signed-off-by: Serey Kong <serey.kong@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40evf: fix memory leak on unused interfacesMitch Williams
If the driver is loaded and then unloaded before the interface is brought up, then it will allocate a MAC filter entry and never free it. To fix this, on unload, run through the mac filter list and free all the entries. We also do this during reset recovery when the driver cannot contact the PF and needs to shut down completely. Change-ID: I15fabd67eb4a1bfc57605a7db60d0b5d819839db Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40evf: don't leak queue vectorsMitch Williams
Fix a memory leak. Driver was allocating memory for queue vectors on init but not freeing them on shutdown. These need to be freed at two different times: during module unload, and during reset recovery when the driver cannot contact the PF driver and needs to give up. Change-ID: I7c1d0157a776e960d4da432dfe309035aad7c670 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40evf: do not re-arm watchdog after removeAshish Shah
Add in an adapter state check to prevent re-arming watchdog timer after i40evf_remove has been called and timer has been deleted. Change-ID: I636ba7c6322be8cbf053231959f90c0a2d8d803a Signed-off-by: Ashish Shah <ashish.n.shah@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40evf: future-proof vfr_stat state checkAshish Shah
Previously defined state I40E_VFR_VFACTIVE uses bit 1 which is now set to "reserved." Update the state checks to also include I40E_VFR_COMPLETED. This change will allow the VF to work with both existing and future PFs. Change-ID: Ifd1d34f79f3b0ffd6d2550ee4dadc55825ff52f8 Signed-off-by: Ashish Shah <ashish.n.shah@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40e: remove support for vf unicast promiscuous modeAshish Shah
Remove the ability of a VF to set unicast promiscuous mode. Considered to be a security risk to allow VFs to receive traffic intended for other VFs so don't allow it, simply ignore the flag. Also fix it to send the correct seid to aq for multicast promiscuous set. Change-ID: Icb9c49a281a8e9d3aeebf991ef1533ac82b84b14 Signed-off-by: Ashish Shah <ashish.n.shah@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40e: Minor comment changesAkeem G Abodunrin
Fixes comment for reset reason Change-ID: I6fda4fa292255e6eb0f874502b4d38d722149b10 Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40evf: fix scan warning on sprintfJesse Brandeburg
The driver was converted to use snprintf everywhere but this one function. Just use snprintf, instead of sprintf. Also a small spelling correction in a comment. Change-ID: I59d45f94a52754c7b4cd6034df9a61d8132b7f77 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40e: disable local loopback on vmdq vsiShannon Nelson
The local loopback should only be enabled for VSIs that are supporting cascaded VEBs or VEPA setups. This is not the case here, and we need to stop the VEB from echoing the VMDQ VSI packets back at the VSI. Change-ID: I9dfb6ac79db24d04360d7efde62d81e20abc5090 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40e: use correct vf_id offset for virtchnl messageAshish Shah
The vf_id needs to be offset by the vf_base_id from hw function capabilities for the case of multiple PFs. Change-ID: I20ca8621f98e9cdf98649380b8eeaa35db52677c Signed-off-by: Ashish Shah <ashish.n.shah@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40e: expose debug_write_register requestShannon Nelson
Now that the HW registers are no longer in debug mode and many are locked down for writes, we need to expose the Firmware API request used to do writes on the driver's behalf. Change-ID: I09a05c4dc9ea0b24c00193faac34d7799eaa8496 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <jamesx.m.young> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40e: adds FCoE to build and updates its documentationVasu Dev
Adds newly added FCoE files to the build but only if FCoE module is configured. Also, updates i40e document for added FCoE support. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40e: Adds FCoE related code to i40e core driverVasu Dev
Adds FCoE specific code to existing i40e core driver to:- 1. have separate FCoE VSI with additional FCoE queues pairs. 2. have FCoE related hash defines. 3. have additional FCoE related stats code. 4. export and then re-use existing functions required by FCoE build. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02i40e: adds FCoE code to the i40e driverVasu Dev
This patch adds FCoE ( Fibre Channel Over Ethernet ) code for Intel XL710 adapters. This patch is limited to only new FCoE offloads code in newly added files by this patch and then following patches in the series modifies rest of the existing driver to enable FCoE with i40e driver. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02Merge branch 'amd-xgbe-next'David S. Miller
Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver update 2014-08-01 The following series of patches includes minor fixes/updates to the driver. - Remove some uses of spinlock around ethtool/phylib areas - Update Rx/Tx ready check logic ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02amd-xgbe-phy: Allow more time for Rx/Tx to become readyLendacky, Thomas
The current time range waiting for Rx/Tx to become ready can sometimes be too short if a connection is not present. Increase the number of retries and the sleep to give a bit more time. Also, change level of the message issued from _err to _dbg if Rx/Tx do not become ready since the underlying logic will function as if no link is established and retry eventually. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02amd-xgbe: Remove unnecessary spinlocksLendacky, Thomas
Remove the spinlocks around the ethtool get and set settings functions and within the link adjustment callback routine. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02net: dnet: Use managed interfacesHimangi Saraogi
This patch introduces the use of managed interfaces like devm_ioremap_resource and does away with the calls to free the allocated memory in the probe and remove functions. Also, some labels and variable are done away with. This fixes a bug as there was a missing release_mem_region in the remove function. Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02net: ks8851-ml: Use devm_ioremap_resourceHimangi Saraogi
This patch introduces the use of devm_ioremap_resource, devm_kmalloc and does away with the functions to free the allocated memory in the probe and remove functions. Also, some labels are done away with. A bug is fixed as two regions are allocated in the probe function, but only one is freed in the remove function. Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02cirrus: cs89x0: Use managed interfacesHimangi Saraogi
This patch introduces the use of managed interfaces like devm_ioremap_resource and does away with the functions to free the allocated memory in the probe and remove functions. Also, many labels are done away with. The field size in no longer needed and is hence removed from the struct net_local. Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02net: sh_eth: Add r8a7794 supportHisashi Nakamura
Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com> [uli: added bindings documentation] Signed-off-by: Ulrich Hecht <ulrich.hecht+renesas@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02Merge branch 'be2net-next'David S. Miller
Sathya Perla says: ==================== be2net: patch set Patch 1 fixes a regression caused by a previous commit on net-next. Old versions of BE3 FW may not support cmds to re-provision (and hence optimize) resources/queues in SR-IOV config. Do not treat this FW cmd failure as fatal and fail the function initialization. Instead, just enable SR-IOV with the resources provided by the FW. Patch 2 ignores a VF mac address setting if the new mac is already active on the VF. Patch 3 adds support to delete a FW-dump via ethtool on Lancer adapters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02be2net: support deleting FW dump via ethtool (only for Lancer)Kalesh AP
This patch adds support to delete an existing FW-dump in Lancer via ethtool. Initiating a new dump is not allowed if a FW dump is already present in the adapter. The existing dump has to be first explicitly deleted. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>