summaryrefslogtreecommitdiff
path: root/drivers/net/macvtap.c
AgeCommit message (Collapse)Author
2013-12-12macvlan: Remove custom recieve and forward handlersVlad Yasevich
Since now macvlan and macvtap use the same receive and forward handlers, we can remove them completely and use netif_rx and dev_forward_skb() directly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12macvtap: Add support of packet capture on macvtap device.Vlad Yasevich
Macvtap device currently doesn not allow a user to capture traffic on due to the fact that it steals the packets from the network stack before the skb->dev is set correctly on the receive side, and that use uses macvlan transmit path directly on the send side. As a result, we never get a change to give traffic to the taps while the correct device is set in the skb. This patch makes macvtap device behave almost exaclty like macvlan. On the send side, we switch to using dev_queue_xmit(). On the receive side, to deliver packets to macvtap, we now use netif_rx and dev_forward_skb just like macvlan. The only differnce now is that macvtap has its own rx_handler which is attached to the macvtap netdev. It is here that we now steal the packet and provide it to the socket. As a result, we can now capture traffic on the macvtap device: tcpdump -i macvtap0 It also gives us the abilit to add tc actions to the macvtap device and actually utilize different bandwidth management queues on output. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11macvtap: signal truncated packetsJason Wang
macvtap_put_user() never return a value grater than iov length, this in fact bypasses the truncated checking in macvtap_recvmsg(). Fix this by always returning the size of packet plus the possible vlan header to let the trunca checking work. Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net: Revert macvtap/tun truncation signalling changes.David S. Miller
Jason Wang and Michael S. Tsirkin are still discussing how to properly fix this. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10macvtap: signal truncated packetsJason Wang
macvtap_put_user() never return a value grater than iov length, this in fact bypasses the truncated checking in macvtap_recvmsg(). Fix this by always returning the size of packet plus the possible vlan header to let the truncated checking work. Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10Revert "macvtap: remove useless codes in macvtap_aio_read() and ↵David S. Miller
macvtap_recvmsg()" This reverts commit 41e4af69a5984a3193ba3108fb4e067b0e34dc73. MSG_TRUNC handling was broken and is going to be fixed in the 'net' tree, so revert this. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg()Zhi Yong Wu
By checking related codes, it is impossible that ret > len or total_len, so we should remove some useless coeds in both above functions. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that Daniel's direct transmit changes depend upon. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06macvtap: remove unused parameter in macvtap_do_read()Zhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06macvtap: remove the dead branchZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06macvtap: update file current positionZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-29macvtap: Do not double-count received packetsVlad Yasevich
Currently macvlan will count received packets after calling each vlans receive handler. Macvtap attempts to count the packet yet again when the user reads the packet from the tap socket. This code doesn't do this consistently either. Remove the counting from macvtap and let only macvlan count received packets. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28macvtap: fix tx_dropped counting errorJason Wang
After commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless tx path), tx stat counter were converted to percpu stat structure. So we need use to this also for tx_dropped in macvtap. Otherwise, the management won't notice the dropping packet in macvtap tx path. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14macvtap: limit head length of skb allocatedJason Wang
We currently use hdr_len as a hint of head length which is advertised by guest. But when guest advertise a very big value, it can lead to an 64K+ allocating of kmalloc() which has a very high possibility of failure when host memory is fragmented or under heavy stress. The huge hdr_len also reduce the effect of zerocopy or even disable if a gso skb is linearized in guest. To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the head, which guarantees an order 0 allocation each time. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20macvtap: Ignore tap features when VNET_HDR is offVlad Yasevich
When the user turns off VNET_HDR support on the macvtap device, there is no way to provide any offload information to the user. So, it's safer to ignore offload setting then depend on the user setting them correctly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20macvtap: Correctly set tap features when IFF_VNET_HDR is disabled.Vlad Yasevich
When the user turns off IFF_VNET_HDR flag, attempts to change offload features via TUNSETOFFLOAD do not work. This could cause GSO packets to be delivered to the user when the user is not prepared to handle them. To solve, allow processing of TUNSETOFFLOAD when IFF_VNET_HDR is disabled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20macvtap: simplify usage of tap_featuresVlad Yasevich
In macvtap, tap_features specific the features of that the user has specified via ioctl(). If we treat macvtap as a macvlan+tap then we could all the tap a pseudo-device and give it other features like SG and GSO. Then we can stop using the features of lower device (macvlan) when forwarding the traffic the tap. This solves the issue of possible checksum offload mismatch between tap feature and macvlan features. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-11macvtap: fix two racesEric Dumazet
Since commit ac4e4af1e59e1 ("macvtap: Consistently use rcu functions"), Thomas gets two different warnings : BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45891/45892 caller is macvtap_do_read+0x45c/0x600 [macvtap] CPU: 1 PID: 45892 Comm: vhost-45891 Not tainted 3.11.0-bisecttest #13 Call Trace: ([<00000000001126ee>] show_trace+0x126/0x144) [<00000000001127d2>] show_stack+0xc6/0xd4 [<000000000068bcec>] dump_stack+0x74/0xd8 [<0000000000481066>] debug_smp_processor_id+0xf6/0x114 [<000003ff802e9a18>] macvtap_do_read+0x45c/0x600 [macvtap] [<000003ff802e9c1c>] macvtap_recvmsg+0x60/0x88 [macvtap] [<000003ff80318c5e>] handle_rx+0x5b2/0x800 [vhost_net] [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost] [<000000000015f3ac>] kthread+0xd8/0xe4 [<00000000006934a6>] kernel_thread_starter+0x6/0xc [<00000000006934a0>] kernel_thread_starter+0x0/0xc And BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45897/45898 caller is macvlan_start_xmit+0x10a/0x1b4 [macvlan] CPU: 1 PID: 45898 Comm: vhost-45897 Not tainted 3.11.0-bisecttest #16 Call Trace: ([<00000000001126ee>] show_trace+0x126/0x144) [<00000000001127d2>] show_stack+0xc6/0xd4 [<000000000068bdb8>] dump_stack+0x74/0xd4 [<0000000000481132>] debug_smp_processor_id+0xf6/0x114 [<000003ff802b72ca>] macvlan_start_xmit+0x10a/0x1b4 [macvlan] [<000003ff802ea69a>] macvtap_get_user+0x982/0xbc4 [macvtap] [<000003ff802ea92a>] macvtap_sendmsg+0x4e/0x60 [macvtap] [<000003ff8031947c>] handle_tx+0x494/0x5ec [vhost_net] [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost] [<000000000015f3ac>] kthread+0xd8/0xe4 [<000000000069356e>] kernel_thread_starter+0x6/0xc [<0000000000693568>] kernel_thread_starter+0x0/0xc 2 locks held by vhost-45897/45898: #0: (&vq->mutex){+.+.+.}, at: [<000003ff8031903c>] handle_tx+0x54/0x5ec [vhost_net] #1: (rcu_read_lock){.+.+..}, at: [<000003ff802ea53c>] macvtap_get_user+0x824/0xbc4 [macvtap] In the first case, macvtap_put_user() calls macvlan_count_rx() in a preempt-able context, and this is not allowed. In the second case, macvtap_get_user() calls macvlan_start_xmit() with BH enabled, and this is not allowed. Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com> Bisected-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10net: attempt high order allocations in sock_alloc_send_pskb()Eric Dumazet
Adding paged frags skbs to af_unix sockets introduced a performance regression on large sends because of additional page allocations, even if each skb could carry at least 100% more payload than before. We can instruct sock_alloc_send_pskb() to attempt high order allocations. Most of the time, it does a single page allocation instead of 8. I added an additional parameter to sock_alloc_send_pskb() to let other users to opt-in for this new feature on followup patches. Tested: Before patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 46861.15 After patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 57981.11 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: move zerocopy_sg_from_iovec() to net/core/datagram.cJason Wang
To let it be reused and reduce code duplication. Also document this function. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: move iov_pages() to net/core/iovec.cJason Wang
To let it be reused and reduce code duplication. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-18macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGSJason Wang
We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16macvtap: do not assume 802.1Q when send vlan packetsJason Wang
The hard-coded 8021.q proto will break 802.1ad traffic. So switch to use vlan->proto. Cc: Basil Gor <basil.gor@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16macvtap: fix the missing ret value of TUNSETQUEUEJason Wang
Commit 441ac0fcaadc76ad09771812382345001dd2b813 (macvtap: Convert to using rtnl lock) forget to return what macvtap_ioctl_set_queue() returns to its caller. This may break multiqueue API by always falling through to TUNGETFEATURES. Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10macvtap: correctly linearize skb when zerocopy is usedJason Wang
Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to linearize parts of the skb to let the rest of iov to be fit in the frags, we need count copylen into linear when calling macvtap_alloc_skb() instead of partly counting it into data_len. Since this breaks zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should be zero at beginning. This cause nr_frags to be increased wrongly without setting the correct frags. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: Perform GSO on forwarding path.Vlad Yasevich
When macvtap forwards skb to its tap, it needs to check if GSO needs to be performed. This is sometimes necessary when the HW device performed GRO, but the guest reading from the tap does not support it (ex: Windows 7). Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: Let TUNSETOFFLOAD actually controll offload features.Vlad Yasevich
When the user issues TUNSETOFFLOAD ioctl, macvtap does not do anything other then to verify arguments. This patch adds functionality to allow users to actually control offload features. NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the features can be controlled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: Consistently use rcu functionsVlad Yasevich
Currently macvtap uses rcu_bh functions in its user facing fuction macvtap_get_user() and macvtap_put_user(). However, its packet handlers use normal rcu as the rcu_read_lock() is taken in netif_receive_skb(). We can safely discontinue the usage or rcu with bh disabled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: Convert to using rtnl lockVlad Yasevich
Macvtap uses a private lock to protect the relationship between macvtap_queue and macvlan_dev. The private lock is not needed since the relationship is managed by user via open(), release(), and dellink() calls. dellink() already happens under rtnl, so we can safely convert open() and release(), and use it in ioctl() as well. Suggested by Eric Dumazet. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: fix recovery from gup errorsMichael S. Tsirkin
get user pages might fail partially in macvtap zero copy mode. To recover we need to put all pages that we got, but code used a wrong index resulting in double-free errors. Reported-by: Brad Hubbard <bhubbard@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13macvtap: fix uninitialized return value macvtap_ioctl_set_queue()Jason Wang
Return -EINVAL on illegal flag instead of uninitialized value. This fixes the kbuild test warning. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13macvtap: slient sparse warningsJason Wang
This patch silents the following sparse warnings: drivers/net/macvtap.c:98:9: warning: incorrect type in assignment (different address spaces) drivers/net/macvtap.c:98:9: expected struct macvtap_queue *<noident> drivers/net/macvtap.c:98:9: got struct macvtap_queue [noderef] <asn:4>*<noident> drivers/net/macvtap.c:120:9: warning: incorrect type in assignment (different address spaces) drivers/net/macvtap.c:120:9: expected struct macvtap_queue *<noident> drivers/net/macvtap.c:120:9: got struct macvtap_queue [noderef] <asn:4>*<noident> drivers/net/macvtap.c:151:22: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:233:23: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:243:23: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:247:15: error: incompatible types in comparison expression (different address spaces) CC [M] drivers/net/macvtap.o drivers/net/macvlan.c:232:24: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: enable multiqueue flagJason Wang
To notify the userspace about our capability of multiqueue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: add TUNSETQUEUE ioctlJason Wang
This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or enable a queue of macvtap. This is used to be compatible at API layer of tuntap to simplify the userspace to manage the queues. This is done through introducing a linked list to track all taps while using vlan->taps array to only track active taps. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: eliminate linear searchJason Wang
Linear search were used in both get_slot() and macvtap_get_queue(), this is because: - macvtap didn't reshuffle the array of taps when create or destroy a queue, so when adding a new queue, macvtap must do linear search to find a location for the new queue. This will also complicate the TUNSETQUEUE implementation for multiqueue API. - the queue itself didn't track the queue index, so the we must do a linear search in the array to find the location of a existed queue. The solution is straightforward: reshuffle the array and introduce a queue_index to macvtap_queue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: introduce macvtap_get_vlan()Jason Wang
Factor out the device holding logic to a macvtap_get_vlan(), this will be also used by multiqueue API. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: do not add self to waitqueue if doing a nonblock readJason Wang
There's no need to add self to waitqueue if doing a nonblock read. This could help to avoid the spinlock contention. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: fix a possible race between queue selection and changing queuesJason Wang
Complier may generate codes that re-read the vlan->numvtaps during macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the same time and which can lead unexpected result (e.g. very huge value). We need prevent the compiler from generating such codes by adding an ACCESS_ONCE() to make sure vlan->numvtaps were only read once. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-28net: pass info struct via netdevice notifierJiri Pirko
So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27net: switch to use skb_probe_transport_header()Jason Wang
Switch to use the new help skb_probe_transport_header() to do the l4 header probing for untrusted sources. For packets with partial csum, the header should already been set by skb_partial_csum_set(). Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26macvtap: set transport header before passing skb to lower deviceJason Wang
Set the transport header for 1) some drivers (e.g ixgbe) needs l4 header 2) precise packet length estimation (introduced in 1def9238) needs l4 header to compute header length. For the packets with partial checksum, the patch just set the transport header to csum_start. Otherwise tries to use skb_flow_dissect() to get l4 offset, if it fails, just pretend no l4 header. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-27macvtap: convert to idr_alloc()Tejun Heo
Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-13net: Fix possible wrong checksum generation.Pravin B Shelar
Patch cef401de7be8c4e (net: fix possible wrong checksum generation) fixed wrong checksum calculation but it broke TSO by defining new GSO type but not a netdev feature for that type. net_gso_ok() would not allow hardware checksum/segmentation offload of such packets without the feature. Following patch fixes TSO and wrong checksum. This patch uses same logic that Eric Dumazet used. Patch introduces new flag SKBTX_SHARED_FRAG if at least one frag can be modified by the user. but SKBTX_SHARED_FRAG flag is kept in skb shared info tx_flags rather than gso_type. tx_flags is better compared to gso_type since we can have skb with shared frag without gso packet. It does not link SHARED_FRAG to GSO, So there is no need to define netdev feature for this. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28net: fix possible wrong checksum generationEric Dumazet
Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-12macvtap: rcu_dereference outside read-lock sectionDenis Efremov
rcu_dereference occurs in update section. Replacement by rcu_dereference_protected in order to prevent lockdep complaint. Found by Linux Driver Verification project (linuxtesting.org) Signed-off-by: Denis Efremov <yefremov.denis@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-07macvtap: use prepare_to_wait/finish_wait to ensure mbHong zhi guo
instead of raw assignment to current->state Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller