summaryrefslogtreecommitdiff
path: root/drivers/net/virtio_net.c
AgeCommit message (Collapse)Author
2017-07-25virtio-net: fix module unloadingAndrew Jones
Unregister the driver before removing multi-instance hotplug callbacks. This order avoids the warning issued from __cpuhp_remove_state_cpuslocked when the number of remaining instances isn't yet zero. Fixes: 8017c279196a ("net/virtio-net: Convert to hotplug state machine") Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-07virtio-net: fix leaking of ctx arrayJason Wang
Fixes: commit d45b897b11ea ("virtio_net: allow specifying context for rx") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29virtio-net: serialize tx routine during resetJason Wang
We don't hold any tx lock when trying to disable TX during reset, this would lead a use after free since ndo_start_xmit() tries to access the virtqueue which has already been freed. Fix this by using netif_tx_disable() before freeing the vqs, this could make sure no tx after vq freeing. Reported-by: Jean-Philippe Menil <jpmenil@gmail.com> Tested-by: Jean-Philippe Menil <jpmenil@gmail.com> Fixes commit f600b6905015 ("virtio_net: Add XDP support") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Robert McCabe <robert.mccabe@rockwellcollins.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: virtio_net: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau
Add support to virtio_net to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: introduce and use skb_put_data()Johannes Berg
A common pattern with skb_put() is to just want to memcpy() some data into the new space, introduce skb_put_data() for this. An spatch similar to the one for skb_put_zero() converts many of the places using it: @@ identifier p, p2; expression len, skb, data; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_data(skb, data, len); | -p = (t)skb_put(skb, len); +p = skb_put_data(skb, data, len); ) ( p2 = (t2)p; -memcpy(p2, data, len); | -memcpy(p, data, len); ) @@ type t, t2; identifier p, p2; expression skb, data; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); ) ( p2 = (t2)p; -memcpy(p2, data, sizeof(*p)); | -memcpy(p, data, sizeof(*p)); ) @@ expression skb, len, data; @@ -memcpy(skb_put(skb, len), data, len); +skb_put_data(skb, data, len); (again, manually post-processed to retain some comments) Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Just some simple overlapping changes in marvell PHY driver and the DSA core code. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04virtio_net: check return value of skb_to_sgvec alwaysJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02virtio_net: lower limit on buffer sizeMichael S. Tsirkin
commit d85b758f72b0 ("virtio_net: fix support for small rings") was supposed to increase the buffer size for small rings but had an unintentional side effect of decreasing it for large rings. This seems to break some setups - it's not yet clear why, but increasing buffer size back to what it was before helps. Fixes: d85b758f72b0 ("virtio_net: fix support for small rings") Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24virtio-net: enable TSO/checksum offloads for Q-in-Q vlansVlad Yasevich
Since virtio does not provide it's own ndo_features_check handler, TSO, and now checksum offload, are disabled for stacked vlans. Re-enable the support and let the host take care of it. This restores/improves Guest-to-Guest performance over Q-in-Q vlans. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Fixes, cleanups, performance A bunch of changes to virtio, most affecting virtio net. Also ptr_ring batched zeroing - first of batching enhancements that seems ready." * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: s390/virtio: change maintainership tools/virtio: fix spelling mistake: "wakeus" -> "wakeups" virtio_net: tidy a couple debug statements ptr_ring: support testing different batching sizes ringtest: support test specific parameters ptr_ring: batch ring zeroing virtio: virtio_driver doc virtio_net: don't reset twice on XDP on/off virtio_net: fix support for small rings virtio_net: reduce alignment for buffers virtio_net: rework mergeable buffer handling virtio_net: allow specifying context for rx virtio: allow extra context per descriptor tools/virtio: fix build breakage virtio: add context flag to find vqs virtio: wrap find_vqs ringtest: fix an assert statement
2017-05-09virtio_net: tidy a couple debug statementsDan Carpenter
We are printing a decimal value for truesize so we shouldn't use an "0x" prefix. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: don't reset twice on XDP on/offMichael S. Tsirkin
We already do a reset once in remove_vq_common - there appears to be no point in doing another one when we add/remove XDP. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: fix support for small ringsMichael S. Tsirkin
When ring size is small (<32 entries) making buffers smaller means a full ring might not be able to hold enough buffers to fit a single large packet. Make sure a ring full of buffers is large enough to allow at least one packet of max size. Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: reduce alignment for buffersMichael S. Tsirkin
We don't need to align length to any particular value anymore. Aligning to L1 cache size probably sill makes sense to reduce false sharing. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: rework mergeable buffer handlingMichael S. Tsirkin
Use the new _ctx virtio API to maintain true length for each buffer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: allow specifying context for rxMichael S. Tsirkin
With mergeable buffers we never use s/g for rx, so allow specifying context in that case. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-03xdp: use common helper for netlink extended ack reportingDaniel Borkmann
Small follow-up to d74a32acd59a ("xdp: use netlink extended ACK reporting") in order to let drivers all use the same NL_SET_ERR_MSG_MOD() helper macro for reporting. This also ensures that we consistently add the driver's prefix for dumping the report in user space to indicate that the error message is driver specific and not coming from core code. Furthermore, NL_SET_ERR_MSG_MOD() now reuses NL_SET_ERR_MSG() and thus makes all macros check the pointer as suggested. References: https://www.spinics.net/lists/netdev/msg433267.html Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02virtio: wrap find_vqsMichael S. Tsirkin
We are going to add more parameters to find_vqs, let's wrap the call so we don't need to tweak all drivers every time. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-01virtio_net: make use of extended ack message reportingJakub Kicinski
Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30virtio-net: use netif_tx_napi_add for tx napiWillem de Bruijn
Avoid hashing the tx napi struct into napi_hash[], which is used for busy polling receive queues. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26virtio-net: on tx, only call napi_disable if tx napi is onWillem de Bruijn
As of tx napi, device down (`ip link set dev $dev down`) hangs unless tx napi is enabled. Else napi_enable is not called, so napi_disable will spin on test_and_set_bit NAPI_STATE_SCHED. Only call napi_disable if tx napi is enabled. Fixes: 5a719c2552ca ("virtio-net: transmit napi") Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24virtio-net: keep tx interrupts disabled unless kickWillem de Bruijn
Tx napi mode increases the rate of transmit interrupts. Suppress some by masking interrupts while more packets are expected. The interrupts will be reenabled before the last packet is sent. This optimization reduces the througput drop with tx napi for unidirectional flows such as UDP_STREAM that do not benefit from cleaning tx completions in the the receive napi handler. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24virtio-net: clean tx descriptors from rx napiWillem de Bruijn
Amortize the cost of virtual interrupts by doing both rx and tx work on reception of a receive interrupt if tx napi is enabled. With VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion interrupts for bidirectional workloads. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24virtio-net: move free_old_xmit_skbsWillem de Bruijn
An upcoming patch will call free_old_xmit_skbs indirectly from virtnet_poll. Move the function above this to avoid having to introduce a forward declaration. This is a pure move: no code changes. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24virtio-net: transmit napiWillem de Bruijn
Convert virtio-net to a standard napi tx completion path. This enables better TCP pacing using TCP small queues and increases single stream throughput. The virtio-net driver currently cleans tx descriptors on transmission of new packets in ndo_start_xmit. Latency depends on new traffic, so is unbounded. To avoid deadlock when a socket reaches its snd limit, packets are orphaned on tranmission. This breaks socket backpressure, including TSQ. Napi increases the number of interrupts generated compared to the current model, which keeps interrupts disabled as long as the ring has enough free descriptors. Keep tx napi optional and disabled for now. Follow-on patches will reduce the interrupt cost. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24virtio-net: napi helper functionsWillem de Bruijn
Prepare virtio-net for tx napi by converting existing napi code to use helper functions. This also deduplicates some logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07virtio_net: clear MTU when out of rangeMichael S. Tsirkin
virtio attempts to clear the MTU feature bit if the value is out of the supported range, but this has no real effect since FEATURES_OK has already been set. Fix this up by checking the MTU in the new validate callback. Fixes: 14de9d114a82 ("virtio-net: Add initial MTU advice feature") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-04-07virtio_net: enable big packets for large MTU valuesMichael S. Tsirkin
If one enables e.g. jumbo frames without mergeable buffers, packets won't fit in 1500 byte buffers we use. Switch to big packet mode instead. TODO: make sizing more exact, possibly extend small packet mode to use larger pages. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-03-22net: virtio_net: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix double-free in batman-adv, from Sven Eckelmann. 2) Fix packet stats for fast-RX path, from Joannes Berg. 3) Netfilter's ip_route_me_harder() doesn't handle request sockets properly, fix from Florian Westphal. 4) Fix sendmsg deadlock in rxrpc, from David Howells. 5) Add missing RCU locking to transport hashtable scan, from Xin Long. 6) Fix potential packet loss in mlxsw driver, from Ido Schimmel. 7) Fix race in NAPI handling between poll handlers and busy polling, from Eric Dumazet. 8) TX path in vxlan and geneve need proper RCU locking, from Jakub Kicinski. 9) SYN processing in DCCP and TCP need to disable BH, from Eric Dumazet. 10) Properly handle net_enable_timestamp() being invoked from IRQ context, also from Eric Dumazet. 11) Fix crash on device-tree systems in xgene driver, from Alban Bedel. 12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de Melo. 13) Fix use-after-free in netvsc driver, from Dexuan Cui. 14) Fix max MTU setting in bonding driver, from WANG Cong. 15) xen-netback hash table can be allocated from softirq context, so use GFP_ATOMIC. From Anoob Soman. 16) Fix MAC address change bug in bgmac driver, from Hari Vyas. 17) strparser needs to destroy strp_wq on module exit, from WANG Cong. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) strparser: destroy workqueue on module exit sfc: fix IPID endianness in TSOv2 sfc: avoid max() in array size rds: remove unnecessary returned value check rxrpc: Fix potential NULL-pointer exception nfp: correct DMA direction in XDP DMA sync nfp: don't tell FW about the reserved buffer space net: ethernet: bgmac: mac address change bug net: ethernet: bgmac: init sequence bug xen-netback: don't vfree() queues under spinlock xen-netback: keep a local pointer for vif in backend_disconnect() netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups netfilter: nf_conntrack_sip: fix wrong memory initialisation can: flexcan: fix typo in comment can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer can: gs_usb: fix coding style can: gs_usb: Don't use stack memory for USB transfers ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines ixgbe: update the rss key on h/w, when ethtool ask for it ...
2017-03-02Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull vhost updates from Michael Tsirkin: "virtio, vhost: optimizations, fixes Looks like a quiet cycle for vhost/virtio, just a couple of minor tweaks. Most notable is automatic interrupt affinity for blk and scsi. Hopefully other devices are not far behind" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-console: avoid DMA from stack vhost: introduce O(1) vq metadata cache virtio_scsi: use virtio IRQ affinity virtio_blk: use virtio IRQ affinity blk-mq: provide a default queue mapping for virtio device virtio: provide a method to get the IRQ affinity mask for a virtqueue virtio: allow drivers to request IRQ affinity when creating VQs virtio_pci: simplify MSI-X setup virtio_pci: don't duplicate the msix_enable flag in struct pci_dev virtio_pci: use shared interrupts for virtqueues virtio_pci: remove struct virtio_pci_vq_info vhost: try avoiding avail index access when getting descriptor virtio_mmio: expose header to userspace
2017-03-02average: change to declare precision, not factorJohannes Berg
Declaring the factor is counter-intuitive, and people are prone to using small(-ish) values even when that makes no sense. Change the DECLARE_EWMA() macro to take the fractional precision, in bits, rather than a factor, and update all users. While at it, add some more documentation. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-02-27virtio: allow drivers to request IRQ affinity when creating VQsChristoph Hellwig
Add a struct irq_affinity pointer to the find_vqs methods, which if set is used to tell the PCI layer to create the MSI-X vectors for our I/O virtqueues with the proper affinity from the start. Compared to after the fact affinity hints this gives us an instantly working setup and allows to allocate the irq descritors node-local and avoid interconnect traffic. Last but not least this will allow blk-mq queues are created based on the interrupt affinity for storage drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-21virtio-net: switch to use build_skb() for small bufferJason Wang
This patch switch to use build_skb() for small buffer which can have better performance for both TCP and XDP (since we can work at page before skb creation). It also remove lots of XDP codes since both mergeable and small buffer use page frag during refill now. Before | After XDP_DROP(xdp1) 64B : 11.1Mpps | 14.4Mpps Tested with xdp1/xdp2/xdp_ip_tx_tunnel and netperf. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20virito-net: set queues after reset during xdp_setJason Wang
We set queues before reset which will cause a crash[1]. This is because is_xdp_raw_buffer_queue() depends on the old xdp queue pairs number to do the correct detection. So fix this by - passing xdp queue pairs and current queue pairs to virtnet_reset() - change vi->xdp_qp after reset but before refill, to make sure both free_unused_bufs() and refill can make correct detection of XDP. - remove the duplicated queue pairs setting before virtnet_reset() since we will do it inside virtnet_reset() [1] [ 74.328168] general protection fault: 0000 [#1] SMP [ 74.328625] Modules linked in: nfsd xfs libcrc32c virtio_net virtio_pci [ 74.329117] CPU: 0 PID: 2849 Comm: xdp2 Not tainted 4.10.0-rc7+ #499 [ 74.329577] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 [ 74.330424] task: ffff88007a894000 task.stack: ffffc90004388000 [ 74.330844] RIP: 0010:skb_release_head_state+0x28/0x80 [ 74.331298] RSP: 0018:ffffc9000438b8d0 EFLAGS: 00010206 [ 74.331676] RAX: 0000000000000000 RBX: ffff88007ad96300 RCX: 0000000000000000 [ 74.332217] RDX: ffff88007fc137a8 RSI: ffff88007fc0db28 RDI: 0001bf00000001be [ 74.332758] RBP: ffffc9000438b8d8 R08: 000000000005008f R09: 00000000000005f9 [ 74.333274] R10: ffff88007d001700 R11: ffffffff820a8a4d R12: ffff88007ad96300 [ 74.333787] R13: 0000000000000002 R14: ffff880036604000 R15: 000077ff80000000 [ 74.334308] FS: 00007fc70d8a7b40(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 74.334891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 74.335314] CR2: 00007fff4144a710 CR3: 000000007ab56000 CR4: 00000000003406f0 [ 74.335830] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 74.336373] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 74.336895] Call Trace: [ 74.337086] skb_release_all+0xd/0x30 [ 74.337356] consume_skb+0x2c/0x90 [ 74.337607] free_unused_bufs+0x1ff/0x270 [virtio_net] [ 74.337988] ? vp_synchronize_vectors+0x3b/0x60 [virtio_pci] [ 74.338398] virtnet_xdp+0x21e/0x440 [virtio_net] [ 74.338741] dev_change_xdp_fd+0x101/0x140 [ 74.339048] do_setlink+0xcf4/0xd20 [ 74.339304] ? symcmp+0xf/0x20 [ 74.339529] ? mls_level_isvalid+0x52/0x60 [ 74.339828] ? mls_range_isvalid+0x43/0x50 [ 74.340135] ? nla_parse+0xa0/0x100 [ 74.340400] rtnl_setlink+0xd4/0x120 [ 74.340664] ? cpumask_next_and+0x30/0x50 [ 74.340966] rtnetlink_rcv_msg+0x7f/0x1f0 [ 74.341259] ? sock_has_perm+0x59/0x60 [ 74.341586] ? napi_consume_skb+0xe2/0x100 [ 74.342010] ? rtnl_newlink+0x890/0x890 [ 74.342435] netlink_rcv_skb+0x92/0xb0 [ 74.342846] rtnetlink_rcv+0x23/0x30 [ 74.343277] netlink_unicast+0x162/0x210 [ 74.343677] netlink_sendmsg+0x2db/0x390 [ 74.343968] sock_sendmsg+0x33/0x40 [ 74.344233] SYSC_sendto+0xee/0x160 [ 74.344482] ? SYSC_bind+0xb0/0xe0 [ 74.344806] ? sock_alloc_file+0x92/0x110 [ 74.345106] ? fd_install+0x20/0x30 [ 74.345360] ? sock_map_fd+0x3f/0x60 [ 74.345586] SyS_sendto+0x9/0x10 [ 74.345790] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 74.346086] RIP: 0033:0x7fc70d1b8f6d [ 74.346312] RSP: 002b:00007fff4144a708 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 74.346785] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fc70d1b8f6d [ 74.347244] RDX: 000000000000002c RSI: 00007fff4144a720 RDI: 0000000000000003 [ 74.347683] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 74.348544] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff4144bd90 [ 74.349082] R13: 0000000000000002 R14: 0000000000000002 R15: 00007fff4144cda0 [ 74.349607] Code: 00 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 58 48 85 ff 74 0e 40 f6 c7 01 74 3d 48 c7 43 58 00 00 00 00 48 8b 7b 68 48 85 ff 74 05 <f0> ff 0f 74 20 48 8b 43 60 48 85 c0 74 14 65 8b 15 f3 ab 8d 7e [ 74.351008] RIP: skb_release_head_state+0x28/0x80 RSP: ffffc9000438b8d0 [ 74.351625] ---[ end trace fe6e19fd11cfc80b ]--- Fixes: 2de2f7f40ef9 ("virtio_net: XDP support for adjust_head") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19virtio-net: batch stats updatingJason Wang
We already have counters for sent/recv packets and sent/recv bytes. Doing a batched update to reduce the number of u64_stats_update_begin/end(). Take care not to bother with stats update when called speculatively. Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07virtio_net: XDP support for adjust_headJohn Fastabend
Add support for XDP adjust head by allocating a 256B header region that XDP programs can grow into. This is only enabled when a XDP program is loaded. In order to ensure that we do not have to unwind queue headroom push queue setup below bpf_prog_add. It reads better to do a prog ref unwind vs another queue setup call. At the moment this code must do a full reset to ensure old buffers without headroom on program add or with headroom on program removal are not used incorrectly in the datapath. Ideally we would only have to disable/enable the RX queues being updated but there is no API to do this at the moment in virtio so use the big hammer. In practice it is likely not that big of a problem as this will only happen when XDP is enabled/disabled changing programs does not require the reset. There is some risk that the driver may either have an allocation failure or for some reason fail to correctly negotiate with the underlying backend in this case the driver will be left uninitialized. I have not seen this ever happen on my test systems and for what its worth this same failure case can occur from probe and other contexts in virtio framework. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07virtio_net: refactor freeze/restore logic into virtnet reset logicJohn Fastabend
For XDP we will need to reset the queues to allow for buffer headroom to be configured. In order to do this we need to essentially run the freeze()/restore() code path. Unfortunately the locking requirements between the freeze/restore and reset paths are different however so we can not simply reuse the code. This patch refactors the code path and adds a reset helper routine. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07virtio_net: remove duplicate queue pair binding in XDPJohn Fastabend
Factor out qp assignment. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07virtio_net: factor out xdp handler for readabilityJohn Fastabend
At this point the do_xdp_prog is mostly if/else branches handling the different modes of virtio_net. So remove it and handle running the program in the per mode handlers. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07virtio_net: wrap rtnl_lock in test for calling with lock already heldJohn Fastabend
For XDP use case and to allow ethtool reset tests it is useful to be able to use reset paths from contexts where rtnl lock is already held. This requries updating virtnet_set_queues and free_receive_bufs the two places where rtnl_lock is taken in virtio_net. To do this we use the following pattern, _foo(...) { do stuff } foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()}; this allows us to use freeze()/restore() flow from both contexts. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-04virtio_net: exploit napi_complete_done() return valueEric Dumazet
Since commit 364b6055738b ("net: busy-poll: return busypolling status to drivers"), napi_complete_done() returns a boolean that can be used by drivers to conditionally rearm interrupts. This patch changes virtio_net to use this boolean to avoid a bit of overhead for busy-poll users. Jason reports about 1.1% improvement for 1 byte TCP_RR (burst 100). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03virtio_net: remove custom busy_pollEric Dumazet
Generic NAPI busy polling allows us to remove custom implementations found in drivers. It is possible further optimization could be done by testing napi_complete_done() return value. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two trivial overlapping changes conflicts in MPLS and mlx5. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25virtio_net: reject XDP programs using header adjustmentJakub Kicinski
commit 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog") added a new XDP helper to prepend and remove data from a frame. Make virtio_net reject programs making use of this helper until proper support is added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25virtio_net: use dev_kfree_skb for small buffer XDP receiveJohn Fastabend
In the small buffer case during driver unload we currently use put_page instead of dev_kfree_skb. Resolve this by adding a check for virtnet mode when checking XDP queue type. Also name the function so that the code reads correctly to match the additional check. Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25bpf: add initial bpf tracepointsDaniel Borkmann
This work adds a number of tracepoints to paths that are either considered slow-path or exception-like states, where monitoring or inspecting them would be desirable. For bpf(2) syscall, tracepoints have been placed for main commands when they succeed. In XDP case, tracepoint is for exceptions, that is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED return code, or when error occurs during XDP_TX action and the packet could not be forwarded. Both have been split into separate event headers, and can be further extended. Worst case, if they unexpectedly should get into our way in future, they can also removed [1]. Of course, these tracepoints (like any other) can be analyzed by eBPF itself, etc. Example output: # ./perf record -a -e bpf:* sleep 10 # ./perf script sock_example 6197 [005] 283.980322: bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0 sock_example 6197 [005] 283.980721: bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5 sock_example 6197 [005] 283.988423: bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER sock_example 6197 [005] 283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00] [...] sock_example 6197 [005] 288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00] swapper 0 [005] 289.338243: bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER [1] https://lwn.net/Articles/705270/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24virtio_net: fix PAGE_SIZE > 64kMichael S. Tsirkin
I don't have any guests with PAGE_SIZE > 64k but the code seems to be clearly broken in that case as PAGE_SIZE / MERGEABLE_BUFFER_ALIGN will need more than 8 bit and so the code in mergeable_ctx_to_buf_address does not give us the actual true size. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>